滴滴云A100 40G+TensorFlow1.15.2 +Ubuntu 18.04 性能测试

今天拿到了滴滴云内测版A100,跑了一下 TensorFlow基准测试,现在把结果记录一下!

运行环境

平台为:滴滴云

系统为:Ubuntu 18.04

显卡为:A100-SXM4-40GB

Python版本: 3.6

TensorFlow版本:1.15.2 NV编译版

系统环境:

测试方法

TensorFlow benchmarks测试方法:

https://github.com/tensorflow/benchmarks

resnet50_v1.5

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=resnet50_v1.5
Step    Img/sec total_loss


1 images/sec: 602.4 +/- 0.0 (jitter = 0.0) 7.847


10 images/sec: 606.8 +/- 1.2 (jitter = 5.4) 8.053


20 images/sec: 606.3 +/- 0.8 (jitter = 4.4) 8.102


30 images/sec: 605.8 +/- 0.8 (jitter = 3.8) 8.117


40 images/sec: 606.2 +/- 0.7 (jitter = 3.8) 7.893


50 images/sec: 606.1 +/- 0.5 (jitter = 3.0) 7.919


60 images/sec: 606.2 +/- 0.5 (jitter = 2.9) 8.104


70 images/sec: 606.6 +/- 0.5 (jitter = 2.9) 7.985


80 images/sec: 606.6 +/- 0.4 (jitter = 2.8) 7.805


90 images/sec: 606.6 +/- 0.4 (jitter = 2.8) 7.973


100 images/sec: 606.7 +/- 0.4 (jitter = 2.8) 7.644


----------------------------------------------------------------


total images/sec: 606.23


----------------------------------------------------------------


--use_fp16

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=resnet50_v1.5 --use_fp16
Step    Img/sec total_loss


1 images/sec: 1327.1 +/- 0.0 (jitter = 0.0) 7.972


10 images/sec: 1321.2 +/- 5.7 (jitter = 27.6) 7.885


20 images/sec: 1323.5 +/- 4.4 (jitter = 25.9) 8.073


30 images/sec: 1323.6 +/- 3.7 (jitter = 27.3) 7.934


40 images/sec: 1322.1 +/- 3.3 (jitter = 32.9) 8.102


50 images/sec: 1321.4 +/- 3.0 (jitter = 27.7) 7.876


60 images/sec: 1322.2 +/- 2.8 (jitter = 32.3) 7.883


70 images/sec: 1322.3 +/- 2.5 (jitter = 32.6) 7.962


80 images/sec: 1324.0 +/- 2.4 (jitter = 32.2) 8.049


90 images/sec: 1324.2 +/- 2.2 (jitter = 31.2) 7.909


100 images/sec: 1325.1 +/- 2.1 (jitter = 29.6) 7.874


----------------------------------------------------------------


total images/sec: 1322.76


----------------------------------------------------------------

Resnet50 BS64

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=resnet50
Step    Img/sec total_loss


1 images/sec: 653.5 +/- 0.0 (jitter = 0.0) 8.219


10 images/sec: 646.2 +/- 2.0 (jitter = 6.0) 7.879


20 images/sec: 646.1 +/- 1.4 (jitter = 7.2) 7.909


30 images/sec: 646.0 +/- 1.2 (jitter = 6.0) 7.820


40 images/sec: 646.2 +/- 1.0 (jitter = 6.3) 8.006


50 images/sec: 646.0 +/- 1.0 (jitter = 8.6) 7.769


60 images/sec: 646.0 +/- 0.9 (jitter = 8.6) 8.114


70 images/sec: 645.7 +/- 0.9 (jitter = 9.5) 7.811


80 images/sec: 645.8 +/- 0.8 (jitter = 9.5) 7.979


90 images/sec: 645.8 +/- 0.8 (jitter = 8.0) 8.095


100 images/sec: 645.8 +/- 0.7 (jitter = 6.4) 8.038


----------------------------------------------------------------


total images/sec: 645.26


----------------------------------------------------------------

--use_fp16

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=resnet50 --use_fp16
Step    Img/sec total_loss


1 images/sec: 1300.1 +/- 0.0 (jitter = 0.0) 8.101


10 images/sec: 1310.1 +/- 7.5 (jitter = 7.4) 7.758


20 images/sec: 1309.7 +/- 8.0 (jitter = 42.3) 7.912


30 images/sec: 1315.0 +/- 5.9 (jitter = 32.1) 7.776


40 images/sec: 1315.5 +/- 4.7 (jitter = 28.2) 7.918


50 images/sec: 1317.5 +/- 3.9 (jitter = 27.7) 7.895


60 images/sec: 1316.5 +/- 3.4 (jitter = 18.6) 7.711


70 images/sec: 1317.3 +/- 3.1 (jitter = 16.1) 8.008


80 images/sec: 1316.9 +/- 2.8 (jitter = 11.4) 7.777


90 images/sec: 1317.7 +/- 2.6 (jitter = 11.8) 7.808


100 images/sec: 1317.1 +/- 2.4 (jitter = 9.9) 8.036


----------------------------------------------------------------


total images/sec: 1315.11


----------------------------------------------------------------

AlexNet BS512

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=512 --model=alexnet
Step    Img/sec total_loss


1 images/sec: 8294.2 +/- 0.0 (jitter = 0.0) nan


10 images/sec: 8290.2 +/- 1.6 (jitter = 5.3) nan


20 images/sec: 8290.6 +/- 1.0 (jitter = 3.7) nan


30 images/sec: 8290.8 +/- 0.7 (jitter = 2.8) nan


40 images/sec: 8291.3 +/- 0.6 (jitter = 2.7) nan


50 images/sec: 8289.8 +/- 1.4 (jitter = 2.9) nan


60 images/sec: 8290.2 +/- 1.2 (jitter = 2.9) nan


70 images/sec: 8290.4 +/- 1.3 (jitter = 3.6) nan


80 images/sec: 8291.1 +/- 1.1 (jitter = 3.5) nan


90 images/sec: 8291.9 +/- 1.0 (jitter = 4.4) nan


100 images/sec: 8291.9 +/- 1.1 (jitter = 5.2) nan


----------------------------------------------------------------


total images/sec: 8282.46


----------------------------------------------------------------

--use_fp16

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=512 --model=alexnet --use_fp16
Step    Img/sec total_loss


1 images/sec: 10618.6 +/- 0.0 (jitter = 0.0) 7.250


10 images/sec: 10607.7 +/- 4.4 (jitter = 16.3) 7.251


20 images/sec: 10602.5 +/- 3.0 (jitter = 13.1) 7.251


30 images/sec: 10604.1 +/- 2.3 (jitter = 11.2) 7.251


40 images/sec: 10601.0 +/- 2.5 (jitter = 13.4) 7.251


50 images/sec: 10601.7 +/- 2.5 (jitter = 13.8) 7.251


60 images/sec: 10603.0 +/- 2.2 (jitter = 14.0) 7.250


70 images/sec: 10605.1 +/- 2.1 (jitter = 12.5) 7.251


80 images/sec: 10605.4 +/- 1.9 (jitter = 12.2) 7.251


90 images/sec: 10605.4 +/- 1.7 (jitter = 12.1) 7.251


100 images/sec: 10605.8 +/- 1.7 (jitter = 12.3) 7.251


----------------------------------------------------------------


total images/sec: 10587.67


----------------------------------------------------------------

Inception v3 BS64

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=inception3
Step    Img/sec total_loss


1 images/sec: 436.8 +/- 0.0 (jitter = 0.0) 7.276


10 images/sec: 437.9 +/- 1.2 (jitter = 0.8) 7.337


20 images/sec: 437.8 +/- 1.0 (jitter = 2.2) 7.269


30 images/sec: 437.9 +/- 0.8 (jitter = 2.2) 7.422


40 images/sec: 437.9 +/- 0.6 (jitter = 3.5) 7.299


50 images/sec: 438.6 +/- 0.6 (jitter = 4.1) 7.277


60 images/sec: 439.2 +/- 0.5 (jitter = 3.7) 7.363


70 images/sec: 439.5 +/- 0.5 (jitter = 4.8) 7.347


80 images/sec: 440.3 +/- 0.5 (jitter = 5.3) 7.410


90 images/sec: 440.3 +/- 0.5 (jitter = 5.2) 7.325


100 images/sec: 440.3 +/- 0.4 (jitter = 5.0) 7.346


----------------------------------------------------------------


total images/sec: 440.01


----------------------------------------------------------------

--use_fp16

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=inception3 --use_fp16
Step    Img/sec total_loss


1 images/sec: 901.5 +/- 0.0 (jitter = 0.0) 7.305


10 images/sec: 945.5 +/- 7.0 (jitter = 5.0) 7.354


20 images/sec: 945.6 +/- 4.9 (jitter = 7.1) 7.330


30 images/sec: 945.3 +/- 3.9 (jitter = 6.9) 7.382


40 images/sec: 946.3 +/- 3.2 (jitter = 7.3) 7.278


50 images/sec: 946.6 +/- 2.8 (jitter = 7.5) 7.373


60 images/sec: 946.3 +/- 2.5 (jitter = 7.6) 7.299


70 images/sec: 946.8 +/- 2.3 (jitter = 7.5) 7.323


80 images/sec: 946.5 +/- 2.1 (jitter = 7.6) 7.317


90 images/sec: 946.6 +/- 2.0 (jitter = 7.6) 7.357


100 images/sec: 947.2 +/- 1.8 (jitter = 7.3) 7.327


----------------------------------------------------------------


total images/sec: 946.03


----------------------------------------------------------------

VGG16 BS64

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=vgg16
Step    Img/sec total_loss


1 images/sec: 442.1 +/- 0.0 (jitter = 0.0) 7.321


10 images/sec: 442.4 +/- 0.1 (jitter = 0.4) 7.315


20 images/sec: 442.4 +/- 0.1 (jitter = 0.3) 7.269


30 images/sec: 442.4 +/- 0.0 (jitter = 0.2) 7.271


40 images/sec: 442.4 +/- 0.0 (jitter = 0.2) 7.282


50 images/sec: 442.4 +/- 0.0 (jitter = 0.2) 7.291


60 images/sec: 442.4 +/- 0.0 (jitter = 0.2) 7.250


70 images/sec: 442.4 +/- 0.1 (jitter = 0.2) 7.278


80 images/sec: 442.4 +/- 0.0 (jitter = 0.2) 7.274


90 images/sec: 442.4 +/- 0.0 (jitter = 0.2) 7.286


100 images/sec: 442.4 +/- 0.0 (jitter = 0.2) 7.283


----------------------------------------------------------------


total images/sec: 442.20


----------------------------------------------------------------

--use_fp16

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=vgg16 --use_fp16
Step    Img/sec total_loss


1 images/sec: 687.4 +/- 0.0 (jitter = 0.0) 7.279


10 images/sec: 688.2 +/- 0.2 (jitter = 0.5) 7.255


20 images/sec: 688.0 +/- 0.1 (jitter = 0.5) 7.283


30 images/sec: 688.0 +/- 0.1 (jitter = 0.7) 7.254


40 images/sec: 687.9 +/- 0.1 (jitter = 0.7) 7.283


50 images/sec: 687.8 +/- 0.1 (jitter = 0.7) 7.249


60 images/sec: 687.7 +/- 0.1 (jitter = 0.8) 7.294


70 images/sec: 687.6 +/- 0.1 (jitter = 0.9) 7.278


80 images/sec: 687.6 +/- 0.1 (jitter = 0.9) 7.268


90 images/sec: 687.7 +/- 0.1 (jitter = 0.9) 7.264


100 images/sec: 687.6 +/- 0.1 (jitter = 0.9) 7.268


----------------------------------------------------------------


total images/sec: 687.07


----------------------------------------------------------------


GoogLeNet BS128

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=128 --model=googlenet
Step    Img/sec total_loss


1 images/sec: 1577.4 +/- 0.0 (jitter = 0.0) 7.104


10 images/sec: 1565.9 +/- 4.1 (jitter = 12.5) 7.105


20 images/sec: 1561.7 +/- 3.1 (jitter = 20.4) 7.094


30 images/sec: 1562.3 +/- 2.5 (jitter = 15.1) 7.087


40 images/sec: 1561.5 +/- 2.2 (jitter = 16.1) 7.067


50 images/sec: 1561.6 +/- 2.0 (jitter = 15.6) 7.091


60 images/sec: 1561.5 +/- 1.8 (jitter = 15.7) 7.049


70 images/sec: 1560.3 +/- 1.9 (jitter = 15.3) 7.074


80 images/sec: 1558.8 +/- 1.9 (jitter = 17.2) 7.077


90 images/sec: 1558.2 +/- 1.8 (jitter = 17.2) 7.079


100 images/sec: 1557.5 +/- 1.8 (jitter = 17.6) 7.066


----------------------------------------------------------------


total images/sec: 1556.06


----------------------------------------------------------------

--use_fp16

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=128 --model=googlenet --use_fp16
Step    Img/sec total_loss


1 images/sec: 2690.1 +/- 0.0 (jitter = 0.0) 7.173


10 images/sec: 2675.3 +/- 13.9 (jitter = 35.5) 7.068


20 images/sec: 2682.4 +/- 9.9 (jitter = 55.4) 7.086


30 images/sec: 2686.6 +/- 8.3 (jitter = 36.6) 7.075


40 images/sec: 2687.8 +/- 6.9 (jitter = 30.6) 7.084


50 images/sec: 2686.7 +/- 6.0 (jitter = 36.4) 7.076


60 images/sec: 2687.5 +/- 5.4 (jitter = 36.4) 7.075


70 images/sec: 2681.0 +/- 6.8 (jitter = 41.6) 7.075


80 images/sec: 2683.2 +/- 6.1 (jitter = 34.0) 7.065


90 images/sec: 2684.1 +/- 5.6 (jitter = 35.6) 7.092


100 images/sec: 2683.9 +/- 5.2 (jitter = 36.1) 7.052


----------------------------------------------------------------


total images/sec: 2680.27


----------------------------------------------------------------

ResNet152 BS32

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=32 --model=resnet152
Step    Img/sec total_loss


1 images/sec: 225.6 +/- 0.0 (jitter = 0.0) 9.060


10 images/sec: 228.3 +/- 1.0 (jitter = 2.0) 8.594


20 images/sec: 228.3 +/- 0.6 (jitter = 2.0) 8.635


30 images/sec: 228.2 +/- 0.5 (jitter = 2.5) 8.719


40 images/sec: 227.9 +/- 0.5 (jitter = 2.8) 8.599


50 images/sec: 228.1 +/- 0.5 (jitter = 2.9) 8.791


60 images/sec: 228.3 +/- 0.4 (jitter = 3.6) 8.668


70 images/sec: 228.3 +/- 0.4 (jitter = 3.3) 9.072


80 images/sec: 228.3 +/- 0.4 (jitter = 3.5) 8.874


90 images/sec: 228.4 +/- 0.3 (jitter = 3.7) 9.030


100 images/sec: 228.4 +/- 0.3 (jitter = 3.7) 8.839


----------------------------------------------------------------


total images/sec: 228.29


----------------------------------------------------------------

--use_fp16

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=32 --model=resnet152 --use_fp16
Step    Img/sec total_loss


1 images/sec: 392.9 +/- 0.0 (jitter = 0.0) 9.147


10 images/sec: 397.9 +/- 2.8 (jitter = 6.0) 9.000


20 images/sec: 399.0 +/- 2.1 (jitter = 8.6) 8.842


30 images/sec: 393.7 +/- 2.9 (jitter = 14.7) 8.813


40 images/sec: 394.4 +/- 2.3 (jitter = 15.2) 8.984


50 images/sec: 394.9 +/- 2.0 (jitter = 13.9) 8.647


60 images/sec: 395.7 +/- 1.8 (jitter = 13.9) 8.838


70 images/sec: 396.5 +/- 1.6 (jitter = 15.3) 8.941


80 images/sec: 395.9 +/- 1.4 (jitter = 13.4) 8.913


90 images/sec: 396.2 +/- 1.3 (jitter = 14.1) 8.807


100 images/sec: 395.7 +/- 1.3 (jitter = 14.5) 8.729


----------------------------------------------------------------


total images/sec: 395.34


----------------------------------------------------------------

性能对比

A100 和V100 和 2080ti 性能对比:

https://www.tonyisstark.com/383.html