错误:基于tensorflow识别mnist数据集出现ResourceExhaustedError ,see above for traceback: OOM when allocating tensor with shape[10000,32,28,28] and type float on

错误:最近,在尝试运行我以前博客代码的时候出现了如下错误

2020-04-03 10:53:22.982491: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 957.03MiB.  Current allocation summary follows.
2020-04-03 10:53:22.982951: I tensorflow/core/common_runtime/bfc_allocator.cc:610] Bin (256):   Total Chunks: 32, Chunks in use: 32. 8.0KiB allocated for chunks. 8.0KiB in use in bin. 2.1KiB client-requested in use in bin.
2020-04-03 10:53:23.028460: I tensorflow/core/common_runtime/bfc_allocator.cc:610] Bin (512):   Total Chunks: 1, Chunks in use: 0. 768B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-04-03 10:53:23.029622: I tensorflow/core/common_runtime/bfc_allocator.cc:610] Bin (1024):  Total Chunks: 1, Chunks in use: 1. 1.3KiB allocated for chunks. 1.3KiB in use in bin. 1.0KiB client-requested in use in bin.
2020-04-03 10:53:23.030901: I tensorflow/core/common_runtime/bfc_allocator.cc:610] Bin (2048):  Total Chunks: 5, Chunks in use: 5. 16.3KiB allocated for chunks. 16.3KiB in use in bin. 15.6KiB client-requested in use in bin.
2020-04-03 10:53:23.032178: I tensorflow/core/common_runtime/bfc_allocator.cc:610] Bin (4096):  Total Chunks: 5, Chunks in use: 5. 20.0KiB allocated for chunks. 20.0KiB in use in bin. 20.0KiB client-requested in use in bin.
2020-04-03 10:53:23.034338: I tensorflow/core/common_runtime/bfc_allocator.cc:610] Bin (8192):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-04-03 10:53:23.035662: I tensorflow/core/common_runtime/bfc_allocator.cc:610] Bin (16384):         Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-04-03 10:53:23.036993: I tensorflow/core/common_runtime/bfc_allocator.cc:610] Bin (32768):         Total Chunks: 5, Chunks in use: 4. 196.8KiB allocated for chunks. 160.0KiB in use in bin. 160.0KiB client-requested in use in bin.
2020-04-03 10:53:23.038338: I tensorflow/core/common_runtime/bfc_allocator.cc:610] Bin (65536):         Total Chunks: 2, Chunks in use: 1. 160.0KiB allocated for chunks. 78.3KiB in use in bin. 78.1KiB client-requested in use in bin.
2020-04-03 10:53:23.038780: I tensorflow/core/common_runtime/bfc_allocator.cc:610] Bin (131072):        Total Chunks: 3, Chunks in use: 3. 600.0KiB allocated for chunks. 600.0KiB in use in bin. 600.0KiB client-requested in use in bin.
2020-04-03 10:53:23.039219: I tensorflow/core/common_runtime/bfc_allocator.cc:610] Bin (262144):        Total Chunks: 1, Chunks in use: 1. 312.3KiB allocated for chunks. 312.3KiB in use in bin. 200.0KiB client-requested in use in bin.
2020-04-03 10:53:23.039651: I tensorflow/core/common_runtime/bfc_allocator.cc:610] Bin (524288):        Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-04-03 10:53:23.040041: I tensorflow/core/common_runtime/bfc_allocator.cc:610] Bin (1048576):       Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-04-03 10:53:23.040437: I tensorflow/core/common_runtime/bfc_allocator.cc:610] Bin (2097152):       Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-04-03 10:53:23.040827: I tensorflow/core/common_runtime/bfc_allocator.cc:610] Bin (4194304):       Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-04-03 10:53:23.041222: I tensorflow/core/common_runtime/bfc_allocator.cc:610] Bin (8388608):       Total Chunks: 2, Chunks in use: 2. 28.05MiB allocated for chunks. 28.05MiB in use in bin. 24.50MiB client-requested in use in bin.
2020-04-03 10:53:23.041652: I tensorflow/core/common_runtime/bfc_allocator.cc:610] Bin (16777216):      Total Chunks: 3, Chunks in use: 2. 65.57MiB allocated for chunks. 49.57MiB in use in bin. 42.16MiB client-requested in use in bin.
2020-04-03 10:53:23.042091: I tensorflow/core/common_runtime/bfc_allocator.cc:610] Bin (33554432):      Total Chunks: 1, Chunks in use: 0. 34.09MiB allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-04-03 10:53:23.042648: I tensorflow/core/common_runtime/bfc_allocator.cc:610] Bin (67108864):      Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-04-03 10:53:23.043182: I tensorflow/core/common_runtime/bfc_allocator.cc:610] Bin (134217728):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-04-03 10:53:23.043540: I tensorflow/core/common_runtime/bfc_allocator.cc:610] Bin (268435456):     Total Chunks: 1, Chunks in use: 1. 1.00GiB allocated for chunks. 1.00GiB in use in bin. 957.03MiB client-requested in use in bin.
2020-04-03 10:53:23.043879: I tensorflow/core/common_runtime/bfc_allocator.cc:626] Bin for 957.03MiB was 256.00MiB, Chunk State: 
2020-04-03 10:53:23.058396: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 0000000501980000 of size 1280
2020-04-03 10:53:23.058594: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 0000000501980500 of size 256
2020-04-03 10:53:23.058768: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 0000000501980600 of size 256
2020-04-03 10:53:23.058940: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 0000000501980700 of size 256
2020-04-03 10:53:23.059116: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 0000000501980800 of size 256
2020-04-03 10:53:23.059290: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 0000000501980900 of size 4096
2020-04-03 10:53:23.059467: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 0000000501981900 of size 256
2020-04-03 10:53:23.059641: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 0000000501981A00 of size 256
2020-04-03 10:53:23.059819: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 0000000501981B00 of size 256
2020-04-03 10:53:23.060000: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 0000000501981C00 of size 3328
2020-04-03 10:53:23.060177: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 0000000501982900 of size 256
2020-04-03 10:53:23.060351: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 0000000501982A00 of size 256
2020-04-03 10:53:23.060529: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 0000000501982B00 of size 256
2020-04-03 10:53:23.060702: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 0000000501982C00 of size 256
2020-04-03 10:53:23.060878: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 0000000501982D00 of size 204800
2020-04-03 10:53:23.061060: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 00000005019B4D00 of size 4096
2020-04-03 10:53:23.061243: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 00000005019B5D00 of size 40960
2020-04-03 10:53:23.061422: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 00000005019BFD00 of size 256
2020-04-03 10:53:23.061606: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 00000005019BFE00 of size 256
2020-04-03 10:53:23.061783: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 00000005019BFF00 of size 256
2020-04-03 10:53:23.061960: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 00000005019C0000 of size 256
2020-04-03 10:53:23.062136: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 00000005019C0100 of size 256
2020-04-03 10:53:23.063085: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 00000005019C0200 of size 256
2020-04-03 10:53:23.063489: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 00000005019C0300 of size 256
2020-04-03 10:53:23.063677: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 00000005019C0400 of size 256
2020-04-03 10:53:23.063872: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 00000005019C0500 of size 256
2020-04-03 10:53:23.064045: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 00000005019C0600 of size 256
2020-04-03 10:53:23.064220: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Free  at 00000005019C0700 of size 768
2020-04-03 10:53:23.064391: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 00000005019C0A00 of size 256
2020-04-03 10:53:23.064562: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 00000005019C0B00 of size 40960
2020-04-03 10:53:23.064736: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 00000005019CAB00 of size 80128
2020-04-03 10:53:23.064910: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Free  at 00000005019DE400 of size 83712
2020-04-03 10:53:23.065086: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 00000005019F2B00 of size 256
2020-04-03 10:53:23.065260: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 00000005019F2C00 of size 4096
2020-04-03 10:53:23.065436: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 00000005019F3C00 of size 3328
2020-04-03 10:53:23.065611: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Free  at 00000005019F4900 of size 37632
2020-04-03 10:53:23.065788: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 00000005019FDC00 of size 256
2020-04-03 10:53:23.065960: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 00000005019FDD00 of size 256
2020-04-03 10:53:23.066133: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 00000005019FDE00 of size 256
2020-04-03 10:53:23.066305: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 00000005019FDF00 of size 3328
2020-04-03 10:53:23.066480: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 00000005019FEC00 of size 3328
2020-04-03 10:53:23.066656: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 00000005019FF900 of size 256
2020-04-03 10:53:23.066828: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 00000005019FFA00 of size 256
2020-04-03 10:53:23.067000: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 00000005019FFB00 of size 256
2020-04-03 10:53:23.067175: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 00000005019FFC00 of size 256
2020-04-03 10:53:23.067350: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 00000005019FFD00 of size 256
2020-04-03 10:53:23.067528: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 00000005019FFE00 of size 256
2020-04-03 10:53:23.067703: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 00000005019FFF00 of size 204800
2020-04-03 10:53:23.067883: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 0000000501A31F00 of size 319744
2020-04-03 10:53:23.068063: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Free  at 0000000501A80000 of size 16777216
2020-04-03 10:53:23.068247: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 0000000502A80000 of size 3328
2020-04-03 10:53:23.068426: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 0000000502A80D00 of size 204800
2020-04-03 10:53:23.068605: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 0000000502AB2D00 of size 16569088
2020-04-03 10:53:23.068788: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 0000000503A80000 of size 4096
2020-04-03 10:53:23.068965: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 0000000503A81000 of size 4096
2020-04-03 10:53:23.069142: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 0000000503A82000 of size 40960
2020-04-03 10:53:23.069323: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 0000000503A8C000 of size 40960
2020-04-03 10:53:23.073979: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 0000000503A96000 of size 12845056
2020-04-03 10:53:23.074323: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 00000005046D6000 of size 20619264
2020-04-03 10:53:23.074539: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 0000000505F80000 of size 31360000
2020-04-03 10:53:23.074746: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Free  at 0000000507D68400 of size 35748864
2020-04-03 10:53:23.074955: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Chunk at 0000000509F80000 of size 1073741824
2020-04-03 10:53:23.075162: I tensorflow/core/common_runtime/bfc_allocator.cc:651]      Summary of in-use Chunks by size: 
2020-04-03 10:53:23.075366: I tensorflow/core/common_runtime/bfc_allocator.cc:654] 32 Chunks of size 256 totalling 8.0KiB
2020-04-03 10:53:23.075566: I tensorflow/core/common_runtime/bfc_allocator.cc:654] 1 Chunks of size 1280 totalling 1.3KiB
2020-04-03 10:53:23.075768: I tensorflow/core/common_runtime/bfc_allocator.cc:654] 5 Chunks of size 3328 totalling 16.3KiB
2020-04-03 10:53:23.075970: I tensorflow/core/common_runtime/bfc_allocator.cc:654] 5 Chunks of size 4096 totalling 20.0KiB
2020-04-03 10:53:23.076171: I tensorflow/core/common_runtime/bfc_allocator.cc:654] 4 Chunks of size 40960 totalling 160.0KiB
2020-04-03 10:53:23.076374: I tensorflow/core/common_runtime/bfc_allocator.cc:654] 1 Chunks of size 80128 totalling 78.3KiB
2020-04-03 10:53:23.076581: I tensorflow/core/common_runtime/bfc_allocator.cc:654] 3 Chunks of size 204800 totalling 600.0KiB
2020-04-03 10:53:23.076787: I tensorflow/core/common_runtime/bfc_allocator.cc:654] 1 Chunks of size 319744 totalling 312.3KiB
2020-04-03 10:53:23.076993: I tensorflow/core/common_runtime/bfc_allocator.cc:654] 1 Chunks of size 12845056 totalling 12.25MiB
2020-04-03 10:53:23.077201: I tensorflow/core/common_runtime/bfc_allocator.cc:654] 1 Chunks of size 16569088 totalling 15.80MiB
2020-04-03 10:53:23.077417: I tensorflow/core/common_runtime/bfc_allocator.cc:654] 1 Chunks of size 20619264 totalling 19.66MiB
2020-04-03 10:53:23.077628: I tensorflow/core/common_runtime/bfc_allocator.cc:654] 1 Chunks of size 31360000 totalling 29.91MiB
2020-04-03 10:53:23.077839: I tensorflow/core/common_runtime/bfc_allocator.cc:654] 1 Chunks of size 1073741824 totalling 1.00GiB
2020-04-03 10:53:23.078051: I tensorflow/core/common_runtime/bfc_allocator.cc:658] Sum Total of in-use chunks: 1.08GiB
2020-04-03 10:53:23.078259: I tensorflow/core/common_runtime/bfc_allocator.cc:660] Stats: 
Limit:                  1468615884
InUse:                  1156359936
MaxInUse:               1156760064
NumAllocs:                   77310
MaxAllocSize:           1073741824

2020-04-03 10:53:23.078695: W tensorflow/core/common_runtime/bfc_allocator.cc:275] *********__************************************************************************************xxxxx
2020-04-03 10:53:23.079560: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at conv_ops.cc:693 : Resource exhausted: OOM when allocating tensor with shape[10000,32,28,28] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
  File "E:\Users\anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1292, in _do_call
    return fn(*args)
  File "E:\Users\anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1277, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "E:\Users\anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1367, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[10000,32,28,28] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
         [[{{node Conv2D}} = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Reshape, Variable/read)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[10000,32,28,28] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
         [[{{node Conv2D}} = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Reshape, Variable/read)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

代码链接:卷积神经网络CNN识别MNIST数据集,这里面的代码是使用CPU进行训练的,而我这里是采用GPU进行训练的。

报错代码:

# 训练结束后报告在测试集上的准确率
print("test accuracy %g" % accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))

错误原因:本机环境是基于tensorflow-gpu版本的,在前几次训练都是没有问题的,然后在最近因为某些原因重新安装了tensorflow,然后就导致出现这个问题,我最近也一直疑惑为啥我重新安装过后就会出现这种错误呢,查找资料过后只能暂定为运行的时候显存占用过多导致出现这个错误,但是为啥前几次就没有出现这个错误呢?希望后面知识积累多了就能解决这个问题了,现在先记录下来。

解决办法:

1、batchsize太大,这种只需要将batchsize减小就行了,这就是自身代码的问题,导致GPU内存不够用,这个只能自查。
2、GPU的显存太小,或者剩余的显存太少了,通过nvidia-smi命令查看占用GPU的进程,然后把进程kill掉。

在这里的话的解决办法就是参考这个stackoverflow

将上述代码替换成如下代码:

for i in range(10):
    testSet = mnist.test.next_batch(50)
    print("test accuracy %g"%accuracy.eval(feed_dict={ x: testSet[0], y_: testSet[1], keep_prob: 1.0}))

或者如下代码:

accuracy_sum = tf.reduce_sum(tf.cast(correct_prediction, tf.float32))
good = 0
total = 0
for i in range(10):
    testSet = mnist.test.next_batch(50)
    good += accuracy_sum.eval(feed_dict={ x: testSet[0], y_: testSet[1], keep_prob: 1.0})
    total += testSet[0].shape[0]
print("test accuracy %g"%(good/total))

通过观察代码得知,这里的解决方法本质上就是通过降低batchsize的大小来解决这个错误。