ubuntu16.04下安装配置深度学习环境，Ubuntu 16.04/16.10+ cuda7.5/8+cudnn4/5+caffe

2023-11-13 07:35•Java教程•阅读 1045

主要参照以下两篇博文：http://blog.csdn.net/g0m3e/article/details/51420565 http://blog.csdn.net/xuzhongxiong/article/details/52717285

我先做个说明，我曾经在两种环境下搭建过，下面说一下软硬件配置。

1）y480笔记本，GPU为GT650，软件环境为ubuntu16.04+cuda7.5+cudnn v4，后来因为编译caffe的时候报了一个包含“computer_64”的错，就把cuda换成8了，以（1）的配置安装为例

2）thinkstation p510+gtx1080，因为Ubuntu16.04装带有1080显卡的时候一点击安装就黑屏提示信号超出范围，所以选择了16.10，这个装的时候会鼠标失灵，用键盘装完，装上1080驱动就好了，另外要说明的是Ubuntu16.10的gcc版本太高为6，要为他降级为5，然后安装过程和在16.04下一样，另一个特别重要的是1080的GPU用cudnn一定要用v5的，否则runtest的时候会报错。

1.下载所需要的软件

cuda7.5下载（8自己去百度搜或者官网下载），cudnn4.0下载（切记1080显卡用v5版本）

2.安装NVIDIA驱动。

一般有两种方法：1）一种方法是利用“软件和更新”来安装，依次选择系统设置->软件和更新->附加驱动->选择最新的驱动->应用更改

安装时可能遇到的问题：点击完应用更改一段时间后并没有成功安装，再次点击却出现闪退的现象，这个问题困扰了我一晚上，最后发现是因为依赖的问题，通过在终端输入以下命令：sudo apt-get install -f sudo apt-get update后再次安装问题就解决了

2）方法二就是下载安装包后通过命令行安装，因为这个比较麻烦，我没有尝试，看网上其他教程说需要关了xwindows安装才行。

3.安装cuda和cudnn

（1）在终端cd到所下载的安装包所在的目录，输入sh cuda_7.5.18_linux.run --override

跑起来后一路空格完那些协议，然后输入accept，除了有一个是让安装驱动的选择N外，其他的一路Y下去

（2）安装cudnn（这个是GPU加速用的）

解压下载好的安装包，在终端输入以下命令：

sudo cp cudnn.h /usr/local/cuda/include/

cd ~/cuda/lib64

sudo cp lib* /usr/local/cuda/lib64/

cd /usr/local/cuda/lib64/

sudo rm -rf libcudnn.so libcudnn.so.4

sudo ln -s libcudnn.so.4.0.7 libcudnn.so.4

sudo ln -s libcudnn.so.4 libcudnn.so

然后设置环境变量

sudo gedit /etc/profile

在末尾加入

export PATH=/usr/local/cuda/bin:$PATH

export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

保存之后创建链接文件

sudo vim /etc/ld.so.conf.d/cuda.conf

键盘按i进入编辑状态，添加文字

/usr/local/cuda/lib64

然后按esc，输入:wq保存退出。

终端下接着输入

sudo ldconfig 使链接生效

（注意：如果安装的cuda8，要把以上路径中的cuda变成cuda-8）

3.生成Cuda Sample测试

（1）首先在此之前先把需要的依赖包都安装好，为接下来make caffe做准备

sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev protobuf-compiler

sudo apt-get install --no-install-recommends libboost-all-dev

sudo apt-get install libatlas-base-dev

sudo apt-get install libgflags-dev libgoogle-glog-dev liblmdb-dev

（2）更改gcc版本（我一开始没有更改，直接make没有报错，但make玩后测试出错，所以这里最好是改一下，如果报报错“unsupported GNU version! gcc versions later than 4.9 are not supported!”错误，那就一定得改了）原因就是这个cuda不支持gcc5.0以上

解决一：

cd /usr/local/cuda-7.5/include

cp host_config.h host_config.h.bak

sudo gedit host_config.h

Ctrl+F寻找有”4.9”的地方，应该是只有一处，在其上方的

#if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ > 9)将两个4改成5，保存退出，继续

解决二：

方案就是给gcc降级为4.8，具体做法参照http://blog.csdn.net/linzhaolover/article/details/45023361（注意此处降级后，在编译caffe的时候要再次升级为5，否则编译报错）

（3）正式开始make example了

cd 切换到 /home/gomee/NVIDIA_CUDA-8.0_Samples

终端输入 make all -j4 (j4代表开多少个线程，一般你的电脑是几核的就开几个)

这就应该开始make了，此处大约有4,5分钟。完成之后

cd /home/gomee/NVIDIA_CUDA-7.5_Samples/bin/x86_64/linux/realease

./deviceQuery

如果出现如下信息

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GT 650M"

CUDA Driver Version / Runtime Version 8.0 / 7.5

CUDA Capability Major/Minor version number: 3.0

Total amount of global memory: 1999 MBytes (2096300032 bytes)

( 2) Multiprocessors, (192) CUDA Cores/MP: 384 CUDA Cores

GPU Max Clock rate: 885 MHz (0.88 GHz)

Memory Clock rate: 2000 Mhz

Memory Bus Width: 128-bit

L2 Cache Size: 262144 bytes

Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)

Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers

Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 49152 bytes

Total number of registers available per block: 65536

Warp size: 32

Maximum number of threads per multiprocessor: 2048

Maximum number of threads per block: 1024

Max dimension size of a thread block (x,y,z): (1024, 1024, 64)

Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)

Maximum memory pitch: 2147483647 bytes

Texture alignment: 512 bytes

Concurrent copy and kernel execution: Yes with 1 copy engine(s)

Run time limit on kernels: Yes

Integrated GPU sharing Host Memory: No

Support host page-locked memory mapping: Yes

Alignment requirement for Surfaces: Yes

Device has ECC support: Disabled

Device supports Unified Addressing (UVA): Yes

Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0

Compute Mode:

< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 7.5, NumDevs = 1, Device0 = GeForce GT 650M

Result = PASS

证明cuda安装成功。

4.caffe的安装

(1)下载caffe安装包到https://github.com/BVLC/caffe里下载

(2)用unzip命令解压

(3)Python的配置

sudo apt-get install python-pip 安装pip

sudo apt-get install python-numpy python-scipy python-matplotlib ipython ipython-notebook python-pandas python-sympy python-nose

cd 到你解压caffe下的python目录下

sudo su

for req in $(cat requirements.txt); do pip install $req; done（可使用清华大学的源提高下载速度for req in $(cat requirements.txt); do pip install -i https://pypi.tuna.tsinghua.edu.cn/simple $req; done）

（4）opencv的安装

这个可以安装也可以不安装，我首次安装caffe的时候并没有安装这个也成功运行了，后来又装上了，这个库是视觉库，也就是你要处理图片时应该是要使用这个库的。下面写一下opencv的安装过程

从官网(http://opencv.org/downloads.html)下载OpenCV,并将其解压到你要安装的位置，假设解压到了/home/opencv。

安装前准备，创建编译文件夹：

cd ~/opencv
mkdir build
cd build

配置：

cmake -D CMAKE_BUILD_TYPE=Release -D CMAKE_INSTALL_PREFIX=/usr/local ..

编译：

make -j4

以上只是将opencv编译成功，还没将opencv安装，需要运行下面指令进行安装：

sudo make install

安装时可能遇到下面这个错误：


fata error: LAPACKE_H_PATH-NOTFOUND when building OpenCV 3.2


解决方案：sudo apt-get install liblapacke-dev checkinstall

（5）配置caffe

（1）目录切换到caffe-master（你解压的caffe安装包目录）下输入一下命令：

cp Makefile.config.example Makefile.config

gedit Makefile.config

将USE_CUDNN := 1 取消注释，在

INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include后面打上一个空格然后添加/usr/include/hdf5/serial 如果没有这一句可能会

若使用了opencv并且版本是3的，则

将
#OPENCV_VERSION := 3 
修改为： 
OPENCV_VERSION := 3


注：如果安装了opencv并修改了这个，在make all 和 make test之后执行make runtest 的时候可能会出现如下的错误


libopencv_shape.so.3.0: cannot open shared object file: No such file or directory 


解决笔记 :


进入目录：/etc/ld.so.conf.d


创建：OpenCV.conf


添加：/opt/opencv-3.0.0/build/lib


执行：ldconfig

(2)打开Makefile并编辑

搜索并替换

NVCCFLAGS += -ccbin=$(CXX) -Xcompiler -fPIC $(COMMON_FLAGS)

为

NVCCFLAGS += -D_FORCE_INLINES -ccbin=$(CXX) -Xcompiler -fPIC $(COMMON_FLAGS)

保存退出

（3）编辑/usr/local/cuda/include/host_config.h

将其中的第115行注释掉：

将
#error-- unsupported GNU version! gcc versions later than 4.9 are not supported!
改为
//#error-- unsupported GNU version! gcc versions later than 4.9 are not supported!


（4）再次切换到caffe目录下执行如下命令


make all -j4


make test -j4


make runtest


如果执行后没有报错并在执行runtest的时候终端出现类似如下的代码就是配置成功了


[----------] 10 tests from EltwiseLayerTest/2 (408 ms total)


[----------] 6 tests from CuDNNConvolutionLayerTest/1, where TypeParam = double


[ RUN      ] CuDNNConvolutionLayerTest/1.TestSimpleConvolutionCuDNN


[       OK ] CuDNNConvolutionLayerTest/1.TestSimpleConvolutionCuDNN (2 ms)


[ RUN      ] CuDNNConvolutionLayerTest/1.TestSobelConvolutionCuDNN


[       OK ] CuDNNConvolutionLayerTest/1.TestSobelConvolutionCuDNN (2 ms)


[ RUN      ] CuDNNConvolutionLayerTest/1.TestGradientGroupCuDNN


[       OK ] CuDNNConvolutionLayerTest/1.TestGradientGroupCuDNN (529 ms)


[ RUN      ] CuDNNConvolutionLayerTest/1.TestSetupCuDNN


[       OK ] CuDNNConvolutionLayerTest/1.TestSetupCuDNN (3 ms)


[ RUN      ] CuDNNConvolutionLayerTest/1.TestGradientCuDNN


[       OK ] CuDNNConvolutionLayerTest/1.TestGradientCuDNN (1448 ms)


[ RUN      ] CuDNNConvolutionLayerTest/1.TestSimpleConvolutionGroupCuDNN


[       OK ] CuDNNConvolutionLayerTest/1.TestSimpleConvolutionGroupCuDNN (2 ms)


[----------] 6 tests from CuDNNConvolutionLayerTest/1 (1986 ms total)


*******************************************************************************************************************************


1.安装时坑爹的过程：先是驱动怎么也安装不了  后来发现是软件依赖问题  需要执行 sudo apt-get install -f  最好也更新一下软件  执行  sudo apt-get update


2.在cuda和Python等都安装好后编译caffe的时候报了一个包含什么“computer_64”还有什么“nvcc fetal”啥的错，然后各种百度谷歌没找到解决方案，我突然注意到是nvcc报的错，那不就是


cuda的问题吗？但是cuda  make example时没有问题啊，我想是不是n卡驱动太新了，cuda7.5不能支持啊，然后卸载了他装了个cuda8，重新配置一遍果然这个错误解决了但是又报cudnn的错误


坑爹啊，都快郁闷了。哈，纠结了好长时间发现配置cudnn的时候cuda变成cuda-8.0了，那就是cudnn没配成功呗。问题终于解决了，编译也成功了。实践证明不能一味的照着教程敲命令啊，有时候


需要知道这个命令到底是干嘛的。

此时只能在caffe-master/python下打开Python import caffe才不报错。

在Ubuntu环境下，打开python解释程序，输入import caffe时：出现以下错误

>>>import caffe

Traceback (most recent call last):

File "<stdin>", line 1, in <module>

ImportError: No module named caffe

基本思路是把caffe中的python导入到解释器中

解决方法

第一种方法：设置环境变量

在终中输入：

export PYTHONPATH=~/下载/caffe/python #caffe的路径下面的python

则该终端起作用，关掉终端后或重新打开一终端，则失效。

放到配置文件中，可以永久有效果，命令操作如下：

A.把环境变量路径放到 ~/.bashrc文件中

sudo echo export PYTHONPATH="~/下载/caffe-master/python" >> ~/.bashrc

B.使环境变量生效

source ~/.bashrc

第二种方法：通过代码来实现

在每个python代码中使用以下代码： (这个方法在写python代码时有用)

caffe_root = '~/下载/caffe-master/python '

import sys

sys.path.insert(0, caffe_root + 'python')

import caffe

上一篇 »Ubuntu16.04下安装vim8，并支持python3
下一篇 »Ubuntu16.04 安装 CUDA、CUDNN、OpenCV 并用 Anaconda 配置 Tensorflow 和 Caffe 详细过程

ubuntu16.04下安装配置深度学习环境，Ubuntu 16.04/16.10+ cuda7.5/8+cudnn4/5+caffe

解决方法

相关推荐

Ubuntu16.04安装及配置nginx

ubuntu16.04+anaconda+tensorflow-gpu1.8.0+keras+pytorch，caffe2

在Ubuntu 16.04安装 Let’s Encrypt并配置ssl

Ubuntu 16.04下开启Mysql 3306端口远程访问

「一文足以系列」「亲测好用」Ubuntu 16.04 tensorflow安装

ubuntu 16.04 vscode + php debug

Ubuntu深度学习环境搭建 tensorflow+pytorch

ubuntu16.04下g++安装及使用