Google Tensorflow 源码编译,三:tensorflow

这几天终于把tensorflow安装上了,中间遇到过不少的问题,这里记录下来。供大家想源码安装的参考。

安装环境:POWER8处理器,Docker容器Ubuntu14.04镜像。

Build Tensorflow for IBM POWER8 CPU from Source Code

1. My os environment

  14.04.1-Ubuntu SMP

  ppc64le

  gcc 4.8.4

  python 2.7.6

2. Install bazel and protobuf

  I only have openjdk-7. so I installed bazel 0.1.0, and bazel 0.1.0 needs protobuf v3.0.0-alpha-3, you can refer to “Build Bazel<v0.1.0> for IBM POWER8 CPU from Source Code" for the installation.

3. Install other dependencies

  sudo apt-get install python-pip python-dev python-numpy

  sudo apt-get install swig

4. get source code

  git clone --recurse-submodules https://github.com/tensorflow/tensorflow

5. modify ~/.bazelrc

  add build options #you can visit http://bazel.io/docs/bazel-user-manual.html to find these options' descriptions

  to build in standalone : --spawn_strategy=standalone --genrule_strategy=standalone

  to limit cpu and ram usage : --jobs=20 --ram_utilization_factor percentage=30

6. build source code

  ./configure (select GPU or CPU)

  bazel build -c opt //tensorflow/cc:tutorials_example_trainer

7. Create the pip package and install

7.1 generate tensorflow whl package

  if you wan to use tensorflow in python, a pip package should be created

  $ bazel build -c opt //tensorflow/tools/pip_package:build_pip_package

  # or build with GPU support:

  $ bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

  after a night, a message displayed:

  Target //tensorflow/tools/pip_package:build_pip_package up-to-date:

  bazel-bin/tensorflow/tools/pip_package/build_pip_package

  INFO: Elapsed time: 32556.820s, Critical Path: 31793.39s

  bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

7.2 tensorflow whl package path

  opuser@nova:~/tensorflow/tensorflow$ ls /tmp/tensorflow_pkg/

  tensorflow-0.5.0-cp27-none-linux_ppc64le.whl

7.3 install whl package using pip

  opuser@nova:~/tensorflow/tensorflow$ sudo pip install /tmp/tensorflow_pkg/tensorflow-0.5.0-cp27-none-linux_ppc64le.whl

7.4 tensflow installed package path

  opuser@nova:~/tensorflow/tensorflow/tensorflow/models/image/mnist$ ls /usr/local/lib/python2.7/dist-packages

  tensorflow tensorflow-0.5.0.dist-info

7.5 train a mnist dataset(#sudo is needed)

  # You can alternatively pass the path to the model program file to the python interpreter.

  opuser@nova:~$ sudo python /usr/local/lib/python2.7/dist-packages/tensorflow/models/image/mnist/convolutional.py

  Succesfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.

  Succesfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.

  Succesfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.

  Succesfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.

  Extracting data/train-images-idx3-ubyte.gz

  Extracting data/train-labels-idx1-ubyte.gz

  Extracting data/t10k-images-idx3-ubyte.gz

  Extracting data/t10k-labels-idx1-ubyte.gz

  can't determine number of CPU cores: assuming 4

  I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 4

  can't determine number of CPU cores: assuming 4

  I tensorflow/core/common_runtime/direct_session.cc:60] Direct session inter op parallelism threads: 4

  Initialized!

  Epoch 0.00

  Minibatch loss: 12.054, learning rate: 0.010000

  Minibatch error: 90.6%

  Validation error: 84.6%

  Minibatch loss: 3.289, learning rate: 0.010000

  ......

8. problems during compiling

<Error: gcc: internal compiler error: Killed, com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 4.

>

  This is due to the lack of cpu ram or swap. you can modify --jobs value or --ram_utilization_factor value . or check if there is any process that occupies large ram. and kill it. It happends to me that there may exist two bazel servers. so I need to kill one.

9. reference

tensorflow/tensorflow/g3doc/get_started/os_setup.md

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/get_started/os_setup.md

bazel-user-manual.html

http://bazel.io/docs/bazel-user-manual.html

cuda or cudnn version dismatch

https://github.com/tensorflow/tensorflow/issues/125