Tensorflow word2vec编译运行

  1. Word2vec 更完整版本(非demo)的代码在

    tensorflow/models/embedding/

  1. 首先需要安装bazel 来进行编译

    bazel可以下载最新的binary安装文件,这里下载0.1.0版本的bazel

    https://github.com/bazelbuild/bazel/releases/download/0.1.0/bazel-0.1.0-installer-linux-x86_64.sh

    貌似需要root安装

    sh bazel-0.1.0-installer-linux-x86_64.sh

  2. 编译word2vec

    参考README.md

    bazel build -c opt tensorflow/models/embedding:all

  3. 下载训练和验证数据

    wget http://mattmahoney.net/dc/text8.zip -O text8.gz

    gzip -d text8.gz -f

    wget https://word2vec.googlecode.com/svn/trunk/questions-words.txt

  4. 运行word2vec

pwd

/home/users/chenghuige/other/tensorflow/bazel-bin/tensorflow/models/embedding

执行命令

./word2vec_optimized --train_data ./data/text8 --eval_data ./data/questions-words.txt --save_path ./data/result/

I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 24

I tensorflow/core/common_runtime/direct_session.cc:60] Direct session inter op parallelism threads: 24

I tensorflow/models/embedding/word2vec_kernels.cc:149] Data file: ./data/text8 contains 100000000 bytes, 17005207 words, 253854 unique words, 71290 unique frequent words.

Data file: ./data/text8

Vocab size: 71290 + UNK

Words per epoch: 17005207

Eval analogy file: ./data/questions-words.txt

Questions: 17827

Skipped: 1717

Epoch 1 Step 151381: lr = 0.023 words/sec = 25300

Eval 1419/17827 accuracy = 8.0%

Epoch 2 Step 302768: lr = 0.022 words/sec = 48503

Eval 2445/17827 accuracy = 13.7%

Epoch 3 Step 454147: lr = 0.020 words/sec = 46666

Eval 3211/17827 accuracy = 18.0%

Epoch 4 Step 605540: lr = 0.018 words/sec = 53928

Eval 3608/17827 accuracy = 20.2%

Epoch 5 Step 756907: lr = 0.017 words/sec = 81255

Eval 4081/17827 accuracy = 22.9%

Epoch 6 Step 908251: lr = 0.015 words/sec = 46954