Centos 安装Python Scrapy PhantomJS

安装依赖:

  • yum install libxslt-devel libffi libffi-devel python-devel gcc openssl openssl-devel sqlite-devel

安装Python2.7或以上的版本(如果多版本共存则必须加prefix)

  • wget http://python.org/ftp/python/2.7.2/Python-2.7.2.tgz
  • tar xvf Python-2.7.3.tgz
  • cd Python-2.7.3
  • ./configure --prefix=/usr/local/python27
  • make && make install

安装setuptools和pip(可能需要添加PATH或者设置软链接)

  • wget -q http://peak.telecommunity.com/dist/ez_setup.py
  • python ez_setup.py
  • easy_install pip

或者

  • wget -q https://bootstrap.pypa.io/get-pip.py
  • python get-pip.py

或者

  • wget https://files.pythonhosted.org/packages/66/6d/dad0d39ce1cfa98ef3634463926e7324e342c956aecb066968e2e3696300/setuptools-30.0.0.tar.gz
  • tar -xvf setuptools-30.0.0.tar.gz
  • cd setuptools-30.0.0
  • python setup.py install
  • cd ..
  • wget https://files.pythonhosted.org/packages/5e/53/eaef47e5e2f75677c9de0737acc84b659b78a71c4086f424f55346a341b5/pip-9.0.0.tar.gz
  • tar -xvf pip-9.0.0.tar.gz
  • cd pip-9.0.0
  • python setup.py install

安装Twisted(可能需要添加PATH或者设置软链接)

  • easy_install Twisted
  • 可能Twisted版本过高或过低导致最后报错,可以用pip指定版本,多试几次
  • pip install twisted==12.5.0

安装w3lib

  • easy_install -U w3lib

安装lxml

  • easy_install lxml

安装pyOpenSSL

  • easy_install pyOpenSSL
  • 如果不行则手动下载安装
  • wget http://launchpadlibrarian.net/58498441/pyOpenSSL-0.11.tar.gz
  • tar zxvf pyOpenSSL-0.11.tar.gz
  • cd pyOpenSSL
  • python2.7 setup.py install

安装Scrapy(可能需要添加PATH或者设置软链接)

  • easy_install -U Scrapy

安装Selenium(如果需要解析动态网页)

  • pip install selenium

安装PhantomJS(如果需要解析动态网页)

  • wget https://bitbucket.org/ariya/phantomjs/downloads/phantomjs-2.1.1-linux-x86_64.tar.bz2
  • bzip2 -d phantomjs-2.1.1-linux-x86_64.tar.bz2
  • tar xvf phantomjs-2.1.1-linux-x86_64.tar -C /usr/local/
  • yum -y install wget fontconfig
  • mv /usr/local/phantomjs-2.1.1-linux-x86_64/ /usr/local/phantomjs
  • ln -s /usr/local/phantomjs/bin/phantomjs /usr/bin/

Scrapy测试

  • scrapy shell www.baidu.com

Selenium和PhantomJS测试

from selenium import webdriver
driver = webdriver.PhantomJS()
driver.get("http://hotel.qunar.com/")
data = driver.title
print data

参考文献:

http://www.cnblogs.com/xiaoruoen/archive/2013/02/27/2933854.html

http://blog.csdn.net/diaoruiqing/article/details/8700533

http://blog.csdn.net/liuxiao723846/article/details/51477266

http://www.linuxidc.com/Linux/2016-11/137668.htm

https://stackoverflow.com/questions/42731760/attributeerror-module-object-has-no-attribute-op-no-tlsv1-1/43220861

http://www.cnblogs.com/zengguowang/p/6911812.html

http://blog.csdn.net/feifeilyj/article/details/52678011

http://www.cnblogs.com/zzhzhao/p/5380376.html

http://www.cnblogs.com/luxiaojun/p/6144748.html