centos6下安装部署hadoop2.2

环境准备

1、操作系统:centos6.0 64位

2、hadoop版本:hahadoop-2.2.0

安装和配置步骤具体如下:

1、主机和ip分配如下

ip地址 主机名 用途

192.168.1.112 hadoop1 namenode

192.168.1.113 hadoop2 datanode

192.168.1.114 hadoop3 datanode

192.168.1.115 hadoop4 datanode

2、四台主机主机名修改如下,仅以hadoop1为例

1) [root@hadoop1 ~]# hostname hadoop1

2) [root@hadoop1 ~]# vi /etc/sysconfig/network,修改hostname属性值保存

3、四台主机都需要安装jdk

4、关闭防火墙,切换到root用户 执行如下命令 chkconfig iptables off

5、每台主机配置/etc/hosts 增加ip地址解析

hadoop1 192.168.1.112

hadoop2 192.168.1.113

hadoop3 192.168.1.114

hadoop4 192.168.1.115

5、配置namenode无密码访问datanode

1) 在namenode机器上,在hadoop用户下执行下面命令

ssh-keygen -t rsa

遇到所有选项回车默认值即可

2) 导入公钥到本机认证文件

cat ~/.ssh/id_rsa.pub>>~/.ssh/authorized_keys

3) 导入公钥到其他datanode节点认证文件

参考 http://www.cnblogs.com/zhwl/p/3664178.html

scp ~/.ssh/authorized_keys hadoop@192.168.1.113:/home/hadoop/.ssh/authorized_keys

scp ~/.ssh/authorized_keys hadoop@192.168.1.114:/home/hadoop/.ssh/authorized_keys

scp ~/.ssh/authorized_keys hadoop@192.168.1.115:/home/hadoop/.ssh/authorized_keys

以上过程由于是第一次传输访问,系统会提示输入hadoop用户的密码,输入密码即可。

4) 验证是否能够无密码登陆到datanode节点

[hadoop@hadoop1 ~]$ ssh 192.168.1.113

如果没有密码提示直接登陆显示如下

[hadoop@hadoop2 ~]$

无密码登陆验证通过,访问其他datanode节点类似

6、安装hadoop2.2

1)解压缩hadoop-2.2.0.tar.gz

tar -zxf hadoop-2.2.0.tar.gz

默认解压缩到当前目录下面,这里解压缩/home/hadoop/目录下面

2) 修改hadoop配置文件

打开hadoop-2.2.0/etc/hadoop,修改里面的配置文件

a) hadoop-env.sh,找到里面的JAVA_HOME,修改为实际地址

b) yarn-env.sh ,同样找到里面的JAVA_HOME,修改为实际路径

c) slave 这个文件配置所有datanode节点,以便namenode搜索,本例配置如下

hadoop2

hadoop3

hadoop4

d) core-site.xml

<configuration>

<property>

<name>fs.defaultFS</name>

<value>hdfs://hadoop1:9000</value>

</property>

<property>

<name>io.file.buffer.size</name>

<value>131072</value>

</property>

<property>

<name>hadoop.tmp.dir</name>

<value>/home/hadoop/hadoopp-2.2.0/mytmp</value>

<description>A base for other temporarydirectories.</description>

</property>

<property>

<name>hadoop.proxyuser.root.hosts</name>

<value>hadoop1</value>

</property>

<property>

<name>hadoop.proxyuser.root.groups</name>

<value>*</value>

</property>

</configuration>

e) hdfs-site.xml

<configuration>

<property>

<name>dfs.namenode.name.dir</name>

<value>/home/hadoop/name</value>

<final>true</final>

</property>

<property>

<name>dfs.datanode.data.dir</name>

<value>/home/hadoop/data</value>

<final>true</final>

</property>

<property>

<name>dfs.replication</name>

<value>3</value>

</property>

<property>

<name>dfs.permissions</name>

<value>false</value>

</property>

</configuration>

f) mapred-site.xml

<configuration>

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

<property>

<name>mapreduce.jobhistory.address</name>

<value>hadoop1:10020</value>

</property>

<property>

<name>mapreduce.jobhistory.webapp.address</name>

<value>hadoop1:19888</value>

</property>

<property>

<name>mapreduce.jobhistory.intermediate-done-dir</name>

<value>/mr-history/tmp</value>

</property>

<property>

<name>mapreduce.jobhistory.done-dir</name>

<value>/mr-history/done</value>

</property>

</configuration>

g) yarn-site.xml

<configuration>

<property>

<name>yarn.resourcemanager.address</name>

<value>hadoop1:18040</value>

</property>

<property>

<name>yarn.resourcemanager.scheduler.address</name>

<value>hadoop1:18030</value>

</property>

<property>

<name>yarn.resourcemanager.resource-tracker.address</name>

<value>hadoop1:18025</value>

</property>

<property>

<name>yarn.resourcemanager.admin.address</name>

<value>hadoop1:18041</value>

</property>

<property>

<name>yarn.resourcemanager.webapp.address</name>

<value>hadoop1:8088</value>

</property>

<property>

<name>yarn.nodemanager.local-dirs</name>

<value>/home/hadoop/mynode/my</value>

</property>

<property>

<name>yarn.nodemanager.log-dirs</name>

<value>/home/hadoop/mynode/logs</value>

</property>

<property>

<name>yarn.nodemanager.log.retain-seconds</name>

<value>10800</value>

</property>

<property>

<name>yarn.nodemanager.remote-app-log-dir</name>

<value>/logs</value>

</property>

<property>

<name>yarn.nodemanager.remote-app-log-dir-suffix</name>

<value>logs</value>

</property>

<property>

<name>yarn.log-aggregation.retain-seconds</name>

<value>-1</value>

</property>

<property>

<name>yarn.log-aggregation.retain-check-interval-seconds</name>

<value>-1</value>

</property>

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

</configuration>

3) 将上述文件配置好后,将hadoop-2.2.0文件复制到其余datanode机器上的相同路径下

修改/etc/profile文件 设置hadoop环境变量,切换到root用户

在文件的最后增加如下配置

#hadoop variable settings

export HADOOP_HOME=/home/hadoop/hadoop-2.2.0

export HADOOP_COMMON_HOME=$HADOOP_HOME

export HADOOP_HDFS_HOME=$HADOOP_HOME

export HADOOP_MAPRED_HOME=$HADOOP_HOME

export HADOOP_YARN_HOME=$HADOOP_HOME

export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/lib

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"

export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native

增加之后保存

最后两行特殊说明下,有的文章中遗漏掉这部分配置,最后在启动hadoop2.2时报了下面的错误

Hadoop 2.2.0 - warning: You have loaded library /home/hadoop/2.2.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard.

Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Starting namenodes on [Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /home/hadoop/2.2.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.

It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.

localhost]

sed: -e expression #1, char 6: unknown option to `s'

HotSpot(TM): ssh: Could not resolve hostname HotSpot(TM): Name or service not known

64-Bit: ssh: Could not resolve hostname 64-Bit: Name or service not known

Java: ssh: Could not resolve hostname Java: Name or service not known

Server: ssh: Could not resolve hostname Server: Name or service not known

VM: ssh: Could not resolve hostname VM: Name or service not known

配置完成之后需要重启电脑,所有datanode节点也需要对环境变量增加上面配置,配置完成之后重启电脑

7、hadoop的启动与关闭

1)hadoop namenode的初始化,只需要第一次的时候初始化,之后就不需要了

cd /home/hadoop/hadoop-2.2.0/bin

hdfs namenode -format

2)启动

启动:在namenode机器上,进入/home/myhadoop/sbin

执行脚本start-all.sh

3) 关闭

在namenode节点上,进入/home/myhadoop/sbin

stop-all.sh

8、web接口地址

启动hadoop后,在浏览器中输入地址查看

http://hadoop1:50070

http://hadoop1:8088

http://hadoop1:19888

环境准备

1、操作系统:centos6.0 64位

2、hadoop版本:hahadoop-2.2.0

安装和配置步骤具体如下:

1、主机和ip分配如下

ip地址 主机名 用途

192.168.1.112 hadoop1 namenode

192.168.1.113 hadoop2 datanode

192.168.1.114 hadoop3 datanode

192.168.1.115 hadoop4 datanode

2、四台主机主机名修改如下,仅以hadoop1为例

1) [root@hadoop1 ~]# hostname hadoop1

2) [root@hadoop1 ~]# vi /etc/sysconfig/network,修改hostname属性值保存

3、四台主机都需要安装jdk

4、关闭防火墙,切换到root用户 执行如下命令 chkconfig iptables off

5、每台主机配置/etc/hosts 增加ip地址解析

hadoop1 192.168.1.112

hadoop2 192.168.1.113

hadoop3 192.168.1.114

hadoop4 192.168.1.115

5、配置namenode无密码访问datanode

1) 在namenode机器上,在hadoop用户下执行下面命令

ssh-keygen -t rsa

遇到所有选项回车默认值即可

2) 导入公钥到本机认证文件

cat ~/.ssh/id_rsa.pub>>~/.ssh/authorized_keys

3) 导入公钥到其他datanode节点认证文件

参考 http://www.cnblogs.com/zhwl/p/3664178.html

scp ~/.ssh/authorized_keys hadoop@192.168.1.113:/home/hadoop/.ssh/authorized_keys

scp ~/.ssh/authorized_keys hadoop@192.168.1.114:/home/hadoop/.ssh/authorized_keys

scp ~/.ssh/authorized_keys hadoop@192.168.1.115:/home/hadoop/.ssh/authorized_keys

以上过程由于是第一次传输访问,系统会提示输入hadoop用户的密码,输入密码即可。

4) 验证是否能够无密码登陆到datanode节点

[hadoop@hadoop1 ~]$ ssh 192.168.1.113

如果没有密码提示直接登陆显示如下

[hadoop@hadoop2 ~]$

无密码登陆验证通过,访问其他datanode节点类似

6、安装hadoop2.2

1)解压缩hadoop-2.2.0.tar.gz

tar -zxf hadoop-2.2.0.tar.gz

默认解压缩到当前目录下面,这里解压缩/home/hadoop/目录下面

2) 修改hadoop配置文件

打开hadoop-2.2.0/etc/hadoop,修改里面的配置文件

a) hadoop-env.sh,找到里面的JAVA_HOME,修改为实际地址

b) yarn-env.sh ,同样找到里面的JAVA_HOME,修改为实际路径

c) slave 这个文件配置所有datanode节点,以便namenode搜索,本例配置如下

hadoop2

hadoop3

hadoop4

d) core-site.xml

<configuration>

<property>

<name>fs.defaultFS</name>

<value>hdfs://hadoop1:9000</value>

</property>

<property>

<name>io.file.buffer.size</name>

<value>131072</value>

</property>

<property>

<name>hadoop.tmp.dir</name>

<value>/home/hadoop/hadoopp-2.2.0/mytmp</value>

<description>A base for other temporarydirectories.</description>

</property>

<property>

<name>hadoop.proxyuser.root.hosts</name>

<value>hadoop1</value>

</property>

<property>

<name>hadoop.proxyuser.root.groups</name>

<value>*</value>

</property>

</configuration>

e) hdfs-site.xml

<configuration>

<property>

<name>dfs.namenode.name.dir</name>

<value>/home/hadoop/name</value>

<final>true</final>

</property>

<property>

<name>dfs.datanode.data.dir</name>

<value>/home/hadoop/data</value>

<final>true</final>

</property>

<property>

<name>dfs.replication</name>

<value>3</value>

</property>

<property>

<name>dfs.permissions</name>

<value>false</value>

</property>

</configuration>

f) mapred-site.xml

<configuration>

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

<property>

<name>mapreduce.jobhistory.address</name>

<value>hadoop1:10020</value>

</property>

<property>

<name>mapreduce.jobhistory.webapp.address</name>

<value>hadoop1:19888</value>

</property>

<property>

<name>mapreduce.jobhistory.intermediate-done-dir</name>

<value>/mr-history/tmp</value>

</property>

<property>

<name>mapreduce.jobhistory.done-dir</name>

<value>/mr-history/done</value>

</property>

</configuration>

g) yarn-site.xml

<configuration>

<property>

<name>yarn.resourcemanager.address</name>

<value>hadoop1:18040</value>

</property>

<property>

<name>yarn.resourcemanager.scheduler.address</name>

<value>hadoop1:18030</value>

</property>

<property>

<name>yarn.resourcemanager.resource-tracker.address</name>

<value>hadoop1:18025</value>

</property>

<property>

<name>yarn.resourcemanager.admin.address</name>

<value>hadoop1:18041</value>

</property>

<property>

<name>yarn.resourcemanager.webapp.address</name>

<value>hadoop1:8088</value>

</property>

<property>

<name>yarn.nodemanager.local-dirs</name>

<value>/home/hadoop/mynode/my</value>

</property>

<property>

<name>yarn.nodemanager.log-dirs</name>

<value>/home/hadoop/mynode/logs</value>

</property>

<property>

<name>yarn.nodemanager.log.retain-seconds</name>

<value>10800</value>

</property>

<property>

<name>yarn.nodemanager.remote-app-log-dir</name>

<value>/logs</value>

</property>

<property>

<name>yarn.nodemanager.remote-app-log-dir-suffix</name>

<value>logs</value>

</property>

<property>

<name>yarn.log-aggregation.retain-seconds</name>

<value>-1</value>

</property>

<property>

<name>yarn.log-aggregation.retain-check-interval-seconds</name>

<value>-1</value>

</property>

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

</configuration>

3) 将上述文件配置好后,将hadoop-2.2.0文件复制到其余datanode机器上的相同路径下

修改/etc/profile文件 设置hadoop环境变量,切换到root用户

在文件的最后增加如下配置

#hadoop variable settings

export HADOOP_HOME=/home/hadoop/hadoop-2.2.0

export HADOOP_COMMON_HOME=$HADOOP_HOME

export HADOOP_HDFS_HOME=$HADOOP_HOME

export HADOOP_MAPRED_HOME=$HADOOP_HOME

export HADOOP_YARN_HOME=$HADOOP_HOME

export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/lib

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"

export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native

增加之后保存

最后两行特殊说明下,有的文章中遗漏掉这部分配置,最后在启动hadoop2.2时报了下面的错误

Hadoop 2.2.0 - warning: You have loaded library /home/hadoop/2.2.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard.

Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Starting namenodes on [Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /home/hadoop/2.2.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.

It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.

localhost]

sed: -e expression #1, char 6: unknown option to `s'

HotSpot(TM): ssh: Could not resolve hostname HotSpot(TM): Name or service not known

64-Bit: ssh: Could not resolve hostname 64-Bit: Name or service not known

Java: ssh: Could not resolve hostname Java: Name or service not known

Server: ssh: Could not resolve hostname Server: Name or service not known

VM: ssh: Could not resolve hostname VM: Name or service not known

配置完成之后需要重启电脑,所有datanode节点也需要对环境变量增加上面配置,配置完成之后重启电脑

7、hadoop的启动与关闭

1)hadoop namenode的初始化,只需要第一次的时候初始化,之后就不需要了

cd /home/hadoop/hadoop-2.2.0/bin

hdfs namenode -format

2)启动

启动:在namenode机器上,进入/home/myhadoop/sbin

执行脚本start-all.sh

3) 关闭

在namenode节点上,进入/home/myhadoop/sbin

stop-all.sh

8、web接口地址

启动hadoop后,在浏览器中输入地址查看

http://hadoop1:50070

http://hadoop1:8088

http://hadoop1:19888