Centos7.5安装分布式Hadoop2.6.0+Hbase+Hive,CDH5.14.2离线安装tar包

Tags: Hadoop

主机环境

基本配置:

节点数5
操作系统CentOS Linux release 7.5.1804 (Core)
内存8GB

流程配置:

节点数5
操作系统CentOS Linux release 7.5.1804 (Core)
内存16GB

注: 实际生产中按照需求分配内存,如果只是在vmvare中搭建虚拟机,内存可以调整为每台主机1-2GB即可

软件环境

软件版本下载地址
jdkjdk-8u172-linux-x64点击下载
hadoophadoop-2.6.0-cdh5.14.2点击下载
zookeeeperzookeeper-3.4.5-cdh5.14.2点击下载
hbasehbase-1.2.0-cdh5.14.2点击下载
hivehive-1.1.0-cdh5.14.2点击下载

注: CDH5的所有软件可以在此下载:http://archive.cloudera.com/cdh5/cdh/5/

主机规划

5个节点角色规划如下:

主机名CDHNode1CDHNode2CDHNode3CDHNode4CDHNode5
IP192.168.223.201192.168.223.202192.168.223.203192.168.223.204192.168.223.205
namenodeyesyesnonono
dataNodenonoyesyesyes
resourcemanageryesyesnonono
journalnodeyesyesyesyesyes
zookeeperyesyesyesnono
hmaster(hbase)yesyesnonono
regionserver(hbase)nonoyesyesyes
hive(hiveserver2)nonoyesyesyes

注: Journalnode和ZooKeeper保持奇数个,如果需要高可用则不少于 3 个节点。具体原因,以后详叙。

主机安装前准备

  1. 关闭所有节点的 SELinux
sed -i 's/^SELINUX=.*$/SELINUX=disabled/g' /etc/selinux/config 
setenforce 0
  1. 关闭所有节点防火墙 firewalld or iptables
systemctl disable firewalld;  
systemctl stop firewalld;
systemctl disable iptables;  
systemctl stop iptables;
  1. 开启所有节点时间同步 ntpdate
echo "*/5 * * * * /usr/sbin/ntpdate asia.pool.ntp.org | logger -t NTP" >> /var/spool/cron/root
  1. 设置所有节点语言编码以及时区
echo 'export TZ=Asia/Shanghai' >> /etc/profile
echo 'export  >> /etc/profile
. /etc/profile
  1. 所有节点添加hadoop用户
useradd -m hadoop
echo '123456' | passwd --stdin hadoop
# 设置PS1
su - hadoop
echo 'export PS1="\u@\h:\$PWD>"' >> ~/.bash_profile
echo "alias mv='mv -i'
alias rm='rm -i'" >> ~/.bash_profile
. ~/.bash_profile
  1. 设置hadoop用户之间免密登录 首先在CDHNode1主机生成秘钥
su - hadoop
ssh-keygen -t rsa       # 一直回车即可生成hadoop用户的公钥和私钥
cd .ssh
vi id_rsa.pub  # 去掉私钥末尾的主机名 hadoop@CDHNode1
cat id_rsa.pub > authorized_keys
chmod 600 authorized_keys

压缩.ssh文件夹

su - hadoop
zip -r ssh.zip .ssh

随后分发ssh.zip到CDHNode2-5主机hadoop用户家目录解压即完成免密登录

  1. 主机内核参数优化以及最大文件打开数、最大进程数等参数优化 不同主机优化参数有可能不一样,故这里不作出具体优化方法,但如果Hadoop环境用于正式生产,必须优化,linux默认参数可能会导致hadoop集群性能低下。
  2. datanode节点(CDHNode3-5)挂载数据盘/chunk1,大小15G,请挂载后目录需要授权给hadoop用户

注: 以上操作需要使用 root 用户,到目前为止操作系统环境已经准备完成,以下开始正式安装,后面的操作如果不做特殊说明均使用 hadoop 用户

安装jdk1.8

所有节点都需要安装,安装方式都一样 解压 jdk-8u172-linux-x64.tar.gz

tar zxvf jdk-8u172-linux-x64.tar.gz
mkdir -p /home/hadoop/app
mv jdk-8u172-linux-x64 /home/hadoop/app/jdk
rm -f jdk-8u172-linux-x64.tar.gz

配置环境变量 vi ~/.bash_profile 添加以下内容:

#java
export JAVA_HOME=/home/hadoop/app/jdk
export CLASSPATH=.:$JAVA_HOME/lib:$CLASSPATH
export PATH=$PATH:$JAVA_HOME/bin:$JAVA_HOME/jre/bin

加载环境变量

. ~/.bash_profile

查看是否安装成功 java -version

java version "1.8.0_172"
Java(TM) SE Runtime Environment (build 1.8.0_172-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.172-b11, mixed mode)

如果出现以上结果证明安装成功。

安装zookeeper

首先在CDHNode1上安装

解压 zookeeper-3.4.5-cdh5.14.2.tar.gz

tar zxvf zookeeper-3.4.5-cdh5.14.2.tar.gz
mv zookeeper-3.4.5-cdh5.14.2 /home/hadoop/app/zookeeper
rm -f zookeeper-3.4.5-cdh5.14.2.tar.gz

设置环境变量 vi ~/.bash_profile 添加以下内容:

#zk
export ZOOKEEPER_HOME=/home/hadoop/app/zookeeper
export PATH=$PATH:$ZOOKEEPER_HOME/bin

加载环境变量

. ~/.bash_profile

添加配置文件 vi /home/hadoop/app/zookeeper/conf/zoo.cfg 添加以下内容:

# The number of milliseconds of each tick  
tickTime=2000
# The number of ticks that the initial  
# synchronization phase can take  
initLimit=10
# The number of ticks that can pass between  
# sending a request and getting an acknowledgement  
syncLimit=5
# the directory where the snapshot is stored.  
# do not use /tmp for storage, /tmp here is just  
# example sakes.  
#数据文件目录与日志目录  
dataDir=/home/hadoop/data/zookeeper/zkdata
dataLogDir=/home/hadoop/data/zookeeper/zkdatalog
# the port at which the clients will connect  
clientPort=2181
#server.服务编号=主机名称:Zookeeper不同节点之间同步和通信的端口:选举端口(选举leader)  
server.1=CDHNode1:2888:3888
server.2=CDHNode2:2888:3888
server.3=CDHNode3:2888:3888
# 节点变更时只需在此添加或者删除相应的节点(所有节点配置都需要修改),然后在启动新增或者停止删除的节点即可
# administrator guide before turning on autopurge.  
#  
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance  
#  
# The number of snapshots to retain in dataDir  
#autopurge.snapRetainCount=3  
# Purge task interval in hours  
# Set to "0" to disable auto purge feature  
#autopurge.purgeInterval=1

创建所需目录

mkdir -p /home/hadoop/data/zookeeper/zkdata
mkdir -p /home/hadoop/data/zookeeper/zkdatalog
mkdir -p /home/hadoop/app/zookeeper/logs

添加myid vim /home/hadoop/data/zookeeper/zkdata/myid,添加:

1

注: 此数字来源于zoo.cfg中配置 server.1=CDHNode1:2888:3888行server后面的1,故CDHNode2填写2,CDHNode3填写3

配置日志目录 vim /home/hadoop/app/zookeeper/libexec/zkEnv.sh ,修改以下参数为:

ZOO_LOG_DIR="$ZOOKEEPER_HOME/logs"
ZOO_LOG4J_PROP="INFO,ROLLINGFILE"

注: /home/hadoop/app/zookeeper/libexec/zkEnv.sh/home/hadoop/app/zookeeper/bin/zkEnv.sh 文件内容相同。启动脚本 /home/hadoop/app/zookeeper/bin/zkServer.sh 会优先读取/home/hadoop/app/zookeeper/libexec/zkEnv.sh,当其不存在时才会读取 /home/hadoop/app/zookeeper/bin/zkEnv.sh

vim /home/hadoop/app/zookeeper/conf/log4j.properties ,修改以下参数为:

zookeeper.root.logger=INFO, ROLLINGFILE
zookeeper.log.dir=/home/hadoop/app/zookeeper/logs
log4j.appender.ROLLINGFILE=org.apache.log4j.RollingFileAppender

复制zookeeper到CDHNode2-3

scp ~/.bash_profile CDHNode2:/home/hadoop
scp ~/.bash_profile CDHNode3:/home/hadoop
scp -pr /home/hadoop/app/zookeeper CDHNode2:/home/hadoop/app
scp -pr /home/hadoop/app/zookeeper CDHNode3:/home/hadoop/app
ssh CDHNode2 "mkdir -p /home/hadoop/data/zookeeper/zkdata;mkdir -p /home/hadoop/data/zookeeper/zkdatalog;mkdir -p /home/hadoop/app/zookeeper/logs"
ssh CDHNode2 "echo 2 > /home/hadoop/data/zookeeper/zkdata/myid"
ssh CDHNode3 "mkdir -p /home/hadoop/data/zookeeper/zkdata;mkdir -p /home/hadoop/data/zookeeper/zkdatalog;mkdir -p /home/hadoop/app/zookeeper/logs"
ssh CDHNode3 "echo 3 > /home/hadoop/data/zookeeper/zkdata/myid"

启动zookeeper 3个节点均启动

/home/hadoop/app/zookeeper/bin/zkServer.sh start

查看节点状态

/home/hadoop/app/zookeeper/bin/zkServer.sh status

如果一个节点为leader,另2个节点为follower,则说明Zookeeper安装成功

查看进程

jps

其中 QuorumPeerMain 进程为zookeeper

停止zookeeper

/home/hadoop/app/zookeeper/bin/zkServer.sh stop

安装hadoop

首先在CDHNode1节点安装,然后复制到其他节点 解压 hadoop-2.6.0-cdh5.14.2.tar.gz

tar zxvf hadoop-2.6.0-cdh5.14.2.tar.gz
mv hadoop-2.6.0-cdh5.14.2 /home/hadoop/app/hadoop
rm -f hadoop-2.6.0-cdh5.14.2.tar.gz

设置环境变量 vi ~/.bash_profile 添加以下内容:

#hadoop
HADOOP_HOME=/home/hadoop/app/hadoop
PATH=$HADOOP_HOME/bin:$PATH
export HADOOP_HOME PATH

加载环境变量

. ~/.bash_profile

配置HDFS

配置 /home/hadoop/app/hadoop/etc/hadoop/hadoop-env.sh, 修改以下内容

export JAVA_HOME=/home/hadoop/app/jdk

配置 /home/hadoop/app/hadoop/etc/hadoop/core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://cluster1</value>
        </property>
        <!-- 这里的值指的是默认的HDFS路径 ,取名为cluster1 -->
        <property>
                <name>hadoop.tmp.dir</name>
                <value>/home/hadoop/data/tmp</value>
        </property>
        <!-- hadoop的临时目录,如果需要配置多个目录,需要逗号隔开,data目录需要我们自己创建 -->
        <property>
                <name>ha.zookeeper.quorum</name>
                <value>CDHNode1:2181,CDHNode2:2181,CDHNode3:2181</value>
        </property>
        <!-- 配置Zookeeper 管理HDFS -->
</configuration>

配置 /home/hadoop/app/hadoop/etc/hadoop/hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
        <property>
                <name>dfs.replication</name>
                <value>3</value>
        </property>
        <!-- 数据块副本数为3 -->
        <property>
                <name>dfs.name.dir</name>
                <value>/home/hadoop/data/hdfs/name</value>
        </property>
        <!-- 元数据保存目录,多个以','隔开 -->
        <property>
                <name>dfs.data.dir</name>
                <value>/chunk1</value>
        </property>
        <!-- 数据保存目录,多个以','隔开 -->
        <property>
                <name>dfs.permissions</name>
                <value>false</value>
        </property>
        <property>
                <name>dfs.permissions.enabled</name>
                <value>false</value>
        </property>
        <!-- 权限默认配置为false -->
        <property>
                <name>dfs.nameservices</name>
                <value>cluster1</value>
        </property>
        <!-- 命名空间,它的值与fs.defaultFS的值要对应,namenode高可用之后有两个namenode,cluster1是对外提供的统一入口 -->
        <property>
                <name>dfs.ha.namenodes.cluster1</name>
                <value>CDHNode1,CDHNode2</value>
        </property>
        <!-- 指定 nameService 是 cluster1 时的nameNode有哪些,这里的值也是逻辑名称,名字随便起,相互不重复即可 -->
        <property>
                <name>dfs.namenode.rpc-address.cluster1.CDHNode1</name>
                <value>CDHNode1:9000</value>
        </property>
        <!-- CDHNode1 rpc地址 -->
        <property>
                <name>dfs.namenode.http-address.cluster1.CDHNode1</name>
                <value>CDHNode1:50070</value>
        </property>
        <!-- CDHNode1 http地址 -->
        <property>
                <name>dfs.namenode.rpc-address.cluster1.CDHNode2</name>
                <value>CDHNode2:9000</value>
        </property>
        <!-- CDHNode2 rpc地址 -->
        <property>
                <name>dfs.namenode.http-address.cluster1.CDHNode2</name>
                <value>CDHNode2:50070</value>
        </property>
        <!-- CDHNode2 http地址 -->
        <property>
                <name>dfs.ha.automatic-failover.enabled</name>
                <value>true</value>
        </property>
        <!-- 启动故障自动恢复 -->
        <property>
                <name>dfs.namenode.shared.edits.dir</name>
                <value>qjournal://CDHNode1:8485;CDHNode2:8485;CDHNode3:8485;CDHNode4:8485;CDHNode5:8485/cluster1</value>
        </property>
        <!-- 指定journal -->
        <property>
                <name>dfs.client.failover.proxy.provider.cluster1</name>
                <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
        </property>
        <!-- 指定 cluster1 出故障时,哪个实现类负责执行故障切换 -->
        <property>
                <name>dfs.journalnode.edits.dir</name>
                <value>/home/hadoop/data/journaldata/jn</value>
        </property>
        <!-- 指定JournalNode集群在对nameNode的目录进行共享时,自己存储数据的磁盘路径 -->
        <property>
                <name>dfs.ha.fencing.methods</name>
                <value>shell(/bin/true)</value>