Hadoop全分布式搭建实战

Hadoop是一个分布式系统基础架构,包含了一个分布式计算框架(MapReduce)和一个分布式文件系统(HDFS)。全分布式是级别较低的一种集群方式。

实验环境:

  • Centos 7主机三台
主机名 IP地址
Master 10.30.59.130
Slave1 10.30.59.131
Slave2 10.30.59.132

软件要求:

软件名称 软件版本
JDK 8u77
Hadoop 2.6.0
  • 软件约定:
    • 安装包在 /opt/soft
    • 安装目录在 /opt

先决条件:

实验步骤:

一、关闭防火墙与SELinux

1
2
3
4
5
[root@master ~]# systemctl stop firewalld
[root@master ~]# systemctl disable firewalld
Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
[root@master ~]# setenforce 0

三个节点均需此操作

二、解压组件

1
2
3
4
5
[root@master ~]# cd /opt 
[root@master opt]# tar -xzvf soft/jdk-8u77-linux-x64.tar.gz
[root@master opt]# tar -xzvf soft/hadoop-2.6.0.tar.gz
[root@master opt]# mv jdk1.8.0_77/ jdk
[root@master opt]# mv hadoop-2.6.0/ hadoop

三、填写配置文件

1
[root@master opt]# vi haoop/etc/hadoop/hdfs-site.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
<?xml version="1.0" encoding="utf-8"?>
<configuration>
<property>
<!-- 指定默认namenode的RPC服务器地址和端口 -->
<name>dfs.namenode.rpc-address</name>
<value>master:9000</value>
</property>
<property>
<!-- 指定namenode的HTTP服务器地址和端口 -->
<name>dfs.namenode.http-address</name>
<value>master:50070</value>
</property>
<property>
<!-- 指定备用namenode的RPC服务器地址和端口 -->
<name>dfs.namenode.secondary.rpc-address</name>
<value>slave1:9000</value>
</property>
<property>
<!-- 指定备用namenode的HTTP服务器地址和端口 -->
<name>dfs.namenode.secondary.http-address</name>
<value>slave1:50070</value>
</property>
<property>
<!-- 指定namenode的name文件路径,指定两个路径用于冗余 -->
<name>dfs.namenode.name.dir</name>
<value>file:///opt/hadoop-repo/name1,file:///opt/hadoop-repo/name2</value>
</property>
<property>
<!-- 指定datenode的data文件路径,指定两个路径用于冗余 -->
<name>dfs.datanode.data.dir</name>
<value>file:///opt/hadoop-repo/data1,file:///opt/hadoop-repo/data2</value>
</property>
</configuration>
1
[root@master opt]# vi hadoop/etc/hadoop/core-site.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
<?xml version="1.0" encoding="utf-8"?>
<configuration>
<property>
<!-- 指定默认文件系统名称 -->
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<!-- 指定默认临时文件目录 -->
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop-repo/tmp</value>
</property>
</configuration>
1
2
[root@master opt]# cp haeoop/etc/hadoop/mapred-site.xml.template hadoop/etc/hadoop/mapred-site.xml 
[root@master opt]# vi hadoop/etc/hadoop/mapred-site.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
<?xml version="1.0" encoding="utf-8"?>
<configuration>
<property>
<!-- 指定MR的运行框架 -->
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<!-- 提交作业时使用的目录。 -->
<name>yarn.app.mapreduce.am.staging-dir</name>
<value>/opt/hadoop-repo/history</value>
</property>
</configuration>
1
[root@master opt]# vi hadoop/etc/hadoop/yarn-site.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
<?xml version="1.0" encoding="utf-8"?>
<configuration>
<property>
<!-- RM中的应用程序管理器接口的地址 -->
<name>yarn.resourcemanager.address</name>
<value>slave1</value>
</property>
<property>
<!-- 指定yarn节点管理器名称 -->
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<!-- 启用日志聚合功能 -->
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
</configuration>
1
[root@master opt]# vi hadoop/etc/hadoop/slaves
1
2
3
master
slave1
slave2

四、配置环境变量并令其立即生效

1
2
3
4
5
6
7
8
[root@master opt]# vi /etc/profile.d/hadoop-etc.sh
export JAVA_HOME=/opt/jdk
export PATH=$PATH:$JAVA_HOME/bin

export HADOOP_HOME=/opt/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

[root@master opt]# source /etc/profile.d/hadoop-etc.sh

五、格式化HDFS

1
[root@master opt]# hdfs namenode -format

仅需在master上格式化

六、启动Hadoop
  • 在master上启动
1
[root@master opt]# start-dfs.sh 
  • 在slave1上启动
1
[root@slave1 opt]# start-yarn.sh 
  • 三台均需启动
1
[root@master opt]# mr-jobhistory-daemon.sh start historyserver 

实验验证:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
[root@master opt]# jps
14310 NameNode
15046 NodeManager
15159 Jps
14059 QuorumPeerMain
14587 DataNode

[root@slave1 ~]# jps
14501 SecondaryNameNode
14699 Jps
14588 NodeManager
14269 QuorumPeerMain
14415 DataNode

[root@slave2 ~]# jps
13120 NodeManager
13218 Jps
13016 DataNode
12863 QuorumPeerMain

[root@master opt]# hdfs dfsadmin -report
Configured Capacity: 93344772096 (86.93 GB)
Present Capacity: 86899245056 (80.93 GB)
DFS Remaining: 86899232768 (80.93 GB)
DFS Used: 12288 (12 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Live datanodes (3):

Name: 10.30.59.132:50010 (slave2)
Hostname: slave2
Decommission Status : Normal
Configured Capacity: 31114924032 (28.98 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 2017771520 (1.88 GB)
DFS Remaining: 29097148416 (27.10 GB)
DFS Used%: 0.00%
DFS Remaining%: 93.52%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri Jun 07 02:32:49 CST 2019


Name: 10.30.59.130:50010 (master)
Hostname: master
Decommission Status : Normal
Configured Capacity: 31114924032 (28.98 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 2409652224 (2.24 GB)
DFS Remaining: 28705267712 (26.73 GB)
DFS Used%: 0.00%
DFS Remaining%: 92.26%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri Jun 07 02:32:49 CST 2019


Name: 10.30.59.131:50010 (slave1)
Hostname: slave1
Decommission Status : Normal
Configured Capacity: 31114924032 (28.98 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 2018103296 (1.88 GB)
DFS Remaining: 29096816640 (27.10 GB)
DFS Used%: 0.00%
DFS Remaining%: 93.51%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri Jun 07 02:32:50 CST 2019

且以下页面均有正常显示