Hadoop本地伪分布式搭建实战

Hadoop是一个分布式系统基础架构,包含了一个分布式计算框架(MapReduce)和一个分布式文件系统(HDFS)。本地伪分布式是提供给开发者在本地测试应用程序的一种搭建模式,不推荐在生产环境中使用这种搭建方式。

实验环境:

  • Centos 7主机一台
主机名 IP地址
Master 10.30.59.130

软件要求:

软件名称 软件版本
JDK 8u77
Hadoop 2.6.0
  • 软件约定:
    • 安装包在 /opt/soft
    • 安装目录在 /opt

先决条件:

  • 需要给localhost和0.0.0.0两个地址配置ssh免密登录

实验步骤:

一、关闭防火墙与SELinux

1
2
3
4
5
[root@localhost ~]# systemctl stop firewalld
[root@localhost ~]# systemctl disable firewalld
Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
[root@localhost ~]# setenforce 0

二、解压组件

1
2
3
4
5
[root@localhost ~]# cd /opt 
[root@localhost opt]# tar -xzvf soft/jdk-8u77-linux-x64.tar.gz
[root@localhost opt]# tar -xzvf soft/hadoop-2.6.0.tar.gz
[root@localhost opt]# mv jdk1.8.0_77/ jdk
[root@localhost opt]# mv hadoop-2.6.0/ hadoop

三、填写配置文件

1
[root@localhost opt]# vi haoop/etc/hadoop/hdfs-site.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
<?xml version="1.0" encoding="utf-8"?>
<configuration>
<property>
<!-- 指定dfs最小副本数量 -->
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<!-- 指定namenode的name文件路径,指定两个路径用于冗余 -->
<name>dfs.namenode.name.dir</name>
<value>file:///opt/hadoop-repo/name1,file:///opt/hadoop-repo/name2</value>
</property>
<property>
<!-- 指定datenode的data文件路径,指定两个路径用于冗余 -->
<name>dfs.datanode.data.dir</name>
<value>file:///opt/hadoop-repo/data1,file:///opt/hadoop-repo/data2</value>
</property>
</configuration>
1
[root@localhost opt]# vi hadoop/etc/hadoop/core-site.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
<?xml version="1.0" encoding="utf-8"?>
<configuration>
<property>
<!-- 指定默认文件系统名称 -->
<name>fs.defaultFS</name>
<value>hdfs://172.0.0.1:9000</value>
</property>
<property>
<!-- 指定默认临时文件目录 -->
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop-repo/tmp</value>
</property>
</configuration>
1
2
[root@localhost opt]# cp haeoop/etc/hadoop/mapred-site.xml.template hadoop/etc/hadoop/mapred-site.xml 
[root@localhost opt]# vi hadoop/etc/hadoop/mapred-site.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
<?xml version="1.0" encoding="utf-8"?>
<configuration>
<property>
<!-- 指定MR的运行框架 -->
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<!-- 提交作业时使用的目录。 -->
<name>yarn.app.mapreduce.am.staging-dir</name>
<value>/opt/hadoop-repo/history</value>
</property>
</configuration>
1
[root@localhost opt]# vi hadoop/etc/hadoop/yarn-site.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
<?xml version="1.0" encoding="utf-8"?>
<configuration>
<property>
<!-- 指定yarn节点管理器名称 -->
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<!-- 启用日志聚合功能 -->
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
</configuration>

四、配置环境变量并令其立即生效

1
2
3
4
5
6
7
8
[root@localhost opt]# vi /etc/profile.d/hadoop-etc.sh
export JAVA_HOME=/opt/jdk
export PATH=$PATH:$JAVA_HOME/bin

export HADOOP_HOME=/opt/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

[root@localhost opt]# source /etc/profile.d/hadoop-etc.sh

五、格式化HDFS

1
[root@localhost opt]# hdfs namenode -format

六、启动Hadoop

1
2
3
[root@localhost opt]# start-dfs.sh 
[root@localhost opt]# start-yarn.sh
[root@localhost opt]# mr-jobhistory-daemon.sh start historyserver

实验验证:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
[root@localhost opt]# jps
14400 ResourceManager
14673 NodeManager
14867 JobHistoryServer
14903 Jps
12568 NameNode
12843 SecondaryNameNode
12686 DataNode
[root@localhost opt]# hdfs dfsadmin -report
Configured Capacity: 62229848064 (57.96 GB)
Present Capacity: 57338130432 (53.40 GB)
DFS Remaining: 57155321856 (53.23 GB)
DFS Used: 182808576 (174.34 MB)
DFS Used%: 0.32%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Live datanodes (1):

Name: 127.0.0.1:50010 (localhost)
Hostname: localhost
Decommission Status : Normal
Configured Capacity: 62229848064 (57.96 GB)
DFS Used: 182808576 (174.34 MB)
Non DFS Used: 4891717632 (4.56 GB)
DFS Remaining: 57155321856 (53.23 GB)
DFS Used%: 0.29%
DFS Remaining%: 91.85%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu Jun 06 19:41:43 CST 2019


此时:hdfs上传文件/mapred均可正常运行

伪分布式即搭建成功