本安装在开发实验环境中部署,只涉及到全局资源管理调度系统YARN的安装,HDFS还是第一代,没有部署HDFS Federation和HDFS HA,后续会加上。

OS: CentOS Linux release 6.0 (Final) x86_64

部署机器:

dev80.hadoop 192.168.7.80

dev81.hadoop 192.168.7.81

dev82.hadoop 192.168.7.82

dev83.hadoop 192.168.7.83

dev80主要作为ResourceManager, Namenode,SecondaryNamenode,slave节点(起datanode和nodemanager)包括 dev80,dev81,dev82,dev83

首先需要安装jdk,并保证和各个slave节点ssh打通。

从hadoop官网上下载2.0.5 alpha版本(目前最新的打包版本,beta版本已经从trunk上拉出了分支,不过需要自己build)

wget http://apache.fayea.com/apache-mirror/hadoop/common/hadoop-2.0.5-alpha/hadoop-2.0.5-alpha.tar.gz
tar xzvf hadoop-2.0.5-alpha.tar.gz

解压开来后发现整个目录和hadoop 1.0发生很大变化,和linux根目录结构很相似,客户端的启动命令都放到bin下面,而管理员服务端启动命令都在sbin(super bin)下面,配置文件统一放在了etc/hadoop下,在原有基础上多了一个yarn-site.xml和yarn-env.sh,启动yarn的话可以用sbin/yarn-daemon.sh和sbin/yarn-daemons.sh(启动多个slave上的service)

drwxr-xr-x 2 hadoop hadoop 4096 Aug 16 18:18 bin
drwxr-xr-x 3 hadoop hadoop 4096 Aug 14 10:27 etc
drwxr-xr-x 2 hadoop hadoop 4096 Aug 14 10:27 include
drwxr-xr-x 3 hadoop hadoop 4096 Aug 14 10:27 lib
drwxr-xr-x 2 hadoop hadoop 4096 Aug 16 15:58 libexec
drwxrwxr-x 3 hadoop hadoop 4096 Aug 14 18:15 logs
drwxr-xr-x 2 hadoop hadoop 4096 Aug 16 18:25 sbin
drwxr-xr-x 4 hadoop hadoop 4096 Aug 14 10:27 share

配置

export HADOOP_HOME=/usr/local/hadoop/hadoop-2.0.5-alpha 加入/etc/profile文件中,这样启动的时候就会加载到系统环境变量中

hadoop-env.sh中设置JAVA HOME和ssh参数

export JAVA_HOME=/usr/local/jdk
export HADOOP_SSH_OPTS="-p 58422"
slaves文件加入如下节点:

dev80.hadoop
dev81.hadoop
dev82.hadoop
dev83.hadoop
core-site.xml

<configuration>
        <property>
                <name>fs.default.name</name>
                <value>hdfs://dev80.hadoop:8020</value>
                <final>true</final>
        </property>
</configuration>
hdfs-site.xml中配置namenode存放editlog和fsimage的目录、和datanode存放block storage的目录

<configuration>
        <property>
                <name>dfs.namenode.name.dir</name>
                <value>/data/yarn/name</value>
        </property>
        <property>
                <name>dfs.datanode.data.dir</name>
                <value>/data/yarn/data</value>
        </property>
        <property>
                <name>dfs.replication</name>
                <value>3</value>
        </property>
</configuration>
yarn-site.xml,yarn中shuffle部分被独立成一个service,需要在nodemanager启动的时候作为auxiliary service一起启动,这样可以自定义第三方的shuffle provider,和 ShuffleConsumer,比如可以替换现阶段的HTTP Shuffle 为RDMA Shuffle,对于中间结果merge可以采用更合适的策略来得到更好的性能提升

<configuration>
        <!-- Site specific YARN configuration properties -->
        <property>
                <name>yarn.resourcemanager.address</name>
                <value>dev80.hadoop:9080</value>
        </property>
        <property>
                <name>yarn.resourcemanager.scheduler.address</name>
                <value>dev80.hadoop:9081</value>
        </property>
        <property>
                <name>yarn.resourcemanager.resource-tracker.address</name>
                <value>dev80.hadoop:9082</value>
        </property>
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce.shuffle</value>
        </property>
        <property>
                <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
                <value>org.apache.hadoop.mapred.ShuffleHandler</value>
        </property>
</configuration>
mapred-site.xml中需要配置mapreduce.framework.name为yarn,这样mr job会被提交到ResourceManager

<configuration>
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>
</configuration>
将上述conf文件rsync到各个slave节点上


启动Service

先启动HDFS

bin/hdfs namenode -format

这条命令执行后/data/yarn/name下面就被formatted了

启动namenode:

sbin/hadoop-daemon.sh start namenode

启动datanode

sbin/hadoop-daemons.sh start datanode (注意这边是hadoop-daemons.sh,他会调用salves.sh,读取slaves文件ssh到各个slave节点上启动service)

至此namenode和datanode启动完毕

可以通过http://192.168.7.80:50070 查看hdfs页面



启动ResourceManager

sbin/yarn-daemon.sh start resourcemanager

启动NodeManager

sbin/yarn-daemons.sh start nodemanager

检查YARN的页面http://192.168.7.80:8088/cluster


启动history server

sbin/mr-jobhistory-daemon.sh start historyserver

查看页面http://192.168.7.80:19888




跑一个简单的例子

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.5-alpha.jar pi 30 30

Number of Maps  = 30
Samples per Map = 30
13/08/19 12:03:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Wrote input for Map #10
Wrote input for Map #11
Wrote input for Map #12
Wrote input for Map #13
Wrote input for Map #14
Wrote input for Map #15
Wrote input for Map #16
Wrote input for Map #17
Wrote input for Map #18
Wrote input for Map #19
Wrote input for Map #20
Wrote input for Map #21
Wrote input for Map #22
Wrote input for Map #23
Wrote input for Map #24
Wrote input for Map #25
Wrote input for Map #26
Wrote input for Map #27
Wrote input for Map #28
Wrote input for Map #29
Starting Job
13/08/19 12:03:52 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
13/08/19 12:03:52 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
13/08/19 12:03:53 INFO input.FileInputFormat: Total input paths to process : 30
13/08/19 12:03:53 INFO mapreduce.JobSubmitter: number of splits:30
13/08/19 12:03:53 WARN conf.Configuration: mapred.jar is deprecated. Instead, use mapreduce.job.jar
13/08/19 12:03:53 WARN conf.Configuration: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
13/08/19 12:03:53 WARN conf.Configuration: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
13/08/19 12:03:53 WARN conf.Configuration: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
13/08/19 12:03:53 WARN conf.Configuration: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
13/08/19 12:03:53 WARN conf.Configuration: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
13/08/19 12:03:53 WARN conf.Configuration: mapred.job.name is deprecated. Instead, use mapreduce.job.name
13/08/19 12:03:53 WARN conf.Configuration: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
13/08/19 12:03:53 WARN conf.Configuration: mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class
13/08/19 12:03:53 WARN conf.Configuration: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
13/08/19 12:03:53 WARN conf.Configuration: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
13/08/19 12:03:53 WARN conf.Configuration: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class
13/08/19 12:03:53 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
13/08/19 12:03:53 WARN conf.Configuration: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
13/08/19 12:03:53 WARN conf.Configuration: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
13/08/19 12:03:53 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1376884226092_0001
13/08/19 12:03:53 INFO client.YarnClientImpl: Submitted application application_1376884226092_0001 to ResourceManager at dev80.hadoop/192.168.7.80:9080
13/08/19 12:03:53 INFO mapreduce.Job: The url to track the job: http://dev80.hadoop:8088/proxy/application_1376884226092_0001/
13/08/19 12:03:53 INFO mapreduce.Job: Running job: job_1376884226092_0001
13/08/19 12:04:00 INFO mapreduce.Job: Job job_1376884226092_0001 running in uber mode : false
13/08/19 12:04:00 INFO mapreduce.Job:  map 0% reduce 0%
13/08/19 12:04:10 INFO mapreduce.Job:  map 3% reduce 0%
13/08/19 12:04:11 INFO mapreduce.Job:  map 23% reduce 0%
13/08/19 12:04:13 INFO mapreduce.Job:  map 27% reduce 0%
13/08/19 12:04:14 INFO mapreduce.Job:  map 43% reduce 0%
13/08/19 12:04:15 INFO mapreduce.Job:  map 73% reduce 0%
13/08/19 12:04:16 INFO mapreduce.Job:  map 100% reduce 0%
13/08/19 12:04:17 INFO mapreduce.Job:  map 100% reduce 100%
13/08/19 12:04:17 INFO mapreduce.Job: Job job_1376884226092_0001 completed successfully
13/08/19 12:04:17 INFO mapreduce.Job: Counters: 44
        File System Counters
                FILE: Number of bytes read=666
                FILE: Number of bytes written=2258578
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=8060
                HDFS: Number of bytes written=215
                HDFS: Number of read operations=123
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=3
        Job Counters 
                Launched map tasks=30
                Launched reduce tasks=1
                Data-local map tasks=27
                Rack-local map tasks=3
                Total time spent by all maps in occupied slots (ms)=358664
                Total time spent by all reduces in occupied slots (ms)=5182
        Map-Reduce Framework
                Map input records=30
                Map output records=60
                Map output bytes=540
                Map output materialized bytes=840
                Input split bytes=4520
                Combine input records=0
                Combine output records=0
                Reduce input groups=2
                Reduce shuffle bytes=840
                Reduce input records=60
                Reduce output records=0
                Spilled Records=120
                Shuffled Maps =30
                Failed Shuffles=0
                Merged Map outputs=30
                GC time elapsed (ms)=942
                CPU time spent (ms)=14180
                Physical memory (bytes) snapshot=6924914688
                Virtual memory (bytes) snapshot=22422675456
                Total committed heap usage (bytes)=5318574080
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=3540
        File Output Format Counters 
                Bytes Written=97
Job Finished in 24.677 seconds
Estimated value of Pi is 3.13777777777777777778
job history页面


dev80 jps出来的java进程

27172 JobHistoryServer
28627 Jps
26699 ResourceManager
26283 NameNode
26507 DataNode
27014 NodeManager

dev81 jps出来的java进程

3232 Jps
1858 NodeManager
1709 DataNode

这样yarn cluster算搭建完成了


参考:

http://dongxicheng.org/mapreduce-nextgen/hadoop-yarn-install/


本文链接http://blog.csdn.net/lalaguozhe/article/details/10062619,转载请注明

GitHub 加速计划 / li / linux-dash
6
1
下载
A beautiful web dashboard for Linux
最近提交(Master分支:3 个月前 )
186a802e added ecosystem file for PM2 4 年前
5def40a3 Add host customization support for the NodeJS version 4 年前
Logo

旨在为数千万中国开发者提供一个无缝且高效的云端环境,以支持学习、使用和贡献开源项目。

更多推荐