YARN安装配置初体验
本安装在开发实验环境中部署,只涉及到全局资源管理调度系统YARN的安装,HDFS还是第一代,没有部署HDFS Federation和HDFS HA,后续会加上。
OS: CentOS Linux release 6.0 (Final) x86_64
部署机器:
dev80.hadoop 192.168.7.80
dev81.hadoop 192.168.7.81
dev82.hadoop 192.168.7.82
dev83.hadoop 192.168.7.83
dev80主要作为ResourceManager, Namenode,SecondaryNamenode,slave节点(起datanode和nodemanager)包括 dev80,dev81,dev82,dev83
首先需要安装jdk,并保证和各个slave节点ssh打通。
从hadoop官网上下载2.0.5 alpha版本(目前最新的打包版本,beta版本已经从trunk上拉出了分支,不过需要自己build)
wget http://apache.fayea.com/apache-mirror/hadoop/common/hadoop-2.0.5-alpha/hadoop-2.0.5-alpha.tar.gz
tar xzvf hadoop-2.0.5-alpha.tar.gz
解压开来后发现整个目录和hadoop 1.0发生很大变化,和linux根目录结构很相似,客户端的启动命令都放到bin下面,而管理员服务端启动命令都在sbin(super bin)下面,配置文件统一放在了etc/hadoop下,在原有基础上多了一个yarn-site.xml和yarn-env.sh,启动yarn的话可以用sbin/yarn-daemon.sh和sbin/yarn-daemons.sh(启动多个slave上的service)
drwxr-xr-x 2 hadoop hadoop 4096 Aug 16 18:18 bin
drwxr-xr-x 3 hadoop hadoop 4096 Aug 14 10:27 etc
drwxr-xr-x 2 hadoop hadoop 4096 Aug 14 10:27 include
drwxr-xr-x 3 hadoop hadoop 4096 Aug 14 10:27 lib
drwxr-xr-x 2 hadoop hadoop 4096 Aug 16 15:58 libexec
drwxrwxr-x 3 hadoop hadoop 4096 Aug 14 18:15 logs
drwxr-xr-x 2 hadoop hadoop 4096 Aug 16 18:25 sbin
drwxr-xr-x 4 hadoop hadoop 4096 Aug 14 10:27 share
配置
export HADOOP_HOME=/usr/local/hadoop/hadoop-2.0.5-alpha 加入/etc/profile文件中,这样启动的时候就会加载到系统环境变量中
hadoop-env.sh中设置JAVA HOME和ssh参数
export JAVA_HOME=/usr/local/jdk
export HADOOP_SSH_OPTS="-p 58422"
slaves文件加入如下节点:
dev80.hadoop
dev81.hadoop
dev82.hadoop
dev83.hadoop
core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://dev80.hadoop:8020</value>
<final>true</final>
</property>
</configuration>
hdfs-site.xml中配置namenode存放editlog和fsimage的目录、和datanode存放block storage的目录
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/data/yarn/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/data/yarn/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
yarn-site.xml,yarn中shuffle部分被独立成一个service,需要在nodemanager启动的时候作为auxiliary service一起启动,这样可以自定义第三方的shuffle provider,和
ShuffleConsumer,比如可以替换现阶段的HTTP Shuffle
为RDMA Shuffle,对于中间结果merge可以采用更合适的策略来得到更好的性能提升
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.address</name>
<value>dev80.hadoop:9080</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>dev80.hadoop:9081</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>dev80.hadoop:9082</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
mapred-site.xml中需要配置mapreduce.framework.name为yarn,这样mr job会被提交到ResourceManager
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
将上述conf文件rsync到各个slave节点上
启动Service
先启动HDFS
bin/hdfs namenode -format
这条命令执行后/data/yarn/name下面就被formatted了
启动namenode:
sbin/hadoop-daemon.sh start namenode
启动datanode
sbin/hadoop-daemons.sh start datanode (注意这边是hadoop-daemons.sh,他会调用salves.sh,读取slaves文件ssh到各个slave节点上启动service)
至此namenode和datanode启动完毕
可以通过http://192.168.7.80:50070 查看hdfs页面
启动ResourceManager
sbin/yarn-daemon.sh start resourcemanager
启动NodeManager
sbin/yarn-daemons.sh start nodemanager
检查YARN的页面http://192.168.7.80:8088/cluster
启动history server
sbin/mr-jobhistory-daemon.sh start historyserver
跑一个简单的例子
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.5-alpha.jar pi 30 30
Number of Maps = 30
Samples per Map = 30
13/08/19 12:03:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Wrote input for Map #10
Wrote input for Map #11
Wrote input for Map #12
Wrote input for Map #13
Wrote input for Map #14
Wrote input for Map #15
Wrote input for Map #16
Wrote input for Map #17
Wrote input for Map #18
Wrote input for Map #19
Wrote input for Map #20
Wrote input for Map #21
Wrote input for Map #22
Wrote input for Map #23
Wrote input for Map #24
Wrote input for Map #25
Wrote input for Map #26
Wrote input for Map #27
Wrote input for Map #28
Wrote input for Map #29
Starting Job
13/08/19 12:03:52 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
13/08/19 12:03:52 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
13/08/19 12:03:53 INFO input.FileInputFormat: Total input paths to process : 30
13/08/19 12:03:53 INFO mapreduce.JobSubmitter: number of splits:30
13/08/19 12:03:53 WARN conf.Configuration: mapred.jar is deprecated. Instead, use mapreduce.job.jar
13/08/19 12:03:53 WARN conf.Configuration: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
13/08/19 12:03:53 WARN conf.Configuration: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
13/08/19 12:03:53 WARN conf.Configuration: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
13/08/19 12:03:53 WARN conf.Configuration: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
13/08/19 12:03:53 WARN conf.Configuration: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
13/08/19 12:03:53 WARN conf.Configuration: mapred.job.name is deprecated. Instead, use mapreduce.job.name
13/08/19 12:03:53 WARN conf.Configuration: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
13/08/19 12:03:53 WARN conf.Configuration: mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class
13/08/19 12:03:53 WARN conf.Configuration: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
13/08/19 12:03:53 WARN conf.Configuration: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
13/08/19 12:03:53 WARN conf.Configuration: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class
13/08/19 12:03:53 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
13/08/19 12:03:53 WARN conf.Configuration: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
13/08/19 12:03:53 WARN conf.Configuration: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
13/08/19 12:03:53 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1376884226092_0001
13/08/19 12:03:53 INFO client.YarnClientImpl: Submitted application application_1376884226092_0001 to ResourceManager at dev80.hadoop/192.168.7.80:9080
13/08/19 12:03:53 INFO mapreduce.Job: The url to track the job: http://dev80.hadoop:8088/proxy/application_1376884226092_0001/
13/08/19 12:03:53 INFO mapreduce.Job: Running job: job_1376884226092_0001
13/08/19 12:04:00 INFO mapreduce.Job: Job job_1376884226092_0001 running in uber mode : false
13/08/19 12:04:00 INFO mapreduce.Job: map 0% reduce 0%
13/08/19 12:04:10 INFO mapreduce.Job: map 3% reduce 0%
13/08/19 12:04:11 INFO mapreduce.Job: map 23% reduce 0%
13/08/19 12:04:13 INFO mapreduce.Job: map 27% reduce 0%
13/08/19 12:04:14 INFO mapreduce.Job: map 43% reduce 0%
13/08/19 12:04:15 INFO mapreduce.Job: map 73% reduce 0%
13/08/19 12:04:16 INFO mapreduce.Job: map 100% reduce 0%
13/08/19 12:04:17 INFO mapreduce.Job: map 100% reduce 100%
13/08/19 12:04:17 INFO mapreduce.Job: Job job_1376884226092_0001 completed successfully
13/08/19 12:04:17 INFO mapreduce.Job: Counters: 44
File System Counters
FILE: Number of bytes read=666
FILE: Number of bytes written=2258578
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=8060
HDFS: Number of bytes written=215
HDFS: Number of read operations=123
HDFS: Number of large read operations=0
HDFS: Number of write operations=3
Job Counters
Launched map tasks=30
Launched reduce tasks=1
Data-local map tasks=27
Rack-local map tasks=3
Total time spent by all maps in occupied slots (ms)=358664
Total time spent by all reduces in occupied slots (ms)=5182
Map-Reduce Framework
Map input records=30
Map output records=60
Map output bytes=540
Map output materialized bytes=840
Input split bytes=4520
Combine input records=0
Combine output records=0
Reduce input groups=2
Reduce shuffle bytes=840
Reduce input records=60
Reduce output records=0
Spilled Records=120
Shuffled Maps =30
Failed Shuffles=0
Merged Map outputs=30
GC time elapsed (ms)=942
CPU time spent (ms)=14180
Physical memory (bytes) snapshot=6924914688
Virtual memory (bytes) snapshot=22422675456
Total committed heap usage (bytes)=5318574080
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=3540
File Output Format Counters
Bytes Written=97
Job Finished in 24.677 seconds
Estimated value of Pi is 3.13777777777777777778
job history页面
dev80 jps出来的java进程
27172 JobHistoryServer
28627 Jps
26699 ResourceManager
26283 NameNode
26507 DataNode
27014 NodeManager
dev81 jps出来的java进程
3232 Jps
1858 NodeManager
1709 DataNode
这样yarn cluster算搭建完成了
参考:
http://dongxicheng.org/mapreduce-nextgen/hadoop-yarn-install/
本文链接http://blog.csdn.net/lalaguozhe/article/details/10062619,转载请注明
更多推荐
所有评论(0)