Ubuntu 14.10 下Spark on yarn安装
1 服务器分布
服务器 | 说明 |
192.168.1.100 | NameNode |
192.168.1.101 | DataNode |
192.168.1.102 | DataNode |
2 软件环境
2.1 安装JDK,添加环境变量
2.2 安装Scala,添加环境变量
2.3 SSH免密码登陆,A登陆A,A登陆B,可参考http://blog.csdn.net/codepeak/article/details/14447627
ssh-keygen -t rsa -P ‘‘ cat ~./ssh/id_rsa.pub >> ~/.ssh/authorized_keys
scp ~/.ssh/id_rsa.pub username@ipaddress:/location cat id_rsa.pub >> authorized_keys
2.4 主机名设置
sudo nano /etc/hosts 192.168.1.100 cloud001 192.168.1.101 cloud002 192.168.1.102 cloud003
3 Hadoop集群配置(各个机器相同配置)
3.1 Hadoop环境安装,环境变量配置
export HADOOP_HOME=/home/hadoop/hadoop-2.2.0 export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_YARN_HOME=$HADOOP_HOME export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export SCALA_HOME=/home/hadoop/software/spark/scala-2.11.4 export SPARK_EXAMPLES_JAR=/home/hadoop/software/spark/spark-1.0.0/examples/target/scala-2.11.4/spar$ export SPARK_HOME=/home/hadoop/software/spark/spark-1.0.0 export IDEA_HOME=/home/hadoop/software/dev/idea-IU-139.1117.1 export PATH=$PATH:$SCALA_HOME/bin:$SPARK_HOME/bin:$IDEA_HOME/bin:$HADOOP_HOME/bin:$HADOOP/sbin:$M2_HOME/bin
3.2 core.site.xml配置
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://cloud001:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/hadoop-2.2.0/tmp</value> </property> <!-- <property> <name>io.file.buffer.size</name> <value>131072</value> </property> <property> <name>hadoop.proxyuser.hadoop.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.hadoop.groups</name> <value>*</value> </property>--> </configuration>
3.3 hdfs-site.xml 配置
<configuration> <property> <name>dfs.namenode.secondary.http-address</name> <value>cloud001:9001</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/home/hadoop/hadoop-2.2.0/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/home/hadoop/hadoop-2.2.0/dfs/data</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> </configuration>
3.4 mapred-site.xml 配置
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <!-- <property> <name>mapreduce.jobhistory.address</name> <value>hadoopmaster:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>hadoopmaster:19888</value> </property>--> </configuration>
3.5 yarn-site.xml 配置
<configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!-- <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property>--> <property> <name>yarn.resourcemanager.address</name> <value>cloud001:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>cloud001:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>cloud001:8031</value> </property> <!-- <property> <name>yarn.resourcemanager.admin.address</name> <value>hadoopmaster:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>hadoopmaster:8088</value> </property> --> </configuration>
3.6 配置hadoop-env.sh、mapred-env.sh、yarn-env.sh,在开头添加
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64
3.7 数据节点配置
nano slaves
cloud002
cloud003
4 Spark集群配置(各个机器相同配置)
4.1 Spark安装部署
下载Spark二进制包,配置环境变量
export SCALA_HOME=/home/hadoop/software/spark/scala-2.11.4 export SPARK_EXAMPLES_JAR=/home/hadoop/software/spark/spark-1.0.0/examples/target/scala-2.11.4/spar$ export SPARK_HOME=/home/hadoop/software/spark/spark-1.0.0
配置spark-env.sh,添加如下
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64 export SCALA_HOME=/home/hadoop/software/spark/scala-2.11.4 export HADOOP_HOME=/home/hadoop/hadoop-2.2.0
配置slaves
cloud002
cloud003
5 集群启动
5.1 格式化NameNode节点
hdfs namenode -format
5.2 启动Hadoop
sbin/start-all.sh
5.3 启动Spark
sbin/start-all.sh
6 测试
6.1 本地测试
# bin/run-exampleorg.apache.spark.examples.SparkPi local
6.2 普通集群测试
# bin/run-exampleorg.apache.spark.examples.SparkPi spark://cloud001:7077 # bin/run-exampleorg.apache.spark.examples.SparkLR spark://cloud001:7077 # bin/run-exampleorg.apache.spark.examples.SparkKMeans spark://cloud001:7077 file:/usr/local/spark/data/kmeans_data.txt 2 1
6.3 结合HDFS的集群模式
# hadoop fs -put README.md . # MASTER=spark://cloud001:7077bin/spark-shell scala> val file =sc.textFile("hdfs://cloud001:9000/user/root/README.md") scala> val count = file.flatMap(line=> line.split(" ")).map(word => (word, 1)).reduceByKey(_+_) scala> count.collect() scala> :quit
6.4 基于YARN模式
#SPARK_JAR=assembly/target/scala-2.10/spark-assembly_2.10-0.9.0-incubating-hadoop2.2.0.jar bin/spark-class org.apache.spark.deploy.yarn.Client --jar examples/target/scala-2.10/spark-examples_2.10-assembly-0.9.0-incubating.jar --class org.apache.spark.examples.SparkPi --args yarn-standalone --num-workers 3 --master-memory 4g --worker-memory 2g --worker-cores 1
郑重声明:本站内容如果来自互联网及其他传播媒体,其版权均属原媒体及文章作者所有。转载目的在于传递更多信息及用于网络分享,并不代表本站赞同其观点和对其真实性负责,也不构成任何其他建议。