Ubuntu 14.10 下使用IDEA开发Spark应用
1 环境准备
1.1 下载IDEA,可在官网下载
1.2 IDEA与Eclipse有点不同,IDEA中的New Projects相当于Eclipse中的workspace,New Module才是新建工程
2 建立Spark程序
2.1 首先新建项目,New Projects,名字随便取:Create New Project -> Scala -> SBT -> 创建名为SparkExample的project
2.2 创建Module,New Module,名字随便取:New Module-> Scala -> Scala,创建名为FirstApp
2.3 设置FirstApp 的Project Structure
2.3.1 增加源码目录,目录结构自己设置
2.3.2 增加Jar包,File -> Project Structure -> Libraries -> + -> java -> 选择
spark-assembly-1.0.0-hadoop2.2.0.jar
scala-library.jar
2.4 编写代码,在源码包下新建Object,这里找了三个Demo
import org.apache.spark._ import scala.math.random /** * Created by hadoop on 15-3-21. */ object SparkPi { def main (args: Array[String]) { val conf = new SparkConf().setAppName("Spark Pi") val spark = new SparkContext(conf) val slices = if(args.length > 0) args(0).toInt else 2 val n = 100000 * slices val count = spark.parallelize(1 to n,slices).map{ i => val x = random * 2 - 1 val y = random * 2 - 1 if(x*x + y*y < 1) 1 else 0 }.reduce(_+_) println("Pi is roughly " + 4.0 * count / n) spark.stop() } }
import org.apache.spark.{SparkContext,SparkConf} import org.apache.spark.SparkContext._ /** * Created by hadoop on 15-3-21. */ object WordCount1 { def main (args: Array[String]) { if(args.length == 0){ System.err.println("Usage: WordCount1 <file1>") System.exit(1) } val conf = new SparkConf().setAppName("WordCount1") val sc = new SparkContext(conf) sc.textFile(args(0)).flatMap(_.split(" ")).map(x => (x,1)).reduceByKey(_+_).take(10).foreach(println) sc.stop() } }
import org.apache.spark.{SparkContext, SparkConf} import org.apache.spark.SparkContext._ /** * Created by hadoop on 15-3-21. */ object WordCount2 { def main(args: Array[String]) { if (args.length == 0) { System.err.println("Usage: WordCount2 <file1>") System.exit(1) } val conf = new SparkConf().setAppName("WordCount2") val sc = new SparkContext(conf) sc.textFile(args(0)).flatMap(_.split(" ")).map(x => (x,1)).reduceByKey(_+_).map(x => (x._2,x._1)).sortByKey(false).map(x => (x._2,x._1)).take(10).foreach(println) sc.stop() } }
2.5 生成Jar包
生成程序包之前要先建立一个artifacts,File -> Project Structure -> Artifacts -> + -> Jars -> From moudles with dependencies,然后随便选一个class作为主class。
按OK后,对artifacts进行配置,修改Name为FirstApp,删除Output Layout中FirstApp.jar中的几个依赖包,只剩FirstApp项目本身。
按OK后, Build -> Build Artifacts -> FirstApp -> rebuild进行打包,经过编译后,程序包放置在out/artifacts/FirstApp目录下,文件名为FirstApp.jar。
3 测试Jar包,下图摘自http://blog.csdn.net/book_mmicky/article/details/25714545,需要修改Jar包名称,HDFS路径
参考:http://www.aboutyun.com/thread-8404-1-1.html
http://blog.csdn.net/book_mmicky/article/details/25714545
http://blog.csdn.net/david_xtd/article/details/19081341
郑重声明:本站内容如果来自互联网及其他传播媒体,其版权均属原媒体及文章作者所有。转载目的在于传递更多信息及用于网络分享,并不代表本站赞同其观点和对其真实性负责,也不构成任何其他建议。