SparkSQL使用之Spark SQL CLI

浏览数：47 / 时间：2015年06月12日

Spark SQL CLI描述

Spark SQL CLI的引入使得在SparkSQL中通过hive metastore就可以直接对hive进行查询更加方便；当前版本中还不能使用Spark SQL CLI与ThriftServer进行交互。

注意：在使用Spark SQL CLI时需要将hive-site.xml配置文件拷贝到$SPARK_HOME/conf目录下。

Spark SQL CLI命令参数介绍：

cd $SPARK_HOME/bin
spark-sql --help

Usage: ./bin/spark-sql [options] [cli option]
Spark assembly has been built with Hive, including Datanucleus jars on classpath
Options:
  --master MASTER_URL         spark://host:port, mesos://host:port, yarn, or local.
  --deploy-mode DEPLOY_MODE   Whether to launch the driver program locally ("client") or
                              on one of the worker machines inside the cluster ("cluster")
                              (Default: client).
  --class CLASS_NAME          Your application‘s main class (for Java / Scala apps).
  --name NAME                 A name of your application.
  --jars JARS                 Comma-separated list of local jars to include on the driver
                              and executor classpaths.
  --py-files PY_FILES         Comma-separated list of .zip, .egg, or .py files to place
                              on the PYTHONPATH for Python apps.
  --files FILES               Comma-separated list of files to be placed in the working
                              directory of each executor.

  --conf PROP=VALUE           Arbitrary Spark configuration property.
  --properties-file FILE      Path to a file from which to load extra properties. If not
                              specified, this will look for conf/spark-defaults.conf.

  --driver-memory MEM         Memory for driver (e.g. 1000M, 2G) (Default: 512M).
  --driver-java-options       Extra Java options to pass to the driver.
  --driver-library-path       Extra library path entries to pass to the driver.
  --driver-class-path         Extra class path entries to pass to the driver. Note that
                              jars added with --jars are automatically included in the
                              classpath.

  --executor-memory MEM       Memory per executor (e.g. 1000M, 2G) (Default: 1G).

  --help, -h                  Show this help message and exit
  --verbose, -v               Print additional debug output

 Spark standalone with cluster deploy mode only:
  --driver-cores NUM          Cores for driver (Default: 1).
  --supervise                 If given, restarts the driver on failure.

 Spark standalone and Mesos only:
  --total-executor-cores NUM  Total cores for all executors.

 YARN-only:
  --executor-cores NUM        Number of cores per executor (Default: 1).
  --queue QUEUE_NAME          The YARN queue to submit to (Default: "default").
  --num-executors NUM         Number of executors to launch (Default: 2).
  --archives ARCHIVES         Comma separated list of archives to be extracted into the
                              working directory of each executor.

CLI options:
-d,--define <key=value>          Variable subsitution to apply to hive
                                  commands. e.g. -d A=B or --define A=B
    --database <databasename>     Specify the database to use
 -e <quoted-query-string>         SQL from command line
 -f <filename>                    SQL from files
 -h <hostname>                    connecting to Hive Server on remote host
    --hiveconf <property=value>   Use value for given property
    --hivevar <key=value>         Variable subsitution to apply to hive
                                  commands. e.g. --hivevar A=B
 -i <filename>                    Initialization SQL file
 -p <port>                        connecting to Hive Server on port number
 -S,--silent                      Silent mode in interactive shell
 -v,--verbose                     Verbose mode (echo executed SQL to the console)

在启动spark-sql时，如果不指定master，则以local的方式运行，master既可以指定standalone的地址，也可以指定yarn；

当设定master为yarn时(spark-sql --master yarn)时，可以通过http://hadoop000:8088页面监控到整个job的执行过程；

注：如果在$SPARK_HOME/conf/spark-defaults.conf中配置了spark.master spark://hadoop000:7077，那么在启动spark-sql时不指定master也是运行在standalone集群之上。

spark-sql使用

启动spark-sql：由于我已经在spark-defaults.conf中配置了spark.master spark://hadoop000:7077，就没在spark-sql启动时指定master了

cd $SPARK_HOME/bin
spark-sql

SELECT track_time, url, session_id, referer, ip, end_user_id, city_id FROM page_views WHERE city_id = -1000 limit 10;

SELECT session_id, count(*) c FROM page_views group by session_id order by c desc limit 10;

上面两个sql语句用到的表现在存在hive中了，如果没有则手工创建下，创建脚本以及导入数据脚本如下：

create table page_views(
track_time string,
url string,
session_id string,
referer string,
ip string,
end_user_id string,
city_id string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘\t‘;

load data local inpath ‘/home/spark/software/data/page_views.dat‘ overwrite into table page_views;

郑重声明：本站内容如果来自互联网及其他传播媒体，其版权均属原媒体及文章作者所有。转载目的在于传递更多信息及用于网络分享，并不代表本站赞同其观点和对其真实性负责，也不构成任何其他建议。

SparkSQL使用之Spark SQL CLI

标签： des class style java 文件使用数据 com log des class style java 文件使用数据 com log

SparkSQL使用之Spark SQL CLI

相关文章

随机文章

您可能还喜欢

您可能还喜欢

最新图文

您可能还喜欢

您可能还喜欢

文摘排行

文章排行

推荐文章

图文排行

推荐图文