Phoenix(sql on hbase)简介
介绍:
Phoenix is a SQL skin over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular
JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance
on the order of milliseconds for small queries, or seconds for tens of millions of rows.
部署:
1:wget http://phoenix-bin.github.com/client/phoenix-2.2.1-install.tar,将jar包拷贝至HBASE_HOME/lib即可
2:执行psql.sh localhost ../examples/web_stat.sql ../examples/web_stat.csv ../examples/web_stat_queries.sql,加载示例数据
3:sqlline.sh localhost(zookeeper地址)进入命令行客户端
相关文档:
wiki主页(文档很详细):
https://github.com/forcedotcom/phoenix/wiki
Quick Start
https://github.com/forcedotcom/phoenix/wiki/Phoenix-in-15-minutes-or-less
Recently Implemented Features
https://github.com/forcedotcom/phoenix/wiki/Recently-Implemented-Features
Phoenix Performance vs Hive,Impala
https://github.com/forcedotcom/phoenix/wiki/Performance#salting
官方实时性能测试结果:
http://phoenix-bin.github.io/client/performance/latest.htm
语法:
http://forcedotcom.github.io/phoenix/index.html
二级索引相关(索引的使用需要调用Phoenix API):
二级索引(多列时)使用需要在hbase-site.xml中加入如下配置
<property>
<name>hbase.regionserver.wal.codec</name>
<value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value>
</property>
创建索引例子:
create table usertable (id varchar primary key, firstname varchar, lastname varchar);
create index idx_name on usertable (firstname) include (lastname);
可通过如下方法查看当前SQL语句索引是否生效:
explain select id, firstname, lastname from usertable where firstname = ‘foo‘;
explain的相关说明:
RANGE SCAN means that only a subset of the rows in your table will be scanned over. This occurs if you use one or more leading columns from your primary key constraint. Query that is not filtering on leading PK columns ex. select * from test where pk2=‘x‘
and pk3=‘y‘; will result in full scan whereas the following query will result in range scan select * from test where pk1=‘x‘ and pk2=‘y‘;. Note that you can add a secondary index on your "pk2" and "pk3" columns and that would cause a range scan to be done
for the first query (over the index table).
DEGENERATE SCAN means that a query can‘t possibly return any rows. If we can determine that at compile time, then we don‘t bother to even run the scan.
FULL SCAN means that all rows of the table will be scanned over (potentially with a filter applied if you have a WHERE clause)
SKIP SCAN means that either a subset or all rows in your table will be scanned over, however it will skip large groups of rows depending on the conditions in your filter. See this blog for more detail. We don‘t do a SKIP SCAN if you have no filter on the
leading primary key columns, but you can force a SKIP SCAN by using the /*+ SKIP_SCAN */ hint. Under some conditions, namely when the cardinality of your leading primary key columns is low, it will be more efficient than a FULL SCAN.
索引使用介绍:
- 主键索引:主键索引要按创建时的顺序引用。如primary key(id,name,add),那么会隐式的创建(id),(id,name),(id,name,add)三个索引,如果在where中用这三个条件会用到索引,其他组合则无法使用索引(FULL SCAN)。
- 二级索引:除了要按创建时的顺序引用外,如果查询的列不全在索引或者覆盖索引中则无法使用索引。
举例:
DDL:create table usertable (id varchar primary key, firstname varchar, lastname varchar);
create index idx_name on usertable (firstname);
DML:select id, firstname, lastname from usertable where firstname = ‘foo‘;
此查询不会使用到索引,因为lastname不再索引中。
执行DDL:create idx_name on usertable (firstname) include (lastname)后该查询语句才能使用索引。
遗留问题:include和on在Phoenix中具体有什么区别?
- 查询条件中主键索引+二级索引同时存在的话,Phoenix会自己选择最优索引。
Phoenix的SQL表结构与Hbase结构的映射实验
>>create table user3table (id varchar, firstname varchar, lastname varchar CONSTRAINT PK PRIMARY KEY (id,firstname));
>>!describe user3table
+------------+-------------+------------+-------------+-----------+------------+-------------+---------------+----------------+----------------+------+
| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | COLUMN_NAME | DATA_TYPE | TYPE_NAME | COLUMN_SIZE | BUFFER_LENGTH | DECIMAL_DIGITS | NUM_PREC_RADIX | NULL |
+------------+-------------+------------+-------------+-----------+------------+-------------+---------------+----------------+----------------+------+
| null | null | USER3TABLE | ID | 12 | VARCHAR | null | null | null | null | 1 |
| null | null | USER3TABLE | FIRSTNAME | 12 | VARCHAR | null | null | null | null | 1 |
| _0 | null | USER3TABLE | LASTNAME | 12 | VARCHAR | null | null | null | null | 1 |
+------------+-------------+------------+-------------+-----------+------------+-------------+---------------+----------------+----------------+------+
>>!index user3table;
+-----------+-------------+------------+------------+-----------------+------------+------+------------------+-------------+-------------+------------+
| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | NON_UNIQUE | INDEX_QUALIFIER | INDEX_NAME | TYPE | ORDINAL_POSITION | COLUMN_NAME | ASC_OR_DESC | CARDINALIT |
+-----------+-------------+------------+------------+-----------------+------------+------+------------------+-------------+-------------+------------+
+-----------+-------------+------------+------------+-----------------+------------+------+------------------+-------------+-------------+------------+
>>select * from user3table;
+------------+------------+------------+
| ID | FIRSTNAME | LASTNAME |
+------------+------------+------------+
| hup | zhan | feng |
+------------+------------+------------+
>>hbase>>scan ‘USER3TABLE‘
ROW COLUMN+CELL
hup\x00zhan column=_0:LASTNAME, timestamp=1387875092585, value=feng
hup\x00zhan column=_0:_0, timestamp=1387875092585, value=
1 row(s) in 0.0110 seconds
>>create index idx_test on user3table (firstname) include (lastname);
>>!index user3table;
+-----------+-------------+------------+------------+-----------------+------------+------+------------------+-------------+-------------+------------+
| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | NON_UNIQUE | INDEX_QUALIFIER | INDEX_NAME | TYPE | ORDINAL_POSITION | COLUMN_NAME | ASC_OR_DESC | CARDINALIT |
+-----------+-------------+------------+------------+-----------------+------------+------+------------------+-------------+-------------+------------+
| null | null | USER3TABLE | true | null | IDX_TEST | 3 | 1 | :FIRSTNAME | A | null |
| null | null | USER3TABLE | true | null | IDX_TEST | 3 | 2 | :ID | A | null |
| null | null | USER3TABLE | true | null | IDX_TEST | 3 | 3 | _0:LASTNAME | null | null |
+-----------+-------------+------------+------------+-----------------+------------+------+------------------+-------------+-------------+------------+
>>select * from user3table;
+------------+------------+------------+
| ID | FIRSTNAME | LASTNAME |
+------------+------------+------------+
| hup | zhan | feng |
+------------+------------+------------+
>>hbase>>scan ‘USER3TABLE‘
ROW COLUMN+CELL
hup\x00zhan column=_0:LASTNAME, timestamp=1387875092585, value=feng
hup\x00zhan column=_0:_0, timestamp=1387875092585, value=
1 row(s) in 0.0110 seconds
此外:当表中非主键的列有多个时会统一加后缀:
1:NASalesforce.com\x00Login\x00\x00\x00 column=STATS:ACTIVE_VISITOR, timestamp=1387867968156, value=\x80\x00\x1A"
\x01C%\x17\xFE0
2:NASalesforce.com\x00Login\x00\x00\x00 column=USAGE:CORE, timestamp=1387867968156, value=\x80\x00\x00\x00\x00\x00\x00\xC9
\x01C%\x17\xFE0
3:NASalesforce.com\x00Login\x00\x00\x00 column=USAGE:DB, timestamp=1387867968156, value=\x80\x00\x00\x00\x00\x00\x02\x84
\x01C%\x17\xFE0
4:NASalesforce.com\x00Login\x00\x00\x00 column=USAGE:_0, timestamp=1387867968156, value=
\x01C%\x17\xFE0
结论:
1:Phoenix会把“CONSTRAINT PK PRIMARY KEY (id,firstname)”这样定义的列拼起来加入到Hbase主键中(用\x00进行分割),同时将联合主键涉及到的列合并默认名为"_0"的一列。其值为空。其他列放入Hbase的同名列中存储
2:Phoenix在Hbase中维护了一张系统表(SYSTEM TABLE)来存储相关Phoenix表的scheme元数据。
3:创建二级索引(create index)操作不会影响表结构
4:如果建表时不指定列族,则列族以_0、_1的方式命名
5:如果有多列时value值通过HBase接口获取的并不是直接可用的值(只能通过Phoenix接口获取正常值)
动态scheme相关
1:支持修改列
Example:
ALTER TABLE my_schema.my_table ADD d.dept_id char(10) VERSIONS=10
ALTER TABLE my_table ADD dept_name char(50)
ALTER TABLE my_table ADD parent_id char(15) null primary key
ALTER TABLE my_table DROP COLUMN d.dept_id
ALTER TABLE my_table DROP COLUMN dept_name
ALTER TABLE my_table DROP COLUMN parent_id
ALTER TABLE my_table SET IMMUTABLE_ROWS=true
2:支持修改二级索引
Example:
CREATE INDEX my_idx ON sales.opportunity(last_updated_date DESC)
CREATE INDEX my_idx ON log.event(created_date DESC) INCLUDE (name, payload) SALT_BUCKETS=10
CREATE INDEX IF NOT EXISTS my_comp_idx ON server_metrics ( gc_time DESC, created_date DESC )
DATA_BLOCK_ENCODING=‘NONE‘,VERSIONS=?,MAX_FILESIZE=2000000 split on (?, ?, ?)
ALTER INDEX my_idx ON sales.opportunity DISABLE
ALTER INDEX IF EXISTS my_idx ON server_metrics REBUILD
DROP INDEX my_idx ON sales.opportunity
DROP INDEX IF EXISTS my_idx ON server_metrics
3:应该是不支持修改主键索引(没找到相关信息。理论上也不好支持,因为主键索引就是rowkey的值。)
Java客户端示例代码(直接面向JDBC接口编程):
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.PreparedStatement;
import java.sql.Statement;
public class test {
public static void main(String[] args) throws SQLException {
Statement stmt = null;
ResultSet rset = null;
Connection con = DriverManager.getConnection("jdbc:phoenix:zookeeper");
stmt = con.createStatement();
stmt.executeUpdate("create table test (mykey integer not null primary key, mycolumn varchar)");
stmt.executeUpdate("upsert into test values (1,‘Hello‘)");
stmt.executeUpdate("upsert into test values (2,‘World!‘)");
con.commit();
PreparedStatement statement = con.prepareStatement("select * from test");
rset = statement.executeQuery();
while (rset.next()) {
System.out.println(rset.getString("mycolumn"));
}
statement.close();
con.close();
}
}
单节点测试:
建表:
CREATE TABLE IF NOT EXISTS $table (HOST CHAR(2) NOT NULL,DOMAIN VARCHAR NOT NULL,
FEATURE VARCHAR NOT NULL,DATE DATE NOT NULL,USAGE.CORE BIGINT,USAGE.DB BIGINT,STATS.ACTIVE_VISITOR
INTEGER CONSTRAINT PK PRIMARY KEY (HOST, DOMAIN, FEATURE, DATE))
SPLIT ON (‘CSGoogle‘,‘CSSalesforce‘,‘EUApple‘,‘EUGoogle‘,‘EUSalesforce‘,‘NAApple‘,‘NAGoogle‘,‘NASalesforce‘);
performance_10000000数据表中是1000W条如下数据:
+------+------------+------------+---------------------+----------+----------+----------------+
| HOST | DOMAIN | FEATURE | DATE | CORE | DB | ACTIVE_VISITOR |
+------+------------+------------+---------------------+----------+----------+----------------+
| CS | Apple.com | Dashboard | 2013-12-23 | 363 | 795 | 8390 |
+------+------------+------------+---------------------+----------+----------+----------------+
Query # 1 - Count - SELECT COUNT(1) FROM performance_10000000;
COUNT(1)
--------
10000000
Time: 66.044 sec(s)
Query # 2 - Group By First PK - SELECT HOST FROM performance_10000000 GROUP BY HOST;
HOST
----
CS
EU
NA
Time: 51.43 sec(s)
Query # 3 - Group By Second PK - SELECT DOMAIN FROM performance_10000000 GROUP BY DOMAIN;
DOMAIN
----------
Apple.com
Google.com
Salesforce.com
Time: 46.908 sec(s)
Query # 4 - Truncate + Group By - SELECT TRUNC(DATE,‘DAY‘) DAY FROM performance_10000000 GROUP BY TRUNC(DATE,‘DAY‘);
DAY
-------------------
2013-12-23 00:00:00
2013-12-24 00:00:00
2013-12-25 00:00:00
......
Time: 48.132 sec(s)
Query # 5 - Filter + Count - SELECT COUNT(1) FROM performance_10000000 WHERE CORE<10;
COUNT(1)
--------
198669
Time: 31.301 sec(s)
集群(5*RegionServer)测试:
performance_10000000数据表中是1000W条如下数据:
+------+------------+------------+---------------------+----------+----------+----------------+
| HOST | DOMAIN | FEATURE | DATE | CORE | DB | ACTIVE_VISITOR |
+------+------------+------------+---------------------+----------+----------+----------------+
| CS | Apple.com | Dashboard | 2013-12-23 | 363 | 795 | 8390 |
+------+------------+------------+---------------------+----------+----------+----------------+
Query # 1 - Count - SELECT COUNT(1) FROM performance_10000000;
COUNT(1)
--------
19630614
Time: 13.879 sec(s)
Query # 2 - Group By First PK - SELECT HOST FROM performance_10000000 GROUP BY HOST;
HOST
----
CS
EU
NA
Time: 13.545 sec(s)
Query # 3 - Group By Second PK - SELECT DOMAIN FROM performance_10000000 GROUP BY DOMAIN;
DOMAIN
----------
Apple.com
Google.com
Salesforce.com
Time: 12.907 sec(s)
Query # 4 - Truncate + Group By - SELECT TRUNC(DATE,‘DAY‘) DAY FROM performance_10000000 GROUP BY TRUNC(DATE,‘DAY‘);
DAY
-------------------
2013-12-23 00:00:00
2013-12-24 00:00:00
2013-12-25 00:00:00
......
Time: 13.845 sec(s)
Query # 5 - Filter + Count - SELECT COUNT(1) FROM performance_10000000 WHERE CORE<10;
COUNT(1)
--------
393154
Time: 8.522 sec(s)
优点:
1:命令行和java客户端使用都很简单。尤其是java客户端直接面向JDBC接口编程,封装且优化了Hbase很多细节。
2:在单表操作上性能比Hive Handler好很多(但是handler也有可能会升级加入斜处理器相关聚合等特性)
3:支持多列的二级索引,列数不限。其中可变索引时列数越多写入速度越慢,不可变索引不影响写入速度(参考:https://github.com/forcedotcom/phoenix/wiki/Secondary-Indexing#mutable-indexing)。
4:对Top-N查询速度远超Hive(参考:https://github.com/forcedotcom/phoenix/wiki/Performance#top-n)
5:提供对rowkey分桶的特性,可以实现数据在各个region的均匀分布(参考:https://github.com/forcedotcom/phoenix/wiki/Performance#salting)
6:低侵入性,基本对原Hbase的使用没什么影响
7:提供的函数基本都能cover住绝大多数需求了
8:与Hive不同的是,Phoenix的sql语句更接近标准sql规范。
缺点:
1:Phoenix创建的表Hbase可以识别并使用,但是使用Hbase创建的表,Phoenix不能识别,因为Phoenix对每张表都有其相应的元数据信息。
2:硬伤:多表join操作当前不支持(官方文档对当前2.2.3版本的说法不一致,但3.0应该会支持,有可能会引入Apache Drill把大表join分割成小任务的特性)。
3:目前只支持hbase0.94系列(这个应该问题不大)
其他
1:Phoenix对所有数据都是基于内存进行统计。因此从sql语句复杂程度以及java各数据结构的性能,基本能对其消耗的时间有一个大概的估计。
功能扩展(impala不太熟,主要针对Hive说了)
如果将Hive和Phoenix集成的话,还是很不错的,两者刚好互补。Hive并没使用coprocesser,只是通过把数据取出来做MR,而Phoenix刚好是在单表取数据方面做了很多优化。集成后可以享受到Phoenix的单表操作优势,同时可以解决多表join的问题(在Phoenix估计短时间难做出来大表join的方案,说是要模仿Drill,但是现在Drill本身救处于Alpha阶段)。
如果集成的话主要工作是需要在Hive-hbase-handler中适配Hive相关单表操作到Phoenix的java客户端接口。
不太成熟的问题:
1:是把Phoenix的单表操作移植到Hive中还是把Hive的join移植到Phoenix中?
2:是只对外提供Hive的接口还是同时对外提供Hive和Phoenix两种接口呢?
3:适配的过程还有很多细节
尝试调研了下Phoenix二级索引是否可以达到像华为一样创建完可以无需修改HBase任何代码就享受到二级索引的效果
扩展阅读:
sql for hbase(Phoenix、Impala、Drill): http://www.orzota.com/sql-for-hbase/
SQL on Hadoop的最新进展及7项相关技术分享:http://www.csdn.net/article/2013-10-18/2817214-big-data-hadoop
对比华为HBase二级索引:
缺点:华为二级索引需要在建表时指定列(及不支持动态修改),同时华为代码对Hbase本身侵入性太大(比如balancer要用华为的),难以升级维护。
优点:但是索引建好后,在对Hbase的scan、Puts、Deletes操作时使用Hbase原生代码(无需任何改动)即可享受到索引的效果。也不需要指定使用哪个索引,它会自己使用最优索引。
也就是说如果加上华为索引,Hive-hbase-handler无需改动即可使用二级索引。但是phoenix目前只支持通过phoenix sql方式使用二级索引。
性能对比:暂未测试,估计差不太多
综合看移植phoenix比移植华为更靠谱,phoenix侵入性小,功能更强大,且升级维护方面也比华为要靠谱。但是移植phoenix难度也相对比较大。
但是如果只是想短期起效果,可以尝试下华为索引。
淘宝开源项目Lealone:
是一个可用于HBase的分布式SQL引擎,主要功能就是能用SQL方式(JDBC)查询Hbase,避免了HBase使用的繁琐操作。相对与Phoenix的功能弱多了。
- 支持高性能的分布式事务,
- 使用一个非常新颖的基于局部时间戳的多版本冲突与有效性检测的分布式事务模型
- 是对H2关系数据库SQL引擎的改进和扩展
- HBase建的表Lealone只能读;Lealone建的表Lealone可以读写。
基于Solr的HBase多条件查询:
介绍:ApacheSolr 是一个开源的搜索服务器,Solr 使用 Java 语言开发,主要基于 HTTP 和Apache Lucene 实现。
原理:基于Solr的HBase多条件查询原理很简单,将HBase表中涉及条件过滤的字段和rowkey在Solr中建立索引,通过Solr的多条件查询快速获得符合过滤条件的rowkey值,拿到这些rowkey之后在HBASE中通过指定rowkey进行查询。
缺点:
1:ApacheSolr本身并不是专为HBase设计的。需要专门针对ApacheSolr写Hbase的相关应用,比如HBase写数据时同步更新索引的过程需要我们自己写协处理器。
2:ApacheSolr本身是一个WebService服务,需要额外维护一个或多个ApacheSolr服务器。
参考:
1:基于Solr的HBase多条件查询http://www.cnblogs.com/chenz/articles/3229997.html
2:http://blog.csdn.net/chenjia3615349/article/details/8112289#t40
中文wiki:https://github.com/codefollower/Lealone
个人感觉Phoenix是这些sql for hbase项目里最合适、最有前景一个。
Phoenix(sql on hbase)简介,古老的榕树,5-wow.com