Alex 的 Hadoop 菜鸟教程: 第8课 Hbase 的 java调用方法
声明
- 本文基于Centos 6.x + CDH 5.x
- 本例中 Hbase 是安装成集群模式的
- 本文基于Maven3.5+ 和 Eclipse 4.3
- 教程后的参考资料建议大家一定要看下
环境搭建
建立项目
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>org.crazycake</groupId> <artifactId>playhbase</artifactId> <version>0.0.1-SNAPSHOT</version> <packaging>jar</packaging> <name>playhbase</name> <url>http://maven.apache.org</url> <properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> </properties> <resources> <resource> <directory>${basedir}/conf</directory> <filtering>false</filtering> <includes> <include>hbase-site.xml</include> </includes> </resource> </resources> <dependencies> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>3.8.1</version> <scope>test</scope> </dependency> <dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-client</artifactId> <version>0.98.4-hadoop2</version> </dependency> </dependencies> <build> <plugins> <plugin> <artifactId>maven-compiler-plugin</artifactId> <version>2.0.2</version> <configuration> <source>1.6</source> <target>1.6</target> <encoding>UTF-8</encoding> <optimise>true</optimise> <compilerArgument>-nowarn</compilerArgument> </configuration> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-shade-plugin</artifactId> <version>2.3</version> <configuration> <transformers> <transformer implementation="org.apache.maven.plugins.shade.resource.ApacheLicenseResourceTransformer"> </transformer> </transformers> </configuration> <executions> <execution> <phase>package</phase> <goals> <goal>shade</goal> </goals> </execution> </executions> </plugin> </plugins> </build> </project>
- 除了引入Hbase 的Jar包以外,还引入了一个maven插件叫 maven-shade-plugin ,这个插件可以防止出现“证书重复问题”,重复的证书文件会造成HDInsight集群在运行报错。
- 配置中还增加了一个resource,这个resource引用了一个配置文件hbase-site.xml,在这里写上hbase的连接信息
建立配置文件
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- /** * Copyright 2010 The Apache Software Foundation * * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. The ASF licenses this file * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ --> <configuration> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>host1,host2</value> </property> <property> <name>hbase.zookeeper.property.clientPort</name> <value>2181</value> </property> </configuration>
这个host1 跟 host2 就是你安装zookeeper的机器,因为我只装了两台机器,所以只有host1 和 host2,正常情况下至少是要3个,并且是奇数个增长
操作
创建表并插入数据
package org.crazycake.playhbase; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.HColumnDescriptor; import org.apache.hadoop.hbase.HTableDescriptor; import org.apache.hadoop.hbase.MasterNotRunningException; import org.apache.hadoop.hbase.TableName; import org.apache.hadoop.hbase.ZooKeeperConnectionException; import org.apache.hadoop.hbase.client.HBaseAdmin; import org.apache.hadoop.hbase.client.HTable; import org.apache.hadoop.hbase.client.Put; import org.apache.hadoop.hbase.util.Bytes; public class CreateTable { public static void main(String[] args) throws MasterNotRunningException, ZooKeeperConnectionException, IOException { Configuration config = HBaseConfiguration.create(); // 这边注释起来的是动态设定zookeeper参数的方法,如果你没有hbase-site.xml 或者想动态改变 // 可以采用动态方式设定 // // config.set("hbase.zookeeper.quorum", // "zookeepernode0,zookeepernode1,zookeepernode2"); //config.set("hbase.zookeeper.property.clientPort", "2181"); //config.set("hbase.cluster.distributed", "true"); // 使用配置文件创建一个 admin 对象 HBaseAdmin admin = new HBaseAdmin(config); // 创建表 HTableDescriptor tableDescriptor = new HTableDescriptor(TableName.valueOf("people")); // 创建2个列簇 tableDescriptor.addFamily(new HColumnDescriptor("name")); tableDescriptor.addFamily(new HColumnDescriptor("contactinfo")); admin.createTable(tableDescriptor); // 接下来搞点数据进去呗 String[][] people = { { "1", "Marcel", "Haddad", "[email protected]"}, { "2", "Franklin", "Holtz", "[email protected]" }, { "3", "Dwayne", "McKee", "[email protected]" }, { "4", "Rae", "Schroeder", "[email protected]" }, { "5", "Rosalie", "burton", "[email protected]"}, { "6", "Gabriela", "Ingram", "[email protected]"} }; HTable table = new HTable(config, "people"); // 把这些数据插入到表里面 for (int i = 0; i< people.length; i++) { //第一列做rowkey Put person = new Put(Bytes.toBytes(people[i][0])); //把 Marcel 放到 name 这个列簇的 first 这个字段去 person.add(Bytes.toBytes("name"), Bytes.toBytes("first"), Bytes.toBytes(people[i][1])); person.add(Bytes.toBytes("name"), Bytes.toBytes("last"), Bytes.toBytes(people[i][2])); person.add(Bytes.toBytes("contactinfo"), Bytes.toBytes("email"), Bytes.toBytes(people[i][3])); table.put(person); } // 最后要记得提交和关闭表 table.flushCommits(); table.close(); } }
tail -200f /var/log/hbase/hbase-hbase-master-host1.localdomain.log
hbase(main):003:0> scan 'people' ROW COLUMN+CELL 1 column=contactinfo:email, timestamp=1421338694666, value=[email protected] 1 column=name:first, timestamp=1421338694666, value=Marcel 1 column=name:last, timestamp=1421338694666, value=Haddad 2 column=contactinfo:email, timestamp=1421338694932, value=[email protected] 2 column=name:first, timestamp=1421338694932, value=Franklin 2 column=name:last, timestamp=1421338694932, value=Holtz 3 column=contactinfo:email, timestamp=1421338694977, value=[email protected] 3 column=name:first, timestamp=1421338694977, value=Dwayne 3 column=name:last, timestamp=1421338694977, value=McKee 4 column=contactinfo:email, timestamp=1421338695034, value=[email protected] 4 column=name:first, timestamp=1421338695034, value=Rae 4 column=name:last, timestamp=1421338695034, value=Schroeder 5 column=contactinfo:email, timestamp=1421338695054, value=[email protected] 5 column=name:first, timestamp=1421338695054, value=Rosalie 5 column=name:last, timestamp=1421338695054, value=burton 6 column=contactinfo:email, timestamp=1421338695076, value=[email protected] 6 column=name:first, timestamp=1421338695076, value=Gabriela 6 column=name:last, timestamp=1421338695076, value=Ingram 6 row(s) in 0.3910 seconds
根据email来搜索
package org.crazycake.playhbase; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.client.HTable; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.client.ResultScanner; import org.apache.hadoop.hbase.client.Scan; import org.apache.hadoop.hbase.filter.CompareFilter.CompareOp; import org.apache.hadoop.hbase.filter.RegexStringComparator; import org.apache.hadoop.hbase.filter.SingleColumnValueFilter; import org.apache.hadoop.hbase.util.Bytes; /** * 根据email 来搜索用户 * @author alexxiyang (https://github.com/alexxiyang) * */ public class SearchByEmail { public static void main(String[] args) throws IOException { //创建配置 Configuration config = HBaseConfiguration.create(); // 打开表 HTable table = new HTable(config, "people"); //定义一系列要用到的列簇和列 // 定义列簇 byte[] contactFamily = Bytes.toBytes("contactinfo"); // 列 byte[] emailQualifier = Bytes.toBytes("email"); //列簇 byte[] nameFamily = Bytes.toBytes("name"); //列 byte[] firstNameQualifier = Bytes.toBytes("first"); byte[] lastNameQualifier = Bytes.toBytes("last"); // 创建一个正则表达式的比较器 RegexStringComparator emailFilter = new RegexStringComparator("[email protected]"); // 创建一个filter,把这个正则比较器传进去 SingleColumnValueFilter filter = new SingleColumnValueFilter(contactFamily, emailQualifier, CompareOp.EQUAL, emailFilter); // 创建一个 scan对象 Scan scan = new Scan(); //把filter 传进去 scan.setFilter(filter); // 开始查询,并获取结果 ResultScanner results = table.getScanner(scan); // 遍历结果打印数据 for (Result result : results) { String id = new String(result.getRow()); byte[] firstNameObj = result.getValue(nameFamily, firstNameQualifier); String firstName = new String(firstNameObj); byte[] lastNameObj = result.getValue(nameFamily, lastNameQualifier); String lastName = new String(lastNameObj); System.out.println(firstName + " " + lastName + " - ID: " + id); byte[] emailObj = result.getValue(contactFamily, emailQualifier); String email = new String(emailObj); System.out.println(firstName + " " + lastName + " - " + email + " - ID: " + id); } //关闭结果 results.close(); //关闭表 table.close(); } }运行结果
Rosalie burton - ID: 5 Rosalie burton - [email protected] - ID: 5
删除表
package org.crazycake.playhbase; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.client.HBaseAdmin; /** * 删除表 * @author alexxiyang (https://github.com/alexxiyang) * */ public class DeleteTable { public static void main(String[] args) throws IOException { //创建配置 Configuration config = HBaseConfiguration.create(); // 建立 admin HBaseAdmin admin = new HBaseAdmin(config); // 先 disable 表,再delete admin.disableTable("people"); admin.deleteTable("people"); } }
去hbase检查下结果
hbase(main):004:0> list TABLE employee employee2 student users 4 row(s) in 4.8460 seconds => ["employee", "employee2", "student", "users"]
people表没有了
我遇到的问题
hbase(main):002:0> create 'users','info' 0 row(s) in 36.5110 seconds => Hbase::Table - users hbase(main):003:0> list TABLE employee employee2 student users 4 row(s) in 0.4520 seconds => ["employee", "employee2", "student", "users"] hbase(main):004:0> put 'users',1,'info:name','ted' 0 row(s) in 0.8350 seconds hbase(main):005:0> scan 'users' ROW COLUMN+CELL 1 column=info:name, timestamp=1421252020520, value=ted 1 row(s) in 0.3140 seconds
这样问题就有可能出在zookeeper上了,因为你的java API 不是直接跟hbase交互的,是先通过zookeeper交互,所以我就去看下zookeeper的日志,我用tail监听zookeeper日志
tail -200f /var/log/zookeeper/zookeeper.log
参考资料
郑重声明:本站内容如果来自互联网及其他传播媒体,其版权均属原媒体及文章作者所有。转载目的在于传递更多信息及用于网络分享,并不代表本站赞同其观点和对其真实性负责,也不构成任何其他建议。