用elasticsearch索引mongodb数据

浏览数：304 / 时间：2015年06月12日

三个步骤：

一，搭建单机replicSet
二，安装mongodb-river插件
三，创建meta，验证使用

第一步，搭建单机mongodb的replSet

1，配置/etc/mongodb.conf
增加两个配置：

replSet=rs0 #这里是指定replSet的名字 
oplogSize=100 #这里是指定oplog表数据大小（太大了不支持）

启动mongodb：bin/mongod --fork --logpath /data/db/mongodb.log -f /etc/mongodb.conf

2，初始化replicSet

root# bin/mongo
>rs.initiate( {"_id" : "rs0", "version" : 1, "members" : [ { "_id" : 0, "host" : "127.0.0.1:27017" } ]})

3，搭建好replicSet之后，退出mongo shell重新登录，提示符会变成：

rs0:PRIMARY>

第二步，安装mongodb-river插件

插件项目：https://github.com/richardwilly98/elasticsearch-river-mongodb
安装插件命令：

bin/plugin --install com.github.richardwilly98.elasticsearch/elasticsearch-river-mongodb/2.0.0

完毕后启动elasticsearch，正常会显示如下提示信息：

root# bin/elasticsearch

...
[2014-03-14 19:28:34,179][INFO ][plugins] [Super Rabbit] loaded [mongodb-river], sites [river-mongodb]
[2014-03-14 19:28:41,032][INFO ][org.elasticsearch.river.mongodb.MongoDBRiver] Starting river mongodb_test
[2014-03-14 19:28:41,087][INFO ][org.elasticsearch.river.mongodb.MongoDBRiver] MongoDB River Plugin - version[2.0.0] - hash[a0c23f1] - time[2014-02-23T20:40:05Z]
[2014-03-14 19:28:41,087][INFO ][org.elasticsearch.river.mongodb.MongoDBRiver] starting mongodb stream. options: secondaryreadpreference [false], drop_collection [false], include_collection [], throttlesize [5000], gridfs [false], filter [null], db [test], collection [page], script [null], indexing to [test]/[page]
[2014-03-14 19:28:41,303][INFO ][org.elasticsearch.river.mongodb.MongoDBRiver] MongoDB version - 2.2.7

第三步，创建meta信息

1，创建mongodb连接

root# curl -XPUT "localhost:9200/_river/mongodb_mytest/_meta" -d ‘ 
> {
> "type": "mongodb", 
> "mongodb": { 
> "host": "localhost", 
> "port": "27017", 
> "db": "testdb", 
> "collection": "testcollection" 
> }, 
> "index": { 
> "name": "testdbindex", 
> "type": "testcollection"} }‘
{"_index":"_river","_type":"mongodb_mytest","_id":"_meta","_version":1,"created":true}‘
返回created为true，表示创建成功，也可通过curl "http://localhost:9200/_river/mongodb_mytest/_meta"查看

主要分为三个部分：

type：river的类型，也就是“mongodb”
mongodb：mongodb的连接信息
index：elastisearch中用于接收mongodb数据的索引index和“type”。

2，往mongodb插入数据

rs0:PRIMARY> db.testcollection.save({name:"stone"})

3，自定义查询

root# curl -XGET ‘http://localhost:9200/testdbindex/_search?q=name:stone‘
{"took":2,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.30685282,"hits":[{"_index":"testdbindex","_type":"testcollection","_id":"5322eb23fdfc233ffcfa02bb","_score":0.30685282, "_source" : {"_id":"5322eb23fdfc233ffcfa02bb","name":"stone"}}]}}

一个问题（我这边测试不存在这个问题，创建meta后之前mongodb中已存在的数据也会被索引，不过还是把原作者的解决方案放在下面吧）

"在river建立之后的数据变动会体现在elasticsearh里，但是river建立前的数据变动因为没有在oplog表里，不能被同步。解决方案是，遍历一次需要导出的表，重新插入到另外一个表里，然后将river指定到这个新表，这样新表的变动就可以全部体现在oplog里了。"

遍历mongodb的表可以通过cursor来实现：

var myCursor = db.oldcollection.find( { }, {html:0} ); 
myCursor.forEach(function(myDoc) {db.newcollection.save(myDoc); });

附：mongodb&mongodb-river（elasticsearch）部署

如果索引数据多了，elasticsearch的data目录会很大，可以删除索引即可

root# curl -XDELETE ‘http://localhost:9200/testdbindex‘
root# curl -XDELETE ‘http://localhost:9200/_river‘ (这行不知道需不需要)
{"acknowledged":true}

用elasticsearch索引mongodb数据,古老的榕树,5-wow.com

郑重声明：本站内容如果来自互联网及其他传播媒体，其版权均属原媒体及文章作者所有。转载目的在于传递更多信息及用于网络分享，并不代表本站赞同其观点和对其真实性负责，也不构成任何其他建议。

用elasticsearch索引mongodb数据

标签： http it color class java html 类代码 lu blog oc http it color class java html 类代码 lu blog oc

用elasticsearch索引mongodb数据

相关文章

随机文章

您可能还喜欢

您可能还喜欢

最新图文

您可能还喜欢

您可能还喜欢

文摘排行

文章排行

推荐文章

图文排行

推荐图文