Lucene使用笔记

浏览数：21 / 时间：2015年06月09日

　　本学期学习了做一个自己的Web搜索引擎，其中使用到了Lucene包，发现这个开源包简单易用，所以记录一下。

　　首先是Lucene的索引结构：

技术分享

从左往右看，是读索引的过程，从右往左看，则是构建索引的过程。

其中，所有中包含文档（Document）每篇文档中又包含（Field），field为自己设置的域。

先看创建索引过程吧：

1.创建Field

　　Field field=new Field(Field名称，Field内容，存储方式，索引方式）

例如： field=new Field("content",content,Field.Store.YES,Field.Index.ANALYZED,Field.TermVector.WITH_POSITIONS_OFFSETS);

存储方式：

（1）Field.Store.NO为不存储，一般用于检索的文档的正文。

（2）Field.Store.YES完全存储，如标题可用完全存储。

（3）Field.Store.COMPRESS压缩存储，正文也可以用压缩存储。

索引方式：

（1）Field.Index.NO不索引

（2）Field.Index.NO_NORMS索引但不分析

（3）Field.Index.UN_TOKENIZED索引但不分词

（4）Field.Index.TOKENIZED分词并索引

2.创建Document

Document doc=new Document();(空Document)

doc.add(Field);

3.创建IndexWriter

IndexWriter writer=new IndexWriter(存储索引的路径，分析器的实例);

分析器实例：做词法分析的(StandardAnlyzer......)同时推荐大家IK-Analyzer。

通过IndexWriter联系逻辑索引和物理索引，在此创建的为空索引器，需要添加Document

writer.add(Document);

创建完成记得关闭writer.close();

FSDirectory是Lucene对文件系统的操作，有下面三个子类

1.SimpleFSDirectory 不能很好地支持多线程操作，要做到这一点必须在内部加入锁。

2.NIOFSDirectory 支持Windows外的多线程操作，在win上的性能较差。

3.MapDirectory 使用内存映射的I/O接口进行读写，支持多线程，但是消耗空间。

代码示例：

import java.io.File;

import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.SimpleFSDirectory;
import org.apache.lucene.util.Version;
import org.apache.lucene.analysis.standard.StandardAnalyzer;

public class BasicIndexer {

   public static void main(String[] args) throws java.io.IOException{
       // TODO Auto-generated method stub
       SimpleFSDirectory dir=new SimpleFSDirectory(new File("index"));
       IndexWriter writer=new IndexWriter(dir, new IndexWriterConfig(Version.LUCENE_36, new StandardAnalyzer(Version.LUCENE_36)));

       //该示例中有两个文档，即一个Document，包含两个Field
        //First doc
       Document doc=new Document();
       String title="I love China";
       Field field=new Field("title",title,Field.Store.YES,Field.Index.ANALYZED);
       doc.add(field);
       String content="I love you, China";
       field=new Field("content",content,Field.Store.YES,Field.Index.ANALYZED);
       doc.add(field);
       writer.addDocument(doc);

       //second doc
       doc=new Document();
       title="I love mom";
       field=new Field("title",title,Field.Store.YES,Field.Index.ANALYZED);
       doc.add(field);
       content="I love you, my mother";
       field=new Field("title",title,Field.Store.YES,Field.Index.ANALYZED);
       doc.add(field);
       writer.addDocument(doc);

       writer.close();
       System.out.println("Index Created!");
   }

}

到此索引便建立起来了。

接着便是执行搜索，以下为示例代码：

import java.io.File;

import org.apache.lucene.index.*;
import org.apache.lucene.search.*;
import org.apache.lucene.store.SimpleFSDirectory;

public class BasicSearcher {

   public static void main(String[] args) throws java.io.IOException{
       // TODO Auto-generated method stub
       SimpleFSDirectory indexPath=new SimpleFSDirectory(new File("index"));
       IndexReader reader=IndexReader.open(indexPath);
       IndexSearcher searcher=new IndexSearcher(reader);
       String searchField="content";//要执行搜索的Field
       String searchPhrase="汉语";//要搜索的内容


       Term t=new Term(searchField,searchPhrase);
       Query q=new TermQuery(t);

       TopDocs tdc=searcher.search(q,5);
       ScoreDoc[] sdc=tdc.scoreDocs;
       System.out.println(tdc.totalHits);
       for(int i=0;i<sdc.length;i++)
       {
           int doc=sdc[i].doc;
           System.out.println(doc);
           System.out.println(reader.document(sdc[i].doc).get(searchField));
       }
       reader.close();
       searcher.close();
   }

}
在Lucene中还可以进行模糊查询等查询功能，同时也支持用户自定义的加权求文档向量空间得分。