Elasticsearch添加中文分词
安装IK分词插件
从GitHub
上下载项目(我下载到了/tmp
下),并解压
cd /tmpwget https://github.com/medcl/elasticsearch-analysis-ik/archive/master.zipunzip master.zip
进入elasticsearch-analysis-ik-master
cd elasticsearch-analysis-ik/
然后使用mvn
命令,编译出jar包,elasticsearch-analysis-ik-1.4.0.jar
,这个过程可能需要多尝试几次才能成功
mvn package
顺便说一下,mvn
需要安装maven
,在Ubuntu
上,安装maven
的命令如下
apt-cache search mavensudo apt-get install mavenmvn -version
将elasticsearch-analysis-ik-master/
下的ik
文件夹复制到${ES_HOME}/config/
下
将elasticsearch-analysis-ik-master/target
下的elasticsearch-analysis-ik-1.4.0.jar
复制到${ES_HOME}/lib
下
在${ES_HOME}/config/
下的配置文件elasticsearch.yml
中增加ik
的配置,在最后增加
index: analysis: analyzer:ik: alias: [ik_analyzer] type: org.elasticsearch.index.analysis.IkAnalyzerProvider ik_max_word: type: ik use_smart: false ik_smart: type: ik use_smart: trueindex.analysis.analyzer.default.type: ik
同时,还需要在${ES_HOME}/lib
中引入httpclient-4.3.5.jar
和httpcore-4.3.2.jar
IK分词测试
创建一个索引,名为index
curl -XPUT http://localhost:9200/index
为索引index
创建mapping
curl -XPOST http://localhost:9200/index/fulltext/_mapping -d ' { "fulltext": { "_all": {"analyzer": "ik" }, "properties": {"content": { "type" : "string", "boost" : 8.0, "term_vector" : "with_positions_offsets", "analyzer" : "ik", "include_in_all" : true} } }}'
测试
curl 'http://localhost:9200/index/_analyze?analyzer=ik&pretty=true' -d '{ "text":"世界如此之大"}'{ "tokens" : [ { "token" : "text", "start_offset" : 4, "end_offset" : 8, "type" : "ENGLISH", "position" : 1 }, { "token" : "世界", "start_offset" : 11, "end_offset" : 13, "type" : "CN_WORD", "position" : 2 }, { "token" : "如此之", "start_offset" : 13, "end_offset" : 16, "type" : "CN_WORD", "position" : 3 }, { "token" : "如此", "start_offset" : 13, "end_offset" : 15, "type" : "CN_WORD", "position" : 4 }, { "token" : "之大", "start_offset" : 15, "end_offset" : 17, "type" : "CN_WORD", "position" : 5 } ]}