Elasticsearch添加中文分词

作者：小梦来源: 网络时间: 2024-05-05 阅读: 大中小

安装IK分词插件

从GitHub上下载项目(我下载到了/tmp下)，并解压

cd /tmpwget https://github.com/medcl/elasticsearch-analysis-ik/archive/master.zipunzip master.zip

进入elasticsearch-analysis-ik-master

cd elasticsearch-analysis-ik/

然后使用mvn命令，编译出jar包，elasticsearch-analysis-ik-1.4.0.jar，这个过程可能需要多尝试几次才能成功

mvn package

顺便说一下，mvn需要安装maven，在Ubuntu上，安装maven的命令如下

apt-cache search mavensudo apt-get install mavenmvn -version

将elasticsearch-analysis-ik-master/下的ik文件夹复制到${ES_HOME}/config/下

将elasticsearch-analysis-ik-master/target下的elasticsearch-analysis-ik-1.4.0.jar复制到${ES_HOME}/lib下

在${ES_HOME}/config/下的配置文件elasticsearch.yml中增加ik的配置，在最后增加

index:  analysis:           analyzer:ik:          alias: [ik_analyzer]          type: org.elasticsearch.index.analysis.IkAnalyzerProvider      ik_max_word:          type: ik          use_smart: false      ik_smart:          type: ik          use_smart: trueindex.analysis.analyzer.default.type: ik

同时，还需要在${ES_HOME}/lib中引入httpclient-4.3.5.jar和httpcore-4.3.2.jar

IK分词测试

创建一个索引，名为index

curl -XPUT http://localhost:9200/index

为索引index创建mapping

curl -XPOST http://localhost:9200/index/fulltext/_mapping -d ' {        "fulltext": { "_all": {"analyzer": "ik"        },       "properties": {"content": {    "type" : "string",    "boost" : 8.0,    "term_vector" : "with_positions_offsets",    "analyzer" : "ik",    "include_in_all" : true}        }    }}'

测试

curl 'http://localhost:9200/index/_analyze?analyzer=ik&pretty=true' -d '{   "text":"世界如此之大"}'{  "tokens" : [ {    "token" : "text",    "start_offset" : 4,    "end_offset" : 8,    "type" : "ENGLISH",    "position" : 1  }, {    "token" : "世界",    "start_offset" : 11,    "end_offset" : 13,    "type" : "CN_WORD",    "position" : 2  }, {    "token" : "如此之",    "start_offset" : 13,    "end_offset" : 16,    "type" : "CN_WORD",    "position" : 3  }, {    "token" : "如此",    "start_offset" : 13,    "end_offset" : 15,    "type" : "CN_WORD",    "position" : 4  }, {    "token" : "之大",    "start_offset" : 15,    "end_offset" : 17,    "type" : "CN_WORD",    "position" : 5  } ]}

标签:ik analysis type offset elasticsearch 收藏本文

当前位置

Elasticsearch添加中文分词

安装IK分词插件

IK分词测试

相关阅读

热点阅读

网友最爱