Elasticsearch分词插件安装与配置

elasticsearch-analysis-ik


u=1818841916,3214833412&fm=21&gp=0.jpg



1.添加中文分词ik插件(注意版本对应)

       git clone https://github.com/medcl/elasticsearch-analysis-ik

       cd elasticsearch-analysis-ik

       可能会提示:没有maven,这个通过依赖自己可以解决

       maven package

       执行完毕后会在当前目录下生成target/releases目录,将其中生成的内容copy到elasticsearch目录下的plugins/

       将elasticsearch-analysis-ik目录下config/ik复制到elasticsearch目录下的config或者/etc/elasticsearch/下

       配置config/elasticsearch.yml文件添加ik配置:

      index:
        analysis:
            analyzer:
                ik:
            alias: [news_analyzer_ik,ik_analyzer]
            type: org.elasticsearch.index.analysis.IkAnalyzerProvider
            index.analysis.analyzer.default.type : "ik"

       注:不能用tab代替空格,类似python语法

      如果要开启局域网访问,在config/elasticsearch.yml中:

      network.bind_host: 0

  执行 http://localhost:9200/index/_analyze?text=我是中国人&analyzer=ik查看效果

2.添加拼音分词插件(注意版本对应)

elasticsearch-analysis-pinyin

git clone https://github.com/medcl/elasticsearch-analysis-pinyin.git
cd elasticsearch-analysis-pinyin
mvn clean install -Dmaven.test.skip

复制target/releases目录下的*-pinyin.zip并解压到elasticsearch/plugins/下

配置config/elasticsearch.yml文件添加pingyin配置:

index: 
  analysis:
    analyzer:
      pinyin_analyzer:
        tokenizer: xq_pinyin
        filter: [standard,nGram]
    tokenizer:
      xq_pinyin:
        type: pinyin
        first_letter: "prefix"
        padding_char: ""

 或者对单独的某个索引执行配置:

curl -XPUT http://localhost:9200/medcl/ -d' 
{
    "index" : {
        "analysis" : {
            "analyzer" : {
                "pinyin_analyzer" : {
                    "tokenizer" : "my_pinyin",
                    "filter" : ["word_delimiter","standard"]
                }
            },
            "tokenizer" : {
                "my_pinyin" : {
                    "type" : "pinyin",
                    "first_letter" : "prefix",
                    "padding_char" : " "
                }
            }
        }
    }
}'

mapping文件:

curl -XPOST http://localhost:9200/medcl/folks/_mapping -d'
{
    "folks": {
        "properties": {
            "name": {
                "type": "multi_field",
                "fields": {
                    "name": {
                        "type": "string",
                        "store": "no",
                        "term_vector": "with_positions_offsets",
                        "analyzer": "pinyin_analyzer",
                        "boost": 10
                    },
                    "primitive": {
                        "type": "string",
                        "store": "yes",
                        "analyzer": "keyword"
                    }
                }
            }
        }
    }
}'

导入数据:

curl -XPOST http://localhost:9200/_bulk -d'
{ "index":  { "_index": "medcl", "_type": "blog"}}
{ "name":    "中华人民共和国"}
{ "index":  { "_index": "medcl", "_type": "blog"}}
{ "name":    "美利坚合众国"}
{ "index":  { "_index": "medcl", "_type": "blog"}}
{ "name":    "大不列巅和北爱尔兰联合王国"}
{ "index":  { "_index": "medcl", "_type": "blog"}}
{ "name":    "俄罗斯联邦共和国"}
{ "index":  { "_index": "medcl", "_type": "blog"}}
{ "name":    "德意志联邦共和国"}
{ "index":  { "_index": "medcl", "_type": "blog"}}
{ "name":    "法兰西共和国"}
{ "index":  { "_index": "medcl", "_type": "blog"}}
{ "name":    "加拿大"}
{ "index":  { "_index": "medcl", "_type": "blog"}}
{ "name":    "土耳其"}
'

测试插件运行效果:

http://localhost:9200/medcl/blog/_search?q=name:fa lan xi


4 评论

发表评论

电子邮件地址不会被公开。 必填项已用*标注