Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
636 views
in Technique[技术] by (71.8m points)

elasticsearch - Elastic search- search_analyzer vs index_analyzer

I was looking at http://euphonious-intuition.com/2012/08/more-complicated-mapping-in-elasticsearch/ which explains ElasticSearch analyzers.

I did not understand the part about having different search and index analyzers. The second example of custom mapping goes like this:
->the index analyzer is an edgeNgram
->the search analyzer is:

"full_name":{
    "filter":[
        "standard",
        "lowercase",
        "asciifolding"
    ],
    "type":"custom",
    "tokenizer":"standard"
}

if we wanted the query "Race" to not return results like *ra*pport and *rac*ial due to edgeNgram, why index it with edgeNgram in the first place?

Please explain with an example where different analyzers are useful.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You usually have similar analysis chain at both index time and query time. Similar doesn't mean exactly the same, but usually the way you index documents reflects the way you query them.

The ngrams example is a really good fit though, since it's one of the main reasons why you would use different analyzers at index and query time.

For partial matches you index with edge ngrams, so that "elasticsearch" becomes (with mingram 3 and maxgram 20):

"ela", "elas","elast","elasti","elastic","elastics","elasticse","elasticsea","elasticsear","eleasticsearc" and "elasticsearch"

Let's now query the created field. If we query for the term "elastic" there's a match and we get back the expected result. We basically made become what we called above partial match an exact match, given what we indexed. There's no need to apply ngrams to the query too. If we did we would query for all the following terms:

"ela", "elas","elast","elasti" and "elastic"

That would make the query way more complex and would lead to get weird results as well. Let's say you index the term "elapsed" in another document, same field. You would have the following ngrams:

"ela", "elap", "elaps", "elapse", "elapsed"

If you search for "elastic" and make ngrams to the query, the term "ela" would match this second document too, thus you would get it back together with the first document, even though no terms contain the whole "elastic" term you were looking for.

I would suggest you to have a look at the analyze api to play around around with different analyzer and their different results.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...