Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
230 views
in Technique[技术] by (71.8m points)

c# - Howto perform a 'contains' search rather than 'starts with' using Lucene.Net

We use Lucene.NET to implement a full text search on a clients website. The search itself works already but we now want to implement a modification.

Currently all terms get appended a * which leads Lucene to perform what I would classify as a StartsWith search.

In the future we would like to have a search that performs something like a Contains rather than a StartsWith.

We use

  • Lucene.Net 2.9.2.2
  • StandardAnalyzer
  • default QueryParser

Samples:

(Title:Orch*) matches: Orchestra

but:

(Title:rch*) does not match: Orchestra

We want the first and the second one to both match Orchestra.

Basically I want the exact opposite of what was asked in this question, I'm not sure why for this person Lucene performed a Contains and rather than a StartsWith by default:
Why is this Lucene query a "contains" instead of a "startsWith"?

How can we make this happen?
I have the feeling it has something to do with the Analyzer but I'm not sure.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

First off, I assume you're using StandardAnalyzer, or something similar. Your linked question fail to understand that you search for terms, and his case a* will match "Fleet Africa" because it's tokenized into "fleet" and "africa".

You need to call QueryParser.SetAllowLeadingWildcard(true) to be able to write queries like field:*value*. Are you actually changing the string that's passed to QueryParser?

You could parse the query as usual, and then implement a QueryVisitor that rewrites all TermQuery into WildcardQuery. That way you still support phrase searches.

I see no good things in rewriting queries into prefix- or wildcard-queries. There is very little shared between an orc, or a chest, and an Orchestra, but both words will match. Instead, hook up your customer with an analyzer that supports stemming, synonyms, and provide a spell correction feature to fix simple searching mistakes.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...