Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
308 views
in Technique[技术] by (71.8m points)

full text search - Wrong spell-check suggestions by Solr

Working on Spell Suggest with Solr 4.1.

We configured it correctly and Solr offers term as well as collate suggestions. However, we noticed that many times the suggested word / collate doesn't have any results if we search it again.

For example, we searched for term "confort" and got no results, with two suggestions "comfort" and "convert". The first term contains the result.. however the second term doesn't bring any result, and instead suggested two more terms, so term "convert" offers no result with following suggestions - "connect" and "content". Here also, we found that "connect" is having few results but "content" doesn't have any and offered following suggestions.. i.e. "connect" and "continent". Here also we found that "continent" doesn't have any results and it suggested "connect".

The same happens for many search terms and even collate. We're clueless what is causing this? Can we turn off such suggestions which doesn't carry any result?

My Solr Config

<requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
    <lst name="defaults">
      <str name="df">Name</str>
      <str name="spellcheck.dictionary">default</str>
      <str name="spellcheck.dictionary">wordbreak</str>
      <str name="spellcheck">on</str>
      <str name="spellcheck.extendedResults">true</str>       
      <str name="spellcheck.count">10</str>
      <str name="spellcheck.alternativeTermCount">5</str>
      <str name="spellcheck.maxResultsForSuggest">5</str>       
      <str name="spellcheck.collate">true</str>
      <str name="spellcheck.collateExtendedResults">true</str>  
      <str name="spellcheck.maxCollationTries">10</str>
      <str name="spellcheck.maxCollations">5</str>         
    </lst>
    <arr name="last-components">
      <str>spellcheck</str>
    </arr>
</requestHandler>

<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">text</str>
<lst name="spellchecker">
  <str name="name">default</str>
  <str name="field">Name</str>
  <str name="classname">solr.DirectSolrSpellChecker</str>
  <str name="distanceMeasure">internal</str>
  <float name="accuracy">0.5</float>
  <int name="maxEdits">2</int>
  <int name="minPrefix">1</int>
  <int name="maxInspections">5</int>
  <int name="minQueryLength">4</int>
  <float name="maxQueryFrequency">0.01</float>
</lst>

<lst name="spellchecker">
  <str name="name">wordbreak</str>
  <str name="classname">solr.WordBreakSolrSpellChecker</str>      
  <str name="field">Name</str>
  <str name="combineWords">true</str>
  <str name="breakWords">false</str>
  <int name="maxChanges">10</int>     
</lst>
</searchComponent> 

My Schema :

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
    <filter class="solr.LowerCaseFilterFactory"/>   
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

<field name="Name" type="text" indexed="true" stored="true"  required="false" />

My Query : http://localhost:8983/solr/mycore/spell?q=confort&spellcheck=true&Collate=true&spellcheck.extendedResults=true

Result :

<response>
  <lst name="responseHeader">
    <int name="status">0</int>
    <int name="QTime">16</int>
  </lst>
  <result name="response" numFound="0" start="0"/>
  <lst name="spellcheck">
    <lst name="suggestions">
      <lst name="confort">
        <int name="numFound">2</int>
        <int name="startOffset">0</int>
        <int name="endOffset">7</int>
        <int name="origFreq">0</int>
        <arr name="suggestion">
          <lst>
            <str name="word">comfort</str>
            <int name="freq">6</int>
          </lst>
          <lst>
            <str name="word">convert</str>
            <int name="freq">2</int>
          </lst>
        </arr>
      </lst>
      <bool name="correctlySpelled">false</bool>
    </lst></lst>
  </response>
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Are the terms you search on and the spell check enabled on the same ? do they go under the same analysis ?
One reason can be the fields are different and hence the suggestions on field provided do not exist in the fields that are being searched for.
Also, it can be the fields are analysed differently and hence the spell suggestion and the search does not match.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...