Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
189 views
in Technique[技术] by (71.8m points)

How can I manipulate MySQL fulltext search relevance to make one field more 'valuable' than another?

Suppose I have two columns, keywords and content. I have a fulltext index across both. I want a row with foo in the keywords to have more relevance than a row with foo in the content. What do I need to do to cause MySQL to weight the matches in keywords higher than those in content?

I'm using the "match against" syntax.

SOLUTION:

Was able to make this work in the following manner:

SELECT *, 
CASE when Keywords like '%watermelon%' then 1 else 0 END as keywordmatch, 
CASE when Content like '%watermelon%' then 1 else 0 END as contentmatch,
MATCH (Title, Keywords, Content) AGAINST ('watermelon') AS relevance 
FROM about_data  
WHERE MATCH(Title, Keywords, Content) AGAINST ('watermelon' IN BOOLEAN MODE) 
HAVING relevance > 0  
ORDER by keywordmatch desc, contentmatch desc, relevance desc 
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Create three full text indexes

  • a) one on the keyword column
  • b) one on the content column
  • c) one on both keyword and content column

Then, your query:

SELECT id, keyword, content,
  MATCH (keyword) AGAINST ('watermelon') AS rel1,
  MATCH (content) AGAINST ('watermelon') AS rel2
FROM table
WHERE MATCH (keyword,content) AGAINST ('watermelon')
ORDER BY (rel1*1.5)+(rel2) DESC

The point is that rel1 gives you the relevance of your query just in the keyword column (because you created the index only on that column). rel2 does the same, but for the content column. You can now add these two relevance scores together applying any weighting you like.

However, you aren't using either of these two indexes for the actual search. For that, you use your third index, which is on both columns.

The index on (keyword,content) controls your recall. Aka, what is returned.

The two separate indexes (one on keyword only, one on content only) control your relevance. And you can apply your own weighting criteria here.

Note that you can use any number of different indexes (or, vary the indexes and weightings you use at query time based on other factors perhaps ... only search on keyword if the query contains a stop word ... decrease the weighting bias for keywords if the query contains more than 3 words ... etc).

Each index does use up disk space, so more indexes, more disk. And in turn, higher memory footprint for mysql. Also, inserts will take longer, as you have more indexes to update.

You should benchmark performance (being careful to turn off the mysql query cache for benchmarking else your results will be skewed) for your situation. This isn't google grade efficient, but it is pretty easy and "out of the box" and it's almost certainly a lot lot better than your use of "like" in the queries.

I find it works really well.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...