Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
793 views
in Technique[技术] by (71.8m points)

regex - MySQL REGEXP word boundaries [[:<:]] [[:>:]] and double quotes

I'm trying to match some whole-word-expressions with the MySQL REGEXP function. There is a problem, when there are double quotes involved.

The MySQL documentation says: "To use a literal instance of a special character in a regular expression, precede it by two backslash () characters."

But these queries all return 0:

SELECT '"word"' REGEXP '[[:<:]]"word"[[:>:]]';             -> 0
SELECT '"word"' REGEXP '[[:<:]]"word"[[:>:]]';           -> 0
SELECT '"word"' REGEXP '[[:<:]]"word"[[:>:]]';         -> 0
SELECT '"word"' REGEXP '[[:<:]] word [[:>:]]';             -> 0
SELECT '"word"' REGEXP '[[:<:]][[.".]]word[[.".]][[:>:]]'; -> 0

What else can I try to get a 1? Or is this impossible?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Let me quote the documentation first:

[[:<:]], [[:>:]]

These markers stand for word boundaries. They match the beginning and end of words, respectively. A word is a sequence of word characters that is not preceded by or followed by word characters. A word character is an alphanumeric character in the alnum class or an underscore (_).

From the documentation we can see the reason behind your problem and it is not caused by escaping whatsoever. The problem is that you are trying to match the word boundary [[:<:]] right at the beginning of the string which won't work because a word boundary as you can see from the documentation separates a word character from a non-word character, but in your case the first character is a " which isn't a word character so there is no word boundary, the same goes for the last " and [[:>:]].

In order for this to work, you need to change your expression a bit to this one:

"[[:<:]]word[[:>:]]"
 ^^^^^^^    ^^^^^^^

Notice how the word boundary separates a non-word character " from a word character w in the beginning and a " from d at the end of the string.

EDIT: If you always want to use a word boundary at the start and end of the string without knowing if there will be an actual boundary then you might use the following expression:

([[:<:]]|^)"word"([[:>:]]|$)

This will either match a word boundary at the beginning or the start-of-string ^ and the same for the end of the word boundary or end-of-string. I really advise you to study the data you are trying to match and look for common patterns and don't use regular expressions if they are not the right tool for the job.

SQL Fiddle Demo


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...