Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
789 views
in Technique[技术] by (71.8m points)

regex - Perl: Matching string not containing PATTERN

While using Perl regex to chop a string down into usable pieces I had the need to match everything except a certain pattern. I solved it after I found this hint on Perl Monks:

/^(?:(?!PATTERN).)*$/;    # Matches strings not containing PATTERN

Although I solved my initial problem, I have little clue about how it actually works. I checked perlre, but it is a bit too formal to grasp.

Regular expression to match a line that doesn't contain a word? helps a lot in understanding, but why is the . in my example and the ?: and how do the outer parentheses work?

Can someone break up the regex and explain in simple words how it works?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Building it up piece by piece (and throughout assuming no newlines in the string or PATTERN):

This matches any string:

/^.*$/

But we don't want . to match a character that starts PATTERN, so replace

.

with

(?!PATTERN).

This uses a negative look-ahead that tests a given pattern without actually consuming any of the string and only succeeds if the pattern does not match at the given point in the string. So it's like saying:

if PATTERN doesn't match at this point,
    match the next character

This needs to be done for every character in the string, so * is used to match zero or more times, from the beginning to the end of the string.

To make the * apply to the combination of the negative look-ahead and ., not just the ., it needs to be surrounded by parentheses, and since there's no reason to capture, they should be non-capturing parentheses (?: ):

(?:(?!PATTERN).)*

And putting back the anchors to make sure we test at every position in the string:

/^(?:(?!PATTERN).)*$/

Note that this solution is particularly useful as part of a larger match; e.g. to match any string with foo and later baz but no bar in between:

/foo(?:(?!bar).)*baz/

If there aren't such considerations, you can simply do:

/^(?!.*PATTERN)/

to check that PATTERN does not match anywhere in the string.

About newlines: there are two problems with your regex and newlines. First, . doesn't match newlines, so "foo bar" =~ /^(?:(?!baz).)*$/ doesn't match, even though the string does not contain baz. You need to add the /s flag to make . match any character; "foo bar" =~ /^(?:(?!baz).)*$/s correctly matches. Second, $ doesn't match just at the end of the string, it also can match before a newline at the end of the string. So "foo " =~ /^(?:(?!s).)*$/s does match, even though the string contains whitespace and you are attempting to only match strings with no whitespace; z always only matches at the end, so "foo " =~ /^(?:(?!s).)*z/s correctly fails to match the string that does in fact contain a s. So the correct general purpose regex is:

/^(?:(?!PATTERN).)*z/s

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...