Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
168 views
in Technique[技术] by (71.8m points)

regex - How do I write more maintainable regular expressions?

I have started to feel that using regular expressions decreases code maintainability. There is something evil about the terseness and power of regular expressions. Perl compounds this with side effects like default operators.

I DO have a habit of documenting regular expressions with at least one sentence giving the basic intent and at least one example of what would match.

Because regular expressions are built up I feel it is an absolute necessity to comment on the largest components of each element in the expression. Despite this even my own regular expressions have me scratching my head as though I am reading Klingon.

Do you intentionally dumb down your regular expressions? Do you decompose possibly shorter and more powerful ones into simpler steps? I have given up on nesting regular expressions. Are there regular expression constructs that you avoid due to mainainability issues?

Do not let this example cloud the question.

If the following by Michael Ash had some sort of bug in it would you have any prospects of doing anything but throwing it away entirely?

^(?:(?:(?:0?[13578]|1[02])(/|-|.)31)1|(?:(?:0?[13-9]|1[0-2])(/|-|.)(?:29|30)2))(?:(?:1[6-9]|[2-9]d)?d{2})$|^(?:0?2(/|-|.)293(?:(?:(?:1[6-9]|[2-9]d)?(?:0[48]|[2468][048]|[13579][26])|(?:(?:16|[2468][048]|[3579][26])00))))$|^(?:(?:0?[1-9])|(?:1[0-2]))(/|-|.)(?:0?[1-9]|1d|2[0-8])4(?:(?:1[6-9]|[2-9]d)?d{2})$

Per request the exact purpose can be found using Mr. Ash's link above.

Matches 01.1.02 | 11-30-2001 | 2/29/2000

Non-Matches 02/29/01 | 13/01/2002 | 11/00/02

question from:https://stackoverflow.com/questions/708254/how-do-i-write-more-maintainable-regular-expressions

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Use Expresso which gives a hierarchical, english breakdown of a regex.

Or

This tip from Darren Neimke:

.NET allows regular expression patterns to be authored with embedded comments via the RegExOptions.IgnorePatternWhitespace compiler option and the (?#...) syntax embedded within each line of the pattern string.

This allows for psuedo-code-like comments to be embedded in each line and has the following affect on readability:

Dim re As New Regex ( _
    "(?<=       (?# Start a positive lookBEHIND assertion ) " & _
    "(#|@)      (?# Find a # or a @ symbol ) " & _
    ")          (?# End the lookBEHIND assertion ) " & _
    "(?=        (?# Start a positive lookAHEAD assertion ) " & _
    "   w+     (?# Find at least one word character ) " & _
    ")          (?# End the lookAHEAD assertion ) " & _
    "w+      (?# Match multiple word characters leading up to a word boundary)", _
    RegexOptions.Multiline Or RegexOptions.IgnoreCase Or RegexOptions.IgnoreWhitespace _
)

Here's another .NET example (requires the RegexOptions.Multiline and RegexOptions.IgnorePatternWhitespace options):

static string validEmail = @"    # Find a word boundary
                (?<Username>       # Begin group: Username
                [a-zA-Z0-9._%+-]+  #   Characters allowed in username, 1 or more
                )                  # End group: Username
                @                  # The e-mail '@' character
                (?<Domainname>     # Begin group: Domain name
                [a-zA-Z0-9.-]+     #   Domain name(s), we include a dot so that
                                   #   mail.somewhere is also possible
                .[a-zA-Z]{2,4}     #   The top level domain can only be 4 characters
                                   #   So .info works, .telephone doesn't.
                )                  # End group: Domain name
                                 # Ending on a word boundary
                ";

If your RegEx is applicable to a common problem, another option is to document it and submit to RegExLib, where it will be rated and commented upon. Nothing beats many pairs of eyes...

Another RegEx tool is The Regulator


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...