Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
515 views
in Technique[技术] by (71.8m points)

regex - Regular Expression Vs. String Parsing

At the risk of open a can of worms and getting negative votes I find myself needing to ask,

When should I use Regular Expressions and when is it more appropriate to use String Parsing?

And I'm going to need examples and reasoning as to your stance. I'd like you to address things like readability, maintainability, scaling, and probably most of all performance in your answer.

I found another question Here that only had 1 answer that even bothered giving an example. I need more to understand this.

I'm currently playing around in C++ but Regular Expressions are in almost every Higher Level language and I'd like to know how different languages use/ handle regular expressions also but that's more an after thought.

Thanks for the help in understanding it!

Edit: I'm still looking for more examples and talk on this but the response so far has been great. :)

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

It depends on how complex the language you're dealing with is.

Splitting

This is great when it works, but only works when there are no escaping conventions. It does not work for CSV for example because commas inside quoted strings are not proper split points.

foo,bar,baz

can be split, but

foo,"bar,baz"

cannot.

Regular

Regular expressions are great for simple languages that have a "regular grammar". Perl 5 regular expressions are a little more powerful due to back-references but the general rule of thumb is this:

If you need to match brackets ((...), [...]) or other nesting like HTML tags, then regular expressions by themselves are not sufficient.

You can use regular expressions to break a string into a known number of chunks -- for example, pulling out the month/day/year from a date. They are the wrong job for parsing complicated arithmetic expressions though.

Obviously, if you write a regular expression, walk away for a cup of coffee, come back, and can't easily understand what you just wrote, then you should look for a clearer way to express what you're doing. Email addresses are probably at the limit of what one can correctly & readably handle using regular expressions.

Context free

Parser generators and hand-coded pushdown/PEG parsers are great for dealing with more complicated input where you need to handle nesting so you can build a tree or deal with operator precedence or associativity.

Context free parsers often use regular expressions to first break the input into chunks (spaces, identifiers, punctuation, quoted strings) and then use a grammar to turn that stream of chunks into a tree form.

The rule of thumb for CF grammars is

If regular expressions are insufficient but all words in the language have the same meaning regardless of prior declarations then CF works.

Non context free

If words in your language change meaning depending on context, then you need a more complicated solution. These are almost always hand-coded solutions.

For example, in C,

#ifdef X
  typedef int foo
#endif

foo * bar

If foo is a type, then foo * bar is the declaration of a foo pointer named bar. Otherwise it is a multiplication of a variable named foo by a variable named bar.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...