Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
217 views
in Technique[技术] by (71.8m points)

.net - Regex to parse C# source code to find all strings

I asked this question a long time ago, I wish I had read the answers to When not to use Regex in C# (or Java, C++ etc) first!

I wish to use Regex (regular expressions) to get a list of all strings in my C# source code, including strings that have double quotes embedded in them.

This should not be hard, however before I spend time trying to build the Regex expression up, has anyone got a “pre canned” one already?

This is not as easy as it seems as first due to

  • “av”d”
  • @”ab””cd”
  • @”ab”””
  • @”””ab”
  • etc
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I am posting this as my answer so it stands out to other reading the questions.

As has been pointed out in the helpful comments to my question, it is clear that regex is not a good tool for finding strings in C# code. I could have written a simple “parser” in the time I spent reminding my self of the regex syntax. – (Parser is a over statement as there are no “ in comments etc, it is my source code I am dealing with.)

This seems to sums it up well:

Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems.

However until it breaks on my code I will use the regular expression Blixt has posted, but if it give me problems I will not spend match time trying to fix it before writing my own parser. E.g as a C# string it is

@"@Q(?:[^Q]+|QQ)*Q|Q(?:[^Q\]+|\.)*Q".Replace('Q', '"')

Update, the above regEx had problem, so I just wrote my own parser, including writing unit tests it took about 2 hours to write the parser. That's I lot less time then I spend just trying to find (and test) a pre-canned Regex on the web.

The problem I see to have, is I tend to avoid Regex and just write the string handling code my self, then have a lot of people claim I am wasting the client’s money by not using Regex. However whenever I try to use Regex what seems like a simple match pattern becomes match harder quickly. (None the on-line articles on using Regex in .net that I have read, have a good instruction that make it clear when NOT to use Regex. Likewise with it’s MSDN documentation)

Lets see if we can help solve this problem, I have just created a stack overflow questions “When not to use Regex


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...