Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
311 views
in Technique[技术] by (71.8m points)

python - Using regex to remove comments from source files

I'm making a program to automate the writing of some C code, (I'm writing to parse strings into enumerations with the same name) C's handling of strings is not that great. So some people have been nagging me to try python.

I made a function that is supposed to remove C-style /* COMMENT */ and //COMMENT from a string: Here is the code:

def removeComments(string):
    re.sub(re.compile("/*.*?*/",re.DOTALL ) ,"" ,string) # remove all occurance streamed comments (/*COMMENT */) from string
    re.sub(re.compile("//.*?
" ) ,"" ,string) # remove all occurance singleline comments (//COMMENT
 ) from string

So I tried this code out.

str="/* spam * spam */ eggs"
removeComments(str)
print str

And it apparently did nothing.

Any suggestions as to what I've done wrong?

There's a saying I've heard a couple of times:

If you have a problem and you try to solve it with Regex you end up with two problems.


EDIT: Looking back at this years later. (after a fair bit more parsing experience)

I think regex may have been the right solution. And the simple regex used here "good enough". I may not have emphasized this enough in the question. This was for a single specific file. That had no tricky situations. I think it would be a lot less maintenance to keep the file being parsed simple enough for the regex, than to complicate the regex, into an unreadable symbol soup. (e.g. require that the file only use // single line comments.)

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

What about "//comment-like strings inside quotes"?

OP is asking how to do do it using regular expressions; so:

def remove_comments(string):
    pattern = r"(".*?"|'.*?')|(/*.*?*/|//[^
]*$)"
    # first group captures quoted strings (double or single)
    # second group captures comments (//single-line or /* multi-line */)
    regex = re.compile(pattern, re.MULTILINE|re.DOTALL)
    def _replacer(match):
        # if the 2nd group (capturing comments) is not None,
        # it means we have captured a non-quoted (real) comment string.
        if match.group(2) is not None:
            return "" # so we will return empty to remove the comment
        else: # otherwise, we will return the 1st group
            return match.group(1) # captured quoted-string
    return regex.sub(_replacer, string)

This WILL remove:

  • /* multi-line comments */
  • // single-line comments

Will NOT remove:

  • String var1 = "this is /* not a comment. */";
  • char *var2 = "this is // not a comment, either.";
  • url = 'http://not.comment.com';

Note: This will also work for Javascript source.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...