Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
975 views
in Technique[技术] by (71.8m points)

regex - finding words surround by quotations perl

I am reading another perl file line by line and need to find any words or set of words surround by single or double quotations. This is an example of the code I am reading in:

#!/usr/bin/env perl
use strict;
use warnings;

my $string = 'Hello World!';
print "$string
"; 

Basically, I need to find and print out 'Hello World!' and "$string ".

I've read my file in fine and stored its contents in an array. From there I'm looping over each line and find the desired set of words in the quotations using regex as such:

for(@contents) {
   if(/"|'[^"|']*"|'/) {
       print $_."
";
   }
}

which gives me the following output:

my $string = 'Hello World!';
print "$string
"; 

I tried splitting the contents by whitespace and then trying to find a match, but that gives me this:

'Hello
World!'
"$string
";

I've tried numerous solutions other suggested on here but to no avail. I have also tried Text::ParseText and using parse_line, but that gives me the complete wrong output.

Any ideas that could help me?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Just need to add some capturing parenthesis to your regex, instead of printing the whole line

use strict;
use warnings;

while (<DATA>) {
    if(/(["'][^"']*["'])/) {
        print "$1
";
    }
}

__DATA__
#!/usr/bin/env perl
use strict;
use warnings;

my $string = 'Hello World!';
print "$string
"; 

Note, there are plenty of flaws in your regex though. For example ''' Won't match properly. Neither will "He said 'boo'". To get closer you'll have to do some balanced parenthesis checking, but there isn't going to be any perfect solution.

For a solution that is a little closer, you could use the following:

if(/('(?:(?>[^'\]+)|\.)*'|"(?:(?>[^"\]+)|\.)*")/) {

That would take care of my above exceptions and also strings like print "how about ' this " and ' more ";, but there are still edge cases like the use of qq{} or q{}. Not to mention strings that span more than one line.

In other words, if your goal is perfect, this project may be outside of the scope of most people's skills, but hopefully the above will be of some help.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...