Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
670 views
in Technique[技术] by (71.8m points)

windows - How to replace multiple any-character (including newline) in Perl RegEx?

This is a simplified description of MyTextString.txt:

Note: BlaBla stands for any character including new line character.

START BlaBla-In END BlaBla-Out-Between START BlaBla-In END BlaBla-Out-Between START BlaBla-In END BlaBla-Out-Between START BlaBla-In END ...

I'm looking for removing text between END and START (BlaBla-Out-Between) to result like this:

START BlaBla-In END newline START BlaBla-In END newline START BlaBla-In END newline START BlaBla-In END ...

I've a perl file changes.pl:

BEGIN {
    @ARGV = map glob(""$_""), @ARGV;
}

s/(END).*?(START)/$1
$2/sg; #TEST

I should execute my replaces using this CMD line:

perl -i.bak -p changes.pl My/File/Directory/MyTextString.txt

Note: the changes.pl and CMD line are working well as described in this question with other RegEx find and replace strings.

But with this RegEx string no modifications happen to MyTextString.txt:

s/(END).*?(START)/$1
$2/sg;

I think every thing regarding my regular expression syntax is OK as it's working well on regex 101 tester.

I'm looking for matching and replacing any character (including newline) using mentioned changes.pl and CMD line. Simply, I'm looking for replacing BlaBla-Out-Between with newline.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You need to slurp the whole file into a string before doing the substitution. The -p command line switch only reads a line at a time.

It means that the substitution s/(END).*?(START)/$1 $2/sg will only delete anything in those cases where there is an END pattern followed by a START pattern on the same single line.

To slurp the file you can specify an input record separator of octal 0777:

perl -0777 -p -i.bak changes.pl MyTextString.txt

From perlrun:

-0[octal/hexadecimal]

specifies the input record separator ($/ ) as an octal or hexadecimal number. If there are no digits, the null character is the separator. Other switches may precede or follow the digits. ... The special value 00 will cause Perl to slurp files in paragraph mode. Any value 0400 or above will cause Perl to slurp files whole, but by convention the value 0777 is the one normally used for this purpose.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...