Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
775 views
in Technique[技术] by (71.8m points)

awk - Using SED to replace specific patterns found within parentheses?

I'm having a bit of a problem with this... I'm trying to use Bash scripting (Sed, in particular) to process the following text. Other methods are welcome, of course! But I'm hoping it could be a Bash solution...

Tricky input:

("a"|"b"|"c")."A"|"B"|"C".("e"|"f")."E"|"F"

Desired output:

("a"|"b"|"c")."ABC".("e"|"f")."EF"

Mainly, I think what I want to do is replace the strings "|" with nothing, but limit the scope of change outside of any existing text in parentheses.

The problems gets more crazy with different forms of text inputs I have with the dataset that I have. As in, the combination of blocks (delimited by .) with parentheses and non-parenthesese is varied.

Thanks in advance.


Something I've tried with SED:

gsed -E "s/(."[[:graph:]]+)"|"/1/g" input.txt

output i get is:

("a"|"b"|"c")."A"|"B"|"C".("e"|"f")."EF"

Looks like I'm only getting the partially desired output...only targeting a limited scope...

question from:https://stackoverflow.com/questions/65836789/using-sed-to-replace-specific-patterns-found-within-parentheses

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Assumptions/understandings:

  • fields are separated by periods
  • fields wrapped in parens are to be left alone
  • all other fields have leading/trailing double quotes while all other double quotes, as well as pipes, are to be removed

Sample data:

$ cat pipes.dat
("a"|"b"|"c")."A"|"B"|"C".("e"|"f")."E"|"F"
"j"|"K"|"L"."m"|"n"|"o"|"p".("x"|"y"|"z")

One awk idea:

awk '
BEGIN { FS=OFS="." }                                      # define input/output field separator as a period

      { printf "############
before: %s
",$0            # print a record separator and the current input line;
                                                          # solely for display purposes; this line can
                                                          # be removed/commented-out once logic is verified

        for (i=1; i<=NF; i++)                             # loop through fields
            if ( $i !~ "^[(].*[)]$" )                     # if field does not start/end with parens then ...
                $i=""" gensub(/"||/,"","g",$i) """     # replace field with a new double quote (+) modified string
                                                          # whereby all double quotes and pipes are removed (+)
                                                          # a new ending double quote

        printf "after : %s
",$0                          # print the newly modified line;
                                                          # can be replaced with "print" once logic is verified
      }
' pipes.dat                                               # read data from file; to read from a variable remove this line and ...
#' <<< "${variable_name}"                                 # uncomment this line

The above generates:

############
before: ("a"|"b"|"c")."A"|"B"|"C".("e"|"f")."E"|"F"
after : ("a"|"b"|"c")."ABC".("e"|"f")."EF"
############
before: "j"|"K"|"L"."m"|"n"|"o"|"p".("x"|"y"|"z")
after : "jKL"."mnop".("x"|"y"|"z")

After removing comments and making the printf changes:

awk '
BEGIN { FS=OFS="." }
      { for (i=1; i<=NF; i++)
            if ( $i !~ "^[(].*[)]$" )
                $i=""" gensub(/"||/,"","g",$i) """ 
        print
      }
' pipes.dat

Which generates:

("a"|"b"|"c")."ABC".("e"|"f")."EF"
"jKL"."mnop".("x"|"y"|"z")

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...