Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.3k views
in Technique[技术] by (71.8m points)

bash - Extract text between two strings on different lines

I have a big email file with the following random hosts:

......
HOSTS: test-host,host2.domain.com,
host3.domain.com,another-testing-host,host.domain.
com,host.anotherdomain.net,host2.anotherdomain.net,
another-local-host, TEST-HOST

DATE: August 11 2015 9:00
.......

The hosts are always delimited with a comma but they can be split on one, two or multiple lines (I can't control this, it's what email clients do, unfortunately).

So I need to extract all the text between the string "HOSTS:" and the string "DATE:", wrap it, and replace the commas with new lines, like this:

test-host
host2.domain.com
host3.domain.com
another-testing-host
host.domain.com
host.anotherdomain.net
host2.anotherdomain.net
another-local-host
TEST-HOST

So far I came up with this, but I lose everything that's on the same line with "HOSTS":

sed '/HOST/,/DATE/!d;//d' ${file} | tr -d '
' | sed -E "s/,s*/
/g"
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Something like this might work for you:

sed -n '/HOSTS:/{:a;N;/DATE/!ba;s/[[:space:]]//g;s/,/
/g;s/.*HOSTS:|DATE.*//g;p}' "$file"

Breakdown:

-n                       # Disable printing
/HOSTS:/ {               # Match line containing literal HOSTS:
  :a;                    # Label used for branching (goto)
  N;                     # Added next line to pattern space
  /DATE/!ba              # As long as literal DATE is not matched goto :a
  s/.*HOSTS:|DATE.*//g; # Remove everything in front of and including literal HOSTS:
                         # and remove everything behind and including literal DATE 
  s/[[:space:]]//g;      # Replace spaces and newlines with nothing
  s/,/
/g;              # Replace comma with newline
  p                      # Print pattern space
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...