Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
892 views
in Technique[技术] by (71.8m points)

bash - Pipe symbol | in AWK field delimiter

I have a file foo that has the following data:

A<|>B<|>C<|>D
1<|>2<|>3<|>4

I want to properly access each column using awk, but it isn't properly interpreting the field separator.

When I run:

head foo | 
  awk 'BEGIN {FS="<|>"} {out=""; for(i=1;i<=NF;i++){out=out" "$i}; print out}'

instead of printing

A B C D
1 2 3 4

it prints

A | B | C | D 
1 | 2 | 3 | 4

What's the reason behind this?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The pipe is a special character in a regex, so you need to escape it with a backslash. But this backslash is also a special character for the string literal, so it needs to be escaped again. So you end up with the following:

awk -F '<\|>' '{$1=$1}1'

awk 'BEGIN {FS="<\|>"} {$1=$1}1' 

The reason for this syntax is explained quite well here: http://www.gnu.org/software/gawk/manual/gawk.html#Computed-Regexps. In short, the expression is parsed twice.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...