Is there a way to subset txt data in unix?

Question

Welcome To Ask or Share your Answers For Others

Is there a way to subset txt data in unix?

posted Mar 6, 2021 in Technique[技术] by 深蓝 (71.8m points)

Is there a way to subset txt data in unix?

I have a txt file that looks like this Before

I would like to select rows with the "score" column greater than 100. Then removing everything else except for the "Sequence" and "Description" columns. My goal is to obtain a file that looks like this After.

The problem is that the file is not in a tabular format, I can't really select "column", so I am not sure how to proceed.

I tried to do this by deleting the first 15 rows and then finish the rest using excel's "txt to column" conversion tool. But I am looking for an automated way using unix, in case I have more files coming up.

I should have mentioned that there is a line, below which I'd also like to get rid of,like this,

So I tried the following code to remove all lines below the line containing "inclusion threshold" first.

sed -n '/inclusion threshold/q;p' file

Then use the code that Mr.@Raman Sailopal mentioned

awk 'NR>15 && $2>99 { printf $9""$10"
" } ' file

Is there anyway to combine the sed and awk command together, or achieve the same goal with just one function?

Thank you!

question from:https://stackoverflow.com/questions/65913713/is-there-a-way-to-subset-txt-data-in-unix

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-03-06T05:09:03+0000

awk 'NR>15 && $2>100 { printf $9""$10"
" } ' file

Using awk, when the line number (NR) is greater than 15, check that the second space delimited field is less than 100 and if it is, print the the 9th and 10th space delimited fields separated by a tab.

Categories

Is there a way to subset txt data in unix?

Is there a way to subset txt data in unix?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags