Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
626 views
in Technique[技术] by (71.8m points)

append - Appending text to specific patterns in a fasta BASH

I have a fasta with headers like this:

tr|Q7MX99|Q7MX99_PORGI_BACT

I would like them to say:

tr|Q7MX99|Q7MX99_PORGI_BACT_ORALMICROBIOME

So basically, whenever I have PORGI_BACT I want to append _ORALMICROBIOME to each instance.

I'm sure there is an easy fix through the terminal, but I can't seem to find it.

My first idea is to do something like:

sed 's/>.*/&_ORALMICROBIOME/' file.fa > outfile.fa

BUT I only want to add to specific header endings, and that is where I'm stuck.

question from:https://stackoverflow.com/questions/65881444/appending-text-to-specific-patterns-in-a-fasta-bash

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You are almost close. Would you please try the following:

sed 's/^>.*PORGI_BACT/&_ORALMICROBIOME/' file.fa > outfile.fa

[Edit]
According to the OP's requirement, how about:

sed -E 's/^>.*(PORGI_BACT|HUMAN_MAM|TESTA_BACT)/&_ORALMICROBIOME/' file.fa > outfile.fa

Sample input as file.fa:

>SEQ0|tr|Q7MX99|Q7MX99_PORGI_BACT
FQTWEEFSRAAEKLYLADPMKVRVVLKYRHVDGNLCIKVTDDLVCLVYRTDQAQDVKKIEKF
>SEQ1|tr|Q7MX88|Q7MX88_HUMAN_MAM
KYRTWEEFTRAAEKLYQADPMKVRVVLKYRHCDGNLCIKVTDDVVCLLYRTDQAQDVKKIEKFHSQLMRLME
LKVTDNKECLKFKTDQAQEAKKMEKLNNIFFTLM
>SEQ2|tr|Q7MX77|Q7MX77_TESTA_BACT
EEYQTWEEFARAAEKLYLTDPMKVRVVLKYRHCDGNLCMKVTDDAVCLQYKTDQAQDVKKVEKLHGK
>SEQ3|tr|Q7MX66|Q7MX66_DUMMY
MYQVWEEFSRAVEKLYLTDPMKVRVVLKYRHCDGNLCIKVTDNSVCLQYKTDQAQDVK

Output:

>SEQ0|tr|Q7MX99|Q7MX99_PORGI_BACT_ORALMICROBIOME
FQTWEEFSRAAEKLYLADPMKVRVVLKYRHVDGNLCIKVTDDLVCLVYRTDQAQDVKKIEKF
>SEQ1|tr|Q7MX88|Q7MX88_HUMAN_MAM_ORALMICROBIOME
KYRTWEEFTRAAEKLYQADPMKVRVVLKYRHCDGNLCIKVTDDVVCLLYRTDQAQDVKKIEKFHSQLMRLME
LKVTDNKECLKFKTDQAQEAKKMEKLNNIFFTLM
>SEQ2|tr|Q7MX77|Q7MX77_TESTA_BACT_ORALMICROBIOME
EEYQTWEEFARAAEKLYLTDPMKVRVVLKYRHCDGNLCMKVTDDAVCLQYKTDQAQDVKKVEKLHGK
>SEQ3|tr|Q7MX66|Q7MX66_DUMMY
MYQVWEEFSRAVEKLYLTDPMKVRVVLKYRHCDGNLCIKVTDNSVCLQYKTDQAQDVK

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...