perl - Parsing a GenBank file

Question

Welcome To Ask or Share your Answers For Others

perl - Parsing a GenBank file

1 Reply

深蓝 · Answer 1 · 2022-01-31T07:16:02+0000

If you have access to Bio Perl, you might find a solution such as the following.

#!/usr/bin/perl
use strict;
use warnings;
use Bio::SeqIO;

my $in  = Bio::SeqIO->new( -file   => "input.txt",
                           -format => 'GenBank');

while ( my $seq = $in->next_seq ) {
    my $acc = $seq->accession;
    my $length = $seq->length;
    my $definition = $seq->desc;
    my $type = $seq->molecule;
    my $organism = $seq->species->binomial;

    if ($type eq 'mRNA'              &&
        $organism =~ /homo sapiens/i &&
        $acc =~ /[A-Za-z]{2}_[0-9]{6,}/ )
    {
        print "$acc | $definition | $length
";
        print $seq->seq, "
";
        print "
";
    }
}

I was able to capture the 5 variables from a sample GenBank file I have (input.txt). It should simplify your code.

Categories

perl - Parsing a GenBank file

perl - Parsing a GenBank file

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags