perl - Fast alternative to grep -f

Question

Welcome To Ask or Share your Answers For Others

perl - Fast alternative to grep -f

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

perl - Fast alternative to grep -f

file.contain.query.txt

ENST001

ENST002

ENST003

file.to.search.in.txt

ENST001  90

ENST002  80

ENST004  50

Because ENST003 has no entry in 2nd file and ENST004 has no entry in 1st file the expected output is:

ENST001 90

ENST002 80

To grep multi query in a particular file we usually do the following:

grep -f file.contain.query <file.to.search.in >output.file

since I have like 10000 query and almost 100000 raw in file.to.search.in it takes very long time to finish (like 5 hours). Is there a fast alternative to grep -f ?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T18:23:42+0000

If you want a pure Perl option, read your query file keys into a hash table, then check standard input against those keys:

#!/usr/bin/env perl
use strict;
use warnings;

# build hash table of keys
my $keyring;
open KEYS, "< file.contain.query.txt";
while (<KEYS>) {
    chomp $_;
    $keyring->{$_} = 1;
}
close KEYS;

# look up key from each line of standard input
while (<STDIN>) {
    chomp $_;
    my ($key, $value) = split("", $_); # assuming search file is tab-delimited; replace delimiter as needed
    if (defined $keyring->{$key}) { print "$_
"; }
}

You'd use it like so:

lookup.pl < file.to.search.txt

A hash table can take a fair amount of memory, but searches are much faster (hash table lookups are in constant time), which is handy since you have 10-fold more keys to lookup than to store.

Categories

perl - Fast alternative to grep -f

perl - Fast alternative to grep -f

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags