python - POS-Tagger is incredibly slow

Question

Welcome To Ask or Share your Answers For Others

python - POS-Tagger is incredibly slow

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - POS-Tagger is incredibly slow

I am using nltk to generate n-grams from sentences by first removing given stop words. However, nltk.pos_tag() is extremely slow taking up to 0.6 sec on my CPU (Intel i7).

The output:

['The first time I went, and was completely taken by the live jazz band and atmosphere, I ordered the Lobster Cobb Salad.']
0.620481014252
["It's simply the best meal in NYC."]
0.640982151031
['You cannot go wrong at the Red Eye Grill.']
0.644664049149

The code:

for sentence in source:

    nltk_ngrams = None

    if stop_words is not None:   
        start = time.time()
        sentence_pos = nltk.pos_tag(word_tokenize(sentence))
        print time.time() - start

        filtered_words = [word for (word, pos) in sentence_pos if pos not in stop_words]
    else:
        filtered_words = ngrams(sentence.split(), n)

Is this really that slow or am I doing something wrong here?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T17:56:01+0000

Use pos_tag_sents for tagging multiple sentences:

>>> import time
>>> from nltk.corpus import brown
>>> from nltk import pos_tag
>>> from nltk import pos_tag_sents
>>> sents = brown.sents()[:10]
>>> start = time.time(); pos_tag(sents[0]); print time.time() - start
0.934092998505
>>> start = time.time(); [pos_tag(s) for s in sents]; print time.time() - start
9.5061340332
>>> start = time.time(); pos_tag_sents(sents); print time.time() - start 
0.939551115036

Categories

python - POS-Tagger is incredibly slow

python - POS-Tagger is incredibly slow

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags