Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
655 views
in Technique[技术] by (71.8m points)

python - How to lemmatize a list of sentences

How can I lemmatize a list of sentences in Python?

from nltk.stem.wordnet import WordNetLemmatizer
a = ['i like cars', 'cats are the best']
lmtzr = WordNetLemmatizer()
lemmatized = [lmtzr.lemmatize(word) for word in a]
print(lemmatized)

This is what I've tried but it gives me the same sentences. Do I need to tokenize the words before to work properly?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

TL;DR:

pip3 install -U pywsd

Then:

>>> from pywsd.utils import lemmatize_sentence

>>> text = 'i like cars'
>>> lemmatize_sentence(text)
['i', 'like', 'car']
>>> lemmatize_sentence(text, keepWordPOS=True)
(['i', 'like', 'cars'], ['i', 'like', 'car'], ['n', 'v', 'n'])

>>> text = 'The cat likes cars'
>>> lemmatize_sentence(text, keepWordPOS=True)
(['The', 'cat', 'likes', 'cars'], ['the', 'cat', 'like', 'car'], [None, 'n', 'v', 'n'])

>>> text = 'The lazy brown fox jumps, and the cat likes cars.'
>>> lemmatize_sentence(text)
['the', 'lazy', 'brown', 'fox', 'jump', ',', 'and', 'the', 'cat', 'like', 'car', '.']

Otherwise, take a look at how the function in pywsd:

  • Tokenize the string
  • Uses the POS tagger and maps to WordNet POS tagset
  • Attempts to stem
  • Finally calling the lemmatizer with the POS and/or stems

See https://github.com/alvations/pywsd/blob/master/pywsd/utils.py#L129


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...