python - How to get rid of punctuation using NLTK tokenizer?

Question

Welcome To Ask or Share your Answers For Others

python - How to get rid of punctuation using NLTK tokenizer?

1 Reply

深蓝 · Answer 1 · 2021-10-17T00:56:14+0000

Take a look at the other tokenizing options that nltk provides here. For example, you can define a tokenizer that picks out sequences of alphanumeric characters as tokens and drops everything else:

from nltk.tokenize import RegexpTokenizer

tokenizer = RegexpTokenizer(r'w+')
tokenizer.tokenize('Eighty-seven miles to go, yet.  Onward!')

Output:

['Eighty', 'seven', 'miles', 'to', 'go', 'yet', 'Onward']

Categories

python - How to get rid of punctuation using NLTK tokenizer?

python - How to get rid of punctuation using NLTK tokenizer?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags