machine learning - How to handle <UKN> tokens in text generation

Question

Welcome To Ask or Share your Answers For Others

machine learning - How to handle <UKN> tokens in text generation

posted Jan 31, 2022 in Technique[技术] by 深蓝 (71.8m points)

machine learning - How to handle <UKN> tokens in text generation

In my text generation dataset, I have converted all infrequent words into the token (unknown word), as suggested by most text-generation literature.

However, when training an RNN to take in part of a sentence as input and predict the rest of the sentence, I am not sure how I should stop the network from generating tokens. When the network encounters an unknown (infrequent) word in the training set, what should its output be?

Example:
Sentence: I went to the mall and bought a <ukn> and some groceries
Network input: I went to the mall and bought a
Current network output: <unk> and some groceries
Desired network output: ??? and some groceries

What should it be outputting instead of the <unk>?

I don't want to build a generator that outputs words it does not know.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2022-01-31T07:21:27+0000

A RNN will give you a sampling of tokens that are most likely to appear next in your text. In your code you choose the token with the highest probability, in this case ?unk?.

In this case you can omit the ?ukn? token and simply take the next most likely token that the RNN suggests based on the probability values that it renders.

Categories

machine learning - How to handle <UKN> tokens in text generation

machine learning - How to handle <UKN> tokens in text generation

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags

Categories

machine learning - How to handle &lt;UKN&gt; tokens in text generation

machine learning - How to handle &lt;UKN&gt; tokens in text generation

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags

machine learning - How to handle <UKN> tokens in text generation

machine learning - How to handle <UKN> tokens in text generation