Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
647 views
in Technique[技术] by (71.8m points)

nlp - What do the BILOU tags mean in Named Entity Recognition?

Title pretty much sums up the question. I've noticed that in some papers people have referred to a BILOU encoding scheme for NER as opposed to the typical BIO tagging scheme (Such as this paper by Ratinov and Roth in 2009 http://cogcomp.cs.illinois.edu/page/publication_view/199)

From working with the 2003 CoNLL data I know that

B stands for 'beginning' (signifies beginning of an NE)
I stands for 'inside' (signifies that the word is inside an NE)
O stands for 'outside' (signifies that the word is just a regular word outside of an NE)

While I've been told that the words in BILOU stand for

B - 'beginning'
I - 'inside'
L - 'last'
O - 'outside'
U - 'unit'

I've also seen people reference another tag

E - 'end', use it concurrently with the 'last' tag
S - 'singleton', use it concurrently with the 'unit' tag

I'm pretty new to the NER literature, but I've been unable to find something clearly explaining these tags. My questions in particular relates to what the difference between 'last' and 'end' tags are, and what 'unit' tag stands for.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Based on an issue and a patch in Clear TK, it seems like BILOU stands for "Beginning, Inside and Last tokens of multi-token chunks, Unit-length chunks and Outside" (emphasis added). For instance, the chunking denoted by brackets

(foo foo foo) (bar) no no no (bar bar)

can be encoded with BILOU as

B-foo, I-foo, L-foo, U-bar, O, O, O, B-bar, L-bar

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...