nlp - Stanford coreNLP - split words ignoring apostrophe

Question

Welcome To Ask or Share your Answers For Others

nlp - Stanford coreNLP - split words ignoring apostrophe

1 Reply

深蓝 · Answer 1 · 2021-10-23T20:07:15+0000

How about if you just re-concatenate tokens that are split by an apostrophe?

Here's an implementation in Java:

public static List<String> tokenize(String s) {
    PTBTokenizer<CoreLabel> ptbt = new PTBTokenizer<CoreLabel>(
            new StringReader(s), new CoreLabelTokenFactory(), "");
    List<String> sentence = new ArrayList<String>();
    StringBuilder sb = new StringBuilder();
    for (CoreLabel label; ptbt.hasNext();) {
        label = ptbt.next();
        String word = label.word();
        if (word.startsWith("'")) {
            sb.append(word);
        } else {
            if (sb.length() > 0)
                sentence.add(sb.toString());
            sb = new StringBuilder();
            sb.append(word);
        }
    }
    if (sb.length() > 0)
        sentence.add(sb.toString());
    return sentence;
}

public static void main(String[] args) {
    System.out.println(tokenize("I'm 24 years old."));  // [I'm, 24, years, old, .]
}

Categories

nlp - Stanford coreNLP - split words ignoring apostrophe

nlp - Stanford coreNLP - split words ignoring apostrophe

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags