Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
648 views
in Technique[技术] by (71.8m points)

java regex to filter out non-English text

I found a few references to regex filtering out non-English but none of them is in Java, aside from the fact that they are all referring to somewhat different problems than what I am trying to solve:

  1. Replace all non-English characters with a space.
  2. Create a method that returns true if a string contains any non-English character.

By "English text" I mean not only actual letters and numbers but also punctuation.

So far, what I have been able to come with for goal #1 is quite simple:

String.replaceAll("\W", " ")

In fact, so simple that I suspect that I am missing something... Do you spot any caveats in the above?

As for goal #2, I could simply trim() the string after the above replaceAll(), then check if it's empty. But... Is there a more efficient way to do this?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

In fact, so simple that I suspect that I am missing something... Do you spot any caveats in the above?

W is equivalent to [^w], and w is equivalent to [a-zA-Z_0-9]. Using W will replace everything which isn't a letter, a number, or an underscore — like tabs and newline characters. Whether or not that's a problem is really up to you.

By "English text" I mean not only actual letters and numbers but also punctuation.

In that case, you might want to use a character class which omits punctuation; something like

[^w.,;:'"]

Create a method that returns true if a string contains any non-English character.

Use Pattern and Matcher.

Pattern p = Pattern.compile("\W");

boolean containsSpecialChars(String string)
{
    Matcher m = p.matcher(string);
    return m.find();
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...