Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.0k views
in Technique[技术] by (71.8m points)

regex - Java Regular Expression Matcher doesn't find all possible matches

I was looking at a code at TutorialsPoint and something has been bothering me since then... take a look at this code :

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexMatches
{
    public static void main( String args[] ){

      // String to be scanned to find the pattern.
      String line = "This order was placed for QT3000! OK?";
      String pattern = "(.*)(\d+)(.*)";

      // Create a Pattern object
      Pattern r = Pattern.compile(pattern);

      // Now create matcher object.
      Matcher m = r.matcher(line);
      while(m.find( )) {
         System.out.println("Found value: " + m.group(1));
         System.out.println("Found value: " + m.group(2));
         System.out.println("Found value: " + m.group(3));
      }
   }
}

this code successfully prints :

Found value: This was placed for QT300 
Found value: 0
Found value: ! OK?

but according to the regex "(.*)(\d+)(.*)", why doesn't it return other possible outcomes such as :

Found value: This was placed for QT30 
Found value: 00
Found value: ! OK?

or

Found value: This was placed for QT 
Found value: 3000
Found value: ! OK?

and if this code isn't suited to do so, then how can I write one that can find all possible matches ?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

It's because of the greediness of * and there comes the backtracking.

String :

This order was placed for QT3000! OK?

Regex:

(.*)(\d+)(.*)

We all know that .* is greedy and matches all characters as much as possible. So the first .* matches all the characters upto the last character that is ? and then it backtracks in-order to provide a match. The next pattern in our regex is d+, so it backtracks upto a digit. Once it finds a digit, d+ matches that digit because the condition is satisfied here (d+ matches one or more digits). Now the first (.*) captures This order was placed for QT300 and the following (\d+) captures the digit 0 located just before to the ! symbol.

Now the next pattern (.*) captures all the remaining characters that is !<space>OK?. m.group(1) refers to the characters which are present inside the group index 1 and m.group(2) refers to the index 2, like that it goes on.

See the demo here.

To get your desired output.

String line = "This order was placed for QT3000! OK?";
  String pattern = "(.*)(\d{2})(.*)";

  // Create a Pattern object
  Pattern r = Pattern.compile(pattern);

  // Now create matcher object.
  Matcher m = r.matcher(line);
  while(m.find( )) {
     System.out.println("Found value: " + m.group(1));
     System.out.println("Found value: " + m.group(2));
     System.out.println("Found value: " + m.group(3));
  }

Output:

Found value: This order was placed for QT30
Found value: 00
Found value: ! OK?

(.*)(\d{2}), backtracks upto two digits in-order to provide a match.

Change your pattern to this,

String pattern = "(.*?)(\d+)(.*)";

To get the output like,

Found value: This order was placed for QT
Found value: 3000
Found value: ! OK?

? after the * forces the * to do a non-greedy match.

Use extra captuing groups to get the outputs from a single program.

String line = "This order was placed for QT3000! OK?";
String pattern = "((.*?)(\d{2}))(?:(\d{2})(.*))";
Pattern r = Pattern.compile(pattern);
      Matcher m = r.matcher(line);
      while(m.find( )) {
         System.out.println("Found value: " + m.group(1));
         System.out.println("Found value: " + m.group(4));
         System.out.println("Found value: " + m.group(5));
         System.out.println("Found value: " + m.group(2));
         System.out.println("Found value: " + m.group(3) + m.group(4));
         System.out.println("Found value: " + m.group(5));
     }

Output:

Found value: This order was placed for QT30
Found value: 00
Found value: ! OK?
Found value: This order was placed for QT
Found value: 3000
Found value: ! OK?

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...