Have you reached the point in your studies of J2SE that you want to learn about some of the more complex regex tools and concepts? This article introduces a variety of concepts, and offers some advice for increasing the efficiency of your regular expressions. It is excerpted from chapter three of Java Regular Expressions Taming the Java.util.regex Engine, written by Mehran Habibi (Apress, 2004; ISBN: 1590591070).
Advanced Regex - Reluctant Qualifiers (Page 4 of 7 )
At the other end of the spectrum from greedy qualifiers are reluctant qualifiers, which try to match as little as possible. Reluctant qualifiers are formed by appending?to an existing greedy qualifier. Thus, X+becomesX+?,X(n,m}becomesX{n,m}?, and so on. Given the pattern \d+?against the candidatestring 1234,for example, the resultantmatch is 1, as Listing 3-7 demonstrates.
Listing 3-7. Reluctant Qualifier Example
import java.util.regex.*; public class ReluctantExample{ public static void main(String args[]){ //define the pattern String regex = "(\\d+?)"; //compile the pattern Pattern pattern = Pattern.compile(regex); //define the candidate string String candidate = "1234"; //extract a matcher for the candidate string Matcher matcher = pattern.matcher(candidate); while (matcher.find()){ //matches once for each digit //if this were not an example of a //reluctant qualifier, it would match //exactly once, and that match would //include every digit in the candidate //string "1234". System.out.println(matcher.group()); } System.out.println("Done"); } }
Every time find() is run, it matches as little as possible, because it’s reluctant to match. The Pattern matches exactly four times: once for each digit. If you weren’t using a reluctant qualifier in the Pattern, there would have been a single match for the entire candidate string, namely 1234, because the Pattern would have been greedy and matched as much as possible.