Introduction to the Java.util.regex Object Model - The Matcher Object
(Page 4 of 12 )
Figure 2-2 illustrates the methods of the Matcher class. Please take a moment to study them.

Figure 2-2. The Matcher class
The following sections describe the various methods of the Matcher class. But first, let’s briefly revisit the concept of groups, as they figure so prominently in the Matcher object.
Groups Before you can take full advantage of the Matcher object, it’s important that you understand the concept of a group, as some of the more powerful methods of Matcher deal with them. I discuss groups in even greater detail in Chapter 3, but you need an intuitive sense of them to take full advantage of the material in this chapter, so I provide a brief introduction here.
A group is exactly what it sounds like: a cluster of characters. Often, the term refers to a subportion of the original Pattern, though each group is, by definition, a subgroup of itself. You’re probably already familiar with the concept of groups from your study of arithmetic. For example, the expression
6 * 7 + 4
has an implicit sense of grouping. You really read it as
(6 * 7) + 4
where (6 * 7) is thought of as a clustering of numbers. Further, you can think of the expression as
( (6 * 7) + 4)
where you can consider ((6 * 7) + 4) another clustering of numbers, this one including the subcluster (6*7). Here, your group has a subgroup. Similarly, regex allows you to group a sequence of characters together. Why? I discuss that shortly. First, let’s concentrate on how.
Remember that in regular expressions, you describe what you’re looking for in general terms by using a Pattern object. Groups allow you to nest subdescriptions within your expression. As you examine a specific candidate String, the Matcher can keep track of submatches for that expression.
Creating a grouping of regex characters is very easy. You simply put the expression you want to think of as a group inside a pair of parentheses. That’s it. Thus, the pattern (\w)(\d\d)(\w+) consists of four groups, ranging from 0 to 3. group(0), which is always the original expression itself, is as follows:

group(1), which consists of an alphanumeric or underscore character, is circled in the following image:

group(2) is circled in the following image:

group(3) is circled in the following image:

For a specific candidate String, say X99SuperJava, group(0) is always the part of the candidate string that matches the original regex pattern—namely, the pattern (\w)(\d\d)(\w+) itself:

The following image indicates the corresponding section of X99SuperJava for group(1):

The corresponding section of X99SuperJava for group(2) is circled in the following image:

The corresponding section of X99SuperJava for group(3) is circled in the following image:

OK, so you know how to designate groups and how to find the corresponding section in a candidate string. Now, why would you? A common reason for doing so is the ability to refer to subsections of the candidate string. For example, you may not know what this particular candidate string, namely X99SuperJava, is, but you can still write a program that rearranges it by creating a new String equal to group(3), appended to group(1), and appended to group(2). In this case, that rearranged String would be SuperJavaX99.
Chapter 3 provides detailed examples of groups.
public Pattern pattern() The pattern method returns the Pattern that created this particular Matcher object. Consider Listing 2-6.
Listing 2-6. Matcher Pattern Example
import java.util.regex.*;
public class MatcherPatternExample{
public static void main(String args[]){
test();
}
public static void test(){
Pattern p = Pattern.compile(\\d);
Matcher m1 = p.matcher("55");
Matcher m2 = p.matcher("fdshfdgdfh");
System.out.println(m1.pattern() == m2.pattern());
//return true
}
}
You should notice a few important things here. First, both Matcher objects successfully returned a Pattern, even though m2 wasn’t a successful match. Second, the Matcher objects returned exactly the same Pattern object, because they were both created by that Pattern. Notice that the line
System.out.println(m1.pattern() == m2.pattern());
did a == compare and not a .equals compare. This could only have worked if the actual object returned by m1 and m2 was, in fact, exactly the same object.
public Matcher reset() The reset method clears all state information from the Matcher object it’s called on. The Matcher is, in effect, reverted to the state it originally had when you first received a reference to it, as shown in Listing 2-7.
Listing 2-7. Matcher.reset Example
import java.util.regex.*;
/**
* Demonstrates the usage of the
* Matcher.reset() method
*/
public class MatcherResetExample{
public static void main(String args[]){
test();
}
public static void test(){
//create a pattern, and extract a matcher
Pattern p = Pattern.compile(\\d);
Matcher m1 = p.matcher("01234");
//exhaust the matcher
while (m1.find()){
System.out.println("\t\t" + m1.group());
}
//now reset the matcher to its original state
m1.reset();
System.out.println("After resetting the Matcher");
//iterate through the matcher again.
//this would not be possible without a cleared state
while (m1.find()){
System.out.println("\t\t" + m1.group());
}
}
}
Output 2-2 shows the output of this method.
Output 2-2. Output for the Matcher.reset Example
------------------------------------------------------------------- 0
1
2
3
4
After resetting the Matcher
0
1
2
3
4
You wouldn’t have been able to iterate through the elements of the Matcher again if it hadn’t been reset.
public Matcher reset(CharSequence input) The reset(CharSequence input) methods clears the state of the Matcher object it’s called on and replaces the candidate String with the new input. This has the same effect as creating a new Matcher object, except that it doesn’t have as much of the associated overhead. This can lead to useful optimization, and it’s one that I often use. Listing 2-8 demonstrates this method’s usage.
Listing 2-8. Matcher.reset(CharSequence) Example
import java.util.regex.*;
/**
* Demonstrates the usage of the
* Matcher.reset(CharSequence) method
*/
public class MatcherResetCharSequenceExample{
public static void main(String args[]){
test();
}
public static void test(){
String output="";
//create a pattern, and extract a matcher
Pattern p = Pattern.compile(\\d);
Matcher m1 = p.matcher("01234");
//exhaust the matcher
while (m1.find()){
System.out.println("\t\t" + m1.group());
}
//now reset the matcher with new data
m1.reset("56789");
System.out.println("After resetting the Matcher");
//iterate through the matcher again.
//this would not be possible without
while (m1.find()){
System.out.println("\t\t" + m1.group());
}
}
}
Output 2-3 shows the output of this method.
Output 2-3. Output for the Matcher.reset(CharSequence)
Example
------------------------------------------------------------------- 0
1
2
3
4
After resetting the Matcher
5
6
7
8
9
public int start() The start method returns the starting index of the last successful match the Matcher object had. Listing 2-9 demonstrates the use of the Start method. The code in this listing finds the starting index of the word Bond in the candidate My name is Bond. James Bond..
Listing 2-9. Matcher.start() Example
/**
* Demonstrates the usage of the
* Matcher.start() method
*/
public class MatcherStartExample{
public static void main(String args[]){
test();
}
public static void test(){
//create a Matcher and use the Matcher.start() method
String candidateString = "My name is Bond. James Bond.";
String matchHelper[] =
{" ^"," ^"};
Pattern p = Pattern.compile("Bond");
Matcher matcher = p.matcher(candidateString);
//Find the starting point of the first 'Bond'
matcher.find();
int startIndex = matcher.start();
System.out.println(candidateString);
System.out.println(matchHelper[0] + startIndex);
//Find the starting point of the second 'Bond'
matcher.find();
int nextIndex = matcher.start();
System.out.println(candidateString);
System.out.println(matchHelper[1] + nextIndex);
}
Output 2-4 shows the output of running the start() method.
Output 2-4. Output for the Matcher.start() Example
-------------------------------------------------------------------My name is Bond. James Bond.
^11
My name is Bond. James Bond.
^23
If you execute another find() method
matcher.find();
and then execute start()
int nonIndex = matcher.start(); //throws IllegalStateException
the start() method will throw an IllegalStateException. I’m surprised that it doesn’t simply return a negative number to indicate an unsuccessful match. Use the boolean returned by the matches()method to determine whether you should call methods such as start().
Next: public int start(int group) >>
More Java Articles
More By Apress Publishing
|
This article is excerpted from chapter three of Java Regular Expressions Taming the Java.util.regex Engine, written by Mehran Habibi (Apress, 2004; ISBN: 1590591070). Check it out at your favorite bookstore. Buy this book now.
|
|