Home arrow Java arrow Page 4 - Introduction to the Java.util.regex Object Model
JAVA

Introduction to the Java.util.regex Object Model


If you have ever wanted to know all about the Pattern and Matcher classes of Java's new java.util.regex package, this article is an excellent place to start. It is taken from chapter 2 of the book Java Regular Expressions: Taming the java.util.regex Engine, written by Mehran Habibi (Apress, 2004; ISBN: 1590591070).

Author Info:
By: Apress Publishing
Rating: 4 stars4 stars4 stars4 stars4 stars / 15
August 18, 2005
TABLE OF CONTENTS:
  1. · Introduction to the Java.util.regex Object Model
  2. · public static Pattern compile(String regex, int flags) Throws a PatternSyntaxException
  3. · public String[] split(CharSequence input)
  4. · The Matcher Object
  5. · public int start(int group)
  6. · public int end(int group)
  7. · public String group(int group)
  8. · public boolean find()
  9. · public Matcher appendReplacement (StringBuffer sb, String replacement)
  10. · Special Notes
  11. · New String Rejex-Friendly Methods
  12. · Summary

print this article
SEARCH DEVARTICLES

Introduction to the Java.util.regex Object Model - The Matcher Object
(Page 4 of 12 )

Figure 2-2 illustrates the methods of the Matcher class. Please take a moment to study them.


Figure 2-2.  The Matcher class

The following sections describe the various methods of the Matcher class. But first, letís briefly revisit the concept of groups, as they figure so prominently in the Matcher object.

Groups

Before you can take full advantage of the Matcher object, itís important that you understand the concept of a group, as some of the more powerful methods of Matcher deal with them. I discuss groups in even greater detail in Chapter 3, but you need an intuitive sense of them to take full advantage of the material in this chapter, so I provide a brief introduction here.

A group is exactly what it sounds like: a cluster of characters. Often, the term refers to a subportion of the original Pattern, though each group is, by definition, a subgroup of itself. Youíre probably already familiar with the concept of groups from your study of arithmetic. For example, the expression

6 * 7 + 4

has an implicit sense of grouping. You really read it as

(6 * 7) + 4

where (6 * 7) is thought of as a clustering of numbers. Further, you can think of the expression as

( (6 * 7) + 4)

where you can consider ((6 * 7) + 4) another clustering of numbers, this one including the subcluster (6*7). Here, your group has a subgroup. Similarly, regex allows you to group a sequence of characters together. Why? I discuss that shortly. First, letís concentrate on how.

Remember that in regular expressions, you describe what youíre looking for in general terms by using a Pattern object. Groups allow you to nest subdescriptions within your expression. As you examine a specific candidate String, the Matcher can keep track of submatches for that expression.

Creating a grouping of regex characters is very easy. You simply put the expression you want to think of as a group inside a pair of parentheses. Thatís it. Thus, the pattern (\w)(\d\d)(\w+) consists of four groups, ranging from 0 to 3. group(0), which is always the original expression itself, is as follows: 

group(1), which consists of an alphanumeric or underscore character, is circled in the following image: 

group(2) is circled in the following image: 

group(3) is circled in the following image: 

For a specific candidate String, say X99SuperJava, group(0) is always the part of the candidate string that matches the original regex patternónamely, the pattern (\w)(\d\d)(\w+) itself: 

The following image indicates the corresponding section of X99SuperJava for group(1):

 

The corresponding section of X99SuperJava for group(2) is circled in the following image:

 

The corresponding section of X99SuperJava for group(3) is circled in the following image:

 

OK, so you know how to designate groups and how to find the corresponding section in a candidate string. Now, why would you? A common reason for doing so is the ability to refer to subsections of the candidate string. For example, you may not know what this particular candidate string, namely X99SuperJava, is, but you can still write a program that rearranges it by creating a new String equal to group(3), appended to group(1), and appended to group(2). In this case, that rearranged String would be SuperJavaX99.

Chapter 3 provides detailed examples of groups.

public Pattern pattern()

The pattern method returns the Pattern that created this particular Matcher object. Consider Listing 2-6.

Listing 2-6. Matcher Pattern Example

import java.util.regex.*;
public class MatcherPatternExample{
  public static void main(String args[]){
      test();
  }
 
public static void test(){
    Pattern p = Pattern.compile(\\d);
    Matcher m1 = p.matcher("55");
    Matcher m2 = p.matcher("fdshfdgdfh");
   
System.out.println(m1.pattern() == m2.pattern());
    //return true
  }
}

You should notice a few important things here. First, both Matcher objects successfully returned a Pattern, even though m2 wasnít a successful match. Second, the Matcher objects returned exactly the same Pattern object, because they were both created by that Pattern. Notice that the line

System.out.println(m1.pattern() == m2.pattern());

did a == compare and not a .equals compare. This could only have worked if the actual object returned by m1 and m2 was, in fact, exactly the same object.

public Matcher reset()

The reset method clears all state information from the Matcher object itís called on. The Matcher is, in effect, reverted to the state it originally had when you first received a reference to it, as shown in Listing 2-7.

Listing 2-7. Matcher.reset Example

import java.util.regex.*;
/**
 * Demonstrates the usage of the
 * Matcher.reset() method
 */
public class MatcherResetExample{
  public static void main(String args[]){
    
test();
  }
  public static void test(){
    
//create a pattern, and extract a matcher
     Pattern p = Pattern.compile(\\d);
     Matcher m1 = p.matcher("01234");
    
//exhaust the matcher
     while (m1.find()){
     
System.out.println("\t\t" + m1.group());
     }
     //now reset the matcher to its original state
     m1.reset();
     System.out.println("After resetting the Matcher");
     //iterate through the matcher again.
     //this would not be possible without a cleared state 
     while (m1.find()){
     
System.out.println("\t\t" + m1.group());
     }
  }
}

Output 2-2 shows the output of this method.

Output 2-2.  Output for the Matcher.reset Example

-------------------------------------------------------------------        0
        1
        2
        3
        4
After resetting the Matcher
        0
        1
        2
        3
        4

You wouldnít have been able to iterate through the elements of the Matcher again if it hadnít been reset.

public Matcher reset(CharSequence input)

The reset(CharSequence input) methods clears the state of the Matcher object itís called on and replaces the candidate String with the new input. This has the same effect as creating a new Matcher object, except that it doesnít have as much of the associated overhead. This can lead to useful optimization, and itís one that I often use. Listing 2-8 demonstrates this methodís usage.

Listing 2-8. Matcher.reset(CharSequence) Example

import java.util.regex.*;
/**
 
* Demonstrates the usage of the
 * Matcher.reset(CharSequence) method
 */
public class MatcherResetCharSequenceExample{
  public static void main(String args[]){
    
test();
  }
 
public static void test(){
     String output="";
     //create a pattern, and extract a matcher
     Pattern p = Pattern.compile(\\d);
     Matcher m1 = p.matcher("01234");
    
//exhaust the matcher
     while (m1.find()){
     
System.out.println("\t\t" + m1.group());
     }
     //now reset the matcher with new data
     m1.reset("56789");
     System.out.println("After resetting the Matcher");
     //iterate through the matcher again.
     //this would not be possible without
     while (m1.find()){
     
System.out.println("\t\t" + m1.group());
     }
  }
}

Output 2-3 shows the output of this method.

Output 2-3. Output for the Matcher.reset(CharSequence)
                      Example

-------------------------------------------------------------------        0
        1
        2
        3
        4
After resetting the Matcher
        5
        6
        7
        8
        9

public int start()

The start method returns the starting index of the last successful match the Matcher object had. Listing 2-9 demonstrates the use of the Start method. The code in this listing finds the starting index of the word Bond in the candidate My name is Bond. James Bond..

Listing 2-9. Matcher.start() Example

/**
 
* Demonstrates the usage of the
 
* Matcher.start() method
 */
public class MatcherStartExample{
  public static void main(String args[]){
    
test();
  }
  public static void test(){
    
//create a Matcher and use the Matcher.start() method
     String candidateString = "My name is Bond. James Bond.";
     String matchHelper[] =
     
{"          ^","                      ^"};
     Pattern p = Pattern.compile("Bond");
     Matcher matcher = p.matcher(candidateString);
    
//Find the starting point of the first 'Bond' 
     matcher.find();
     int startIndex = matcher.start();
     System.out.println(candidateString); 
     System.out.println(matchHelper[0] + startIndex);
    
//Find the starting point of the second 'Bond'  
     matcher.find();
     int nextIndex = matcher.start();
     System.out.println(candidateString);
     System.out.println(matchHelper[1] + nextIndex);
  }

Output 2-4 shows the output of running the start() method.

Output 2-4. Output for the Matcher.start() Example

-------------------------------------------------------------------My name is Bond. James Bond.
          ^11
My name is Bond. James Bond.
                      ^23

If you execute another find() method

matcher.find();

and then execute start()

int nonIndex = matcher.start(); //throws IllegalStateException

the start() method will throw an IllegalStateException. Iím surprised that it doesnít simply return a negative number to indicate an unsuccessful match. Use the boolean returned by the matches()method to determine whether you should call methods such as start().


blog comments powered by Disqus
JAVA ARTICLES

- Java Too Insecure, Says Microsoft Researcher
- Google Beats Oracle in Java Ruling
- Deploying Multiple Java Applets as One
- Deploying Java Applets
- Understanding Deployment Frameworks
- Database Programming in Java Using JDBC
- Extension Interfaces and SAX
- Entities, Handlers and SAX
- Advanced SAX
- Conversions and Java Print Streams
- Formatters and Java Print Streams
- Java Print Streams
- Wildcards, Arrays, and Generics in Java
- Wildcards and Generic Methods in Java
- Finishing the Project: Java Web Development ...

Watch our Tech Videos 
Dev Articles Forums 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us 
Weekly Newsletter
 
Developer Updates  
Free Website Content 
Contact Us 
Site Map 
Privacy Policy 
Support 

Developer Shed Affiliates

 




© 2003-2017 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap
Popular Web Development Topics
All Web Development Tutorials