Home arrow Java arrow Page 5 - Introduction to the Java.util.regex Object Model
JAVA

Introduction to the Java.util.regex Object Model


If you have ever wanted to know all about the Pattern and Matcher classes of Java's new java.util.regex package, this article is an excellent place to start. It is taken from chapter 2 of the book Java Regular Expressions: Taming the java.util.regex Engine, written by Mehran Habibi (Apress, 2004; ISBN: 1590591070).

Author Info:
By: Apress Publishing
Rating: 4 stars4 stars4 stars4 stars4 stars / 15
August 18, 2005
TABLE OF CONTENTS:
  1. · Introduction to the Java.util.regex Object Model
  2. · public static Pattern compile(String regex, int flags) Throws a PatternSyntaxException
  3. · public String[] split(CharSequence input)
  4. · The Matcher Object
  5. · public int start(int group)
  6. · public int end(int group)
  7. · public String group(int group)
  8. · public boolean find()
  9. · public Matcher appendReplacement (StringBuffer sb, String replacement)
  10. · Special Notes
  11. · New String Rejex-Friendly Methods
  12. · Summary

print this article
SEARCH DEVARTICLES

Introduction to the Java.util.regex Object Model - public int start(int group)
(Page 5 of 12 )

This method allows you to specify which subgroup within a match you’re interested in. If there are no matches, or if no matches have been attempted, this method throws an IllegalStateException. Listing 2-10 demonstrates the use of the start(int) method shortly. But before examining the code, let’s take a step back and consider what the code is actually trying to demonstrate.

In the following example, the regex pattern is B(ond), which means that you have a subgroup within the pattern (the parentheses indicate a subgroup). The following is the portion of the candidate parsed when find() is called for the first time:

 

Thus, when you call the start(0) method, you’re implicitly calling it only for the region that has already been parsed, which is outlined in the box. As far as the Matcher is concerned, this boxed region is the only one we can currently discuss. This is simply the nature of the find method, and it has nothing to do with the start(int) method yet.

The start(0) method returns the index of the first character in group(0), which is the B in Bond. group(0) is circled in the following image.

 

Similarly, when you call start(1), you’re calling it only for the region that has already been parsed—again, the boxed region in the preceding image. This time, you’re asking for the second grouping in the parsed region. The start(1) method returns the index of the first character in group(1), which is the o in Bond. group(1) is circled in the following image:

 

Next, you call matcher.find() again, which results in a new region of the candidate string coming under consideration, as shown in the following image:

 

Calling the start(0) method here implicitly calls it only for the new region that has already been parsed, which appears in the box in the preceding image. This is the only region the associated Matcher will consider. start(0) returns the index of first character in group(0), which is the B in Bond. group(0) is circled in the following image:

 

Again, calling start(1) asks the Matcher to consider only the new region that has been parsed—again, the boxed region. This time, you’re asking for the second grouping in the parsed region. start(1) returns the index of first character in group(1), which is the o in Bond. group(1) is circled in the boxed region.

 

When you consider the process visually, it’s easy to understand how the start(int) method interacts with groups, group numbers, and the find() method. find() parses just enough of the candidate string for all groups to match and works in that limited region. Keep this in mind as you read through Listing 2-10. Listing 2-10 is a fully working example of the algorithm discussed in this section. Please refer back to the preceding images as necessary when you read the example.

Listing 2-10. Matcher.start(int) Example

import java.util.regex.*;
/**
 * Demonstrates the usage of the
 * Matcher.start(int) method
 */
public class MatcherStartParamExample{
  public static void main(String args[]){
    test();
  }
  public static void test(){
   
//create a Pattern
     Pattern p = Pattern.compile("B(ond)");
   
//create a Matcher and use the Matcher.start(int) method
   String candidateString = "My name is Bond. James Bond.";
   //create a helpful index for the sake of output
   S
tring matchHelper[] =
                           {"          ^",
                           "            ^",
                           "                    ^",
                           "                     ^"};
  
Matcher matcher = p.matcher(candidateString);
   
//Find the starting point of the first 'B(ond)'  
    matcher.find();
    int startIndex = matcher.start(0);
    System.out.println(candidateString);  
    System.out.println(matchHelper[0] + startIndex);
   
//find the starting point of the first subgroup (ond)
    int nextIndex = matcher.start(1);
    System.out.println(candidateString);  
    System.out.println(matchHelper[1] + nextIndex);
   
//Find the starting point of the second 'B(ond)'  
    matcher.find();
    startIndex = matcher.start(0);
    System.out.println(candidateString); 
    System.out.println(matchHelper[2] + startIndex);
    
//find the starting point of the second subgroup (ond)
    nextIndex = matcher.start(1);
    
System.out.println(candidateString);
   
System.out.println(matchHelper[3] + nextIndex);
  }
}

Output 2-5 shows the output of running the start() method.

Output 2-5. Output for the Matcher.start(int) Example

--------------------------------------------------------------------My name is Bond. James Bond.
          ^11
My name is Bond. James Bond.
           ^12
My name is Bond. James Bond.
                      ^23
My name is Bond. James Bond.
                       ^24

If you execute another find() method

matcher.find();

and then execute start()

int nonIndex = matcher.start(0); //throws IllegalStateException

the start(int) method will throw an IllegalStateException because the find() method wasn’t successful. Similarly, it will throw an IndexOutOfBoundsException if you try to refer to a group number that doesn’t exist.

public int end()

The end method returns the ending index of the last successful match the Matcher object had plus 1. If no matches exist, or if no matches have been attempted, this method throws an IllegalStateException. Listing 2-11 demonstrates the use of the end method.

Listing 2-11. Matcher.end() Example

/**
 
* Demonstrates the usage of the
 
* Matcher.end() method
 */
public class MatcherEndExample{
  public static void main(String args[]){
   
test();
  }
  public static void test(){
   
//create a Matcher and use the Matcher.end() method 
    String candidateString = "My name is Bond. James Bond.";
    String matchHelper[] =
     {"               ^","               ^"};
    Pattern p = Pattern.compile("Bond");
    Matcher matcher = p.matcher(candidateString);
   
//Find the end point of the first 'Bond'
     matcher.find();
     int endIndex= matcher.end();
     System.out.println(candidateString); 
     System.out.println(matchHelper[0] + endIndex);
   
//Find the end point of the second 'Bond'
     matcher.find();
     int nextIndex = matcher.end();
     System.out.println(candidateString);
     System.out.println(matchHelper[1] + nextIndex);
 
}
}

Output 2-6 shows the output of running the end method.

Output 2-6. Output for the Matcher.end() Example

-------------------------------------------------------------------My name is Bond. James Bond.
              ^15
My name is Bond. James Bond.
                          ^27

If you execute another find method

matcher.find();

and then execute end

int nonIndex = matcher.end(); //throws IllegalStateException

the end method will throw an IllegalStateException, because there isn’t a valid group to find the end of.


blog comments powered by Disqus
JAVA ARTICLES

- Java Too Insecure, Says Microsoft Researcher
- Google Beats Oracle in Java Ruling
- Deploying Multiple Java Applets as One
- Deploying Java Applets
- Understanding Deployment Frameworks
- Database Programming in Java Using JDBC
- Extension Interfaces and SAX
- Entities, Handlers and SAX
- Advanced SAX
- Conversions and Java Print Streams
- Formatters and Java Print Streams
- Java Print Streams
- Wildcards, Arrays, and Generics in Java
- Wildcards and Generic Methods in Java
- Finishing the Project: Java Web Development ...

Watch our Tech Videos 
Dev Articles Forums 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us 
Weekly Newsletter
 
Developer Updates  
Free Website Content 
Contact Us 
Site Map 
Privacy Policy 
Support 

Developer Shed Affiliates

 




© 2003-2017 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap
Popular Web Development Topics
All Web Development Tutorials