Home arrow Java arrow Page 6 - Introduction to the Java.util.regex Object Model
JAVA

Introduction to the Java.util.regex Object Model


If you have ever wanted to know all about the Pattern and Matcher classes of Java's new java.util.regex package, this article is an excellent place to start. It is taken from chapter 2 of the book Java Regular Expressions: Taming the java.util.regex Engine, written by Mehran Habibi (Apress, 2004; ISBN: 1590591070).

Author Info:
By: Apress Publishing
Rating: 4 stars4 stars4 stars4 stars4 stars / 15
August 18, 2005
TABLE OF CONTENTS:
  1. · Introduction to the Java.util.regex Object Model
  2. · public static Pattern compile(String regex, int flags) Throws a PatternSyntaxException
  3. · public String[] split(CharSequence input)
  4. · The Matcher Object
  5. · public int start(int group)
  6. · public int end(int group)
  7. · public String group(int group)
  8. · public boolean find()
  9. · public Matcher appendReplacement (StringBuffer sb, String replacement)
  10. · Special Notes
  11. · New String Rejex-Friendly Methods
  12. · Summary

print this article
SEARCH DEVARTICLES

Introduction to the Java.util.regex Object Model - public int end(int group)
(Page 6 of 12 )

Like the start(int) method, this method allows you to specify which subgroup within a matching you’re interested in. It returns the last index of the matching character sequence plus 1. Listing 2-12 demonstrates the usage of the end(int) method shortly.

In the following example, the regex pattern is B(on)d, which means you have a subgroup within the pattern. The area that has been examined by the Matcher after find() is initially called is highlighted in the box shown in the following image:

 

By calling the end(0) method, you’re implicitly calling it only for the region that has already been parsed, which is boxed in the preceding image. As far as the Matcher is currently concerned, this boxed region is the only one we can discuss at present.

The end(0) method returns the index of last character in group(0) plus 1. Remember that group(0) is the entire expression B(on)d. In this region, the last character is the d in Bond, which is at position 14. Because end(int) adds 1 to that last index, 15 is returned. group(0) is circled in the following image:

 

Similarly, when you call end(1), you’re calling it only for the region that has already been parsed—again, the boxed region. This time, you’re asking for the second grouping in that region. The end(1) method returns the index of the last character in group(1) plus 1. The last character in group(1) is the n in Bond, because the pattern is B(on)d, and the index of that n is 13. Because end adds 1 to the index, 14 is returned. group(1) is circled in the following image:

 

Next, you call matcher.find() again, which results in a new region of the candidate String coming under consideration, as shown here:

 

Calling the end(0) method implicitly calls it only for the new region that has already been parsed, which is boxed in the preceding image. The end(0) method returns the index of last character in group(0) plus 1, which is the d in Bond. The index of d is 26, and because end adds 1 to that number, 27 is returned. group(0) is circled in the following image:

 

Calling end(1) only considers the new region that been parsed—again, the boxed region. This time, you’re asking for the second grouping in the parsed region. The end(1) method returns the index of last character in group(1) plus 1. That last character is the o in Bond, which is at index 25, as shown in the following image. Because end(int) adds 1 to that number, 26 in returned. The result of calling group(1) is as follows:

 

Please refer back to the preceding images as necessary when you read Listing 2-12. The listing is simply a fully working example of the steps you just went through.

Listing 2-12. Matcher.end(int) Example

import java.util.regex.*;
/**
 
* Demonstrates the usage of the
 
* Matcher.end(int) method
 */
public class MatcherEndParamExample{
  public static void main(String args[]){ 
   
test();
  }
  public static void test(){
   
//create a Pattern
     Pattern p = Pattern.compile("B(on)d");
    //create a Matcher and use the Matcher.start(int) method
   
String candidateString = "My name is Bond. James Bond.";
    //create a helpful index for the sake of output
    String matchHelper [] =
                           
{"               ^",
                             "              ^",
                             "                       ^",
                             "                      ^"};
   
Matcher matcher = p.matcher(candidateString);
   
//Find the end point of the first 'B(ond)'
     matcher.find();
     int endIndex = matcher.end(0);
     System.out.println(candidateString); 
     System.out.println(matchHelper[0] + endIndex);
    
//find the end point of the first subgroup (ond)
     int nextIndex = matcher.end(1);
     System.out.println(candidateString); 
     System.out.println(matchHelper[1] + nextIndex);
   
//Find the end point of the second 'B(ond)' 
     matcher.find();
     endIndex = matcher.end(0);
     System.out.println(candidateString); 
     System.out.println(matchHelper[2] + endIndex);
    
//find the end point of the second subgroup (ond)
     nextIndex = matcher.end(1);
     System.out.println(candidateString);
     System.out.println(matchHelper[3] + nextIndex);
  
}
}

Output 2-7 shows the output of running Listing 2-12.

Output 2-7. Output for the Matcher.end(int) Example

-------------------------------------------------------------------My name is Bond. James Bond.
              ^15
My name is Bond. James Bond.
             ^14
My name is Bond. James Bond.
                         
^27
My name is Bond. James Bond.
                        ^26

If you execute another find() method

matcher.find();

and then execute end()

int nonIndex = matcher.end(0); //throws IllegalStateException

the end(int) method will throw an IllegalStateException if the find method isn’t successful or if it isn’t called in the first place. Similarly, it will throw an IndexOutOfBoundsException if you try to refer to a group number that doesn’t exist.

public String group()

The group method can be a powerful and convenient tool in the war against jumbled code. It simply returns the substring of the candidate String that matches the original regex pattern. For example, say you want to extract occurrences of the pattern Bond

Pattern p = Pattern.compile("Bond");

from the candidate String  My name is Bond. James Bond.. You extract the Matcher

Matcher matcher = p.matches("My name is Bond. James Bond.");

and call find() on it.

Matcher.find();

Now the boxed region in the following image is ready to be scrutinized by the Matcher:

 

You can now extract the part of the candidate String that matches your criteria by using the group() method:

String tmp = matcher.group(); \\return "Bond";

This method extracts the matching part of the region under consideration. That area is circled in the following image:

 

A clumsier way of achieving the same result is to use the start and end methods to find the starting and ending indexes of the group within the candidate String, and use a String.substring method to extract that text.

The group() method will throw an IllegalStateException if the find() method is unsuccessful or if it’s never initially called. Listing 2-13 presents a complete working example of this method and the algorithm discussed.

Listing 2-13. The Matcher.group() Method

import java.util.regex.*;
/**
 * Demonstrates the usage of the
 * Matcher.group() method
 */
public class MatcherGroupExample{
  public static void main(String args[]){
     test();
  }
  public static void test(){
     
//create a Pattern
      Pattern p = Pattern.compile("Bond");
     
//create a Matcher and use the Matcher.group() method
      String candidateString = "My name is Bond. James Bond.";
      Matcher matcher = p.matcher(candidateString);
      //extract the group
      matcher.find();
      System.out.println(matcher.group());
 
}
}


blog comments powered by Disqus
JAVA ARTICLES

- Java Too Insecure, Says Microsoft Researcher
- Google Beats Oracle in Java Ruling
- Deploying Multiple Java Applets as One
- Deploying Java Applets
- Understanding Deployment Frameworks
- Database Programming in Java Using JDBC
- Extension Interfaces and SAX
- Entities, Handlers and SAX
- Advanced SAX
- Conversions and Java Print Streams
- Formatters and Java Print Streams
- Java Print Streams
- Wildcards, Arrays, and Generics in Java
- Wildcards and Generic Methods in Java
- Finishing the Project: Java Web Development ...

Watch our Tech Videos 
Dev Articles Forums 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us 
Weekly Newsletter
 
Developer Updates  
Free Website Content 
Contact Us 
Site Map 
Privacy Policy 
Support 

Developer Shed Affiliates

 




© 2003-2017 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap
Popular Web Development Topics
All Web Development Tutorials