Java
  Home arrow Java arrow Page 4 - Introduction to the Java.util.regex Object...
Dev Articles Forums 
ADO.NET  
Apache  
ASP  
ASP.NET  
C#  
C++  
ColdFusion  
COM/COM+  
Delphi-Kylix  
Design Usability  
Development Cycles  
DHTML  
Embedded Tools  
Flash  
Graphic Design  
HTML  
IIS  
Interviews  
Java  
JavaScript  
MySQL  
Oracle  
Photoshop  
PHP  
Reviews  
Ruby-on-Rails  
SQL  
SQL Server  
Style Sheets  
VB.Net  
Visual Basic  
Web Authoring  
Web Services  
Web Standards  
XML  
Mobile Linux 
App Generation ROI 
IBM® developerWorks 
Sun Developer Network 
Weekly Newsletter
 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid 
Request Media Kit
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
JAVA

Introduction to the Java.util.regex Object Model
By: Apress Publishing
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 4 stars4 stars4 stars4 stars4 stars / 7
    2005-08-18

    Table of Contents:
  • Introduction to the Java.util.regex Object Model
  • public static Pattern compile(String regex, int flags) Throws a PatternSyntaxException
  • public String[] split(CharSequence input)
  • The Matcher Object
  • public int start(int group)
  • public int end(int group)
  • public String group(int group)
  • public boolean find()
  • public Matcher appendReplacement (StringBuffer sb, String replacement)
  • Special Notes
  • New String Rejex-Friendly Methods
  • Summary

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     
    ADVERTISEMENT


    Introduction to the Java.util.regex Object Model - The Matcher Object


    (Page 4 of 12 )

    Figure 2-2 illustrates the methods of the Matcher class. Please take a moment to study them.


    Figure 2-2.  The Matcher class

    The following sections describe the various methods of the Matcher class. But first, let’s briefly revisit the concept of groups, as they figure so prominently in the Matcher object.

    Groups

    Before you can take full advantage of the Matcher object, it’s important that you understand the concept of a group, as some of the more powerful methods of Matcher deal with them. I discuss groups in even greater detail in Chapter 3, but you need an intuitive sense of them to take full advantage of the material in this chapter, so I provide a brief introduction here.

    A group is exactly what it sounds like: a cluster of characters. Often, the term refers to a subportion of the original Pattern, though each group is, by definition, a subgroup of itself. You’re probably already familiar with the concept of groups from your study of arithmetic. For example, the expression

    6 * 7 + 4

    has an implicit sense of grouping. You really read it as

    (6 * 7) + 4

    where (6 * 7) is thought of as a clustering of numbers. Further, you can think of the expression as

    ( (6 * 7) + 4)

    where you can consider ((6 * 7) + 4) another clustering of numbers, this one including the subcluster (6*7). Here, your group has a subgroup. Similarly, regex allows you to group a sequence of characters together. Why? I discuss that shortly. First, let’s concentrate on how.

    Remember that in regular expressions, you describe what you’re looking for in general terms by using a Pattern object. Groups allow you to nest subdescriptions within your expression. As you examine a specific candidate String, the Matcher can keep track of submatches for that expression.

    Creating a grouping of regex characters is very easy. You simply put the expression you want to think of as a group inside a pair of parentheses. That’s it. Thus, the pattern (\w)(\d\d)(\w+) consists of four groups, ranging from 0 to 3. group(0), which is always the original expression itself, is as follows: 

    group(1), which consists of an alphanumeric or underscore character, is circled in the following image: 

    group(2) is circled in the following image: 

    group(3) is circled in the following image: 

    For a specific candidate String, say X99SuperJava, group(0) is always the part of the candidate string that matches the original regex pattern—namely, the pattern (\w)(\d\d)(\w+) itself: 

    The following image indicates the corresponding section of X99SuperJava for group(1):

     

    The corresponding section of X99SuperJava for group(2) is circled in the following image:

     

    The corresponding section of X99SuperJava for group(3) is circled in the following image:

     

    OK, so you know how to designate groups and how to find the corresponding section in a candidate string. Now, why would you? A common reason for doing so is the ability to refer to subsections of the candidate string. For example, you may not know what this particular candidate string, namely X99SuperJava, is, but you can still write a program that rearranges it by creating a new String equal to group(3), appended to group(1), and appended to group(2). In this case, that rearranged String would be SuperJavaX99.

    Chapter 3 provides detailed examples of groups.

    public Pattern pattern()

    The pattern method returns the Pattern that created this particular Matcher object. Consider Listing 2-6.

    Listing 2-6. Matcher Pattern Example

    import java.util.regex.*;
    public class MatcherPatternExample{
      public static void main(String args[]){
          test();
      }
     
    public static void test(){
        Pattern p = Pattern.compile(\\d);
        Matcher m1 = p.matcher("55");
        Matcher m2 = p.matcher("fdshfdgdfh");
       
    System.out.println(m1.pattern() == m2.pattern());
        //return true
      }
    }

    You should notice a few important things here. First, both Matcher objects successfully returned a Pattern, even though m2 wasn’t a successful match. Second, the Matcher objects returned exactly the same Pattern object, because they were both created by that Pattern. Notice that the line

    System.out.println(m1.pattern() == m2.pattern());

    did a == compare and not a .equals compare. This could only have worked if the actual object returned by m1 and m2 was, in fact, exactly the same object.

    public Matcher reset()

    The reset method clears all state information from the Matcher object it’s called on. The Matcher is, in effect, reverted to the state it originally had when you first received a reference to it, as shown in Listing 2-7.

    Listing 2-7. Matcher.reset Example

    import java.util.regex.*;
    /**
     * Demonstrates the usage of the
     * Matcher.reset() method
     */
    public class MatcherResetExample{
      public static void main(String args[]){
        
    test();
      }
      public static void test(){
        
    //create a pattern, and extract a matcher
         Pattern p = Pattern.compile(\\d);
         Matcher m1 = p.matcher("01234");
        
    //exhaust the matcher
         while (m1.find()){
         
    System.out.println("\t\t" + m1.group());
         }
         //now reset the matcher to its original state
         m1.reset();
         System.out.println("After resetting the Matcher");
         //iterate through the matcher again.
         //this would not be possible without a cleared state 
         while (m1.find()){
         
    System.out.println("\t\t" + m1.group());
         }
      }
    }

    Output 2-2 shows the output of this method.

    Output 2-2.  Output for the Matcher.reset Example

    -------------------------------------------------------------------        0
            1
            2
            3
            4
    After resetting the Matcher
            0
            1
            2
            3
            4

    You wouldn’t have been able to iterate through the elements of the Matcher again if it hadn’t been reset.

    public Matcher reset(CharSequence input)

    The reset(CharSequence input) methods clears the state of the Matcher object it’s called on and replaces the candidate String with the new input. This has the same effect as creating a new Matcher object, except that it doesn’t have as much of the associated overhead. This can lead to useful optimization, and it’s one that I often use. Listing 2-8 demonstrates this method’s usage.

    Listing 2-8. Matcher.reset(CharSequence) Example

    import java.util.regex.*;
    /**
     
    * Demonstrates the usage of the
     * Matcher.reset(CharSequence) method
     */
    public class MatcherResetCharSequenceExample{
      public static void main(String args[]){
        
    test();
      }
     
    public static void test(){
         String output="";
         //create a pattern, and extract a matcher
         Pattern p = Pattern.compile(\\d);
         Matcher m1 = p.matcher("01234");
        
    //exhaust the matcher
         while (m1.find()){
         
    System.out.println("\t\t" + m1.group());
         }
         //now reset the matcher with new data
         m1.reset("56789");
         System.out.println("After resetting the Matcher");
         //iterate through the matcher again.
         //this would not be possible without
         while (m1.find()){
         
    System.out.println("\t\t" + m1.group());
         }
      }
    }

    Output 2-3 shows the output of this method.

    Output 2-3. Output for the Matcher.reset(CharSequence)
                          Example

    -------------------------------------------------------------------        0
            1
            2
            3
            4
    After resetting the Matcher
            5
            6
            7
            8
            9

    public int start()

    The start method returns the starting index of the last successful match the Matcher object had. Listing 2-9 demonstrates the use of the Start method. The code in this listing finds the starting index of the word Bond in the candidate My name is Bond. James Bond..

    Listing 2-9. Matcher.start() Example

    /**
     
    * Demonstrates the usage of the
     
    * Matcher.start() method
     */
    public class MatcherStartExample{
      public static void main(String args[]){
        
    test();
      }
      public static void test(){
        
    //create a Matcher and use the Matcher.start() method
         String candidateString = "My name is Bond. James Bond.";
         String matchHelper[] =
         
    {"          ^","                      ^"};
         Pattern p = Pattern.compile("Bond");
         Matcher matcher = p.matcher(candidateString);
        
    //Find the starting point of the first 'Bond' 
         matcher.find();
         int startIndex = matcher.start();
         System.out.println(candidateString); 
         System.out.println(matchHelper[0] + startIndex);
        
    //Find the starting point of the second 'Bond'  
         matcher.find();
         int nextIndex = matcher.start();
         System.out.println(candidateString);
         System.out.println(matchHelper[1] + nextIndex);
      }

    Output 2-4 shows the output of running the start() method.

    Output 2-4. Output for the Matcher.start() Example

    -------------------------------------------------------------------My name is Bond. James Bond.
              ^11
    My name is Bond. James Bond.
                          ^23

    If you execute another find() method

    matcher.find();

    and then execute start()

    int nonIndex = matcher.start(); //throws IllegalStateException

    the start() method will throw an IllegalStateException. I’m surprised that it doesn’t simply return a negative number to indicate an unsuccessful match. Use the boolean returned by the matches()method to determine whether you should call methods such as start().

    More Java Articles
    More By Apress Publishing


     

    Buy this book now. This article is excerpted from chapter three of Java Regular Expressions Taming the Java.util.regex Engine, written by Mehran Habibi (Apress, 2004; ISBN: 1590591070). Check it out at your favorite bookstore. Buy this book now.

    JAVA ARTICLES

    - Deploying Multiple Java Applets as One
    - Deploying Java Applets
    - Understanding Deployment Frameworks
    - Database Programming in Java Using JDBC
    - Extension Interfaces and SAX
    - Entities, Handlers and SAX
    - Advanced SAX
    - Conversions and Java Print Streams
    - Formatters and Java Print Streams
    - Java Print Streams
    - Wildcards, Arrays, and Generics in Java
    - Wildcards and Generic Methods in Java
    - Finishing the Project: Java Web Development ...
    - Generics and Limitations in Java
    - Getting Started with Java Web Development in...






    © 2003-2008 by Developer Shed. All rights reserved. DS Cluster 3 hosted by Hostway
    Stay green...Green IT