Home arrow Java arrow Page 2 - Introduction to the Java.util.regex Object Model
JAVA

Introduction to the Java.util.regex Object Model


If you have ever wanted to know all about the Pattern and Matcher classes of Java's new java.util.regex package, this article is an excellent place to start. It is taken from chapter 2 of the book Java Regular Expressions: Taming the java.util.regex Engine, written by Mehran Habibi (Apress, 2004; ISBN: 1590591070).

Author Info:
By: Apress Publishing
Rating: 4 stars4 stars4 stars4 stars4 stars / 15
August 18, 2005
TABLE OF CONTENTS:
  1. · Introduction to the Java.util.regex Object Model
  2. · public static Pattern compile(String regex, int flags) Throws a PatternSyntaxException
  3. · public String[] split(CharSequence input)
  4. · The Matcher Object
  5. · public int start(int group)
  6. · public int end(int group)
  7. · public String group(int group)
  8. · public boolean find()
  9. · public Matcher appendReplacement (StringBuffer sb, String replacement)
  10. · Special Notes
  11. · New String Rejex-Friendly Methods
  12. · Summary

print this article
SEARCH DEVARTICLES

Introduction to the Java.util.regex Object Model - public static Pattern compile(String regex, int flags) Throws a PatternSyntaxException
(Page 2 of 12 )

The (String regex, int flags) method is a more powerful form of the compile(String) method. The first parameter for this method, regex, is a String that represents a regular expression, as detailed in the Pattern.compile(String regex) method presented earlier. For details on how you must format the String parameter, please see the “public static Pattern compile(String regex) Throws a PatternSyntaxException” section.

The flexibility of this compile method is fully realized by using the second parameter, int flags. The int flags parameter can consist of the following flags or a bit mask created by OR-ing combinations thereof:

  • CANON_EQ

  • CASE_INSENSTIVE

  • COMMENTS

  • DOTALL

  • MULTILINE

  • UNICODE_CASE

  • UNIX_LINES

For example, if you want a match to be successful regardless of the case of the candidate String, then your pattern might look like the following:

Pattern p = Pattern.compile regex,Pattern.CASE_INSENSITIVE);

You can combine the flags by using the | operator. For example, to achieve case-insensitive Unicode matches that include a comment, you might use the following:

Pattern p =
Pattern.compile("t # a compound flag example",
Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE|
Pattern.COMMENT);

The compile(String regex, int flags) method returns a Pattern object.

public String pattern()

This method returns a simple String representation of the regex compiled. It can sometimes be misleading in two ways. First, the string that’s returned doesn’t reflect any flags that were set when the pattern was compiled. Second, the regex String you pass in isn’t always the pattern String you get back out. Specifically, the original String delimitations aren’t shown. Thus, if your original code was this:

Pattern p = Pattern.compile("\\d");

you should expect your output to be \d, with a single \character.

A question naturally arises here: If this method strips out the original delimiting, can you use the resulting String as a regular expression to feed another expression? For example, does Listing 2-2 work?

Listing 2-2. Pattern Matching Example

import java.util.regex.*;
public class PatternMethodExample{
  public static void main(String args[]){   
     reusePatternMethodExample();
  }
 
public static void reusePatternMethodExample(){
     //match a single digit
     Pattern p = Pattern.compile(\\d);
     Matcher matcher = p.matcher("5");
     boolean isOk = matcher.matches();
     System.out.println("original pattern matches " + isOk);
    
//recycle the pattern
     String tmp = p.pattern();
     Pattern p2 = Pattern.compile(tmp);
     matcher = p.matcher("5");
     isOk = matcher.matches();
     System.out.println("second pattern matches " + isOk);
 
}
}

Will this method throw a RuntimeException? After all, the pattern()method returns \d, and an attempt to create a regex pattern using \d as a String will fail to compile.

The answer is no, it won’t throw an exception. Remember that the doubling of the \character is a requirement of the String object’s constructor—it has nothing to do with the regex pattern that the String represents. Thus, once the String is created, the conflict disintegrates.

public Matcher matcher(CharSequence input)

Remember that you create a Pattern object by compiling a description of what you’re looking for. A Pattern is a bit like a personal ad: It lists the features of the thing you’re looking for. Speaking purely conceptually, your patterns might look like the following:

Pattern p = Pattern.compile("She must have red hair, and a temper");

Correspondingly, you’ll need to compare that description against candidates. That is, you’ll want to examine a given String to see if it matches the description you provided.

The Matcher object is designed specifically to help you do this sort of interrogation. I discuss Matcher in detail in the next major section of this chapter, but for now you should know that the Pattern.matcher(CharSequence input) method returns the Matcher that will help get details about how your candidate String compares with the description you passed in.

Pattern.matcher(CharSequence input) takes a CharSequence parameter as an input parameter. CharSequence is a new interface introduced in J2SE 1.4 and retroactively implemented by the String object. Because String implements CharSequence, you can simply pass a String object as the parameter to the Pattern.matcher(CharSequence input) method. I discuss the CharSequence parameter in detail shortly.

In the preceding example, again speaking purely conceptually, you might get your Matcher object as follows:

Matcher m = pattern.matches("Anna");

In J2SE, this Matcher object’s matches() would return true. In real life, YMMV.

public int flags()

Earlier I discussed the constant flags that you can use in compiling your regex pattern. The flags method simply returns an int that represents those flags. For example, to see whether your Pattern class is currently using a given flag (say, the Pattern.COMMENTS flag), simply extract the flag:

int flgs = myPattern.flags();

then “and” (&) that flag to the Pattern.COMMENTS flag:

boolean isUsingCommentFlag =( Pattern.COMMENTS == (Pattern.COMMENTS & flgs)) ;

Similarly, to see if you’re using the CASE_INSENSITIVE flag, use the following code:

boolean isUsingCaseInsensitiveFlag =
(Pattern.CASE_INSENSITIVE == (Pattern. CASE_INSENSITIVE & flgs));

public static boolean matches(String regex,CharSequence input)

Very often, you’ll find that all you need to know about a String is whether it matches a given regular expression exactly. You don’t want to have to create a Pattern object, extract its Matcher object, and interrogate that Matcher.

This static utility method is designed to do exactly that. Internally, it creates the Pattern and Matcher objects you need, compares the regex to the input String, and returns a boolean that tells you whether the two match exactly. Listing 2-3 presents an example of its use.

Listing 2-3. Matches Example

import java.util.regex.*;
public class PatternMatchesTest{
 
public static void main(String args[]){
   
String regex = "ad*";
   
String input = "add";
   
boolean isMatch = Pattern.matches(regex,input);
   
System.out.println(isMatch);//return true
 
}
}

If you’re going to do a lot of comparisons, then it’s more efficient to explicitly create a Pattern object and do your matches manually. However, if you aren’t going to do a lot of comparisons, then matches is a handy utility method.

The Pattern.matches(String regex, CharSequence input) method is also used internally by the String class. As of J2SE 1.4, String has a new method called matches that internally defers to the Pattern.matches method. You might already be using this method without being aware of it.

Of course, this method can throw a PatternSyntaxException if the regex pattern under consideration isn’t well formed.


blog comments powered by Disqus
JAVA ARTICLES

- Java Too Insecure, Says Microsoft Researcher
- Google Beats Oracle in Java Ruling
- Deploying Multiple Java Applets as One
- Deploying Java Applets
- Understanding Deployment Frameworks
- Database Programming in Java Using JDBC
- Extension Interfaces and SAX
- Entities, Handlers and SAX
- Advanced SAX
- Conversions and Java Print Streams
- Formatters and Java Print Streams
- Java Print Streams
- Wildcards, Arrays, and Generics in Java
- Wildcards and Generic Methods in Java
- Finishing the Project: Java Web Development ...

Watch our Tech Videos 
Dev Articles Forums 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us 
Weekly Newsletter
 
Developer Updates  
Free Website Content 
Contact Us 
Site Map 
Privacy Policy 
Support 

Developer Shed Affiliates

 




© 2003-2017 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap
Popular Web Development Topics
All Web Development Tutorials