Home arrow Java arrow Page 12 - Introduction to the Java.util.regex Object Model

Introduction to the Java.util.regex Object Model

If you have ever wanted to know all about the Pattern and Matcher classes of Java's new java.util.regex package, this article is an excellent place to start. It is taken from chapter 2 of the book Java Regular Expressions: Taming the java.util.regex Engine, written by Mehran Habibi (Apress, 2004; ISBN: 1590591070).

Author Info:
By: Apress Publishing
Rating: 4 stars4 stars4 stars4 stars4 stars / 15
August 18, 2005
  1. · Introduction to the Java.util.regex Object Model
  2. · public static Pattern compile(String regex, int flags) Throws a PatternSyntaxException
  3. · public String[] split(CharSequence input)
  4. · The Matcher Object
  5. · public int start(int group)
  6. · public int end(int group)
  7. · public String group(int group)
  8. · public boolean find()
  9. · public Matcher appendReplacement (StringBuffer sb, String replacement)
  10. · Special Notes
  11. · New String Rejex-Friendly Methods
  12. · Summary

print this article

Introduction to the Java.util.regex Object Model - Summary
(Page 12 of 12 )

In this chapter, I provided detailed documentation and numerous examples for the Pattern and Matcher classes and their methods. I also discussed the new regex methods of the String class. You should now have a better sense of how some of these objects work and how they work together, and a point of reference when working these methods. In Chapter 3 youíll learn how to integrate these new tools, the regex language, and the Java language proper into a unified whole.


Q:  How do I start using the regex package?

A:  Simply import the java.util.regex.* package.

Q:  How do I find out whether a string contains a substring?


If youíre really looking for a explicit substring, instead of a pattern description, then use the String.indexOf method. However, if you need to actually confirm the existence of a pattern, then you have two paths open to you. The first is to use a variation of the String.split method with a negative number as the second parameter:

String tokens[] = candidate.split(subStringPattern,-1);

and make sure the resulting array has more than a single element:

boolean isThere = tokens.length > 1? true: false;

The problem here is that if the phrase youíre looking for just happens to be the last element in the candidate sentence, then size of the array will be still be 1, which will lead to a false conclusion. Try this with the candidate this is the phrase I want and the phrase description want. with a period trailing the t character.

Your second option is to use a short method like the following, which will always work:

* Confirms, or denies, the existence of the regex
  * as part of the candidate String.
  * @param the <code>String</code> candidate
* @param the <code>String</code> subStringPattern
* @return <code>boolean</code> true if the regex
* describes part of the
* @author M Habibi
public static boolean
  containsSubtring(String candidate, String  
     boolean retval = false;
    //compile the pattern
    Pattern pattern = Pattern.compile
    //see if any part of the candidate contains the
    Matche matcher = pattern.matcher(candidate);
    retval = matcher.find();
  return retval;


How do I confirm the existence of the nth occurrence of a substring?


The solution here is similar to the one given previously, including the usage of the String.split method. The same limitations apply. As far the method-based solution is concerned, the only modifications that you need to make to the method are the following.

First, adjust the method signature so that it accepts a third parameter as the number of interactions, so that the signature looks like the following:

public static boolean containsSubtring(
  String candidate,
  String subStringPattern,
  int n

Second, add the loop indicated in bold:

boolean retval = false;
//compile the patterns
Pattern pattern = Pattern.compile(subStringPattern);
//see if any part of the candidate contains the
Matcher matcher = pattern.matcher(candidate);
for (int i=0; i< n; i++)
  retval = matcher.find();
  if (!retval) break;
return retval;


How do I swap out the $ in I want to use a $ character so that the resulting string reads I want to use a \$ character?


For the candidate String

    String candidate = "I want to use a $ character";

the solution is the somewhat counterintuitive regex pattern

String newString = candidate.replaceAll("\\$","\\\\\\$");

The initial parameter, \\$, is clear enough. You want the dollar sign, which just happens to be a regex metacharacter meaning end-of-line. Because you do want the actual dollar sign character and not the end-of-line, you have to delimit the dollar sign, producing the pattern \$.

However, you also need to meet the needs of the String objectís constructor, which expects to treat anything following a \ as a String metacharacter. Because \$ isnít a String metacharacter (itís a regex metacharacter), you need to tell the String objectís constructor to ignore the \. Thus, you need to delimit it once again, producing \\$.

This leads to the second part of the pattern: \\\\\\$. Here, the first \ delimits the second \, the third \ delimits the fourth \, and the fifth \ delimits the sixth. Thus, the String \\\\\\$ results in \\\\$.

Internally, the method has to rip out the \$ part of I want to use a $ character and replace it with something, but what is that something? The method has decomposed the original String you gave it into two parts: a substring consisting of I want to use a and a second substring consisting of character.

Normally, the Matcher.replaceAll method inserts whatever you give it between these two substrings, concatenates the result, and returns that. However, because what you gave it just happens to contain the dollar symbol, there is an added wrinkle.

As the Matcher.replaceAll description in this chapter shows, the dollar sign has special significance in the replaceAll method. Itís used to refer to a sub-group that has been captured by the pattern. Because you donít want it to have that significance, you need to delimit it again. Hence, the pattern \\\$, in which the first \ delimits the second \, and the third \ delimits the $, thus logically producing \$.


DISCLAIMER: The content provided in this article is not warranted or guaranteed by Developer Shed, Inc. The content provided is intended for entertainment and/or educational purposes in order to introduce to the reader key ideas, concepts, and/or product reviews. As such it is incumbent upon the reader to employ real-world tactics for security and implementation of best practices. We are not liable for any negative consequences that may result from implementing any information covered in our articles or tutorials. If this is a hardware review, it is not recommended to open and/or modify your hardware.

blog comments powered by Disqus

- Java Too Insecure, Says Microsoft Researcher
- Google Beats Oracle in Java Ruling
- Deploying Multiple Java Applets as One
- Deploying Java Applets
- Understanding Deployment Frameworks
- Database Programming in Java Using JDBC
- Extension Interfaces and SAX
- Entities, Handlers and SAX
- Advanced SAX
- Conversions and Java Print Streams
- Formatters and Java Print Streams
- Java Print Streams
- Wildcards, Arrays, and Generics in Java
- Wildcards and Generic Methods in Java
- Finishing the Project: Java Web Development ...

Watch our Tech Videos 
Dev Articles Forums 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us 
Weekly Newsletter
Developer Updates  
Free Website Content 
Contact Us 
Site Map 
Privacy Policy 

Developer Shed Affiliates


© 2003-2018 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap
Popular Web Development Topics
All Web Development Tutorials