Home arrow Java arrow Page 11 - Introduction to the Java.util.regex Object Model
JAVA

Introduction to the Java.util.regex Object Model


If you have ever wanted to know all about the Pattern and Matcher classes of Java's new java.util.regex package, this article is an excellent place to start. It is taken from chapter 2 of the book Java Regular Expressions: Taming the java.util.regex Engine, written by Mehran Habibi (Apress, 2004; ISBN: 1590591070).

Author Info:
By: Apress Publishing
Rating: 4 stars4 stars4 stars4 stars4 stars / 15
August 18, 2005
TABLE OF CONTENTS:
  1. · Introduction to the Java.util.regex Object Model
  2. · public static Pattern compile(String regex, int flags) Throws a PatternSyntaxException
  3. · public String[] split(CharSequence input)
  4. · The Matcher Object
  5. · public int start(int group)
  6. · public int end(int group)
  7. · public String group(int group)
  8. · public boolean find()
  9. · public Matcher appendReplacement (StringBuffer sb, String replacement)
  10. · Special Notes
  11. · New String Rejex-Friendly Methods
  12. · Summary

print this article
SEARCH DEVARTICLES

Introduction to the Java.util.regex Object Model - New String Rejex-Friendly Methods
(Page 11 of 12 )

One of the most obvious changes brought about by the introduction of regular expressions into J2SE is the addition of five powerful new methods in the String class. In the following sections I discuss these changes and offer direction on how you can use them in your future coding adventures.

The Art of Delimiting Strings

Thereís one very important consideration that you have to keep in mind when you work with regular expressions and String objects: Special characters, such as the digit, \d, and the word token, \w, to name just a couple, have to be delimited twice when passed into a String. For example, to search for a digit, you must double the number of \ characters you use. Thus, \d becomes \\d  when you use it in a Java String object.

This doesnít sound overly complicated, but it can be surprisingly difficult to deal with at times. For example, imagine that you want to replace every occurrence of the character d in I want to use a d character with \d. That is, you want the new String to say I want to use a \d character. How do you start?

Of course, you could try this:

String retval = tmp.replaceAll("d","\d");

which fails to compile with an illegal escape character error. OK, so you double up the \ characters to achieve the following:

String retval = tmp.replaceAll("d","\\d");

This manages to compile, but it returns the bizarre result of no change at all. Whatís going on here?

Waitórecall that \\d, as a regular expression, doesnít mean a delimited d character; it means a digit. Well, of course that wouldnít have worked. Your candidate doesnít have any digits. Try adding another \ character to delimit the \\\d:

String retval = tmp.replaceAll("d","\\\d");

This again fails to compile with an illegal escape character error. This is getting frustrating. Didnít the material in this book say to add a \ character when trying to delimit special characters?

Well, actually, it didnít. The material in this book said to double the number of \characters. Because there are currently two \characters, doubling them would create \\\\d as the expression. It looks weird, but try it anyway:

String retval = tmp.replaceAll("d","\\\\d");

Amazingly, it works! But why did it work? Because the first \of \\\\d acts as a delimiter for the second \. Similarly, the third \ acts as a delimiter for the fourth \character.

OK, thatís all clear now. Try to swap out the $ in I want to use a $ character so that the resulting String reads I want to use a \$ character. See the FAQs section at the end of this chapter for the solution.

public boolean matches(String regex)

The String.matches method is probably the regex method youíll use most often. It simply compares the given String to a candidate regex and returns true if the two match exactly in terms of regular expressions. For example, for the String

String num = "4";

comparing 4to \d, which represents a single digit, will return true:

num.matches("\\d");\\returns true

However, comparing 4 (that is, 4 followed by a space) to a digit, \d, will return false. Similarly, comparing 4 (that is, 4 with no space after it) to \d (that is, a digit followed by a space) will also return false.

The point here is that, when you use this method, you have to be careful that the regular expression describes the entirety of the String and does not describe anything that is not a part of the String. Even a space, per the preceding example, can throw your match off-kilter.

Behind the scenes, this method instantiates a Pattern object and simply makes a pass through to the Pattern.matches method discussed earlier. If youíre going to be doing a lot of matches operations, youíll probably find it more efficient to explicitly create Pattern and Matcher objects, and use them directly.

If the regular expression passed in is invalid, then this method will throw a PatternSyntaxException error. If the regular expression is null, matches will throw a NullPointerException.

public String replaceFirst(String regex, String replacement)

The String.replaceFirst method replaces the first occurrence of the regex description, with the String represented by the second parameter of this method. Thus, for the String tmp

String tmp = "I want to eat 5 hamburgers, 7 days a week";

the command

String newTmp = tmp.replaceFirst("\d","900");

sets newTmp to I want to eat 900 hamburgers, 7 days a week.

Behind the scenes, this method instantiates Pattern and Matcher objects, and simply makes a pass through to the Matcher.replaceFirst method discussed earlier. If youíre going to be doing a lot of replaceFirst operations, youíll probably find it more efficient to explicitly create Pattern and Matcher objects, and use them directly.

NOTE  If you explicitly create Pattern and Matcher objects and use them directly, you may want to optimize your patterns by putting in end-of-line $ and beginning-of-line ^ characters where appropriate.

If the regular expression passed in is invalid, then this method will throw a PatternSyntaxException error. If the regular expression is null, replaceFirst will throw a NullPointerException.

public String replaceAll (String regex, String replacement)

The String.replaceAll method replaces every occurrence of the regex description with the String represented by the second parameter of this method. Thus, for the String tmp

String tmp = "I want to eat 5 hamburgers, 7 days a week";

the command

String newTmp = tmp.replaceAll("\d","900");

sets newTmp to I want to eat 900 hamburgers, 900 days a week.

Behind the scenes, this method instantiates Pattern and Matcher objects, and simply makes a pass through to the Matcher.replaceAll method discussed earlier. If youíre going to be doing a lot of replaceAll operations, youíll probably find it more efficient to explicitly create Pattern and Matcher objects, and use them directly.

If the regular expression passed in is invalid, then this method will throw a PatternSyntaxException error. If the regular expression is null, replaceFirst will throw a NullPointerException.

public boolean split(String regex)

This method can be particularly helpful if you need to break up a String into an array of substrings based on some criteriaóin concept, itís similar to the StringTokenizer. However, itís much more powerful and more resource intensive than StringTokenizer because it allows your program to use a regular expressions as the splitting criteria.

This method always returns at least one element. If the split candidate, input, canít be found, then a String array is returned that contains exactly one Stringónamely, the original input. If the input can be found, then a String array is returned. That array contains every substring after an occurrence of the input.

Thus, calling the split(",") method on the String Hello, Dolly, will return a String array consisting of two elements. The first element of the array will contain Hello, and the second will contain Dolly.

There are some subtleties you should be aware of when working with this method. If the String had been Hello,Dolly, with a trailing comma character after the y in Dolly then this method would still have returned a two-element String array consisting of Hello and Dolly. The implicit behavior is that trailing spaces arenít returned.

If theString had been Hello,,,Dolly, then the resulting String array would have had four elements. The return value of the split method, as applied to the pattern, is as follows:

// "Hello,,,Dolly".split() is equal to {"Hello","","","Dolly"}

Behind the scenes, this method instantiates a Pattern object and simply makes a pass through to the Pattern.split method discussed earlier. If youíre going to be doing a lot of split operations, youíll probably find it more efficient to explicitly create Pattern objects and use them directly.

If the regular expression passed in is invalid, then this method will throw a PatternSyntaxException error. If the regular expression is null, replaceFirst will throw a NullPointerException.

public String split(String regex, int limit)

This method returns an array containing substrings of the String object it was called on. Those substrings are the text surrounding the regex expression described by the first parameter, regex. The actual number of elements in the array is controlled by the second parameter, limit. The following sections explain what the different values of limit can mean.

Limit == 0

If you specify that the second parameter, limit, should equal 0, then this method returns an array containing as many matching substrings as possible, and trailing spaces are discarded. Thus, the pattern

Pattern p = "Hello, Dolly".split(",",0);

will return an array consisting of two elements when split against the candidate Hello, Dolly.

Similarly, split will return two elements when matched against Hello, Dolly,that has a trailing comma after the y in Dolly:

String tmp[] = "Hello, Dolly,.".split(",",0);

However, you may not always want this behavior. For example, there may be times when you want to limit the number of elements returned.

Limit > 0

Use a positive limit if youíre interested in only a set number of matches. You should use that number plus 1 as the limit. To split Hello, Dolly, You, Are, My, Favorite when you want only the first two tokens, you would use this:

String[] tmp = "Hello, Dolly, You, Are, My, Favorite".split(",",3);

The value of the resulting String is as follows:

//tmp[0] is  "Hello"
// tmp[1] is "Dolly";

The interesting behavior here is that a third element is returned:

//tmp[2] is "You, Are, My, Favorite";

Using a positive limit can potentially lead to performance enhancements, because the regex engine can stop searching when it meets the specified number of matches.

Limit < 0

Using a negative numberóany negative numberófor the limit tells the regex engine that you want to return as many matches as possible and that you want trailing spaces, if any, to be returned. Thus, for the regex pattern, and the candidate Hello,Dolly, the command

String tmp[] is "Hello,Dolly".split(",", -1);

results in

//tmp == {"Hello","Dolly"};

However, for the String Hello, Dolly,   which has trailing spaces after the comma following Dolly, the method call

String tmp[] = "Hello,Dolly,   ".split(",", -1);

results in

//tmp is equal to {"Hello","Dolly","    "};

Notice that the actual value of the negative limit doesnít matter. Thus

p.split("Hello,Dolly", -1);

is exactly equivalent to

p.split("Hello,Dolly", -100);

Behind the scenes, this method instantiates a Pattern object and simply makes a pass through to the Pattern.split method discussed earlier. If youíre going to be doing a lot of split operations, youíll probably find it more efficient to explicitly create the Pattern object and use it directly.

If the regular expression passed in is invalid, then this method will throw a PatternSyntaxException error. If the regular expression is null, replaceFirst will throw a NullPointerException.


blog comments powered by Disqus
JAVA ARTICLES

- Java Too Insecure, Says Microsoft Researcher
- Google Beats Oracle in Java Ruling
- Deploying Multiple Java Applets as One
- Deploying Java Applets
- Understanding Deployment Frameworks
- Database Programming in Java Using JDBC
- Extension Interfaces and SAX
- Entities, Handlers and SAX
- Advanced SAX
- Conversions and Java Print Streams
- Formatters and Java Print Streams
- Java Print Streams
- Wildcards, Arrays, and Generics in Java
- Wildcards and Generic Methods in Java
- Finishing the Project: Java Web Development ...

Watch our Tech Videos 
Dev Articles Forums 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us 
Weekly Newsletter
 
Developer Updates  
Free Website Content 
Contact Us 
Site Map 
Privacy Policy 
Support 

Developer Shed Affiliates

 




© 2003-2017 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap
Popular Web Development Topics
All Web Development Tutorials