Home arrow Java arrow Page 3 - Introduction to the Java.util.regex Object Model
JAVA

Introduction to the Java.util.regex Object Model


If you have ever wanted to know all about the Pattern and Matcher classes of Java's new java.util.regex package, this article is an excellent place to start. It is taken from chapter 2 of the book Java Regular Expressions: Taming the java.util.regex Engine, written by Mehran Habibi (Apress, 2004; ISBN: 1590591070).

Author Info:
By: Apress Publishing
Rating: 4 stars4 stars4 stars4 stars4 stars / 15
August 18, 2005
TABLE OF CONTENTS:
  1. · Introduction to the Java.util.regex Object Model
  2. · public static Pattern compile(String regex, int flags) Throws a PatternSyntaxException
  3. · public String[] split(CharSequence input)
  4. · The Matcher Object
  5. · public int start(int group)
  6. · public int end(int group)
  7. · public String group(int group)
  8. · public boolean find()
  9. · public Matcher appendReplacement (StringBuffer sb, String replacement)
  10. · Special Notes
  11. · New String Rejex-Friendly Methods
  12. · Summary

print this article
SEARCH DEVARTICLES

Introduction to the Java.util.regex Object Model - public String[] split(CharSequence input)
(Page 3 of 12 )

This method can be particularly helpful if you need to break up a String into an array of substrings based on some criteria. In concept, itís similar to the StringTokenizer, but itís much more powerful and more resource intensive than StringTokenizer because it allows your program to use regular expressions as the splitting criteria.

This method always returns at least one element. If the split candidate, input, canít be found, then a String array is returned that contains exactly one Stringónamely, the original input.

If the input can be found, then a String array is returned. That array contains every substring after an occurrence of the input. Thus, for the pattern

Pattern p = new Pattern.compile(",");

the split method for Hello, Dolly will return a String array consisting of two elements. The first element of the array will contain Hello, and the second will contain Dolly. That String array is obtained as follows:

String tmp[] = p.split("Hello,Dolly");

In this case, the value returned is

//tmp is equal to { "Hello", "Dolly"}

You should be aware of some subtleties when you work with this method. If the candidate String had been Hello,Dolly, with a trailing comma character after the y in Dolly then this method would still have returned a two-element String array consisting of  Hello and Dolly. The implicit behavior is that trailing spaces arenít returned.

If the input String had been Hello,,,Dolly the resulting String array would have had four elements. The return value of the split method, as applied to the pattern, is

// p.split("Hello,,,Dolly") returns {"Hello","","","Dolly"}

Listing 2-4 provides an example in which the split method is used to split a String into an array based on a single space character.

Listing 2-4. Pattern Splitting Example

import java.util.regex.*;
public class PatternSplitExample{
  public static void main(String args[]){
      splitTest();
  }
 
public static void splitTest(){
   
Pattern p =
    Pattern.compile(" ");
    String tmp = "this is the String I want to split up";
   
String[] tokens = p.split(tmp);
   
for (int i=0; i<tokens.length i++ ){
      System.out.println(tokens[i]);
    }

  }
}

Of course, this is a misuse of the method: You could have used a StringTokenizer to achieve the same result, and it would have been less resource intensive. In light of what you now know, consider Listing 2-5, which is a slightly modified version of Listing 1-12 from Chapter 1 in that it uses the Pattern.split method. Output 2-1 shows the result of running the program.

Listing 2-5. PatternSplit.java

import java.util.regex.*;
public class PatternSplit{
  public static void main(String args[]){
   
String statement = "I will not compromise. I will not "+
    "cooperate. There will be no concession, no conciliation, no "+
    "finding the middle ground, and no give and take.";
   
String tokens[] =null;
   
String splitPattern= "compromise|cooperate|concession|"+
    "conciliation|(finding the middle ground)|(give and take)";
    Pattern p = Pattern.compile(splitPattern); 
    tokens=p.split(statement);
    System.out.println("REGEX PATTERN:\n"+splitPattern + "\n");
    System.out.println("STATEMENT:\n"+statement + "\n");
   
System.out.println("TOKENS:");
    for (int i=0; i < tokens.length; i++){
    System.out.println(tokens[i]);
    }
 
}
}

Output 2-1. Result of Running PatternSplit.java

-------------------------------------------------------------------C:\RegEx\code\chapter1>java Split
REGEX PATTERN:
compromise|cooperate|concession|conciliation|(finding the middle group)|(give
and take)
STATEMENT:
I will not compromise. I will not cooperate. There
will be no concession, no conciliation,
no finding the middle group, and no give and take.
TOKENS:
I will not
. I will not
. There will be no
, no
, no
, and no
.

Youíll notice that Listing 2-5 uses the Pattern.split method, whereas Listing 1-12 uses the new String.split method. In effect, the two are identical because the String.split method simply defers to this method internally.

What youíve done is really quite amazing and might have been ridiculously convoluted without regular expressions. Youíre actually using complicated artificial constructsónamely, English language synonymsóto decompose text. This isnít your fatherís J2SE.

NOTE  The String method further optimizes its search criteria by placing an invisible ^ before the pattern and a $ after it.

public String[] split(CharSequence input, int limit)

This method works in exactly the same way that Pattern.split(CharSequence input) does, except for one variation. The second parameter, limit, allows you to control how many elements are returned, as shown in the following sections.

Limit == 0

If you specify that the second parameter, limit, should equal 0, then this method behaves exactly like its overloaded counterpart. That is, it returns an array containing as many matching substrings as possible, and trailing spaces are discarded. Thus, the pattern

Pattern p = new Pattern.compile(",");

will return an array consisting of two elements when split against the candidate Hello, Dolly. An example of the usage of the method follows:

String tmp[] = p.split("Hello,Dolly", 0);

Similarly, split will return two elements when matched against the String Hello, Dolly, with a trailing comma character after the y in Dolly:

String tmp[] = p.split("Hello,Dolly,", 0);

However, you may not always want this behavior. For example, there may be a time when you want to limit the number of elements returned.

Limit > 0

Use a positive limit if youíre interested in only a certain number of matches. You should use that number +1 as the limit. To split the String Hello, Dolly, You, Are, My, Favorite when you want only the first two tokens, use this:

String[] tmp = pattern.split("Hello, Dolly, You, Are, My, Favorite",3);

The value of the resulting String is as follows:

//tmp[0] is "Hello",
// tmp[1] is "Dolly";

The interesting behavior here is that a third element is returned. In this case, the third element is

//tmp[2] is "You, Are, My, Favorite";

Using a positive limit can potentially lead to performance enhancements, because the regex engine can stop searching when it meets the specified number of matches.

Limit < 0

Using a negative numberóany negative numberófor the limit tells the regex engine that you want to return as many matches as possible and that you want trailing spaces, if any, to be returned. Thus, for the regex pattern

Pattern p = Pattern.compile(",");

and the candidate String Hello,Dolly, the command

String tmp[] = p.split("Hello,Dolly", -1);

results in the following condition:

//tmp is equal to {"Hello","Dolly"};

However, for the String Hello, Dolly,   with trailing spaces after the comma following Dolly, the method call

String tmp[] = p.split("Hello,Dolly,    ", -1);

results in this:

//tmp is equal to {"Hello","Dolly","    "};

Notice that the actual value of the negative limit doesnít matter, thus

p.split("Hello,Dolly", -1);

is exactly equivalent to this:

p.split("Hello,Dolly", -100);


blog comments powered by Disqus
JAVA ARTICLES

- Java Too Insecure, Says Microsoft Researcher
- Google Beats Oracle in Java Ruling
- Deploying Multiple Java Applets as One
- Deploying Java Applets
- Understanding Deployment Frameworks
- Database Programming in Java Using JDBC
- Extension Interfaces and SAX
- Entities, Handlers and SAX
- Advanced SAX
- Conversions and Java Print Streams
- Formatters and Java Print Streams
- Java Print Streams
- Wildcards, Arrays, and Generics in Java
- Wildcards and Generic Methods in Java
- Finishing the Project: Java Web Development ...

Watch our Tech Videos 
Dev Articles Forums 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us 
Weekly Newsletter
 
Developer Updates  
Free Website Content 
Contact Us 
Site Map 
Privacy Policy 
Support 

Developer Shed Affiliates

 




© 2003-2017 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap
Popular Web Development Topics
All Web Development Tutorials