Home arrow Java arrow Page 6 - Regular Expressions
JAVA

Regular Expressions


Regular expressions are a mechanism for telling the Java Virtual Machine (JVM) how to find and manipulate text for you. Using regular expressions to do this is different from the traditional approach. This article compares the two approaches. It is excerpted from Java Regular Expressions: Taming the java.util.regex Engine, written by Mehran Habibi (Apress, 2004; ISBN: 1590591070).

Author Info:
By: Apress Publishing
Rating: 5 stars5 stars5 stars5 stars5 stars / 28
July 28, 2005
TABLE OF CONTENTS:
  1. · Regular Expressions
  2. · Creating Patterns
  3. · Common and Boundary Characters
  4. · Character Classes
  5. · Back References
  6. · Integrating Java with Regular Expressions
  7. · Confirming Name Formats Example
  8. · Finding Duplicate Words Example
  9. · Regular Expression Operations
  10. · Search and Replace
  11. · Comparing Regex and Perl

print this article
SEARCH DEVARTICLES

Regular Expressions - Integrating Java with Regular Expressions
(Page 6 of 11 )

Thus far, youíve worked almost exclusively with regular expressions, but not really with Java. Now itís time to consider how the two interact. The following examples differ from the preceding ones in that they incorporate Java code with regular expressions. They offer a more complete picture of how you can use some J2SE regex syntax.

Some of the regular expressions youíll see here are slightly more advanced than in the examples youíve seen previously, as they build on the fundamentals discussed thus far in the chapter. For example, Listing 1-2 combines groups with quantifiers.

Donít be discouraged if the patterns themselves arenít completely clear to you right now. An intuitive understanding will develop as you continue to read this book. Focus on the concepts and become comfortable with how the Java code and the regex complement each other.

There are only two pieces of information you need to take full advantage of the following examples:

  • Any \-delimited regex expression metacharacter needs to be delimited once again when itís used in Java code. Thus, \d becomes \\d and \s becomes \\s in your Java code. Correspondingly, a more complex expression such as (\d-)?(\d{3}-)?\d{3}-\d{4}\s becomes (\\d-)?(\\d{3}-)?\\d{3}-\\d{4}\\s in Java code. All \ characters are doubled to produce \\ when theyíre used in a String object.

  • In this book, when I talk about a regular expression in and of itself, I donít use the double delimiting mechanism. However, I do when working with specific coding examples.

  • The String.matches(String regex) method is a new method that has been added to the String class. It compares the String itís called on to the given regular expression, regex, and returns true if the regex pattern matches the String exactly. To match exactly means that the String in question canít contain any charactersónot even invisible characters such as newlines and spacesóthat arenít accounted for in the regex pattern.

Confirming Phone Number Formats Example

The code in Listing 1-2 simply determines if the given phone number meets the criteria of being well formatted. It takes advantage of two metacharacters introduced in Table 1-6. Specifically it uses range,{n,m}, indicating that the previous character or class must be repeated at least n times and no more than m times. It also uses the ?character, indicating the previous character or class must be present zero or one time.

The pattern as a whole checks for seven digits preceded by optional country and area codes. Output 1-2 shows the result of running the program, and Table 1-19 dissects the pattern.

Listing 1-2. MatchPhoneNumber.java

import java.util.regex.*;
public class MatchPhoneNumber{
  public static void main(String args[]){
    isPhoneValid(args[0]);
  }
 
/**
  * Confirms that the format for the given phone number is valid.
  * @param phone is a String representing the phone number.
  * @returns true if the phone number format is acceptable.
  */
  public static boolean isPhoneValid(String phone){
    boolean retval=false;
    
String phoneNumberPattern =
      "(\\d-)?(\\d{3}-)?\\d{3}-\\d{4}";
   
retval= phone.matches(phoneNumberPattern);
   
//prepare a message indicating success or failure 
    String msg = "   NO MATCH: pattern:" + phone
          
+ "\r\n             regex: " + phoneNumberPattern;
   
if (retval){
    msg = " MATCH    : pattern:" + phone
       
+ "\r\n            regex: " + phoneNumberPattern;
    }
   
System.out.println(msg +"\r\n");
    return retval;
  }
}

Output 1-2. Result of Running MatchPhoneNumber.java

------------------------------------------------------------------
C:\RegEx\Examples\chapter1>java MatchPhoneNumber "1-999-111-2222"
  MATCH   : pattern:1-999-111-2222
            regex: (\d-)?(\d{3}-)?\d{3}-\d{4}

C:\RegEx\Examples\chapter1>java MatchPhoneNumber "999-111-2222"
  MATCH   : pattern:999-111-2222
            regex: (\d-)?(\d{3}-)?\d{3}-\d{4}

C:\RegEx\Examples\chapter1>java MatchPhoneNumber "1-111-2222"
  MATCH   : pattern:1-111-2222
            regex: (\d-)?(\d{3}-)?\d{3}-\d{4}

C:\RegEx\Examples\chapter1>java MatchPhoneNumber "111-2222"
  MATCH   : pattern:111-2222
            regex: (\d-)?(\d{3}-)?\d{3}-\d{4}

C:\RegEx\Examples\chapter1>java MatchPhoneNumber "1.999-111-2222"
  NO MATCH: pattern:1.999-111-2222
            regex: (\d-)?(\d{3}-)?\d{3}-\d{4}

C:\RegEx\Examples\chapter1>java MatchPhoneNumber "999 111-2222"
  NO MATCH: pattern:999 111-2222
            regex: (\d-)?(\d{3}-)?\d{3}-\d{4}

C:\RegEx\Examples\chapter1>java MatchPhoneNumber "1 111 2222"
  NO MATCH: pattern:1 111 2222
            regex: (\d-)?(\d{3}-)?\d{3}-\d{4}

C:\RegEx\Examples\chapter1>java MatchPhoneNumber "111-JAVA"
  NO MATCH: pattern:111-JAVA
            regex: (\d-)?(\d{3}-)?\d{3}-\d{4}

----------------------------------------------------------

Table 1-19. The Pattern (\d-)?(\d{3}-)?\d{3}-\d{4}

Regex

Description

(

A group consisting of

\d

A digit

-

Followed by a hyphen (-)

)

The end of this group

?

Look for zero or one of the preceding

(

Followed by a group consisting of

\d

A digit

{

Repeated at least

3

Three times

}

End repetition

-

Followed by a hyphen

)

The end of this group

?

Look for zero or one of the preceding

\d

Followed by a digit

{

Repeated at least

Table 1-19. The Pattern (\d-)?(\d{3}-)?\d{3}-\d{4} (Continued) 

Regex

Description

3

Three times

}

End repetition

-

Followed by a hyphen

\d

Followed by a digit

{

Repeated at least

4

Four times

}

End repetition

* In English:Look for a single digit followed by a hyphen. This is optional. Then, look for three digits followed by a hyphen. This is also optional. Next, look for three digits, followed by a hyphen, followed by four digits.

Confirming Zip Codes Example

The code in Listing 1-3 determines if the zip code meets the criterion of being well formatted. It checks for five digits optionally followed by a hyphen and four digits. Output 1-3 shows the result of running the program. Table 1-20 dissects the pattern.

Listing 1-3. MatchZipCodes.java

import java.util.regex.*;
import java.io.*;
public class MatchZipCodes{
  public static void main(String args[]){
    isZipValid(args[0]);
  }
 
/**
 
* Confirms that the format for the given zip code is valid.
  * @param zip is a String representing the zip code.
 
* @returns true if the zip code format is acceptable.
  */
  public static boolean isZipValid(String zip){
   
boolean retval=false;
   
String zipCodePattern = \\d{5}(-\\d{4})?;
    retval = zip.matches(zipCodePattern);
   
//prepare a message indicating success or failure
   
String msg = "   NO MATCH: pattern:" + zip
          + "\r\n              regex: " + zipCodePattern; 
    if (retval){
    msg = "   MATCH  : pattern:" + zip
        + "\r\n            regex: " + zipCodePattern;
    }
    System.out.println(msg +"\r\n");
    return retval;
  }
}
 

Output 1-3. Result of Running MatchZipCodes.java

------------------------------------------------------------------
C:\RegEx\Examples\chapter1>java MatchZipCodes "45643-4443"
  MATCH   : pattern:45643-4443
            regex: \d{5}(-\d{4})?

C:\RegEx\Examples\chapter1>java MatchZipCodes "45643" 
  MATCH   : pattern:45643
            regex: \d{5}(-\d{4})?

C:\RegEx\Examples\chapter1>java MatchZipCodes "443"
  NO MATCH: pattern:443
            regex: \d{5}(-\d{4})?

C:\RegEx\Examples\chapter1>java MatchZipCodes "45643-44435"
  NO MATCH: pattern:45643-44435
            regex: \d{5}(-\d{4})?

C:\RegEx\Examples\chapter1>java MatchZipCodes "45643 44435"
  NO MATCH: pattern:45643 44435
            regex: \d{5}(-\d{4})?

Table 1-20. The Pattern \d{5}(-\d{4})?

Regex

Description

\d

A digit

{

Repeated at least

5

Five times

}

End repetition

(

Open group

-

Consisting of a hyphen

\d

A digit

{

Repeated at least

4

Four times

}

End repetition

)

The end of this group

?

Look for zero or one of the preceding

* In English: Look for five digits, optionally followed by a hyphen and four digits.

Confirming Dates Example

The code in Listing 1-4 checks the format of a given date. It confirms that given date format consists of one or two digits followed by a hyphen, followed by one or two digits, followed by a hyphen, followed by four digits. Output 1-4 shows the result of running the program. Table 1-21 dissects the pattern.

Listing 1-4. MatchDates.java

import java.util.regex.*;
import java.io.*;
public class MatchDates{
  public static void main(String args[]){
    isDateValid(args[0]);
  }
 
/**
 
* Confirms that given date format consists of one or two digits
 
* followed by a hyphen, followed by one or two digits, followed
 
* by a hyphen, followed by four digits
 
* @param date is a String representing the date.
 
* @returns true if date format is acceptable.
  */
  public static boolean isDateValid(String date){
   
boolean retval=false;
   
String datePattern = \\d{1,2}-\\d{1,2}-\\d{4};
    retval = date.matches(datePattern);
   
//prepare a message indicating success or failure
   
String msg = "   NO MATCH: pattern:" + date
           + "\r\n             regexLength: " + datePattern;
    if (retval){
    msg = "   MATCH  : pattern:" + date
        + "\r\n            regexLength: " + datePattern;
    }
    System.out.println(msg +"\r\n");
    return retval;
  }
}

Output 1-4. Result of Running MatchDates.java

------------------------------------------------------------------
C:\RegEx\Examples\chapter1>java MatchDates "04-02-1999" 
  MATCH   : pattern:04-02-1999
            regexLength: \d{1,2}-\d{1,2}-\d{4}

C:\RegEx\Examples\chapter1>java MatchDates "15-42-1999" 
  MATCH   : pattern:15-42-1999
            regexLength: \d{1,2}-\d{1,2}-\d{4}

C:\RegEx\Examples\chapter1>java MatchDates "April fourth nineteen ninety nine"
  NO MATCH: pattern:April fourth nineteen ninety nine
            regexLength: \d{1,2}-\d{1,2}-\d{4}

C:\RegEx\Examples\chapter1>java MatchDates "15-42-20002"
  NO MATCH: pattern:15-42-20002
            regexLength: \d{1,2}-\d{1,2}-\d{4}

C:\RegEx\Examples\chapter1>java MatchDates "02-02-20002"
  NO MATCH: pattern:02-02-20002
            regexLength: \d{1,2}-\d{1,2}-\d{4}

C:\RegEx\Examples\chapter1>java MatchDates "04-02-02"
  NO MATCH: pattern:04-02-02
            regexLength: \d{1,2}-\d{1,2}-\d{4}

C:\RegEx\Examples\chapter1>java MatchDates "04-02-garbage"
  NO MATCH: pattern:04-02-garbage
            regexLength: \d{1,2}-\d{1,2}-\d{4}

----------------------------------------------------------

Table 1-21. The Pattern \d{1,2}-\d{1,2}-\d{4}  

Regex

Description

\d

A digit

{

Repeated at least

1

One time

,

But no more than

2

Two times

}

Close repetition

-

Followed by a hyphen

\d

Followed by a digit

{

Repeated at least

1

One time

,

But no more than

2

Two times

}

Close repetition

-

Followed by a hyphen

Table 1-21. The Pattern \d{1,2}-\d{1,2}-\d{4}(Continued)

Regex

Description

\d >

Followed by a digit

{

Repeated at least

1

Four times

}

Close repetition

* In English: Look for one or two digits, followed by a hyphen, followed by one or two digits, followed by a hyphen, followed by four digits.


blog comments powered by Disqus
JAVA ARTICLES

- Java Too Insecure, Says Microsoft Researcher
- Google Beats Oracle in Java Ruling
- Deploying Multiple Java Applets as One
- Deploying Java Applets
- Understanding Deployment Frameworks
- Database Programming in Java Using JDBC
- Extension Interfaces and SAX
- Entities, Handlers and SAX
- Advanced SAX
- Conversions and Java Print Streams
- Formatters and Java Print Streams
- Java Print Streams
- Wildcards, Arrays, and Generics in Java
- Wildcards and Generic Methods in Java
- Finishing the Project: Java Web Development ...

Watch our Tech Videos 
Dev Articles Forums 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us 
Weekly Newsletter
 
Developer Updates  
Free Website Content 
Contact Us 
Site Map 
Privacy Policy 
Support 

Developer Shed Affiliates

 




© 2003-2017 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap
Popular Web Development Topics
All Web Development Tutorials