Java
  Home arrow Java arrow Page 6 - Regular Expressions
Dev Articles Forums 
ADO.NET  
Apache  
ASP  
ASP.NET  
C#  
C++  
ColdFusion  
COM/COM+  
Delphi-Kylix  
Design Usability  
Development Cycles  
DHTML  
Embedded Tools  
Flash  
Graphic Design  
HTML  
IIS  
Interviews  
Java  
JavaScript  
MySQL  
Oracle  
Photoshop  
PHP  
Reviews  
Ruby-on-Rails  
SQL  
SQL Server  
Style Sheets  
VB.Net  
Visual Basic  
Web Authoring  
Web Services  
Web Standards  
XML  
Dedicated Servers  
Moblin 
JMSL Numerical Library 
IBM® developerWorks 
Sun Developer Network 
Weekly Newsletter
 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid 
Request Media Kit
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
JAVA

Regular Expressions
By: Apress Publishing
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 5 stars5 stars5 stars5 stars5 stars / 10
    2005-07-28

    Table of Contents:
  • Regular Expressions
  • Creating Patterns
  • Common and Boundary Characters
  • Character Classes
  • Back References
  • Integrating Java with Regular Expressions
  • Confirming Name Formats Example
  • Finding Duplicate Words Example
  • Regular Expression Operations
  • Search and Replace
  • Comparing Regex and Perl

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     
    ADVERTISEMENT


    Regular Expressions - Integrating Java with Regular Expressions


    (Page 6 of 11 )

    Thus far, you’ve worked almost exclusively with regular expressions, but not really with Java. Now it’s time to consider how the two interact. The following examples differ from the preceding ones in that they incorporate Java code with regular expressions. They offer a more complete picture of how you can use some J2SE regex syntax.

    Some of the regular expressions you’ll see here are slightly more advanced than in the examples you’ve seen previously, as they build on the fundamentals discussed thus far in the chapter. For example, Listing 1-2 combines groups with quantifiers.

    Don’t be discouraged if the patterns themselves aren’t completely clear to you right now. An intuitive understanding will develop as you continue to read this book. Focus on the concepts and become comfortable with how the Java code and the regex complement each other.

    There are only two pieces of information you need to take full advantage of the following examples:

    • Any \-delimited regex expression metacharacter needs to be delimited once again when it’s used in Java code. Thus, \d becomes \\d and \s becomes \\s in your Java code. Correspondingly, a more complex expression such as (\d-)?(\d{3}-)?\d{3}-\d{4}\s becomes (\\d-)?(\\d{3}-)?\\d{3}-\\d{4}\\s in Java code. All \ characters are doubled to produce \\ when they’re used in a String object.

    • In this book, when I talk about a regular expression in and of itself, I don’t use the double delimiting mechanism. However, I do when working with specific coding examples.

    • The String.matches(String regex) method is a new method that has been added to the String class. It compares the String it’s called on to the given regular expression, regex, and returns true if the regex pattern matches the String exactly. To match exactly means that the String in question can’t contain any characters—not even invisible characters such as newlines and spaces—that aren’t accounted for in the regex pattern.

    Confirming Phone Number Formats Example

    The code in Listing 1-2 simply determines if the given phone number meets the criteria of being well formatted. It takes advantage of two metacharacters introduced in Table 1-6. Specifically it uses range,{n,m}, indicating that the previous character or class must be repeated at least n times and no more than m times. It also uses the ?character, indicating the previous character or class must be present zero or one time.

    The pattern as a whole checks for seven digits preceded by optional country and area codes. Output 1-2 shows the result of running the program, and Table 1-19 dissects the pattern.

    Listing 1-2. MatchPhoneNumber.java

    import java.util.regex.*;
    public class MatchPhoneNumber{
      public static void main(String args[]){
        isPhoneValid(args[0]);
      }
     
    /**
      * Confirms that the format for the given phone number is valid.
      * @param phone is a String representing the phone number.
      * @returns true if the phone number format is acceptable.
      */
      public static boolean isPhoneValid(String phone){
        boolean retval=false;
        
    String phoneNumberPattern =
          "(\\d-)?(\\d{3}-)?\\d{3}-\\d{4}";
       
    retval= phone.matches(phoneNumberPattern);
       
    //prepare a message indicating success or failure 
        String msg = "   NO MATCH: pattern:" + phone
              
    + "\r\n             regex: " + phoneNumberPattern;
       
    if (retval){
        msg = " MATCH    : pattern:" + phone
           
    + "\r\n            regex: " + phoneNumberPattern;
        }
       
    System.out.println(msg +"\r\n");
        return retval;
      }
    }

    Output 1-2. Result of Running MatchPhoneNumber.java

    ------------------------------------------------------------------
    C:\RegEx\Examples\chapter1>java MatchPhoneNumber "1-999-111-2222"
      MATCH   : pattern:1-999-111-2222
                regex: (\d-)?(\d{3}-)?\d{3}-\d{4}

    C:\RegEx\Examples\chapter1>java MatchPhoneNumber "999-111-2222"
      MATCH   : pattern:999-111-2222
                regex: (\d-)?(\d{3}-)?\d{3}-\d{4}

    C:\RegEx\Examples\chapter1>java MatchPhoneNumber "1-111-2222"
      MATCH   : pattern:1-111-2222
                regex: (\d-)?(\d{3}-)?\d{3}-\d{4}

    C:\RegEx\Examples\chapter1>java MatchPhoneNumber "111-2222"
      MATCH   : pattern:111-2222
                regex: (\d-)?(\d{3}-)?\d{3}-\d{4}

    C:\RegEx\Examples\chapter1>java MatchPhoneNumber "1.999-111-2222"
      NO MATCH: pattern:1.999-111-2222
                regex: (\d-)?(\d{3}-)?\d{3}-\d{4}

    C:\RegEx\Examples\chapter1>java MatchPhoneNumber "999 111-2222"
      NO MATCH: pattern:999 111-2222
                regex: (\d-)?(\d{3}-)?\d{3}-\d{4}

    C:\RegEx\Examples\chapter1>java MatchPhoneNumber "1 111 2222"
      NO MATCH: pattern:1 111 2222
                regex: (\d-)?(\d{3}-)?\d{3}-\d{4}

    C:\RegEx\Examples\chapter1>java MatchPhoneNumber "111-JAVA"
      NO MATCH: pattern:111-JAVA
                regex: (\d-)?(\d{3}-)?\d{3}-\d{4}

    ----------------------------------------------------------

    Table 1-19. The Pattern (\d-)?(\d{3}-)?\d{3}-\d{4}

    Regex

    Description

    (

    A group consisting of

    \d

    A digit

    -

    Followed by a hyphen (-)

    )

    The end of this group

    ?

    Look for zero or one of the preceding

    (

    Followed by a group consisting of

    \d

    A digit

    {

    Repeated at least

    3

    Three times

    }

    End repetition

    -

    Followed by a hyphen

    )

    The end of this group

    ?

    Look for zero or one of the preceding

    \d

    Followed by a digit

    {

    Repeated at least

    Table 1-19. The Pattern (\d-)?(\d{3}-)?\d{3}-\d{4} (Continued) 

    Regex

    Description

    3

    Three times

    }

    End repetition

    -

    Followed by a hyphen

    \d

    Followed by a digit

    {

    Repeated at least

    4

    Four times

    }

    End repetition

    * In English:Look for a single digit followed by a hyphen. This is optional. Then, look for three digits followed by a hyphen. This is also optional. Next, look for three digits, followed by a hyphen, followed by four digits.

    Confirming Zip Codes Example

    The code in Listing 1-3 determines if the zip code meets the criterion of being well formatted. It checks for five digits optionally followed by a hyphen and four digits. Output 1-3 shows the result of running the program. Table 1-20 dissects the pattern.

    Listing 1-3. MatchZipCodes.java

    import java.util.regex.*;
    import java.io.*;
    public class MatchZipCodes{
      public static void main(String args[]){
        isZipValid(args[0]);
      }
     
    /**
     
    * Confirms that the format for the given zip code is valid.
      * @param zip is a String representing the zip code.
     
    * @returns true if the zip code format is acceptable.
      */
      public static boolean isZipValid(String zip){
       
    boolean retval=false;
       
    String zipCodePattern = \\d{5}(-\\d{4})?;
        retval = zip.matches(zipCodePattern);
       
    //prepare a message indicating success or failure
       
    String msg = "   NO MATCH: pattern:" + zip
              + "\r\n              regex: " + zipCodePattern; 
        if (retval){
        msg = "   MATCH  : pattern:" + zip
            + "\r\n            regex: " + zipCodePattern;
        }
        System.out.println(msg +"\r\n");
        return retval;
      }
    }
     

    Output 1-3. Result of Running MatchZipCodes.java

    ------------------------------------------------------------------
    C:\RegEx\Examples\chapter1>java MatchZipCodes "45643-4443"
      MATCH   : pattern:45643-4443
                regex: \d{5}(-\d{4})?

    C:\RegEx\Examples\chapter1>java MatchZipCodes "45643" 
      MATCH   : pattern:45643
                regex: \d{5}(-\d{4})?

    C:\RegEx\Examples\chapter1>java MatchZipCodes "443"
      NO MATCH: pattern:443
                regex: \d{5}(-\d{4})?

    C:\RegEx\Examples\chapter1>java MatchZipCodes "45643-44435"
      NO MATCH: pattern:45643-44435
                regex: \d{5}(-\d{4})?

    C:\RegEx\Examples\chapter1>java MatchZipCodes "45643 44435"
      NO MATCH: pattern:45643 44435
                regex: \d{5}(-\d{4})?

    Table 1-20. The Pattern \d{5}(-\d{4})?

    Regex

    Description

    \d

    A digit

    {

    Repeated at least

    5

    Five times

    }

    End repetition

    (

    Open group

    -

    Consisting of a hyphen

    \d

    A digit

    {

    Repeated at least

    4

    Four times

    }

    End repetition

    )

    The end of this group

    ?

    Look for zero or one of the preceding

    * In English: Look for five digits, optionally followed by a hyphen and four digits.

    Confirming Dates Example

    The code in Listing 1-4 checks the format of a given date. It confirms that given date format consists of one or two digits followed by a hyphen, followed by one or two digits, followed by a hyphen, followed by four digits. Output 1-4 shows the result of running the program. Table 1-21 dissects the pattern.

    Listing 1-4. MatchDates.java

    import java.util.regex.*;
    import java.io.*;
    public class MatchDates{
      public static void main(String args[]){
        isDateValid(args[0]);
      }
     
    /**
     
    * Confirms that given date format consists of one or two digits
     
    * followed by a hyphen, followed by one or two digits, followed
     
    * by a hyphen, followed by four digits
     
    * @param date is a String representing the date.
     
    * @returns true if date format is acceptable.
      */
      public static boolean isDateValid(String date){
       
    boolean retval=false;
       
    String datePattern = \\d{1,2}-\\d{1,2}-\\d{4};
        retval = date.matches(datePattern);
       
    //prepare a message indicating success or failure
       
    String msg = "   NO MATCH: pattern:" + date
               + "\r\n             regexLength: " + datePattern;
        if (retval){
        msg = "   MATCH  : pattern:" + date
            + "\r\n            regexLength: " + datePattern;
        }
        System.out.println(msg +"\r\n");
        return retval;
      }
    }

    Output 1-4. Result of Running MatchDates.java

    ------------------------------------------------------------------
    C:\RegEx\Examples\chapter1>java MatchDates "04-02-1999" 
      MATCH   : pattern:04-02-1999
                regexLength: \d{1,2}-\d{1,2}-\d{4}

    C:\RegEx\Examples\chapter1>java MatchDates "15-42-1999" 
      MATCH   : pattern:15-42-1999
                regexLength: \d{1,2}-\d{1,2}-\d{4}

    C:\RegEx\Examples\chapter1>java MatchDates "April fourth nineteen ninety nine"
      NO MATCH: pattern:April fourth nineteen ninety nine
                regexLength: \d{1,2}-\d{1,2}-\d{4}

    C:\RegEx\Examples\chapter1>java MatchDates "15-42-20002"
      NO MATCH: pattern:15-42-20002
                regexLength: \d{1,2}-\d{1,2}-\d{4}

    C:\RegEx\Examples\chapter1>java MatchDates "02-02-20002"
      NO MATCH: pattern:02-02-20002
                regexLength: \d{1,2}-\d{1,2}-\d{4}

    C:\RegEx\Examples\chapter1>java MatchDates "04-02-02"
      NO MATCH: pattern:04-02-02
                regexLength: \d{1,2}-\d{1,2}-\d{4}

    C:\RegEx\Examples\chapter1>java MatchDates "04-02-garbage"
      NO MATCH: pattern:04-02-garbage
                regexLength: \d{1,2}-\d{1,2}-\d{4}

    ----------------------------------------------------------

    Table 1-21. The Pattern \d{1,2}-\d{1,2}-\d{4}  

    Regex

    Description

    \d

    A digit

    {

    Repeated at least

    1

    One time

    ,

    But no more than

    2

    Two times

    }

    Close repetition

    -

    Followed by a hyphen

    \d

    Followed by a digit

    {

    Repeated at least

    1

    One time

    ,

    But no more than

    2

    Two times

    }

    Close repetition

    -

    Followed by a hyphen

    Table 1-21. The Pattern \d{1,2}-\d{1,2}-\d{4}(Continued)

    Regex

    Description

    \d >

    Followed by a digit

    {

    Repeated at least

    1

    Four times

    }

    Close repetition

    * In English: Look for one or two digits, followed by a hyphen, followed by one or two digits, followed by a hyphen, followed by four digits.

    More Java Articles
    More By Apress Publishing


     

    Buy this book now. This article is excerpted from Java Regular Expressions: Taming the java.util.regex Engine, written by Mehran Habibi (Apress, 2004; ISBN: 1590591070). Check it out at your favorite bookstore. Buy this book now.

    JAVA ARTICLES

    - Deploying Multiple Java Applets as One
    - Deploying Java Applets
    - Understanding Deployment Frameworks
    - Database Programming in Java Using JDBC
    - Extension Interfaces and SAX
    - Entities, Handlers and SAX
    - Advanced SAX
    - Conversions and Java Print Streams
    - Formatters and Java Print Streams
    - Java Print Streams
    - Wildcards, Arrays, and Generics in Java
    - Wildcards and Generic Methods in Java
    - Finishing the Project: Java Web Development ...
    - Generics and Limitations in Java
    - Getting Started with Java Web Development in...







    © 2003-2008 by Developer Shed. All rights reserved. DS Cluster 6 hosted by Hostway