Java
  Home arrow Java arrow Page 7 - Regular Expressions
Dev Articles Forums 
ADO.NET  
Apache  
ASP  
ASP.NET  
C#  
C++  
ColdFusion  
COM/COM+  
Delphi-Kylix  
Design Usability  
Development Cycles  
DHTML  
Embedded Tools  
Flash  
Graphic Design  
HTML  
IIS  
Interviews  
Java  
JavaScript  
MySQL  
Oracle  
Photoshop  
PHP  
Reviews  
Ruby-on-Rails  
SQL  
SQL Server  
Style Sheets  
VB.Net  
Visual Basic  
Web Authoring  
Web Services  
Web Standards  
XML  
Dedicated Servers  
Actuate Whitepapers 
Moblin 
IBM® developerWorks 
Sun Developer Network 
Weekly Newsletter
 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid 
Request Media Kit
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
JAVA

Regular Expressions
By: Apress Publishing
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 5 stars5 stars5 stars5 stars5 stars / 10
    2005-07-28

    Table of Contents:
  • Regular Expressions
  • Creating Patterns
  • Common and Boundary Characters
  • Character Classes
  • Back References
  • Integrating Java with Regular Expressions
  • Confirming Name Formats Example
  • Finding Duplicate Words Example
  • Regular Expression Operations
  • Search and Replace
  • Comparing Regex and Perl

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     
    ADVERTISEMENT

    Stay one step ahead of the competition. Evaluate and give feedback on some of the hottest web development tools on the market today. Make your opinion heard! Click Here

    Regular Expressions - Confirming Name Formats Example


    (Page 7 of 11 )

    The code in Listing 1-5 determines if the given name meets the criterion of being well formatted. It looks for a first name token, an optional middle name token, and finally a last name token. For this example’s purposes, a name token consists of a capital letter followed by any number of lowercase letters.

    This example is interesting because it takes advantage of Java’s robustness to a degree that the previous example didn’t. Specifically, you define what you mean when you say a “name token”:

    String nameToken ="\\p{Upper}(\\p{Lower}+\\s?)";

    Then you use that definition later:

    String namePattern = "("+nameToken+"){2,3}";

    NOTE   \p{Upper} and \p{Lower} are described shortly. They simply mean any uppercase character and any lowercase character, respectively.

    This helps to keep the regex pattern from becoming overwhelming, and it also helps to isolate errors. As the examples in this book grow more ambitious, you’ll start to see that coupling regular expressions with Java’s powerful language can offer benefits that would, at best, be terse using regular expressions alone. Listing 1-5 shows the program MatchNameFormats.java, Output 1-5 shows the result of running the program, and Table 1-22 dissects the pattern.

    Listing 1-5. MatchNameFormats.java

    import Java.util.regex.*;
    import java.io.*;
    public class MatchNameFormats{
      public static void main(String args[]){
       
    isNameValid(args[0]);
      }
     
    /**
     
    * Confirms that the format for the given name is valid.
      * @param name is a String representing the name.
     
    * @returns true if the name format is acceptable.
      */
      public static boolean isNameValid(String name){
       
    boolean retval=false;
       
    String nameToken ="\\p{Upper}(\\p{Lower}+\\s?)";
       
    String namePattern = "("+nameToken+"){2,3}";
       
    retval = name.matches(namePattern);
       
    //prepare a message indicating success or failure
        String msg = "NO MATCH: pattern:" + name
             
    + "\r\n           regex :" + namePattern;
       
    if (retval){
        msg = "MATCH     pattern:"  + name
            
    + "\r\n           regex :" + namePattern;
        }
       
    System.out.println(msg +"\r\n");
        return retval;
        }
    }

    Output 1-5. Result of Running MatchNameFormats.java

    ------------------------------------------------------------------
    C:\RegEx\Examples\chapter1>java MatchNameFormats "John Smith"
    MATCH    pattern:John Smith
             
    regex :(\p{Upper}(\p{Lower}+\s?)){2,3}
    C:\RegEx\Examples\chapter1>java MatchNameFormats "John McGee"
    MATCH    pattern:John McGee
             
    regex :(\p{Upper}(\p{Lower}+\s?)){2,3}
    C:\RegEx\Examples\chapter1>java MatchNameFormats "John Willliam Smith"
    MATCH    pattern:John Willliam Smith
             
    regex :(\p{Upper}(\p{Lower}+\s?)){2,3}
    C:\RegEx\Examples\chapter1>java MatchNameFormats "John Q Smith"
    NO MATCH: pattern:John Q Smith
              
    regex :(\p{Upper}(\p{Lower}+\s?)){2,3}
    C:\RegEx\Examples\chapter1>java MatchNameFormats "John allen Smith"
    NO MATCH: pattern:John allen Smith
              
    regex :(\p{Upper}(\p{Lower}+\s?)){2,3}
    C:\RegEx\Examples\chapter1>java MatchNameFormats "John"
    NO MATCH: pattern:John
              
    regex :(\p{Upper}(\p{Lower}+\s?)){2,3}

    Table 1-22. The Pattern (\p{Upper}(\p{Lower}+\s?)){2,3}  

    Regex

    Description

    (

    A group consisting of

    \p{Upper}

    An uppercase character

    (

    Followed by a inner group consisting of

    \p{Lower}

    A lowercase character

    +

    Repeated one or more times

    \s?

    Followed by an optional space

    )

    The end of the inner group

    Table 1-22. The Pattern (\p{Upper}(\p{Lower}+\s?)){2,3} (Continued)  

    Regex

    Description

    )

    The end of the outer group

    {

    Repeated at least

    2

    Two times

    ,

    But no more than

    3

    Three times

    }

    End repetition

    * In English: Look for two or three words beginning with a capital letter followed by any numb of lowercase letters. Each word could be followed by a single space.

    A couple of questions naturally arise from this example:

    • Why did John Q Public fail? Because Q is not a name token, as you’ve defined name tokens (i.e., a capital letter followed by one or more lowercase letters).

    • Why did John allen Smith fail? Because allen doesn’t start with a capital letter.

    • Why did John fail? Although John is a valid name token, it isn’t repeated two or three name tokens. It’s simply one name token.

    • Why did John McGee pass? McGee isn’t an uppercase letter followed by any number of lowercase letters.Try to puzzle this one out on your own. It’s answered in the “FAQs” section at the end of the chapter.

    This example uses the composition technique mentioned at the beginning of this chapter. That is, it uses patterns previous defined to compose a new pattern. If you think about it, this is a very engineer-like thing to do: Build small blocks, then use those blocks to build more complicated pieces.

    Confirming Addresses Example

    The code in Listing 1-6 simply determines if the given address meets the criterion of being well formatted. It takes advantage of the name and zip code patterns created earlier, and it adds its own address pattern. Output 1-6 shows the result of running the program. Table 1-23 dissects the pattern.

    Listing 1-6. MatchAddress.java

    import java.util.regex.*;
    import java.io.*;
    public class MatchAddress{
      public static void main(String args[]){
          isAddressValid(args[0]);
      }
     
    /**
     
    * Confirms that the format for the given address is valid.
     
    * @param addr is a String representing the address
     
    * @returns true if the zip code format is acceptable.
      */
      public static boolean isAddressValid(String addr){
        boolean retval = false;
        //use the name pattern created earlier.
        String nameToken ="\\p{Upper}(\\p{Lower}+\\s?)";
       
    String namePattern = "("+nameToken+"){2,3}";
       
    //use the zip code pattern created earlier.
        String zipCodePattern = \\d{5}(-\\d{4})?;
        //construct an address pattern
        String addressPattern = "^" + namePattern
         
    + "\\w+ .*, \\w+ " + zipCodePattern +"$";
       
    retval= addr.matches(addressPattern);
       
    //prepare a message indicating success or failure
        String msg = "NO MATCH\npattern:\n" + addr
         
    + "\nregexLength:\n "
         
    + addressPattern;
       
    if (retval){
        msg = "MATCH\npattern:\n" + addr
         
    + "\nregexLength:\n "
         
    + addressPattern;
        } 
        
    System.out.println(msg +"\r\n");
        return retval;
      }
    }

    Output 1-6. Result of Running MatchAddress.java

    ------------------------------------------------------------------
    C:\RegEx\chapter_1\Examples\chapter1>
    java MatchAddress "John Smith 888 Luck Street,
    NY 64332"
    MATCH
    pattern:
     John Smith 888 Luck Street, NY 64332
    regexLength:
     ^(\p{Upper}(\p{Lower}+\s?)){2,3}\w+ .*, \w+\d{5}(-\d{4})?$

    C:\RegEx\chapter_1\Examples\chapter1>
    java MatchAddress "John A. Smith 888 Luck Stree
    t, NY 64332-4453"
    NO MATCH
    pattern:
     John A. Smith 888 Luck Street, NY 64332-4453
    regexLength:
     ^(\p{Upper}(\p{Lower}+\s?)){2,3}\w+ .*, \w+\d{5}(-\d{4})?$

    C:\RegEx\chapter_1\Examples\chapter1>
    java MatchAddress "John Allen Smith 888 Luck Street, NY 64332-4453"
    MATCH
    pattern:
     John Allen Smith 888 Luck Street, NY 64332-4453 regexLength:
     ^(\p{Upper}(\p{Lower}+\s?)){2,3}\w+ .*, \w+\d{5}(-\d{4})?$

    C:\RegEx\chapter_1\Examples\chapter1>
    java MatchAddress "888 Luck Street, NY 64332"
    NO MATCH
    pattern:
     888 Luck Street, NY 64332
    regexLength:
     ^(\p{Upper}(\p{Lower}+\s?)){2,3}\w+ .*, \w+\d{5}(-\d{4})?$

    C:\RegEx\chapter_1\Examples\chapter1>
    java MatchAddress "P.O. BOX 888 Luck Street, NY 64332-4453" NO MATCH
    pattern:
     
    P.O. BOX 888 Luck Street, NY 64332-4453
    regexLength:
     ^(\p{Upper}(\p{Lower}+\s?)){2,3}\w+ .*, \w+\d{5}(-\d{4})?$

    C:\RegEx\chapter_1\Examples\chapter1>
    java MatchAddress "John Allen Smith 888 Luck st., NY"
    NO MATCH
    pattern:
     John Allen Smith 888 Luck st., NY
    regexLength:
    ^(\p{Upper}(\p{Lower}+\s?)){2,3}\w+ .*, \w+\d{5}(-\d{4})?$

    ----------------------------------------------------------
    Table 1-23. The Pattern
    ^(\p{Upper}(\p{Lower}+\s?)){2,3}\w+ .*, \w+ \d{5}(-\d{4})?$

    Regex

    Description

    ^

    The beginning of a line followed by

    (

    A group consisting of

    \p{Upper}

    An uppercase character

    (

    Followed by a inner group consisting of

    \p{Lower}

    A lowercase character

    +

    Repeated one or more times

    \s?

    Followed by an optional space

    )

    The end of the inner group

    )

    The end of the outer group

    {

    Repeated at least

    2

    Two times

    ,

    But no more than

    3

    Three times

    Table 1-23. The Pattern
    ^(\p{Upper}(\p{Lower}+\s?)){2,3}\w+ .*, \w+ \d{5}(-\d{4})?$ (Continued)  
     

    Regex

    Description

    <space>

    Followed by a space

    \w

    Followed by a any alphanumeric character

    +

    Repeated one or more times

    <space>

    Followed by a space

    .

    Followed by any character

    *

    Repeated any number of times

    ,

    Followed by a comma

    <space>

    Followed by a space

    \w

    Followed by any alphanumeric character

    +

    Repeated one or more times

    <space>

    Followed by a space

    \d

    Followed by a digit

    {

    Repeated at least

    5

    Five times

    }

    End repetition

    (

    Open group

    -

    Consisting of a hyphen

    \d

    A digit

    {

    Repeated at least

    4

    Four times

    }

    End repetition

    )

    The end of this group

    ?

    Look for zero or one of the preceding

    * In English: Look for a name token, as previously defined, followed by some words, a comma, and then more words, followed by a zip code. This example uses the composition technique.

    More Java Articles
    More By Apress Publishing


     

    Buy this book now. This article is excerpted from Java Regular Expressions: Taming the java.util.regex Engine, written by Mehran Habibi (Apress, 2004; ISBN: 1590591070). Check it out at your favorite bookstore. Buy this book now.

    JAVA ARTICLES

    - Deploying Multiple Java Applets as One
    - Deploying Java Applets
    - Understanding Deployment Frameworks
    - Database Programming in Java Using JDBC
    - Extension Interfaces and SAX
    - Entities, Handlers and SAX
    - Advanced SAX
    - Conversions and Java Print Streams
    - Formatters and Java Print Streams
    - Java Print Streams
    - Wildcards, Arrays, and Generics in Java
    - Wildcards and Generic Methods in Java
    - Finishing the Project: Java Web Development ...
    - Generics and Limitations in Java
    - Getting Started with Java Web Development in...







    © 2003-2008 by Developer Shed. All rights reserved. DS Cluster 3 hosted by Hostway