Home arrow Java arrow Page 7 - Regular Expressions
JAVA

Regular Expressions


Regular expressions are a mechanism for telling the Java Virtual Machine (JVM) how to find and manipulate text for you. Using regular expressions to do this is different from the traditional approach. This article compares the two approaches. It is excerpted from Java Regular Expressions: Taming the java.util.regex Engine, written by Mehran Habibi (Apress, 2004; ISBN: 1590591070).

Author Info:
By: Apress Publishing
Rating: 5 stars5 stars5 stars5 stars5 stars / 28
July 28, 2005
TABLE OF CONTENTS:
  1. · Regular Expressions
  2. · Creating Patterns
  3. · Common and Boundary Characters
  4. · Character Classes
  5. · Back References
  6. · Integrating Java with Regular Expressions
  7. · Confirming Name Formats Example
  8. · Finding Duplicate Words Example
  9. · Regular Expression Operations
  10. · Search and Replace
  11. · Comparing Regex and Perl

print this article
SEARCH DEVARTICLES

Regular Expressions - Confirming Name Formats Example
(Page 7 of 11 )

The code in Listing 1-5 determines if the given name meets the criterion of being well formatted. It looks for a first name token, an optional middle name token, and finally a last name token. For this example’s purposes, a name token consists of a capital letter followed by any number of lowercase letters.

This example is interesting because it takes advantage of Java’s robustness to a degree that the previous example didn’t. Specifically, you define what you mean when you say a “name token”:

String nameToken ="\\p{Upper}(\\p{Lower}+\\s?)";

Then you use that definition later:

String namePattern = "("+nameToken+"){2,3}";

NOTE   \p{Upper} and \p{Lower} are described shortly. They simply mean any uppercase character and any lowercase character, respectively.

This helps to keep the regex pattern from becoming overwhelming, and it also helps to isolate errors. As the examples in this book grow more ambitious, you’ll start to see that coupling regular expressions with Java’s powerful language can offer benefits that would, at best, be terse using regular expressions alone. Listing 1-5 shows the program MatchNameFormats.java, Output 1-5 shows the result of running the program, and Table 1-22 dissects the pattern.

Listing 1-5. MatchNameFormats.java

import Java.util.regex.*;
import java.io.*;
public class MatchNameFormats{
  public static void main(String args[]){
   
isNameValid(args[0]);
  }
 
/**
 
* Confirms that the format for the given name is valid.
  * @param name is a String representing the name.
 
* @returns true if the name format is acceptable.
  */
  public static boolean isNameValid(String name){
   
boolean retval=false;
   
String nameToken ="\\p{Upper}(\\p{Lower}+\\s?)";
   
String namePattern = "("+nameToken+"){2,3}";
   
retval = name.matches(namePattern);
   
//prepare a message indicating success or failure
    String msg = "NO MATCH: pattern:" + name
         
+ "\r\n           regex :" + namePattern;
   
if (retval){
    msg = "MATCH     pattern:"  + name
        
+ "\r\n           regex :" + namePattern;
    }
   
System.out.println(msg +"\r\n");
    return retval;
    }
}

Output 1-5. Result of Running MatchNameFormats.java

------------------------------------------------------------------
C:\RegEx\Examples\chapter1>java MatchNameFormats "John Smith"
MATCH    pattern:John Smith
         
regex :(\p{Upper}(\p{Lower}+\s?)){2,3}
C:\RegEx\Examples\chapter1>java MatchNameFormats "John McGee"
MATCH    pattern:John McGee
         
regex :(\p{Upper}(\p{Lower}+\s?)){2,3}
C:\RegEx\Examples\chapter1>java MatchNameFormats "John Willliam Smith"
MATCH    pattern:John Willliam Smith
         
regex :(\p{Upper}(\p{Lower}+\s?)){2,3}
C:\RegEx\Examples\chapter1>java MatchNameFormats "John Q Smith"
NO MATCH: pattern:John Q Smith
          
regex :(\p{Upper}(\p{Lower}+\s?)){2,3}
C:\RegEx\Examples\chapter1>java MatchNameFormats "John allen Smith"
NO MATCH: pattern:John allen Smith
          
regex :(\p{Upper}(\p{Lower}+\s?)){2,3}
C:\RegEx\Examples\chapter1>java MatchNameFormats "John"
NO MATCH: pattern:John
          
regex :(\p{Upper}(\p{Lower}+\s?)){2,3}

Table 1-22. The Pattern (\p{Upper}(\p{Lower}+\s?)){2,3}  

Regex

Description

(

A group consisting of

\p{Upper}

An uppercase character

(

Followed by a inner group consisting of

\p{Lower}

A lowercase character

+

Repeated one or more times

\s?

Followed by an optional space

)

The end of the inner group

Table 1-22. The Pattern (\p{Upper}(\p{Lower}+\s?)){2,3} (Continued)  

Regex

Description

)

The end of the outer group

{

Repeated at least

2

Two times

,

But no more than

3

Three times

}

End repetition

* In English: Look for two or three words beginning with a capital letter followed by any numb of lowercase letters. Each word could be followed by a single space.

A couple of questions naturally arise from this example:

  • Why did John Q Public fail? Because Q is not a name token, as you’ve defined name tokens (i.e., a capital letter followed by one or more lowercase letters).

  • Why did John allen Smith fail? Because allen doesn’t start with a capital letter.

  • Why did John fail? Although John is a valid name token, it isn’t repeated two or three name tokens. It’s simply one name token.

  • Why did John McGee pass? McGee isn’t an uppercase letter followed by any number of lowercase letters.Try to puzzle this one out on your own. It’s answered in the “FAQs” section at the end of the chapter.

This example uses the composition technique mentioned at the beginning of this chapter. That is, it uses patterns previous defined to compose a new pattern. If you think about it, this is a very engineer-like thing to do: Build small blocks, then use those blocks to build more complicated pieces.

Confirming Addresses Example

The code in Listing 1-6 simply determines if the given address meets the criterion of being well formatted. It takes advantage of the name and zip code patterns created earlier, and it adds its own address pattern. Output 1-6 shows the result of running the program. Table 1-23 dissects the pattern.

Listing 1-6. MatchAddress.java

import java.util.regex.*;
import java.io.*;
public class MatchAddress{
  public static void main(String args[]){
      isAddressValid(args[0]);
  }
 
/**
 
* Confirms that the format for the given address is valid.
 
* @param addr is a String representing the address
 
* @returns true if the zip code format is acceptable.
  */
  public static boolean isAddressValid(String addr){
    boolean retval = false;
    //use the name pattern created earlier.
    String nameToken ="\\p{Upper}(\\p{Lower}+\\s?)";
   
String namePattern = "("+nameToken+"){2,3}";
   
//use the zip code pattern created earlier.
    String zipCodePattern = \\d{5}(-\\d{4})?;
    //construct an address pattern
    String addressPattern = "^" + namePattern
     
+ "\\w+ .*, \\w+ " + zipCodePattern +"$";
   
retval= addr.matches(addressPattern);
   
//prepare a message indicating success or failure
    String msg = "NO MATCH\npattern:\n" + addr
     
+ "\nregexLength:\n "
     
+ addressPattern;
   
if (retval){
    msg = "MATCH\npattern:\n" + addr
     
+ "\nregexLength:\n "
     
+ addressPattern;
    } 
    
System.out.println(msg +"\r\n");
    return retval;
  }
}

Output 1-6. Result of Running MatchAddress.java

------------------------------------------------------------------
C:\RegEx\chapter_1\Examples\chapter1>
java MatchAddress "John Smith 888 Luck Street,
NY 64332"
MATCH
pattern:
 John Smith 888 Luck Street, NY 64332
regexLength:
 ^(\p{Upper}(\p{Lower}+\s?)){2,3}\w+ .*, \w+\d{5}(-\d{4})?$

C:\RegEx\chapter_1\Examples\chapter1>
java MatchAddress "John A. Smith 888 Luck Stree
t, NY 64332-4453"
NO MATCH
pattern:
 John A. Smith 888 Luck Street, NY 64332-4453
regexLength:
 ^(\p{Upper}(\p{Lower}+\s?)){2,3}\w+ .*, \w+\d{5}(-\d{4})?$

C:\RegEx\chapter_1\Examples\chapter1>
java MatchAddress "John Allen Smith 888 Luck Street, NY 64332-4453"
MATCH
pattern:
 John Allen Smith 888 Luck Street, NY 64332-4453 regexLength:
 ^(\p{Upper}(\p{Lower}+\s?)){2,3}\w+ .*, \w+\d{5}(-\d{4})?$

C:\RegEx\chapter_1\Examples\chapter1>
java MatchAddress "888 Luck Street, NY 64332"
NO MATCH
pattern:
 888 Luck Street, NY 64332
regexLength:
 ^(\p{Upper}(\p{Lower}+\s?)){2,3}\w+ .*, \w+\d{5}(-\d{4})?$

C:\RegEx\chapter_1\Examples\chapter1>
java MatchAddress "P.O. BOX 888 Luck Street, NY 64332-4453" NO MATCH
pattern:
 
P.O. BOX 888 Luck Street, NY 64332-4453
regexLength:
 ^(\p{Upper}(\p{Lower}+\s?)){2,3}\w+ .*, \w+\d{5}(-\d{4})?$

C:\RegEx\chapter_1\Examples\chapter1>
java MatchAddress "John Allen Smith 888 Luck st., NY"
NO MATCH
pattern:
 John Allen Smith 888 Luck st., NY
regexLength:
^(\p{Upper}(\p{Lower}+\s?)){2,3}\w+ .*, \w+\d{5}(-\d{4})?$

----------------------------------------------------------
Table 1-23. The Pattern
^(\p{Upper}(\p{Lower}+\s?)){2,3}\w+ .*, \w+ \d{5}(-\d{4})?$

Regex

Description

^

The beginning of a line followed by

(

A group consisting of

\p{Upper}

An uppercase character

(

Followed by a inner group consisting of

\p{Lower}

A lowercase character

+

Repeated one or more times

\s?

Followed by an optional space

)

The end of the inner group

)

The end of the outer group

{

Repeated at least

2

Two times

,

But no more than

3

Three times

Table 1-23. The Pattern
^(\p{Upper}(\p{Lower}+\s?)){2,3}\w+ .*, \w+ \d{5}(-\d{4})?$ (Continued)  
 

Regex

Description

<space>

Followed by a space

\w

Followed by a any alphanumeric character

+

Repeated one or more times

<space>

Followed by a space

.

Followed by any character

*

Repeated any number of times

,

Followed by a comma

<space>

Followed by a space

\w

Followed by any alphanumeric character

+

Repeated one or more times

<space>

Followed by a space

\d

Followed by a digit

{

Repeated at least

5

Five times

}

End repetition

(

Open group

-

Consisting of a hyphen

\d

A digit

{

Repeated at least

4

Four times

}

End repetition

)

The end of this group

?

Look for zero or one of the preceding

* In English: Look for a name token, as previously defined, followed by some words, a comma, and then more words, followed by a zip code. This example uses the composition technique.


blog comments powered by Disqus
JAVA ARTICLES

- Java Too Insecure, Says Microsoft Researcher
- Google Beats Oracle in Java Ruling
- Deploying Multiple Java Applets as One
- Deploying Java Applets
- Understanding Deployment Frameworks
- Database Programming in Java Using JDBC
- Extension Interfaces and SAX
- Entities, Handlers and SAX
- Advanced SAX
- Conversions and Java Print Streams
- Formatters and Java Print Streams
- Java Print Streams
- Wildcards, Arrays, and Generics in Java
- Wildcards and Generic Methods in Java
- Finishing the Project: Java Web Development ...

Watch our Tech Videos 
Dev Articles Forums 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us 
Weekly Newsletter
 
Developer Updates  
Free Website Content 
Contact Us 
Site Map 
Privacy Policy 
Support 

Developer Shed Affiliates

 




© 2003-2017 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap
Popular Web Development Topics
All Web Development Tutorials