Regular Expressions - Confirming Name Formats Example
(Page 7 of 11 )
The code in Listing 1-5 determines if the given name meets the criterion of being well formatted. It looks for a first name token, an optional middle name token, and finally a last name token. For this example’s purposes, a name token consists of a capital letter followed by any number of lowercase letters.
This example is interesting because it takes advantage of Java’s robustness to a degree that the previous example didn’t. Specifically, you define what you mean when you say a “name token”:
String nameToken ="\\p{Upper}(\\p{Lower}+\\s?)";
Then you use that definition later:
String namePattern = "("+nameToken+"){2,3}";
| NOTE \p{Upper} and \p{Lower} are described shortly. They simply mean any uppercase character and any lowercase character, respectively. |
This helps to keep the regex pattern from becoming overwhelming, and it also helps to isolate errors. As the examples in this book grow more ambitious, you’ll start to see that coupling regular expressions with Java’s powerful language can offer benefits that would, at best, be terse using regular expressions alone. Listing 1-5 shows the program MatchNameFormats.java, Output 1-5 shows the result of running the program, and Table 1-22 dissects the pattern.
Listing 1-5. MatchNameFormats.java
import Java.util.regex.*;
import java.io.*;
public class MatchNameFormats{
public static void main(String args[]){
isNameValid(args[0]);
}
/**
* Confirms that the format for the given name is valid.
* @param name is a String representing the name.
* @returns true if the name format is acceptable.
*/
public static boolean isNameValid(String name){
boolean retval=false;
String nameToken ="\\p{Upper}(\\p{Lower}+\\s?)";
String namePattern = "("+nameToken+"){2,3}";
retval = name.matches(namePattern);
//prepare a message indicating success or failure
String msg = "NO MATCH: pattern:" + name
+ "\r\n regex :" + namePattern;
if (retval){
msg = "MATCH pattern:" + name
+ "\r\n regex :" + namePattern;
}
System.out.println(msg +"\r\n");
return retval;
}
}
Output 1-5. Result of Running MatchNameFormats.java
------------------------------------------------------------------
C:\RegEx\Examples\chapter1>java MatchNameFormats "John Smith"
MATCH pattern:John Smith
regex :(\p{Upper}(\p{Lower}+\s?)){2,3}
C:\RegEx\Examples\chapter1>java MatchNameFormats "John McGee"
MATCH pattern:John McGee
regex :(\p{Upper}(\p{Lower}+\s?)){2,3}
C:\RegEx\Examples\chapter1>java MatchNameFormats "John Willliam Smith"
MATCH pattern:John Willliam Smith
regex :(\p{Upper}(\p{Lower}+\s?)){2,3}
C:\RegEx\Examples\chapter1>java MatchNameFormats "John Q Smith"
NO MATCH: pattern:John Q Smith
regex :(\p{Upper}(\p{Lower}+\s?)){2,3}
C:\RegEx\Examples\chapter1>java MatchNameFormats "John allen Smith"
NO MATCH: pattern:John allen Smith
regex :(\p{Upper}(\p{Lower}+\s?)){2,3}
C:\RegEx\Examples\chapter1>java MatchNameFormats "John"
NO MATCH: pattern:John
regex :(\p{Upper}(\p{Lower}+\s?)){2,3}
Table 1-22. The Pattern
(\p{Upper}(\p{Lower}+\s?)){2,3} Regex | Description |
( | A group consisting of |
\p{Upper} | An uppercase character |
( | Followed by a inner group consisting of |
\p{Lower} | A lowercase character |
+ | Repeated one or more times |
\s? | Followed by an optional space |
) | The end of the inner group |
Table 1-22. The Pattern
(\p{Upper}(\p{Lower}+\s?)){2,3} (Continued) Regex | Description |
) | The end of the outer group |
{ | Repeated at least |
2 | Two times |
, | But no more than |
3 | Three times |
} | End repetition |
* In English: Look for two or three words beginning with a capital letter followed by any numb of lowercase letters. Each word could be followed by a single space. |
A couple of questions naturally arise from this example:
- Why did John Q Public fail? Because Q is not a name token, as you’ve defined name tokens (i.e., a capital letter followed by one or more lowercase letters).
- Why did John allen Smith fail? Because allen doesn’t start with a capital letter.
- Why did John fail? Although John is a valid name token, it isn’t repeated two or three name tokens. It’s simply one name token.
- Why did John McGee pass? McGee isn’t an uppercase letter followed by any number of lowercase letters.Try to puzzle this one out on your own. It’s answered in the “FAQs” section at the end of the chapter.
This example uses the composition technique mentioned at the beginning of this chapter. That is, it uses patterns previous defined to compose a new pattern. If you think about it, this is a very engineer-like thing to do: Build small blocks, then use those blocks to build more complicated pieces.
Confirming Addresses Example The code in Listing 1-6 simply determines if the given address meets the criterion of being well formatted. It takes advantage of the name and zip code patterns created earlier, and it adds its own address pattern. Output 1-6 shows the result of running the program. Table 1-23 dissects the pattern.
Listing 1-6. MatchAddress.java
import java.util.regex.*;
import java.io.*;
public class MatchAddress{
public static void main(String args[]){
isAddressValid(args[0]);
}
/**
* Confirms that the format for the given address is valid.
* @param addr is a String representing the address
* @returns true if the zip code format is acceptable.
*/
public static boolean isAddressValid(String addr){
boolean retval = false;
//use the name pattern created earlier.
String nameToken ="\\p{Upper}(\\p{Lower}+\\s?)";
String namePattern = "("+nameToken+"){2,3}";
//use the zip code pattern created earlier.
String zipCodePattern = \\d{5}(-\\d{4})?;
//construct an address pattern
String addressPattern = "^" + namePattern
+ "\\w+ .*, \\w+ " + zipCodePattern +"$";
retval= addr.matches(addressPattern);
//prepare a message indicating success or failure
String msg = "NO MATCH\npattern:\n" + addr
+ "\nregexLength:\n "
+ addressPattern;
if (retval){
msg = "MATCH\npattern:\n" + addr
+ "\nregexLength:\n "
+ addressPattern;
}
System.out.println(msg +"\r\n");
return retval;
}
}
Output 1-6. Result of Running MatchAddress.java
------------------------------------------------------------------
C:\RegEx\chapter_1\Examples\chapter1>
java MatchAddress "John Smith 888 Luck Street,
NY 64332"
MATCH
pattern:
John Smith 888 Luck Street, NY 64332
regexLength:
^(\p{Upper}(\p{Lower}+\s?)){2,3}\w+ .*, \w+\d{5}(-\d{4})?$
C:\RegEx\chapter_1\Examples\chapter1>
java MatchAddress "John A. Smith 888 Luck Stree
t, NY 64332-4453"
NO MATCH
pattern:
John A. Smith 888 Luck Street, NY 64332-4453
regexLength:
^(\p{Upper}(\p{Lower}+\s?)){2,3}\w+ .*, \w+\d{5}(-\d{4})?$
C:\RegEx\chapter_1\Examples\chapter1>
java MatchAddress "John Allen Smith 888 Luck Street, NY 64332-4453"
MATCH
pattern:
John Allen Smith 888 Luck Street, NY 64332-4453 regexLength:
^(\p{Upper}(\p{Lower}+\s?)){2,3}\w+ .*, \w+\d{5}(-\d{4})?$
C:\RegEx\chapter_1\Examples\chapter1>
java MatchAddress "888 Luck Street, NY 64332"
NO MATCH
pattern:
888 Luck Street, NY 64332
regexLength:
^(\p{Upper}(\p{Lower}+\s?)){2,3}\w+ .*, \w+\d{5}(-\d{4})?$
C:\RegEx\chapter_1\Examples\chapter1>
java MatchAddress "P.O. BOX 888 Luck Street, NY 64332-4453" NO MATCH
pattern:
P.O. BOX 888 Luck Street, NY 64332-4453
regexLength:
^(\p{Upper}(\p{Lower}+\s?)){2,3}\w+ .*, \w+\d{5}(-\d{4})?$
C:\RegEx\chapter_1\Examples\chapter1>
java MatchAddress "John Allen Smith 888 Luck st., NY"
NO MATCH
pattern:
John Allen Smith 888 Luck st., NY
regexLength:
^(\p{Upper}(\p{Lower}+\s?)){2,3}\w+ .*, \w+\d{5}(-\d{4})?$
----------------------------------------------------------
Table 1-23. The Pattern
^(\p{Upper}(\p{Lower}+\s?)){2,3}\w+ .*, \w+ \d{5}(-\d{4})?$
Regex | Description |
^ | The beginning of a line followed by |
( | A group consisting of |
\p{Upper} | An uppercase character |
( | Followed by a inner group consisting of |
\p{Lower} | A lowercase character |
+ | Repeated one or more times |
\s? | Followed by an optional space |
) | The end of the inner group |
) | The end of the outer group |
{ | Repeated at least |
2 | Two times |
, | But no more than |
3 | Three times |
Table 1-23. The Pattern
^(\p{Upper}(\p{Lower}+\s?)){2,3}\w+ .*, \w+ \d{5}(-\d{4})?$ (Continued)
Regex | Description |
<space> | Followed by a space |
\w | Followed by a any alphanumeric character |
+ | Repeated one or more times |
<space> | Followed by a space |
. | Followed by any character |
* | Repeated any number of times |
, | Followed by a comma |
<space> | Followed by a space |
\w | Followed by any alphanumeric character |
+ | Repeated one or more times |
<space> | Followed by a space |
\d | Followed by a digit |
{ | Repeated at least |
5 | Five times |
} | End repetition |
( | Open group |
- | Consisting of a hyphen |
\d | A digit |
{ | Repeated at least |
4 | Four times |
} | End repetition |
) | The end of this group |
? | Look for zero or one of the preceding |
* In English: Look for a name token, as previously defined, followed by some words, a comma, and then more words, followed by a zip code. This example uses the composition technique. |
Next: Finding Duplicate Words Example >>
More Java Articles
More By Apress Publishing
|
This article is excerpted from Java Regular Expressions: Taming the java.util.regex Engine, written by Mehran Habibi (Apress, 2004; ISBN: 1590591070). Check it out at your favorite bookstore. Buy this book now.
|
|