Regular expressions are a mechanism for telling the Java Virtual Machine (JVM) how to find and manipulate text for you. Using regular expressions to do this is different from the traditional approach. This article compares the two approaches. It is excerpted from Java Regular Expressions: Taming the java.util.regex Engine, written by Mehran Habibi (Apress, 2004; ISBN: 1590591070).
Regular Expressions - Integrating Java with Regular Expressions (Page 6 of 11 )
Thus far, you’ve worked almost exclusively with regular expressions, but not really with Java. Now it’s time to consider how the two interact. The following examples differ from the preceding ones in that they incorporate Java code with regular expressions. They offer a more complete picture of how you can use some J2SE regex syntax.
Some of the regular expressions you’ll see here are slightly more advanced than in the examples you’ve seen previously, as they build on the fundamentals discussed thus far in the chapter. For example, Listing 1-2 combines groups with quantifiers.
Don’t be discouraged if the patterns themselves aren’t completely clear to you right now. An intuitive understanding will develop as you continue to read this book. Focus on the concepts and become comfortable with how the Java code and the regex complement each other.
There are only two pieces of information you need to take full advantage of the following examples:
Any \-delimited regex expression metacharacter needs to be delimited once again when it’s used in Java code. Thus, \dbecomes \\d and \s becomes \\s in your Java code. Correspondingly, a more complex expression such as (\d-)?(\d{3}-)?\d{3}-\d{4}\s becomes (\\d-)?(\\d{3}-)?\\d{3}-\\d{4}\\s in Java code. All \ characters are doubled to produce \\ when they’re used in a String object.
In this book, when I talk about a regular expression in and of itself, I don’t use the double delimiting mechanism. However, I do when working with specific coding examples.
The String.matches(String regex) method is a new method that has been added to the String class. It compares the String it’s called on to the given regular expression, regex, and returns true if the regex pattern matches the String exactly. To match exactly means that the String in question can’t contain any characters—not even invisible characters such as newlines and spaces—that aren’t accounted for in the regex pattern.
Confirming Phone Number Formats Example
The code in Listing 1-2 simply determines if the given phone number meets the criteria of being well formatted. It takes advantage of two metacharacters introduced in Table 1-6. Specifically it uses range,{n,m}, indicating that the previous character or class must be repeated at least n times and no more than m times. It also uses the ?character, indicating the previous character or class must be present zero or one time.
The pattern as a whole checks for seven digits preceded by optional country and area codes. Output 1-2 shows the result of running the program, and Table 1-19 dissects the pattern.
Listing 1-2. MatchPhoneNumber.java
import java.util.regex.*; public class MatchPhoneNumber{ public static void main(String args[]){ isPhoneValid(args[0]); } /** * Confirms that the format for the given phone number is valid. * @param phone is a String representing the phone number. * @returns true if the phone number format is acceptable. */ public static boolean isPhoneValid(String phone){ boolean retval=false; String phoneNumberPattern = "(\\d-)?(\\d{3}-)?\\d{3}-\\d{4}"; retval= phone.matches(phoneNumberPattern); //prepare a message indicating success or failure String msg = " NO MATCH: pattern:" + phone + "\r\n regex: " + phoneNumberPattern; if (retval){ msg = " MATCH : pattern:" + phone + "\r\n regex: " + phoneNumberPattern; } System.out.println(msg +"\r\n"); return retval; } }
Output 1-2.Result of Running MatchPhoneNumber.java
------------------------------------------------------------------ C:\RegEx\Examples\chapter1>java MatchPhoneNumber "1-999-111-2222" MATCH : pattern:1-999-111-2222 regex: (\d-)?(\d{3}-)?\d{3}-\d{4} C:\RegEx\Examples\chapter1>java MatchPhoneNumber "999-111-2222" MATCH : pattern:999-111-2222 regex: (\d-)?(\d{3}-)?\d{3}-\d{4} C:\RegEx\Examples\chapter1>java MatchPhoneNumber "1-111-2222" MATCH : pattern:1-111-2222 regex: (\d-)?(\d{3}-)?\d{3}-\d{4} C:\RegEx\Examples\chapter1>java MatchPhoneNumber "111-2222" MATCH : pattern:111-2222 regex: (\d-)?(\d{3}-)?\d{3}-\d{4} C:\RegEx\Examples\chapter1>java MatchPhoneNumber "1.999-111-2222" NO MATCH: pattern:1.999-111-2222 regex: (\d-)?(\d{3}-)?\d{3}-\d{4} C:\RegEx\Examples\chapter1>java MatchPhoneNumber "999 111-2222" NO MATCH: pattern:999 111-2222 regex: (\d-)?(\d{3}-)?\d{3}-\d{4} C:\RegEx\Examples\chapter1>java MatchPhoneNumber "1 111 2222" NO MATCH: pattern:1 111 2222 regex: (\d-)?(\d{3}-)?\d{3}-\d{4} C:\RegEx\Examples\chapter1>java MatchPhoneNumber "111-JAVA" NO MATCH: pattern:111-JAVA regex: (\d-)?(\d{3}-)?\d{3}-\d{4}
In English:Look for a single digit followed by a hyphen. This is optional. Then, look for three digits followed by a hyphen. This is also optional. Next, look for three digits, followed by a hyphen, followed by four digits.
Confirming Zip Codes Example
The code in Listing 1-3 determines if the zip code meets the criterion of being well formatted. It checks for five digits optionally followed by a hyphen and four digits. Output 1-3 shows the result of running the program. Table 1-20 dissects the pattern.
Listing 1-3. MatchZipCodes.java
import java.util.regex.*; import java.io.*; public class MatchZipCodes{ public static void main(String args[]){ isZipValid(args[0]); } /** * Confirms that the format for the given zip code is valid. * @param zip is a String representing the zip code. * @returns true if the zip code format is acceptable. */ public static boolean isZipValid(String zip){ boolean retval=false; String zipCodePattern = \\d{5}(-\\d{4})?; retval = zip.matches(zipCodePattern); //prepare a message indicating success or failure String msg = " NO MATCH: pattern:" + zip + "\r\n regex: " + zipCodePattern; if (retval){ msg = " MATCH : pattern:" + zip + "\r\n regex: " + zipCodePattern; } System.out.println(msg +"\r\n"); return retval; } }
Output 1-3. Result of Running MatchZipCodes.java
------------------------------------------------------------------ C:\RegEx\Examples\chapter1>java MatchZipCodes "45643-4443" MATCH : pattern:45643-4443 regex: \d{5}(-\d{4})? C:\RegEx\Examples\chapter1>java MatchZipCodes "45643" MATCH : pattern:45643 regex: \d{5}(-\d{4})? C:\RegEx\Examples\chapter1>java MatchZipCodes "443" NO MATCH: pattern:443 regex: \d{5}(-\d{4})? C:\RegEx\Examples\chapter1>java MatchZipCodes "45643-44435" NO MATCH: pattern:45643-44435 regex: \d{5}(-\d{4})? C:\RegEx\Examples\chapter1>java MatchZipCodes "45643 44435" NO MATCH: pattern:45643 44435 regex: \d{5}(-\d{4})?
Table 1-20. The Pattern \d{5}(-\d{4})?
Regex
Description
\d
A digit
{
Repeated at least
5
Five times
}
End repetition
(
Open group
-
Consisting of a hyphen
\d
A digit
{
Repeated at least
4
Four times
}
End repetition
)
The end of this group
?
Look for zero or one of the preceding
*
In English: Look for five digits, optionally followed by a hyphen and four digits.
Confirming Dates Example
The code in Listing 1-4 checks the format of a given date. It confirms that given date format consists of one or two digits followed by a hyphen, followed by one or two digits, followed by a hyphen, followed by four digits. Output 1-4 shows the result of running the program. Table 1-21 dissects the pattern.
Listing 1-4. MatchDates.java
import java.util.regex.*; import java.io.*; public class MatchDates{ public static void main(String args[]){ isDateValid(args[0]); } /** * Confirms that given date format consists of one or two digits * followed by a hyphen, followed by one or two digits, followed * by a hyphen, followed by four digits * @param date is a String representing the date. * @returns true if date format is acceptable. */ public static boolean isDateValid(String date){ boolean retval=false; String datePattern = \\d{1,2}-\\d{1,2}-\\d{4}; retval = date.matches(datePattern); //prepare a message indicating success or failure String msg = " NO MATCH: pattern:" + date + "\r\n regexLength: " + datePattern; if (retval){ msg = " MATCH : pattern:" + date + "\r\n regexLength: " + datePattern; } System.out.println(msg +"\r\n"); return retval; } }
Output 1-4. Result of Running MatchDates.java
------------------------------------------------------------------ C:\RegEx\Examples\chapter1>java MatchDates "04-02-1999" MATCH : pattern:04-02-1999 regexLength: \d{1,2}-\d{1,2}-\d{4} C:\RegEx\Examples\chapter1>java MatchDates "15-42-1999" MATCH : pattern:15-42-1999 regexLength: \d{1,2}-\d{1,2}-\d{4} C:\RegEx\Examples\chapter1>java MatchDates "April fourth nineteen ninety nine" NO MATCH: pattern:April fourth nineteen ninety nine regexLength: \d{1,2}-\d{1,2}-\d{4} C:\RegEx\Examples\chapter1>java MatchDates "15-42-20002" NO MATCH: pattern:15-42-20002 regexLength: \d{1,2}-\d{1,2}-\d{4} C:\RegEx\Examples\chapter1>java MatchDates "02-02-20002" NO MATCH: pattern:02-02-20002 regexLength: \d{1,2}-\d{1,2}-\d{4} C:\RegEx\Examples\chapter1>java MatchDates "04-02-02" NO MATCH: pattern:04-02-02 regexLength: \d{1,2}-\d{1,2}-\d{4} C:\RegEx\Examples\chapter1>java MatchDates "04-02-garbage" NO MATCH: pattern:04-02-garbage regexLength: \d{1,2}-\d{1,2}-\d{4}