Regular Expressions - Integrating Java with Regular Expressions
(Page 6 of 11 )
Thus far, you’ve worked almost exclusively with regular expressions, but not really with Java. Now it’s time to consider how the two interact. The following examples differ from the preceding ones in that they incorporate Java code with regular expressions. They offer a more complete picture of how you can use some J2SE regex syntax.
Some of the regular expressions you’ll see here are slightly more advanced than in the examples you’ve seen previously, as they build on the fundamentals discussed thus far in the chapter. For example, Listing 1-2 combines groups with quantifiers.
Don’t be discouraged if the patterns themselves aren’t completely clear to you right now. An intuitive understanding will develop as you continue to read this book. Focus on the concepts and become comfortable with how the Java code and the regex complement each other.
There are only two pieces of information you need to take full advantage of the following examples:
- Any \-delimited regex expression metacharacter needs to be delimited once again when it’s used in Java code. Thus, \d becomes \\d and \s becomes \\s in your Java code. Correspondingly, a more complex expression such as (\d-)?(\d{3}-)?\d{3}-\d{4}\s becomes (\\d-)?(\\d{3}-)?\\d{3}-\\d{4}\\s in Java code. All \ characters are doubled to produce \\ when they’re used in a String object.
- In this book, when I talk about a regular expression in and of itself, I don’t use the double delimiting mechanism. However, I do when working with specific coding examples.
- The String.matches(String regex) method is a new method that has been added to the String class. It compares the String it’s called on to the given regular expression, regex, and returns true if the regex pattern matches the String exactly. To match exactly means that the String in question can’t contain any characters—not even invisible characters such as newlines and spaces—that aren’t accounted for in the regex pattern.
Confirming Phone Number Formats Example The code in Listing 1-2 simply determines if the given phone number meets the criteria of being well formatted. It takes advantage of two metacharacters introduced in Table 1-6. Specifically it uses range,{n,m}, indicating that the previous character or class must be repeated at least n times and no more than m times. It also uses the ?character, indicating the previous character or class must be present zero or one time.
The pattern as a whole checks for seven digits preceded by optional country and area codes. Output 1-2 shows the result of running the program, and Table 1-19 dissects the pattern.
Listing 1-2. MatchPhoneNumber.java
import java.util.regex.*;
public class MatchPhoneNumber{
public static void main(String args[]){
isPhoneValid(args[0]);
}
/**
* Confirms that the format for the given phone number is valid.
* @param phone is a String representing the phone number.
* @returns true if the phone number format is acceptable.
*/
public static boolean isPhoneValid(String phone){
boolean retval=false;
String phoneNumberPattern =
"(\\d-)?(\\d{3}-)?\\d{3}-\\d{4}";
retval= phone.matches(phoneNumberPattern);
//prepare a message indicating success or failure
String msg = " NO MATCH: pattern:" + phone
+ "\r\n regex: " + phoneNumberPattern;
if (retval){
msg = " MATCH : pattern:" + phone
+ "\r\n regex: " + phoneNumberPattern;
}
System.out.println(msg +"\r\n");
return retval;
}
}
Output 1-2. Result of Running MatchPhoneNumber.java
------------------------------------------------------------------
C:\RegEx\Examples\chapter1>java MatchPhoneNumber "1-999-111-2222"
MATCH : pattern:1-999-111-2222
regex: (\d-)?(\d{3}-)?\d{3}-\d{4}
C:\RegEx\Examples\chapter1>java MatchPhoneNumber "999-111-2222"
MATCH : pattern:999-111-2222
regex: (\d-)?(\d{3}-)?\d{3}-\d{4}
C:\RegEx\Examples\chapter1>java MatchPhoneNumber "1-111-2222"
MATCH : pattern:1-111-2222
regex: (\d-)?(\d{3}-)?\d{3}-\d{4}
C:\RegEx\Examples\chapter1>java MatchPhoneNumber "111-2222"
MATCH : pattern:111-2222
regex: (\d-)?(\d{3}-)?\d{3}-\d{4}
C:\RegEx\Examples\chapter1>java MatchPhoneNumber "1.999-111-2222"
NO MATCH: pattern:1.999-111-2222
regex: (\d-)?(\d{3}-)?\d{3}-\d{4}
C:\RegEx\Examples\chapter1>java MatchPhoneNumber "999 111-2222"
NO MATCH: pattern:999 111-2222
regex: (\d-)?(\d{3}-)?\d{3}-\d{4}
C:\RegEx\Examples\chapter1>java MatchPhoneNumber "1 111 2222"
NO MATCH: pattern:1 111 2222
regex: (\d-)?(\d{3}-)?\d{3}-\d{4}
C:\RegEx\Examples\chapter1>java MatchPhoneNumber "111-JAVA"
NO MATCH: pattern:111-JAVA
regex: (\d-)?(\d{3}-)?\d{3}-\d{4}
----------------------------------------------------------
Table 1-19. The Pattern
(\d-)?(\d{3}-)?\d{3}-\d{4} Regex | Description |
( | A group consisting of |
\d | A digit |
- | Followed by a hyphen (-) |
) | The end of this group |
? | Look for zero or one of the preceding |
( | Followed by a group consisting of |
\d | A digit |
{ | Repeated at least |
3 | Three times |
} | End repetition |
- | Followed by a hyphen |
) | The end of this group |
? | Look for zero or one of the preceding |
\d | Followed by a digit |
{ | Repeated at least |
Table 1-19. The Pattern
(\d-)?(\d{3}-)?\d{3}-\d{4} (Continued) Regex | Description |
3 | Three times |
} | End repetition |
- | Followed by a hyphen |
\d | Followed by a digit |
{ | Repeated at least |
4 | Four times |
} | End repetition |
* In English:Look for a single digit followed by a hyphen. This is optional. Then, look for three digits followed by a hyphen. This is also optional. Next, look for three digits, followed by a hyphen, followed by four digits. |
Confirming Zip Codes Example
The code in Listing 1-3 determines if the zip code meets the criterion of being well formatted. It checks for five digits optionally followed by a hyphen and four digits. Output 1-3 shows the result of running the program. Table 1-20 dissects the pattern.
Listing 1-3. MatchZipCodes.java
import java.util.regex.*;
import java.io.*;
public class MatchZipCodes{
public static void main(String args[]){
isZipValid(args[0]);
}
/**
* Confirms that the format for the given zip code is valid.
* @param zip is a String representing the zip code.
* @returns true if the zip code format is acceptable.
*/
public static boolean isZipValid(String zip){
boolean retval=false;
String zipCodePattern = \\d{5}(-\\d{4})?;
retval = zip.matches(zipCodePattern);
//prepare a message indicating success or failure
String msg = " NO MATCH: pattern:" + zip
+ "\r\n regex: " + zipCodePattern;
if (retval){
msg = " MATCH : pattern:" + zip
+ "\r\n regex: " + zipCodePattern;
}
System.out.println(msg +"\r\n");
return retval;
}
}
Output 1-3. Result of Running MatchZipCodes.java
------------------------------------------------------------------
C:\RegEx\Examples\chapter1>java MatchZipCodes "45643-4443"
MATCH : pattern:45643-4443
regex: \d{5}(-\d{4})?
C:\RegEx\Examples\chapter1>java MatchZipCodes "45643"
MATCH : pattern:45643
regex: \d{5}(-\d{4})?
C:\RegEx\Examples\chapter1>java MatchZipCodes "443"
NO MATCH: pattern:443
regex: \d{5}(-\d{4})?
C:\RegEx\Examples\chapter1>java MatchZipCodes "45643-44435"
NO MATCH: pattern:45643-44435
regex: \d{5}(-\d{4})?
C:\RegEx\Examples\chapter1>java MatchZipCodes "45643 44435"
NO MATCH: pattern:45643 44435
regex: \d{5}(-\d{4})?
Table 1-20. The Pattern \d{5}(-\d{4})?
Regex | Description |
\d | A digit |
{ | Repeated at least |
5 | Five times |
} | End repetition |
( | Open group |
- | Consisting of a hyphen |
\d | A digit |
{ | Repeated at least |
4 | Four times |
} | End repetition |
) | The end of this group |
? | Look for zero or one of the preceding |
* In English: Look for five digits, optionally followed by a hyphen and four digits. |
Confirming Dates Example The code in Listing 1-4 checks the format of a given date. It confirms that given date format consists of one or two digits followed by a hyphen, followed by one or two digits, followed by a hyphen, followed by four digits. Output 1-4 shows the result of running the program. Table 1-21 dissects the pattern.
Listing 1-4. MatchDates.java
import java.util.regex.*;
import java.io.*;
public class MatchDates{
public static void main(String args[]){
isDateValid(args[0]);
}
/**
* Confirms that given date format consists of one or two digits
* followed by a hyphen, followed by one or two digits, followed
* by a hyphen, followed by four digits
* @param date is a String representing the date.
* @returns true if date format is acceptable.
*/
public static boolean isDateValid(String date){
boolean retval=false;
String datePattern = \\d{1,2}-\\d{1,2}-\\d{4};
retval = date.matches(datePattern);
//prepare a message indicating success or failure
String msg = " NO MATCH: pattern:" + date
+ "\r\n regexLength: " + datePattern;
if (retval){
msg = " MATCH : pattern:" + date
+ "\r\n regexLength: " + datePattern;
}
System.out.println(msg +"\r\n");
return retval;
}
}
Output 1-4. Result of Running MatchDates.java
------------------------------------------------------------------
C:\RegEx\Examples\chapter1>java MatchDates "04-02-1999"
MATCH : pattern:04-02-1999
regexLength: \d{1,2}-\d{1,2}-\d{4}
C:\RegEx\Examples\chapter1>java MatchDates "15-42-1999"
MATCH : pattern:15-42-1999
regexLength: \d{1,2}-\d{1,2}-\d{4}
C:\RegEx\Examples\chapter1>java MatchDates "April fourth nineteen ninety nine"
NO MATCH: pattern:April fourth nineteen ninety nine
regexLength: \d{1,2}-\d{1,2}-\d{4}
C:\RegEx\Examples\chapter1>java MatchDates "15-42-20002"
NO MATCH: pattern:15-42-20002
regexLength: \d{1,2}-\d{1,2}-\d{4}
C:\RegEx\Examples\chapter1>java MatchDates "02-02-20002"
NO MATCH: pattern:02-02-20002
regexLength: \d{1,2}-\d{1,2}-\d{4}
C:\RegEx\Examples\chapter1>java MatchDates "04-02-02"
NO MATCH: pattern:04-02-02
regexLength: \d{1,2}-\d{1,2}-\d{4}
C:\RegEx\Examples\chapter1>java MatchDates "04-02-garbage"
NO MATCH: pattern:04-02-garbage
regexLength: \d{1,2}-\d{1,2}-\d{4}
----------------------------------------------------------
Table 1-21. The Pattern
\d{1,2}-\d{1,2}-\d{4} Regex | Description |
\d | A digit |
{ | Repeated at least |
1 | One time |
, | But no more than |
2 | Two times |
} | Close repetition |
- | Followed by a hyphen |
\d | Followed by a digit |
{ | Repeated at least |
1 | One time |
, | But no more than |
2 | Two times |
} | Close repetition |
- | Followed by a hyphen |
Table 1-21. The Pattern
\d{1,2}-\d{1,2}-\d{4}(Continued)Regex | Description |
\d > | Followed by a digit |
{ | Repeated at least |
1 | Four times |
} | Close repetition |
* In English: Look for one or two digits, followed by a hyphen, followed by one or two digits, followed by a hyphen, followed by four digits. |
Next: Confirming Name Formats Example >>
More Java Articles
More By Apress Publishing
|
This article is excerpted from Java Regular Expressions: Taming the java.util.regex Engine, written by Mehran Habibi (Apress, 2004; ISBN: 1590591070). Check it out at your favorite bookstore. Buy this book now.
|
|