Regular Expressions - Finding Duplicate Words Example
(Page 8 of 11 )
I discussed the code in Listing 1-7 in the “Groups and Back References” section earlier. The point in reintroducing it here is to demonstrate how regular expressions actually interact with Java code.
As you read this example, notice that it uses a Pattern and Matcher, and not the String.matches(regex) method, as most of the examples in the previous sections have. Try to guess why this approach has been taken. For the answer, look in the “FAQs” section at the end of this chapter. Output 1-7 shows the result of running the program. The pattern is dissected in Table 1-24.
Listing 1-7. MatchDuplicateWords.java
import java.util.regex.*;
import java.io.*;
public class MatchDuplicateWords{
public static void main(String args[]){
hasDuplicate(args[0]);
}
/**
* Confirms that given phrase avoids duplicate words.
* @param phrase is a String representing the phrase.
* @returns true if the phrase avoids duplicate
* words.
*/
public static boolean hasDuplicate(String phrase){
boolean retval=false;
String duplicatePattern =
"\\b(\\w+) \\1\\b";
// Compile the pattern Pattern
p = null;
try{
p = Pattern.compile(duplicatePattern);
}
catch (PatternSyntaxException pex){
pex.printStackTrace();
System.exit(0);
}
//count the number of matches.
int matches = 0;
//get the matcher
Matcher m = p.matcher(phrase);
String val=null;
//find all matching Strings
while (m.find()){
retval = true;
val = ":" + m.group() +":";
System.out.println(val);
matches++;
}
//prepare a message indicating success or failure
String msg = " NO MATCH: pattern:" + phrase
+ "\r\n regex: "
+ duplicatePattern;
if (retval){
msg = " MATCH : pattern:" + phrase
+ "\r\n regex: "
+ duplicatePattern;
}
System.out.println(msg +"\r\n");
return retval;
}
}
Output 1-7. Result of Running MatchDuplicateWords.java
------------------------------------------------------------------
C:\RegEx\Examples\chapter1>java MatchDuplicateWords "pizza pizza"
:pizza pizza:
MATCH : pattern:pizza pizza
regex: \b(\w+) \1\b
C:\RegEx\Examples\chapter1>java MatchDuplicateWords "Faster pussycat kill kill"
:kill kill:
MATCH : pattern:Faster pussycat kill kill
regex: \b(\w+) \1\b
C:\RegEx\Examples\chapter1>java MatchDuplicateWords "The mayor of of simpleton"
:of of:
MATCH : pattern:The mayor of of simpleton
regex: \b(\w+) \1\b
C:\RegEx\Examples\chapter1>java MatchDuplicateWords "Never Never Never Never Never"
:Never Never:
:Never Never:
MATCH : pattern:Never Never Never Never Never
regex: \b(\w+) \1\b
C:\RegEx\Examples\chapter1>java MatchDuplicateWords "222 2222"
NO MATCH: pattern:222 2222
regex: \b(\w+) \1\b
C:\RegEx\Examples\chapter1>java MatchDuplicateWords "sara sarah"
NO MATCH: pattern:sara sarah
regex: \b(\w+) \1\b
C:\RegEx\Examples\chapter1>java MatchDuplicateWords "Faster pussycat kill, kill"
NO MATCH: pattern:Faster pussycat kill, kill
regex: \b(\w+) \1\b
C:\RegEx\Examples\chapter1>java MatchDuplicateWords ". ."
NO MATCH: pattern:. .
regex: \b(\w+) \1\b
Table 1-24. The Pattern \b(\w+) \1\b
Regex | Description |
\b | A word boundary |
( | Followed by a group consisting of |
\w | An alphanumeric or underscore character |
+ | Repeated one or more times |
) | Close group |
<space> | Followed by a space |
\1 | Followed by the exact group of characters captured previously |
\b | Followed by a word boundary |
* In English: Look for a word boundary, followed by a group of alphanumeric characters, followed by a space, followed by the exact same group of alphanumeric characters found previously, followed by a word boundary. In short, look for duplicate words. |
Next: Regular Expression Operations >>
More Java Articles
More By Apress Publishing
|
This article is excerpted from Java Regular Expressions: Taming the java.util.regex Engine, written by Mehran Habibi (Apress, 2004; ISBN: 1590591070). Check it out at your favorite bookstore. Buy this book now.
|
|