Regular expressions are a mechanism for telling the Java Virtual Machine (JVM) how to find and manipulate text for you. Using regular expressions to do this is different from the traditional approach. This article compares the two approaches. It is excerpted from Java Regular Expressions: Taming the java.util.regex Engine, written by Mehran Habibi (Apress, 2004; ISBN: 1590591070).
Regular Expressions - Finding Duplicate Words Example (Page 8 of 11 )
I discussed the code in Listing 1-7 in the “Groups and Back References” section earlier. The point in reintroducing it here is to demonstrate how regular expressions actually interact with Java code.
As you read this example, notice that it uses a Pattern and Matcher, and not the String.matches(regex) method, as most of the examples in the previous sections have. Try to guess why this approach has been taken. For the answer, look in the “FAQs” section at the end of this chapter. Output 1-7 shows the result of running the program. The pattern is dissected in Table 1-24.
Listing 1-7. MatchDuplicateWords.java
import java.util.regex.*; import java.io.*; public class MatchDuplicateWords{ public static void main(String args[]){ hasDuplicate(args[0]); } /** * Confirms that given phrase avoids duplicate words. * @param phrase is a String representing the phrase. * @returns true if the phrase avoids duplicate * words. */ public static boolean hasDuplicate(String phrase){ boolean retval=false; String duplicatePattern = "\\b(\\w+) \\1\\b"; // Compile the pattern Pattern p = null; try{ p = Pattern.compile(duplicatePattern); } catch (PatternSyntaxException pex){ pex.printStackTrace(); System.exit(0); } //count the number of matches. int matches = 0; //get the matcher Matcher m = p.matcher(phrase); String val=null; //find all matching Strings while (m.find()){ retval = true; val = ":" + m.group() +":"; System.out.println(val); matches++; } //prepare a message indicating success or failure String msg = " NO MATCH: pattern:" + phrase + "\r\n regex: " + duplicatePattern; if (retval){ msg = " MATCH : pattern:" + phrase + "\r\n regex: " + duplicatePattern; } System.out.println(msg +"\r\n"); return retval; } }
Output 1-7. Result of Running MatchDuplicateWords.java
C:\RegEx\Examples\chapter1>java MatchDuplicateWords "The mayor of of simpleton" :of of: MATCH : pattern:The mayor of of simpleton regex: \b(\w+) \1\b
C:\RegEx\Examples\chapter1>java MatchDuplicateWords "Never Never Never Never Never" :Never Never: :Never Never: MATCH : pattern:Never Never Never Never Never regex: \b(\w+) \1\b
Followed by the exact group of characters captured previously
\b
Followed by a word boundary
*
In English: Look for a word boundary, followed by a group of alphanumeric characters, followed by a space, followed by the exact same group of alphanumeric characters found previously, followed by a word boundary. In short, look for duplicate words.