Home arrow Java arrow Page 8 - Regular Expressions
JAVA

Regular Expressions


Regular expressions are a mechanism for telling the Java Virtual Machine (JVM) how to find and manipulate text for you. Using regular expressions to do this is different from the traditional approach. This article compares the two approaches. It is excerpted from Java Regular Expressions: Taming the java.util.regex Engine, written by Mehran Habibi (Apress, 2004; ISBN: 1590591070).

Author Info:
By: Apress Publishing
Rating: 5 stars5 stars5 stars5 stars5 stars / 28
July 28, 2005
TABLE OF CONTENTS:
  1. · Regular Expressions
  2. · Creating Patterns
  3. · Common and Boundary Characters
  4. · Character Classes
  5. · Back References
  6. · Integrating Java with Regular Expressions
  7. · Confirming Name Formats Example
  8. · Finding Duplicate Words Example
  9. · Regular Expression Operations
  10. · Search and Replace
  11. · Comparing Regex and Perl

print this article
SEARCH DEVARTICLES

Regular Expressions - Finding Duplicate Words Example
(Page 8 of 11 )

I discussed the code in Listing 1-7 in the “Groups and Back References” section earlier. The point in reintroducing it here is to demonstrate how regular expressions actually interact with Java code.

As you read this example, notice that it uses a Pattern and Matcher, and not the String.matches(regex) method, as most of the examples in the previous sections have. Try to guess why this approach has been taken. For the answer, look in the “FAQs” section at the end of this chapter. Output 1-7 shows the result of running the program. The pattern is dissected in Table 1-24.

Listing 1-7. MatchDuplicateWords.java

import java.util.regex.*;
import java.io.*;
public class MatchDuplicateWords{
  public static void main(String args[]){
    hasDuplicate(args[0]);
  }
 
/**
 
* Confirms that given phrase avoids duplicate words.
 
* @param phrase is a String representing the phrase.
  * @returns true if the phrase avoids duplicate
 
* words.
 
*/ 
 
 public static boolean hasDuplicate(String phrase){
    boolean retval=false;
   
String duplicatePattern =
    "\\b(\\w+) \\1\\b";
    // Compile the pattern Pattern
    p = null;
    try{
     
p = Pattern.compile(duplicatePattern);
    }
    catch (PatternSyntaxException pex){
     
pex.printStackTrace();
     
System.exit(0);
    }
    //count the number of matches.
    int matches = 0;
   
//get the matcher
    Matcher m = p.matcher(phrase);
    String val=null;
   
//find all matching Strings
    while (m.find()){
     
retval = true;
     val = ":" + m.group() +":";
     System.out.println(val);
     matches++;
   
}
   
//prepare a message indicating success or failure 
    String msg = "   NO MATCH: pattern:" + phrase
          
+ "\r\n             regex: " 
           
+ duplicatePattern;
   
if (retval){
    msg = "  MATCH    : pattern:" + phrase
       
+ "\r\n             regex: "
       
+ duplicatePattern;
    }
   
System.out.println(msg +"\r\n");
    return retval;
  }
}

Output 1-7. Result of Running MatchDuplicateWords.java

------------------------------------------------------------------
C:\RegEx\Examples\chapter1>java MatchDuplicateWords "pizza pizza"
:pizza pizza:
  MATCH    : pattern:pizza pizza
             regex: \b(\w+) \1\b

C:\RegEx\Examples\chapter1>java MatchDuplicateWords "Faster pussycat kill kill"
:kill kill:
  MATCH    : pattern:Faster pussycat kill kill
             regex: \b(\w+) \1\b

C:\RegEx\Examples\chapter1>java MatchDuplicateWords "The mayor of of simpleton"
:of of:
  MATCH    : pattern:The mayor of of simpleton
             regex: \b(\w+) \1\b

C:\RegEx\Examples\chapter1>java MatchDuplicateWords "Never Never Never Never Never"
:Never Never:
:Never Never:
 
MATCH    : pattern:Never Never Never Never Never
             regex: \b(\w+) \1\b

C:\RegEx\Examples\chapter1>java MatchDuplicateWords "222 2222"
  NO MATCH: pattern:222 2222
            regex: \b(\w+) \1\b

C:\RegEx\Examples\chapter1>java MatchDuplicateWords "sara sarah"
  NO MATCH: pattern:sara sarah
            regex: \b(\w+) \1\b

C:\RegEx\Examples\chapter1>java MatchDuplicateWords "Faster pussycat kill, kill"
  NO MATCH: pattern:Faster pussycat kill, kill
            regex: \b(\w+) \1\b

C:\RegEx\Examples\chapter1>java MatchDuplicateWords ". ." 
  NO MATCH: pattern:. .
            regex: \b(\w+) \1\b

Table 1-24. The Pattern \b(\w+) \1\b

Regex

Description

\b

A word boundary

(

Followed by a group consisting of

\w

An alphanumeric or underscore character

+

Repeated one or more times

)

Close group

<space>

Followed by a space

\1

Followed by the exact group of characters captured previously

\b

Followed by a word boundary

* In English: Look for a word boundary, followed by a group of alphanumeric characters, followed by a space, followed by the exact same group of alphanumeric characters found previously, followed by a word boundary. In short, look for duplicate words.


blog comments powered by Disqus
JAVA ARTICLES

- Java Too Insecure, Says Microsoft Researcher
- Google Beats Oracle in Java Ruling
- Deploying Multiple Java Applets as One
- Deploying Java Applets
- Understanding Deployment Frameworks
- Database Programming in Java Using JDBC
- Extension Interfaces and SAX
- Entities, Handlers and SAX
- Advanced SAX
- Conversions and Java Print Streams
- Formatters and Java Print Streams
- Java Print Streams
- Wildcards, Arrays, and Generics in Java
- Wildcards and Generic Methods in Java
- Finishing the Project: Java Web Development ...

Watch our Tech Videos 
Dev Articles Forums 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us 
Weekly Newsletter
 
Developer Updates  
Free Website Content 
Contact Us 
Site Map 
Privacy Policy 
Support 

Developer Shed Affiliates

 




© 2003-2017 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap
Popular Web Development Topics
All Web Development Tutorials