Java
  Home arrow Java arrow Page 9 - Regular Expressions
Dev Articles Forums 
ADO.NET  
Apache  
ASP  
ASP.NET  
C#  
C++  
ColdFusion  
COM/COM+  
Delphi-Kylix  
Design Usability  
Development Cycles  
DHTML  
Embedded Tools  
Flash  
Graphic Design  
HTML  
IIS  
Interviews  
Java  
JavaScript  
MySQL  
Oracle  
Photoshop  
PHP  
Reviews  
Ruby-on-Rails  
SQL  
SQL Server  
Style Sheets  
VB.Net  
Visual Basic  
Web Authoring  
Web Services  
Web Standards  
XML  
Dedicated Servers  
Download TestComplete 
IBM® developerWorks 
Weekly Newsletter
 
Developer Updates  
Free Website Content 
eWeek
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid 
Request Media Kit
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
JAVA

Regular Expressions
By: Apress Publishing
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 5 stars5 stars5 stars5 stars5 stars / 10
    2005-07-28

    Table of Contents:
  • Regular Expressions
  • Creating Patterns
  • Common and Boundary Characters
  • Character Classes
  • Back References
  • Integrating Java with Regular Expressions
  • Confirming Name Formats Example
  • Finding Duplicate Words Example
  • Regular Expression Operations
  • Search and Replace
  • Comparing Regex and Perl

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     
    Iron Speed
     
    ADVERTISEMENT

    Ajax Application Generator Generate database and reporting .NET Web apps in minutes. Quickly create visually stunning, feature-rich apps that are easy to customize and ready to deploy. Download Now!

    Regular Expressions - Regular Expression Operations
    (Page 9 of 11 )

    In this section, you’ll explore slightly more realistic uses of regular expressions. In the practical world, people use regular expressions for one of three basic broad categories:

    • Data validation: This is the process of making sure that your candidate String conforms to a specific format (e.g., making sure passwords are at least eight characters long and contain at least two digits).

    • Search and/or replace: This is another popular usage of regular expressions, and for good reason. Say you want to send a letter to all of your customers, and you want each letter to be personalized by interspersing the customer’s name throughout the letter. Of course, this is a little more complex than it sounds, because different names have different lengths, and you don’t want to overwrite the next word in your letter when you insert a longer name. Regex is a perfect solution for these types of problems.

    • Decomposing text: This can also be a challenging activity, particularly if the String in question needs to be split according to complex rules. Fortunately, doing so becomes much easier with regular expressions, as Listing 1-11 (which follows shortly) demonstrates.

    Data Validation

    Data validation, or making sure that data matches a prescribed format, is one of the most common uses for regular expressions. This can be particularly challenging because data often takes inexact forms and is defined by unspoken rules.

    J2SE 1.4 offers you several ways to validate data. The easiest is using the new method boolean String.matches(String regex). This method confirms that the pattern passed inexactly matches the String that it’s called on.

    This exactness can be tricky, so it’s important to understand it well. For example, say you need to confirm that a given String contains the word Java, followed by space, followed by some digit. Further, assume that your candidate String is I love Java 4. The next section demonstrates the process of working through this example.

    Data Validation with Strings Example

    This example seems simple enough, so you start out by testing the pattern Java \d. Table 1-25 shows a breakdown of the pattern.

    Table 1-25. The Pattern Java \d  

    Regex

    Description

    J

    A capital J

    a

    Followed the character a

    v

    Followed the character v

    a

    Followed the character a

    <space>

    Followed by a single space a

    \d

    Followed by digit

    That was pretty easy, so you confidently write your code, as shown in Listing 1-8.

    Listing 1-8. ValidationTest.java

    import java.util.regex.*;
    public class ValidationTest{
     
    public static void main(String args[]){
        String candidate = "I love Java 4";
        String pattern ="Java \\d";
        System.out.println(candidate.matches(pattern));
     
    }
    }

    Then you run it:

    java ValidationTest

    and you watch it fail in Output 1-8.

    Output 1-8. Result of Running ValidationTest.java

    ------------------------------------------------------------------
    C:\RegEx\code>java ValidationTest
    Does candidate : I love Java 4
    match pattern  : Java \d?

    false

    What happened? Because your input string is I love Java 4, and the Java 4 is preceded by I love, the input isn’t an exact match to the pattern Java \d. It’s a partial match. So what do you do now?

    You have two options. You could modify the pattern to allow for characters before and/or after the Java 4 you want to match on, or you could just use the Pattern and Matcher objects. Let’s explore the pros and cons of each option.

    To use the String.matcher(String regex) method, you need to account for any and all characters that might precede or follow the pattern Java \d. Thus, you use the pattern.*\bJava \d( |$), which Table 1-26 dissects.

    Table 1-26. The Pattern .*\bJava \d( |$)  

    Regex

    Description

    .

    Any character

    *

    Repeated any number of times

    \b

    Followed by a word boundary

    J

    Followed by a capital J

    a

    Followed the character a

    v

    Followed the character v

    a

    Followed the character a

    <space>

    Followed by a single space

    \d

    Followed by a digit

    (

    Followed by a group consisting of

    <space>

    A space

    |

    Or

    $

    An end-of-line character

    )

    Close group

    Data Validation with the Pattern and Matcher Objects Example

    Writing the pattern in the preceding section involved a little bit more work than expected. Let’s see if it’s any easier to use the Pattern and Marcher objects in Listing 1-9. The output is shown in Output    1-9.

    Listing 1-9. ValidationTestWithPatternAndMatcher.java

    import java.util.regex.*;
    public class ValidationTestWithPatternAndMatcher{
      public static void main(String args[]){
        // Compile the pattern
        Pattern p = null;
        try{
         
    p = Pattern.compile("Java \\d");
        }
        catch (PatternSyntaxException pex){
          
    pex.printStackTrace();
          System.exit(0);
        }
        
    //define the matcher string
        
    String candidate = "I love Java 4";
        //get the matcher
        Matcher m = p.matcher(candidate);
        
    System.out.println("result=" + m.find());
      }
    }

    Output 1-9. Result of Running ValidationTestWithPatternAndMatcher.java

    ------------------------------------------------------------------
    C:\RegEx\Examples\chapter1>java ValidationTestWithPatternAndMatcher
    result = true

    The pattern used in Listing 1-9 is less complicated than that in Listing 1-8. It’s simply the original string Java \d. But the Java code requires explicit usage of the Pattern and Matcher objects, which is slightly more demanding of the programmer. You’re doing this because you want explicit access to the Matcher.find method, which allows you to examine the input string and see if any part of it matches the pattern. Again, this in contrast to the String.matches(String regex) method, which requires an exact match.

    Generally speaking, there are two types of validation. The first type requires an exact match. For these, the easiest validation method is probably to use the String.matches(String regex), because it rejects anything that doesn’t match fully and completely.

    The second type of validation requires that the string contain the pattern at some point, but it doesn’t require an exact match. For example, you might require that a password contain nonalphanumeric characters. These types of validations are best achieved by using the Matcher and Pattern objects. Chapter 5 provides more complex validation examples.

    More Java Articles
    More By Apress Publishing


     

    Buy this book now. This article is excerpted from Java Regular Expressions: Taming the java.util.regex Engine, written by Mehran Habibi (Apress, 2004; ISBN: 1590591070). Check it out at your favorite bookstore. Buy this book now.

    JAVA ARTICLES

    - Deploying Multiple Java Applets as One
    - Deploying Java Applets
    - Understanding Deployment Frameworks
    - Database Programming in Java Using JDBC
    - Extension Interfaces and SAX
    - Entities, Handlers and SAX
    - Advanced SAX
    - Conversions and Java Print Streams
    - Formatters and Java Print Streams
    - Java Print Streams
    - Wildcards, Arrays, and Generics in Java
    - Wildcards and Generic Methods in Java
    - Finishing the Project: Java Web Development ...
    - Generics and Limitations in Java
    - Getting Started with Java Web Development in...

    Iron Speed





    © 2003-2008 by Developer Shed. All rights reserved. DS Cluster 5 hosted by Hostway