Java
  Home arrow Java arrow Page 11 - Regular Expressions
Moblin
Dev Articles Forums 
ADO.NET  
Apache  
ASP  
ASP.NET  
C#  
C++  
ColdFusion  
COM/COM+  
Delphi-Kylix  
Design Usability  
Development Cycles  
DHTML  
Embedded Tools  
Flash  
Graphic Design  
HTML  
IIS  
Interviews  
Java  
JavaScript  
MySQL  
Oracle  
Photoshop  
PHP  
Reviews  
Ruby-on-Rails  
SQL  
SQL Server  
Style Sheets  
VB.Net  
Visual Basic  
Web Authoring  
Web Services  
Web Standards  
XML  
Dedicated Servers  
Actuate Whitepapers 
Moblin 
JMSL Numerical Library 
IBM® developerWorks 
Sun Developer Network 
Weekly Newsletter
 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid 
Request Media Kit
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
JAVA

Regular Expressions
By: Apress Publishing
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 5 stars5 stars5 stars5 stars5 stars / 10
    2005-07-28

    Table of Contents:
  • Regular Expressions
  • Creating Patterns
  • Common and Boundary Characters
  • Character Classes
  • Back References
  • Integrating Java with Regular Expressions
  • Confirming Name Formats Example
  • Finding Duplicate Words Example
  • Regular Expression Operations
  • Search and Replace
  • Comparing Regex and Perl

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     
    ADVERTISEMENT

    Get inside! Sample the range of functionality easily built with JMSL Library for Time Series Data Analysis, Heat Maps, Portfolio Optimization, Monte Carlo Simulation, Stock Price Charting and more. Download Now!

    Regular Expressions - Comparing Regex and Perl


    (Page 11 of 11 )

    Perl is probably the most popular language to offer regular expression support. As such, it makes sense to put Java’s regex support in context by comparing it to that of Perl. The distinctions you should be aware of are highlighted in the sections that follow. Generally speaking, J2SE doesn’t include some Perl constructs, because Java is a full-featured programming language that offers sophisticated condition and logical paths of execution that are reasonable alternatives to the constructs offered by Perl.

    What Perl Offers That Java Regex Doesn’t

    There are several constructs and concepts you might be familiar with from your Perl experience that you won’t be able to use in the current implementation of Java. Because these are parts of Perl and not Java, I mention them only briefly here.

    Regex String modification isn’t supported in Java. This means that you can’t modify a String with regex. Strings are immutable objects in Java, so you’ll have to use methods that return a new String with the modifications you need. In addition, you’ll have to modify your String manually, as opposed to using regex patterns to do so. The earlier search and replace example shows how this works. The original String isn’t modified—it simply returns a new String that represents the modification.

    Perl’s conditional constructs, (?{X}) and (?(condition)X|Y), aren’t supported by J2SE’s regex. Because Java offers robust if-then-else support as a language feature, there’s no need for conditional constructs. Chapter 4 provides examples of how this works.

    Java doesn’t support the embedded code constructs (?(code)) and (??(code)). Again, these are the sorts of things that can be handled more intuitively, by Java standards, by using Java’s built-in language features.

    Java doesn’t support embedded comments by default, because your patterns can be so easily commented when you create them as Strings. However, you can use the Pattern.COMMENT flag to compile your regex with comments if you really need to. For more on this, please see Chapter 2.

    Java doesn’t support the preprocessing operations \l \u,\L, and\U.

    What Java Regex Offers That Perl Doesn’t

    Possessive qualifiers are unique to Java, but they’re very likely to be adopted by other regex implementations soon, because they’re such a good idea. Possessive qualifiers continue to retain any qualifying greedy match. That means that once a possessive match is achieved, it isn’t relinquished. I discuss possessive qualifiers in depth in Chapter 3.

    Summary

    This chapter covered some general regex syntax and introduced the concepts of the Matcher and Pattern classes. You learned some methods for creating your own regular expressions and how you might actually use them in Java. Finally, you explored some concrete examples and reasoned your way through them. Chapter 2 continues to build on this theme and provide you with a deeper understanding of Java’s regex package.

    FAQs

    Q:  The \b metacharacter seems to act inconsistently in regular expressions as I write them. What’s going on?

    A:  In regex, \b means a word boundary. However, in general Java vernacular, \b means a backspace. Here’s the rule: The literal String \b means a backspace character. However, the literal String \\b means a word boundary.

    Q:  When should I use the String.matches method instead of the Pattern and Matcher objects directly?

    A:  Use the String.matches method if you require an exact match. For example, if you want exactly seven consecutive digits and nothing else is acceptable, then use String.matches with the pattern \d{7}. In general, if you’re prepared to narrow the definition of acceptable patterns, or if you’re willing to define every possible variation, then use the String.matches method. On the other hand, if you’re looking for the existence of substring, you’re better served by the Pattern and Matcher objects.

    Q:  Is using the String.matches method less resource-intensive than using the Pattern and Matcher objects?

    A:  No. The String.matches method simply calls the Pattern.matches method, which in turn creates and uses both a Pattern object and a Matcher object.

    Q:  Can I modify a String by applying a regular expression to it?

    A:  Absolutely not. Strings are immutable objects in Java, and thus they cannot be changed. However, you can create a new String object that has the requested changes. Thus, if you have

       String tmp = "Hello";

    and you want to change the e to a X by doing the following:

       String newTmp = tmp.replaceFirst("e","X");

    the value of tmp is still Hello, but the value of newTmp is HXllo.

    Q:  Why did the pattern (\p{Upper}(\p{Lower}+\s?)){2,3} match John McGee in the NameFormat.java example?

    A:  Because John meets the first part of the pattern, Mc meets the second part of the pattern, and Gee meets the second part of the pattern. As a test, try running John Janis McGee through the NameFormat.java program.

    The point here is that John consists of an uppercase letter, followed by one or more lowercase letters, followed by one space. Mc consists of an uppercase letter, followed by one or more lowercase letters, followed by no space, and Gee consists of an uppercase letter, followed by one or more lowercase letters, followed by no space. This isn’t exactly what you may have had in mind, but it seems permissible in this case. It’s very important to be precise and do a lot of testing when working with regular expressions, or unexpected results are sure to follow.

    Q:  What type of regex engine does Java use?

    A:  J2SE uses a traditional nondeterministic finite automaton (NFA) engine. This means that when the engine reaches a fork in the road, it chooses one path, remembers where the other path is in case things don’t work out, and goes from there.

    The advantage here is that you could be leading the engine to a match very, very quickly if you write efficient expressions. The disadvantage is that you could be leading the regex engine on a wild goose chase before it finally gets the match by writing inefficient expressions.


    DISCLAIMER: The content provided in this article is not warranted or guaranteed by Developer Shed, Inc. The content provided is intended for entertainment and/or educational purposes in order to introduce to the reader key ideas, concepts, and/or product reviews. As such it is incumbent upon the reader to employ real-world tactics for security and implementation of best practices. We are not liable for any negative consequences that may result from implementing any information covered in our articles or tutorials. If this is a hardware review, it is not recommended to open and/or modify your hardware.

     

    Buy this book now. This article is excerpted from Java Regular Expressions: Taming the java.util.regex Engine, written by Mehran Habibi (Apress, 2004; ISBN: 1590591070). Check it out at your favorite bookstore. Buy this book now.

    JAVA ARTICLES

    - Deploying Multiple Java Applets as One
    - Deploying Java Applets
    - Understanding Deployment Frameworks
    - Database Programming in Java Using JDBC
    - Extension Interfaces and SAX
    - Entities, Handlers and SAX
    - Advanced SAX
    - Conversions and Java Print Streams
    - Formatters and Java Print Streams
    - Java Print Streams
    - Wildcards, Arrays, and Generics in Java
    - Wildcards and Generic Methods in Java
    - Finishing the Project: Java Web Development ...
    - Generics and Limitations in Java
    - Getting Started with Java Web Development in...







    © 2003-2008 by Developer Shed. All rights reserved. DS Cluster 2 hosted by Hostway