Home arrow Java arrow Page 11 - Regular Expressions
JAVA

Regular Expressions


Regular expressions are a mechanism for telling the Java Virtual Machine (JVM) how to find and manipulate text for you. Using regular expressions to do this is different from the traditional approach. This article compares the two approaches. It is excerpted from Java Regular Expressions: Taming the java.util.regex Engine, written by Mehran Habibi (Apress, 2004; ISBN: 1590591070).

Author Info:
By: Apress Publishing
Rating: 5 stars5 stars5 stars5 stars5 stars / 28
July 28, 2005
TABLE OF CONTENTS:
  1. · Regular Expressions
  2. · Creating Patterns
  3. · Common and Boundary Characters
  4. · Character Classes
  5. · Back References
  6. · Integrating Java with Regular Expressions
  7. · Confirming Name Formats Example
  8. · Finding Duplicate Words Example
  9. · Regular Expression Operations
  10. · Search and Replace
  11. · Comparing Regex and Perl

print this article
SEARCH DEVARTICLES

Regular Expressions - Comparing Regex and Perl
(Page 11 of 11 )

Perl is probably the most popular language to offer regular expression support. As such, it makes sense to put Javaís regex support in context by comparing it to that of Perl. The distinctions you should be aware of are highlighted in the sections that follow. Generally speaking, J2SE doesnít include some Perl constructs, because Java is a full-featured programming language that offers sophisticated condition and logical paths of execution that are reasonable alternatives to the constructs offered by Perl.

What Perl Offers That Java Regex Doesnít

There are several constructs and concepts you might be familiar with from your Perl experience that you wonít be able to use in the current implementation of Java. Because these are parts of Perl and not Java, I mention them only briefly here.

Regex String modification isnít supported in Java. This means that you canít modify a String with regex. Strings are immutable objects in Java, so youíll have to use methods that return a new String with the modifications you need. In addition, youíll have to modify your String manually, as opposed to using regex patterns to do so. The earlier search and replace example shows how this works. The original String isnít modifiedóit simply returns a new String that represents the modification.

Perlís conditional constructs, (?{X}) and (?(condition)X|Y), arenít supported by J2SEís regex. Because Java offers robust if-then-else support as a language feature, thereís no need for conditional constructs. Chapter 4 provides examples of how this works.

Java doesnít support the embedded code constructs (?(code)) and (??(code)). Again, these are the sorts of things that can be handled more intuitively, by Java standards, by using Javaís built-in language features.

Java doesnít support embedded comments by default, because your patterns can be so easily commented when you create them as Strings. However, you can use the Pattern.COMMENT flag to compile your regex with comments if you really need to. For more on this, please see Chapter 2.

Java doesnít support the preprocessing operations \l \u,\L, and\U.

What Java Regex Offers That Perl Doesnít

Possessive qualifiers are unique to Java, but theyíre very likely to be adopted by other regex implementations soon, because theyíre such a good idea. Possessive qualifiers continue to retain any qualifying greedy match. That means that once a possessive match is achieved, it isnít relinquished. I discuss possessive qualifiers in depth in Chapter 3.

Summary

This chapter covered some general regex syntax and introduced the concepts of the Matcher and Pattern classes. You learned some methods for creating your own regular expressions and how you might actually use them in Java. Finally, you explored some concrete examples and reasoned your way through them. Chapter 2 continues to build on this theme and provide you with a deeper understanding of Javaís regex package.

FAQs

Q:  The \b metacharacter seems to act inconsistently in regular expressions as I write them. Whatís going on?

A:  In regex, \b means a word boundary. However, in general Java vernacular, \b means a backspace. Hereís the rule: The literal String \b means a backspace character. However, the literal String \\b means a word boundary.

Q:  When should I use the String.matches method instead of the Pattern and Matcher objects directly?

A:  Use the String.matches method if you require an exact match. For example, if you want exactly seven consecutive digits and nothing else is acceptable, then use String.matches with the pattern \d{7}. In general, if youíre prepared to narrow the definition of acceptable patterns, or if youíre willing to define every possible variation, then use the String.matches method. On the other hand, if youíre looking for the existence of substring, youíre better served by the Pattern and Matcher objects.

Q:  Is using the String.matches method less resource-intensive than using the Pattern and Matcher objects?

A:  No. The String.matches method simply calls the Pattern.matches method, which in turn creates and uses both a Pattern object and a Matcher object.

Q:  Can I modify a String by applying a regular expression to it?

A:  Absolutely not. Strings are immutable objects in Java, and thus they cannot be changed. However, you can create a new String object that has the requested changes. Thus, if you have

   String tmp = "Hello";

and you want to change the e to a X by doing the following:

   String newTmp = tmp.replaceFirst("e","X");

the value of tmp is still Hello, but the value of newTmp is HXllo.

Q:  Why did the pattern (\p{Upper}(\p{Lower}+\s?)){2,3} match John McGee in the NameFormat.java example?

A:  Because John meets the first part of the pattern, Mc meets the second part of the pattern, and Gee meets the second part of the pattern. As a test, try running John Janis McGee through the NameFormat.java program.

The point here is that John consists of an uppercase letter, followed by one or more lowercase letters, followed by one space. Mc consists of an uppercase letter, followed by one or more lowercase letters, followed by no space, and Gee consists of an uppercase letter, followed by one or more lowercase letters, followed by no space. This isnít exactly what you may have had in mind, but it seems permissible in this case. Itís very important to be precise and do a lot of testing when working with regular expressions, or unexpected results are sure to follow.

Q:  What type of regex engine does Java use?

A:  J2SE uses a traditional nondeterministic finite automaton (NFA) engine. This means that when the engine reaches a fork in the road, it chooses one path, remembers where the other path is in case things donít work out, and goes from there.

The advantage here is that you could be leading the engine to a match very, very quickly if you write efficient expressions. The disadvantage is that you could be leading the regex engine on a wild goose chase before it finally gets the match by writing inefficient expressions.


DISCLAIMER: The content provided in this article is not warranted or guaranteed by Developer Shed, Inc. The content provided is intended for entertainment and/or educational purposes in order to introduce to the reader key ideas, concepts, and/or product reviews. As such it is incumbent upon the reader to employ real-world tactics for security and implementation of best practices. We are not liable for any negative consequences that may result from implementing any information covered in our articles or tutorials. If this is a hardware review, it is not recommended to open and/or modify your hardware.

blog comments powered by Disqus
JAVA ARTICLES

- Java Too Insecure, Says Microsoft Researcher
- Google Beats Oracle in Java Ruling
- Deploying Multiple Java Applets as One
- Deploying Java Applets
- Understanding Deployment Frameworks
- Database Programming in Java Using JDBC
- Extension Interfaces and SAX
- Entities, Handlers and SAX
- Advanced SAX
- Conversions and Java Print Streams
- Formatters and Java Print Streams
- Java Print Streams
- Wildcards, Arrays, and Generics in Java
- Wildcards and Generic Methods in Java
- Finishing the Project: Java Web Development ...

Watch our Tech Videos 
Dev Articles Forums 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us 
Weekly Newsletter
 
Developer Updates  
Free Website Content 
Contact Us 
Site Map 
Privacy Policy 
Support 

Developer Shed Affiliates

 




© 2003-2017 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap
Popular Web Development Topics
All Web Development Tutorials