Home arrow Java arrow Page 4 - Regular Expressions
JAVA

Regular Expressions


Regular expressions are a mechanism for telling the Java Virtual Machine (JVM) how to find and manipulate text for you. Using regular expressions to do this is different from the traditional approach. This article compares the two approaches. It is excerpted from Java Regular Expressions: Taming the java.util.regex Engine, written by Mehran Habibi (Apress, 2004; ISBN: 1590591070).

Author Info:
By: Apress Publishing
Rating: 5 stars5 stars5 stars5 stars5 stars / 28
July 28, 2005
TABLE OF CONTENTS:
  1. · Regular Expressions
  2. · Creating Patterns
  3. · Common and Boundary Characters
  4. · Character Classes
  5. · Back References
  6. · Integrating Java with Regular Expressions
  7. · Confirming Name Formats Example
  8. · Finding Duplicate Words Example
  9. · Regular Expression Operations
  10. · Search and Replace
  11. · Comparing Regex and Perl

print this article
SEARCH DEVARTICLES

Regular Expressions - Character Classes
(Page 4 of 11 )

There are times when you need to describe your search criteria as a class—that is, as a group that shares potentially complex commonalities that you need to be able to describe and for which there are no predefined classes. Fortunately, regex provides a mechanism for doing so through character classes, as shown in Table 1-11.

Table 1-11. Character Classes  

Pattern

Description

[abc]

a, b, or c. (Of course, any character could be used, not just a, b, or c.)

[^abc]

Any character except a, b, or c.

[a-zA-Z]

a through z or A through Z.

[a-d[m-p]]

a through d, or m through p: [a-dm-p].

[a-z&&[def]]

Whatever exists in both sets, namely d, e, or f.

[a-z&&[^bc]]

a through z, except for b and c: [ad-z].

[a-z&&[^m-p]]

a through z, and not m through p: [a-lq-z].

There are also some predefined Portable Operating System Interface for UNIX (POSIX) character classes. These are American Standard Code for Information Interchange (ASCII) classes that experience has shown to be particularly useful. Thus, they’re already in place, and you can simply refer to them for use. Table 1-12 contains the POSIX character classes.

Table 1-12. POSIX Character Classes  

Pattern

Description

\p{Lower}

A lowercase letter: [a-z]

\p{Upper}

An uppercase letter: [A-Z]

\p{ASCII}

All ASCII characters: [\x00-\x7F]

\p{Alpha}

An upper- or lowercase letter: [\p{Lower}\p{Upper}]

\p{Digit}

A digit: [0-9]

\p{Alnum}

A number or a letter: [\p{Alpha}\p{Digit}]

\p{Punct}

Punctuation: one of !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~

\p{Graph}

Any visible character: [\p{Alnum}\p{Punct}]

\p{Print}

A printable character: [\p{Graph}]

\p{Blank}

A tab or space

\p{Cntrl}

A control character: [\x00-\x1F\x7F]

\p{XDigit}

A hexadecimal digit: [0-9a-fA-F]

\p{Space}

A whitespace character: [ \t\n\x0B\f\r]

Simple Class Example

Let’s step through some simple examples. The pattern [0-5] will match any part of the input that contains a digit between 0 and 5. Thus, it will match on 0, 1, 2, 3, 4, or 5. It won’t match 8, 6, or any nondigit characters. Table 1-13 dissects the pattern.

Table 1-13. The Pattern [0-5]  

Regex

Description

[

A class consisting of

0

The digit 0

-

Ranging through

5

The digit 5

]

Close class

* In English: Look for any digit ranging from 0 to 5, including 0 and 5.

Negation Example

The pattern [^A] will match any character except the character A. This includes other characters, spaces, tabs, punctuation, and so on. It’s important to notice that the ^ delimiter only has a not meaning when inside a class bracket—that is, inside the [ and ] brackets. Outside those brackets, it stands for the beginning of the line character. I cover this topic in more detail later. Table 1-14 dissects the pattern.

Table 1-14. The Pattern [^A]  

Regex

Description

[

A class consisting of

^

Any character except

A

The character A

]

Close class

* In English: Look for any character except the capital letter A.

Groups and Back References

Groups are simply logical divisions of the text. When you describe a group in regex, you’re providing a mechanism for the JVM to treat characters that fall into that group in a specific way.

Back references allow the regex pattern to refer to a group, even as it’s in the middle of an operation. A pattern can refer to the last group it found, or the one before that, or even one further down the execution chain.

In the sections that follow, I cover the topics of groups and back references in more detail and present an example for each.

Groups

A group is a submatch. If you’re familiar with SQL, it might be helpful to think of groups as the SQL equivalent of a subquery. Groups allow you to define parts of your pattern as logical subunits of the whole and then refer to the results of those subunits. Their syntax follows in Table 1-15.

Table 1-15. Groups  

Regex

Description

(

A group consisting of

Any regex pattern

)

Close group

Groups Example

As with most things, an example can be more illuminating than a description. Consider the pattern (\w+)_(\w+)@(\w+)\.org to match e-mail patterns. Table 1-16 dissects this pattern.

Table 1-16. The Pattern(\w+)_(\w+)@(\w+)\.org

Regex

Description

(

A group consisting of

\w

An alphanumeric or underscore character

+

Repeated one or more times

)

Close group

_

Followed by an underscore character

(

A group consisting of

\w

One alphanumeric or underscore character

+

Followed by one or more alphanumeric characters

)

Close group

@

Followed by an at character

(

A group consisting of

\w

One alphanumeric or underscore character

+

Followed by one or more alphanumeric or underscore characters

)

Close group

\.

Followed by the period character

o

Followed by the character o

r

Followed by the character r

g

Followed by the character g

* In English: Look for a group of alphanumeric characters, followed by _, followed by a group of alphanumeric characters, followed by @, followed by a group of alphanumeric characters, followed by .org.


blog comments powered by Disqus
JAVA ARTICLES

- Java Too Insecure, Says Microsoft Researcher
- Google Beats Oracle in Java Ruling
- Deploying Multiple Java Applets as One
- Deploying Java Applets
- Understanding Deployment Frameworks
- Database Programming in Java Using JDBC
- Extension Interfaces and SAX
- Entities, Handlers and SAX
- Advanced SAX
- Conversions and Java Print Streams
- Formatters and Java Print Streams
- Java Print Streams
- Wildcards, Arrays, and Generics in Java
- Wildcards and Generic Methods in Java
- Finishing the Project: Java Web Development ...

Watch our Tech Videos 
Dev Articles Forums 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us 
Weekly Newsletter
 
Developer Updates  
Free Website Content 
Contact Us 
Site Map 
Privacy Policy 
Support 

Developer Shed Affiliates

 




© 2003-2017 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap
Popular Web Development Topics
All Web Development Tutorials