Regular Expressions - Back References
(Page 5 of 11 )
Back references are one of the most powerful features offered by regular expressions. Unfortunately, programmers often skip over them because they’re not explained well in the regular expression literature. That’s a mistake I hope to rectify here.
Back references allow a pattern to refer back to parts of itself. They always refer back to groups that were enclosed by the “(” and the “)”characters. Table 1-17 presents the syntax for back references.
Table 1-17. Back References
Regex | Description |
\1 | The first group in the pattern |
\2 | The second group in the pattern |
\n | The nth group in the pattern |
| NOTE There are some idiosyncratic behaviors associated with how back references work in Java, which I explain later in this chapter and in Chapter 3. For right now, you have enough information on back references to get started. |
Back References Example
Say you need to find matches in which a word is duplicated. That is, you don’t know what the word you’re looking for is, but you want to be alerted when the same word is repeated twice in a row. If you’ve used a word processor such as Microsoft Word, you’ll notice that the application does this automatically. Let’s explore how you might do this in Java.
You’ll use the pattern \b(\w+) \1\b, which is dissected in Table 1-18. This pattern matches pizza pizza, Faster pussycat kill kill, or Never Never Never Never Never because each contains a word that’s immediately repeated. It won’t match 222 2222, sara sarah, or Faster pussycat kill, kill because these don’t contain a word that’s immediately repeated. The latter group won’t match because 222 2222has a lingering 2 in the second set, sara sarah has a lingering h in the second word, and in Faster pussycat kill, kill the second kill is separated from the first by a comma.
Table 1-18. The Pattern face="courier new, courier, mono" size=2>\b(\w+) \1\b
Regex | Description |
\b | A word boundary |
( | Followed by a group consisting of |
\w | Any alphanumeric character |
+ | Repeated one for more times |
A ) | Close group |
<space> | Followed by a space |
\1 | Followed by the exact group of characters captured previously a |
\b | Followed by a word boundary |
* In English: Look for a word boundary, followed by a group of alphanumeric characters, followed by a space, followed by the exact same group of alphanumeric characters found previously, followed by a word boundary. In short, look for duplicate words. |
| |
| |
In the next section, you’ll examine some practical examples with corresponding Java code.
Next: Integrating Java with Regular Expressions >>
More Java Articles
More By Apress Publishing
|
This article is excerpted from Java Regular Expressions: Taming the java.util.regex Engine, written by Mehran Habibi (Apress, 2004; ISBN: 1590591070). Check it out at your favorite bookstore. Buy this book now.
|
|