The Mechanics of Expression Processing
(Page 1 of 4 )
This article, the first of a four-part series, discusses how a regular expression engine works. It is excerpted from chapter four of the book
Mastering Regular Expressions, Third Edition, written by Jeffrey E.F. Friedl (O'Reilly, 2006; ISBN: 0596528124). Copyright © 2006 O'Reilly Media, Inc. All rights reserved. Used with permission from the publisher. Available from booksellers or direct from O'Reilly Media.
The previous chapter started with an analogy between cars and regular expressions. The bulk of the chapter discussed features, regex flavors, and other “glossy brochure” issues of regular expressions. This chapter continues with that analogy, talking about the all-important regular-expression engine, and how it goes about its work.
Why would you care how it works? As we’ll see, there are several types of regex engines, and the type most commonly used—the type used by Perl, Tcl, Python, the .NET languages, Ruby, PHP, all Java packages I’ve seen, and more—works in such a way that how you craft your expression can influence whether it can match a particular string, where in the string it matches, and how quickly it finds the match or reports the failure. If these issues are important to you, this chapter is for you.
Start Your Engines! Let’s see how much I can milk this engine analogy. The whole point of having an engine is so that you can get from Point A to Point B without doing much work. The engine does the work for you so you can relax and enjoy the sound system. The engine’s primary task is to turn the wheels, and how it does that isn’t really a concern of yours. Or is it?
Two Kinds of Engines Well, what if you had an electric car? They’ve been around for a long time, but they aren’t as common as gas cars because they’re hard to design well. If you had one, though, you would have to remember not to put gas in it. If you had a gasoline engine, well, watch out for sparks! An electric engine more or less just runs, but a gas engine might need some babysitting. You can get much better performance just by changing little things like your spark plug gaps, air filter, or brand of gas. Do it wrong and the engine’s performance deteriorates, or, worse yet, it stalls.
Each engine might do its work differently, but the end result is that the wheels turn. You still have to steer properly if you want to get anywhere, but that’s an entirely different issue.
New Standards Let’s stoke the fire by adding another variable: the California Emissions Standards.† Some engines adhere to California’s strict pollution standards, and some engines don’t. These aren’t really different kinds of engines, just new variations on what’s already around. The standard regulates a result of the engine’s work, the emissions, but doesn’t say anything about how the engine should go about achieving those cleaner results. So, our two classes of engine are divided into four types: electric (adhering and non-adhering) and gasoline (adhering and non-adhering).
Come to think of it, I bet that an electric engine can qualify for the standard without much change—the standard just “blesses” the clean results that are
already par for the course. The gas engine, on the other hand, needs some major tweaking and a bit of re-tooling before it can qualify. Owners of this kind of engine need to pay particular care to what they feed it—use the wrong kind of gas and you’re in big trouble.
The impact of standards
Better pollution standards are a good thing, but they require that the driver exercise more thought and foresight (well, at least for gas engines). Frankly, however, the standard doesn’t impact most people since all the other states still do their own thing and don’t follow California’s standard.
So, you realize that these four types of engines can be classified into three groups (the two kinds for gas, and electric in general). You know about the differences, and that in the end they all still turn the wheels. What you don’t know is what the heck this has to do with regular expressions! More than you might imagine.
Next: Regex Engine Types >>
More Java Articles
More By O'Reilly Media
|
This article is excerpted from chapter four of the book Mastering Regular Expressions, Third Edition, written by Jeffrey E.F. Friedl (O'Reilly, 2006; ISBN: 0596528124). Check it out today at your favorite bookstore. Buy this book now.
|
|