PHP and Regular Expressions 101 - What is a regular expression? (Page 2 of 6 )
What do you think it is that separates programs like BBEdit and notepad from the good old console-based text editors? Both support text input and let you save that text into a file, however modern text editors also support other functionality including find-replace tools, which makes editing a text file that much easier.
Regular expressions are similar, only better. Think of a regular expression as an extremely advanced find-replace tool that saves us the pain of having to write custom data validation routines to check e-mail addresses, make sure phone numbers are in the correct format, etc.
One of the most common functions of any program is data validation, and PHP comes bundled with several text validation functions that allow us to match a string using regular expressions, making sure there's a space here, a question mark there, etc.
What you may not know however, is that regular expressions are simple to implement, and once you've mastered a few regular expressions (which are specially formatted strings that we can use to tell the regular expression engine the portion of a string we want to match) you'll be asking yourself why you left regular expressions in the corner for so long.
PHP has two sets of functions for dealing with the two types of regular expression patterns: Perl 5 compatible patterns, and Posix standard compatible patterns. In this article we will be looking at the ereg function and working with search expressions that conform to the Posix standard. Although they don't offer as much power as Perl 5 patterns, they're a great way to start learning regular expressions. If you're interested in PHP's support for Perl 5 compatible regular expressions, then see the PHP.net site for details on the preg set of PHP functions.
PHP has six functions that work with regular expressions. They all take a regular expression string as their first argument, and are shown below:
ereg: The most common regular expression function, ereg allows us to search a string for matches of a regular expression.
ereg_replace: Allows us to search a string for a regular expression and replace any occurrence of that expression with a new string.
eregi: Performs exactly the same matching as ereg, but is case insensitive.
eregi_replace: Performs exactly the same search-replace functionality as ereg_replace, but is case insensitive.
split: Allows us to search a string for a regular expression and returns the matches as an array of strings.
spliti: Case insensitive version of the split function
Why use regular expressions?
If you're constantly creating functions to validate or manipulate portions of a string, then you might be able to scrap all of these functions and use regular expressions instead. If you answer yes to any of the questions shown below, then you should definitely consider using regular expressions:
Are you writing custom functions to make sure form data contains valid information (such as an @ and a dot in an e-mail address)?
Do you write custom functions to loop through each character in a string and replace it if it matches a certain criteria (such as if it's upper case, or if it's a space)?
Besides being unfavored methods for string validation and manipulation, the two points shown above can also slow your program down if coded inefficiently. Would you rather use this code to validate an e-mail address:
Sure, the first function looks easier and seems well structured, but wouldn't it be easier if we could validate an e-mail address using the one-lined version of the validateEmail function shown above?
The second function shown above uses regular expressions only, and contains one call to the ereg function. The ereg function always returns only true or false, indicating whether its string argument matched the regular expression or not.
Many programmers steer clear of regular expressions because they are (in some circumstances) slower than other text manipulation methods. The reason that regular expressions may be slower is because they involve the copying and pasting of strings in memory as each new part of a regular expression is matched against a string. However, from my experience with regular expressions, the performance hit isn't noticeable unless you're running a complex regular expression against several hundred lines of text, which is rarely the case when used as an input data validation tool.