This is the second and final half of our two-part series on pattern matching, or string searching algorithms. In the first part, we covered the Knuth-Morris-Pratt (KMP) algorithm and in this segment, we’re going to present a new algorithm that originates from Boyer-Moore. It is currently considered the most efficient and practical algorithm, serving as a benchmark standard.
More Pattern Matching Algorithms: B-M (Page 1 of 4 )
Before we begin, I’d like to suggest reading the first part of this series. You can find it published here on Dev Articles. It contains much of what you should know in order to fully grasp the new methodology of the Boyer-Moore exact pattern matching algorithm. This article will also follow the same scheme as the first one. We’ll start with the theory first.
It all started back in 1977, when Bob Boyer and J. Strother Moore published their work. You can find the scanned copy of the original published abstract here; kudos to the Univ. of Texas (host). This algorithm surprised most people at that time because it approached the theory of string searching differently in that it works backwards, from right to left. And unlike some other algorithms, it preprocesses the pattern, not the source.
Its preprocessing time is Θ(m + |Σ|) in complexity and its matching time is Ω(n / m) (best performance) or O(n) (worst). It performs 3n text comparisons on worst case. Worst case is limited only to non-periodic patterns. For a detailed overview of the asymptotic growth of functions and computational complexity theory, please check out this course from Jack Baskin School of Engineering, UC Santa Cruz.
The efficiency of this algorithm lies in the fact that it does not inspect the source string (in which we are searching for a pattern) in its entirety. The preprocessing phase analyzes the pattern and by using a heuristic approach, it is able to reduce the number of comparisons altogether. The longer the pattern becomes, the fewer comparisons are to be done. Using the preprocessed table(s), the algorithm performs large jumps, which saves time.
Compared to the Knuth-Morris-Pratt pattern matching algorithm, which we all know is a linear algorithm, Boyer-Moore’s variation is sub-linear. Usually, that is. This is mathematically proven in their official publication. Knuth also pointed out that the Boyer-Moore algorithm becomes linear in worst case. As a result, if it's efficiently implemented, it gives the best overall results considering complexity and resources.