Chances are that if you've worked with strings then you have also tried to locate a specific substring in the entire source string. In computer science this is called pattern matching or string matching. And to accomplish this task, there are a few classic algorithms. In this two-part series we’re going to discuss string searching algorithms. We will start with Knuth-Morris-Pratt.
Pattern Matching Algorithms Demystified: KMP (Page 1 of 4 )
Before we begin, I’d like to point out that both of these articles are going to follow this scheme: first we will discuss the theory behind the algorithm. We need to grasp its concepts and methodology well enough that we should be able to run the algorithm on a few sample strings on a piece of paper. After this, we can move on to implementing and writing the code that does K-M-P. I’ve opted for the C++ language.
Knuth-Morris-Pratt (abbreviated as KMP or K-M-P) is an algorithm discovered by Donald Knuth and Vaughan Pratt. Interestingly, J. H. Morris also figured out this exact algorithm. Later on, the three of them published it jointly. It is one of the classic algorithms that is always taught in algorithm analysis courses during string-related chapters. Its complexity O(m) in preprocessing and O(n) in search phase. Thus, the total time it takes can be shown as O(m+n), since it is composed of 2 phases.
The upcoming segment of this two-part series is going to cover the so-called Boyer-Moore (B-M) algorithm. It is a very effective, practical, and efficient algorithm. It is considered the standard benchmark for practical string search literature. Its preprocessing time is Θ(m + |Σ|) and its matching time is Ω(n/m), O(n). For a detailed overview on the asymptotic growth of functions and computational complexity theory, please check out this course from the Jack Baskin School of Engineering, UC Santa Cruz.
Now you know what to expect from this series. Ready, set, go!