Pattern Matching Algorithms Demystified: KMP - C Implementation
(Page 3 of 4 )
First things first, I want to make it clear that there are numerous K-M-P implementations. Therefore, the solution you’re going to read here isn’t the only one. You can surely find others with a quick Google search. The coding language is C++ and it is cross-platform compatible. I’ve compiled it with Visual Studio 2005.
I have opted for the “string” C++ objects. Moreover, we are reading from the stdin and printing out to the stdout. Check out the following code snippet. It’s the “main” section of the program. Notice the necessary header files as well as the prototypes of the function and the method we’re going to use later on. It’s straight-forward.
#include <iostream>
#include <stdlib.h>
#include <string>
using namespace std;
int* genprefix(string);
void kmp_match(string, string);
int main()
{
string t;
string p;
cout << "Enter the T string: ";
cin >> t;
cout << "Enter the P string: ";
cin >> p;
kmp_match(t,p);
return 0;
}
Here comes the part where the fun begins. The following function generates the prefix array. It receives the pattern string (p) as a parameter. The key parts of the code snippet are the while loop and the if section: “if (p[k] == p[q]) k++” since this is where the current substring is increased to the maximum length (if there’s a match).
int* genprefix(string p)
{ // declaration and initialization of variables
int *a;
int m = (int)p.length();
int k = 0;
// we allocate dynamically the soon-to-be prefix array
a = (int *)malloc(m*sizeof(int));
a[0] = 0;
// beginning of the prefix generation algorithm
for(int q = 1; q < m; q++)
{
while (k > 0 && p[k] != p[q])
k = a[k-1];
if (p[k] == p[q])
k++;
a[q] = k;
}
// end of the prefix generation algorithm
return a;
}
Finally, here’s where we are coding the searching algorithm. The method has two parameters: string t and string p - source text string and pattern string, respectively. The algorithm is intuitive since we are moving from left to right, taking into consideration the prefix array. Thanks to that said method we can eliminate lots of unnecessary comparisons and switches. We are jumping right to the next possible substrings.
void kmp_match(string t, string p)
{ // declaration and initialization of variables
int *a;
int q = 0;
int n = (int)t.length();
int m = (int)p.length();
// we generate the prefix array
a = genprefix(p);
// we print the content of the prefix array
cout << endl << "Content of the prefix array: ";
for (int i = 0; i < m; i++)
cout << a[i] << " ";
cout << endl;
// beginning of the K-M-P algorithm
for (int i = 0; i < n; i++)
{
while(q > 0 && p[q] != t[i])
q = a[q-1];
if (p[q] == t[i])
q++;
if (q == m)
{
cout << "Matches on " << i - m + 1 << " position." << endl;
q = a[q-1];
}
}
// end of the K-M-P algorithm
}
The above block of code also prints out the content of the prefix array. It will be useful by the time you learn how the algorithm works. All in all, if you would like to download the complete source code with the compiled executable, then just click on the button below. It is archived in ZIP format.

Next: Taking a Break >>
More Development Cycles Articles
More By Barzan "Tony" Antal