Home arrow Development Cycles arrow Page 3 - Pattern Matching Algorithms Demystified: KMP
DEVELOPMENT CYCLES

Pattern Matching Algorithms Demystified: KMP


Chances are that if you've worked with strings then you have also tried to locate a specific substring in the entire source string. In computer science this is called pattern matching or string matching. And to accomplish this task, there are a few classic algorithms. In this two-part series we’re going to discuss string searching algorithms. We will start with Knuth-Morris-Pratt.

Author Info:
By: Barzan "Tony" Antal
Rating: 4 stars4 stars4 stars4 stars4 stars / 19
July 29, 2008
TABLE OF CONTENTS:
  1. · Pattern Matching Algorithms Demystified: KMP
  2. · The Theory
  3. · C Implementation
  4. · Taking a Break

print this article
SEARCH DEVARTICLES

Pattern Matching Algorithms Demystified: KMP - C Implementation
(Page 3 of 4 )

First things first, I want to make it clear that there are numerous K-M-P implementations. Therefore, the solution you’re going to read here isn’t the only one. You can surely find others with a quick Google search. The coding language is C++ and it is cross-platform compatible. I’ve compiled it with Visual Studio 2005.

I have opted for the “string” C++ objects. Moreover, we are reading from the stdin and printing out to the stdout. Check out the following code snippet. It’s the “main” section of the program. Notice the necessary header files as well as the prototypes of the function and the method we’re going to use later on. It’s straight-forward.

#include <iostream>

#include <stdlib.h>

#include <string>

using namespace std;

int* genprefix(string);

void kmp_match(string, string);

int main()

{

string t;

string p;

cout << "Enter the T string: ";

cin >> t;

cout << "Enter the P string: ";

cin >> p;

kmp_match(t,p);

return 0;

}

Here comes the part where the fun begins. The following function generates the prefix array. It receives the pattern string (p) as a parameter. The key parts of the code snippet are the while loop and the if section: “if (p[k] == p[q]) k++” since this is where the current substring is increased to the maximum length (if there’s a match).

int* genprefix(string p)

{ // declaration and initialization of variables

int *a;

int m = (int)p.length();

int k = 0;

// we allocate dynamically the soon-to-be prefix array

a = (int *)malloc(m*sizeof(int));

a[0] = 0;

// beginning of the prefix generation algorithm

for(int q = 1; q < m; q++)

{

while (k > 0 && p[k] != p[q])

k = a[k-1];

if (p[k] == p[q])

k++;

a[q] = k;

}

// end of the prefix generation algorithm

return a;

}

Finally, here’s where we are coding the searching algorithm. The method has two parameters: string t and string p - source text string and pattern string, respectively. The algorithm is intuitive since we are moving from left to right, taking into consideration the prefix array. Thanks to that said method we can eliminate lots of unnecessary comparisons and switches. We are jumping right to the next possible substrings.

void kmp_match(string t, string p)

{ // declaration and initialization of variables

int *a;

int q = 0;

int n = (int)t.length();

int m = (int)p.length();

 

// we generate the prefix array

a = genprefix(p);

// we print the content of the prefix array

cout << endl << "Content of the prefix array: ";

for (int i = 0; i < m; i++)

cout << a[i] << " ";

cout << endl;

 

// beginning of the K-M-P algorithm

for (int i = 0; i < n; i++)

{

while(q > 0 && p[q] != t[i])

q = a[q-1];

if (p[q] == t[i])

q++;

if (q == m)

{

cout << "Matches on " << i - m + 1 << " position." << endl;

q = a[q-1];

}

}

// end of the K-M-P algorithm

}

The above block of code also prints out the content of the prefix array. It will be useful by the time you learn how the algorithm works. All in all, if you would like to download the complete source code with the compiled executable, then just click on the button below. It is archived in ZIP format.


blog comments powered by Disqus
DEVELOPMENT CYCLES ARTICLES

- Division of Large Numbers
- Branch and Bound Algorithm Technique
- Dynamic Programming Algorithm Technique
- Genetic Algorithm Techniques
- Greedy Strategy as an Algorithm Technique
- Divide and Conquer Algorithm Technique
- The Backtracking Algorithm Technique
- More Pattern Matching Algorithms: B-M
- Pattern Matching Algorithms Demystified: KMP
- Coding Standards
- A Peek into the Future: Transactional Memory
- Learning About the Graph Construct using Gam...
- Learning About the Graph Construct using Gam...
- Learning About the Graph Construct using Gam...
- How to Strike a Match

Watch our Tech Videos 
Dev Articles Forums 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us 
Weekly Newsletter
 
Developer Updates  
Free Website Content 
Contact Us 
Site Map 
Privacy Policy 
Support 

Developer Shed Affiliates

 




© 2003-2017 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap
Popular Web Development Topics
All Web Development Tutorials