Development Cycles
  Home arrow Development Cycles arrow Page 3 - More Pattern Matching Algorithms: B-M
Dev Articles Forums 
ADO.NET  
Apache  
ASP  
ASP.NET  
C#  
C++  
ColdFusion  
COM/COM+  
Delphi-Kylix  
Design Usability  
Development Cycles  
DHTML  
Embedded Tools  
Flash  
Graphic Design  
HTML  
IIS  
Interviews  
Java  
JavaScript  
MySQL  
Oracle  
Photoshop  
PHP  
Reviews  
Ruby-on-Rails  
SQL  
SQL Server  
Style Sheets  
VB.Net  
Visual Basic  
Web Authoring  
Web Services  
Web Standards  
XML  
Mobile Linux 
App Generation ROI 
IBM® developerWorks 
Weekly Newsletter
 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid 
Request Media Kit
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
DEVELOPMENT CYCLES

More Pattern Matching Algorithms: B-M
By: Barzan "Tony" Antal
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 5 stars5 stars5 stars5 stars5 stars / 3
    2008-08-05

    Table of Contents:
  • More Pattern Matching Algorithms: B-M
  • The Theory
  • Implementation
  • Final Words

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     
    ADVERTISEMENT


    More Pattern Matching Algorithms: B-M - Implementation


    (Page 3 of 4 )

    Before we begin, let me point out that there are hundreds of implementations of the Boyer-Moore algorithms, just as there are with other typical algorithms of computer science. Every now and then, a totally different implementation of it may occur.

    Ultimately, it all comes down to whether it works or not. If you do not increase its complexity, either in space or in time, then the overall efficiency of the implementations will still be about the same. And if resource and time efficiency is that important to you, then you should already have basic knowledge in calculating, doing the tests, and mathematically proving its feasibility.

    Here in this article we present a somewhat simple implementation that focuses mostly on its efficient and compact code. It is written in a style that it is easily understood. The presented code is C/C++. And at the end of this article you can download the entire source code of this Boyer-Moore algorithm. We will also exemplify its execution.

    We are going to have three different procedures, one that represents the main B-M algorithm and the other two will illustrate the two preprocessing phases: the good suffix shift and the bad character shift jump tables. As soon as we have generated these two jump tables, we can run the Boyer-Moore algorithm.

    void BadCharTable(string x, int m, int *tableBC)

    { // generates the Bad Character Shift Jump Table

    for (int i = 0; i < 256; i++)

    tableBC[i] = m;

    for (int i = 0; i < m - 1; i++)

    tableBC[x[i]] = m - i - 1;

     

    printf("\nBad Character Shift Jump Table:\n");

    for (int i = 0; i < m; i++)

    printf("%d ", tableBC[x[i]]);

    printf("\nEvery other character has %d value.\n", m);

    }

    Now comes the second preprocessing procedure:

    void GoodSuffixTable(string x, int m, int *tableGS)

    { // generates the Good Suffix Shift Jump Table

    int i, j, f, *suff;

    suff = (int *)malloc(m * sizeof(int));

     

    suff[m - 1] = m;

    // note: on the last position of suff array the length of the pattern string is stored

    // (the last character of the pattern serving as a suffix matches the string being slid with the entire length of the pattern string)

     

    int g = m - 1; // beginning of the suffix table generation

    for (int i = m - 2; i >= 0; i--)

    {

    if (i > g && (suff[i + m - 1 - f] < i - g))

    suff[i] = suff[i + m - 1 - f];

    else

    {

    if (i < g)

    g = i;

    f = i;

    while (g >= 0 && x[g] == x[g + m - 1 - f])

    g--;

    suff[i] = f - g;

    }

    } // end of the suffix table generation

    // we print out the content of the table for exemplification

    printf("\nSuffixes Array: \n");

    for (int i = 0; i < m; i++)

    printf("%d ", suff[i]);

     

    for (i = 0; i < m; i++)

    tableGS[i] = m;

    // we fill this jump table with the length of the pattern

    j = 0;

    for (i = m - 1; i >= 0; i--)

    if (suff[i] == i + 1)

    while (j < m - 1 - i)

    {

    if (tableGS[j] == m)

    tableGS[j] = m - 1 - i;

    j++;

    }

    for (i = 0; i <= m - 2; i++)

    tableGS[m - 1 - suff[i]] = m - 1 - i;

     

    printf("\nGood Suffix Shift Jump Table:\n");

    for (i = 0; i < m; i++)

    printf("%d ", tableGS[i]);

    }

    And here comes the main Boyer-Moore procedure:

    void BoyerMoore(string x, int m, string y, int n)

    { // the typical BoyerMoore algorithm implementation

    int i, j = 0, tableBC[256], tableGS[256];

     

    // preprocessing procedures - we generate the jump tables

    GoodSuffixTable(x, m, tableGS);

    BadCharTable(x, m, tableBC);

     

    // searching phase

    while (j <= n - m)

    {

    for (i = m - 1; i >= 0 && x[i] == y[i + j]; i--);

    // if the index of the above loop reaches below zero, it is an exact match

    if (i < 0)

    { // if exact match, write it out

    printf("Matches on %d position.\n", j);

    j += tableGS[0];

    }

    else

    // not a match, let's jump further (to the maximum between the values in the jump tables)

    j += max(tableGS[i], tableBC[y[i + j]] - m + 1 + i);

    }

    }

    Please note that this is not the best possible implementation of the B-M algorithm in terms of speed. Therefore, it shouldn’t be used in comparisons with other pattern matching algorithms. If that’s what you want, then you may want to “speed up” the two preprocessing phases. Bottom-line is that it works, just don't take its execution as the fastest or the most efficient.

    Needless to say, the elegance and simplicity of this implementation of the algorithm is to be appreciated the most. I know for a fact that this implementation (accompanied with the source codes) is being taught and tutored at various university courses, which is the reason why I picked this one.

    As you may have already noticed, after each preprocessing stage we print out the content of the specific jump table. This way we can “visually” follow what’s going on “behind the scenes.” I have also thoroughly commented the snippets.

    More Development Cycles Articles
    More By Barzan "Tony" Antal


     

    DEVELOPMENT CYCLES ARTICLES

    - Division of Large Numbers
    - Branch and Bound Algorithm Technique
    - Dynamic Programming Algorithm Technique
    - Genetic Algorithm Techniques
    - Greedy Strategy as an Algorithm Technique
    - Divide and Conquer Algorithm Technique
    - The Backtracking Algorithm Technique
    - More Pattern Matching Algorithms: B-M
    - Pattern Matching Algorithms Demystified: KMP
    - Coding Standards
    - A Peek into the Future: Transactional Memory
    - Learning About the Graph Construct using Gam...
    - Learning About the Graph Construct using Gam...
    - Learning About the Graph Construct using Gam...
    - How to Strike a Match







    © 2003-2009 by Developer Shed. All rights reserved. DS Cluster 5 Hosted by Hostway
    For more Enterprise Application Development news, visit eWeek