Development Cycles
  Home arrow Development Cycles arrow Page 5 - Tame the Beast by Matching Similar Strings
Dev Articles Forums 
ADO.NET  
Apache  
ASP  
ASP.NET  
C#  
C++  
ColdFusion  
COM/COM+  
Delphi-Kylix  
Design Usability  
Development Cycles  
DHTML  
Embedded Tools  
Flash  
Graphic Design  
HTML  
IIS  
Interviews  
Java  
JavaScript  
MySQL  
Oracle  
Photoshop  
PHP  
Reviews  
Ruby-on-Rails  
SQL  
SQL Server  
Style Sheets  
VB.Net  
Visual Basic  
Web Authoring  
Web Services  
Web Standards  
XML  
Mobile Linux 
App Generation ROI 
IBM® developerWorks 
Sun Developer Network 
Weekly Newsletter
 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid 
Request Media Kit
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
DEVELOPMENT CYCLES

Tame the Beast by Matching Similar Strings
By: Simon White
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 4 stars4 stars4 stars4 stars4 stars / 27
    2004-02-09

    Table of Contents:
  • Tame the Beast by Matching Similar Strings
  • Equivalence Methods
  • Synonyms and Regular Expressions
  • The Soundex Algorithm
  • Similarity Ranking Methods
  • Editing and Hamming Distances
  • Conclusions

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     
    ADVERTISEMENT


    Tame the Beast by Matching Similar Strings - Similarity Ranking Methods


    (Page 5 of 7 )

    Similarity ranking methods compare a given string to a set of strings and rank those strings in order of similarity. To produce a ranking, we need a way of saying that one match is better than another. This is done by returning a numeric measure of similarity as the result of each comparison. Alternatively, you can think of the distance between two strings, instead of their similarity. Strings with a large distance between them have low similarity, and vice versa.

    Two very common methods for ranking similarity are the Longest Common Sub-string and Edit Distance.

    Longest Common Substring

    The longest common substring between two strings is the longest contiguous chain of characters that exists in both strings. The longer the substring, the better the match between the two strings. This simple approach can work very well in practice.

    A disadvantage of this approach is that the position of an ‘error’ in the input affects the computed similarity between the two strings. If the error occurs in the middle of the string, then the distance between the two strings will be greater than if the error occurred at one end. For example, suppose we make a simple typing error on the keyboard by pressing the key adjacent to the one intended. With a word such as ‘PINEAPPLE’, typing ‘PINESPPLE’ gives a longest common substring of length 4, whereas ‘OINEAPPLE’ gives a value of 8. The problem is that ‘PINESPPLE’ is deemed to be just as good a match with ‘PINEAPPLE’ as the string ‘PINE’, which is probably not what you want.

    More Development Cycles Articles
    More By Simon White


     

    DEVELOPMENT CYCLES ARTICLES

    - Branch and Bound Algorithm Technique
    - Dynamic Programming Algorithm Technique
    - Genetic Algorithm Techniques
    - Greedy Strategy as an Algorithm Technique
    - Divide and Conquer Algorithm Technique
    - The Backtracking Algorithm Technique
    - More Pattern Matching Algorithms: B-M
    - Pattern Matching Algorithms Demystified: KMP
    - Coding Standards
    - A Peek into the Future: Transactional Memory
    - Learning About the Graph Construct using Gam...
    - Learning About the Graph Construct using Gam...
    - Learning About the Graph Construct using Gam...
    - How to Strike a Match
    - Entity Relationship Modeling






    © 2003-2008 by Developer Shed. All rights reserved. DS Cluster 3 hosted by Hostway
    Stay green...Green IT