Home arrow Development Cycles arrow Page 2 - Tame the Beast by Matching Similar Strings

Tame the Beast by Matching Similar Strings

My interest in string similarity stems from a desire for good user interface design. Computers are seen by many as unfriendly, unforgiving beasts that respond unkindly to requests that are almost meaningful. In this article, I demonstrate how computers can be programmed to be more forgiving of their users’ mistakes, with no additional burden on the user such as learning a special query format. Moreover, the techniques described are very widely applicable and often easy to implement.

Author Info:
By: Simon White
Rating: 5 stars5 stars5 stars5 stars5 stars / 39
February 09, 2004
  1. · Tame the Beast by Matching Similar Strings
  2. · Equivalence Methods
  3. · Synonyms and Regular Expressions
  4. · The Soundex Algorithm
  5. · Similarity Ranking Methods
  6. · Editing and Hamming Distances
  7. · Conclusions

print this article

Tame the Beast by Matching Similar Strings - Equivalence Methods
(Page 2 of 7 )

Equivalence methods compare two strings and return a value of true or false according to whether the method deems those two strings to be, in some sense, equivalent. In terms of user-interface design, your application can be more forgiving of user inputs if it accepts equivalent strings instead of only exact matches. A simple example of equivalence is to treat ‘Tweetle-Beetle Battle’ the same as ‘TWEETLE BEETLE BATTLE’ despite the differences in case, and the replacement of a hyphen with a space in the second string.  

Word Stemming

Word stemming is a technique that reduces closely related words to a basic canonical form or ‘stem’. For example, the user inputs ‘swims’ and ‘swimming’ can be reduced to the basic stem ‘swim’ before performing an exact match against expected inputs. Stemming makes use of a suffix dictionary that contains lists of possible word endings. However, such a list is clearly language-dependent and even regional differences of the same language must be considered (for example, compare British spelling ‘standardise’ with American spelling ‘standardize’). Also, not all languages lend themselves to such treatment, although it has been demonstrated for most languages of the Indo-European family (which includes Latin-based and Germanic languages).

Deriving stemming algorithms is a difficult, time-consuming and error-prone activity. Therefore, for application building, I can only recommend using tools such as Snowball, with its suite of existing stemming algorithms for many languages.

blog comments powered by Disqus

- Division of Large Numbers
- Branch and Bound Algorithm Technique
- Dynamic Programming Algorithm Technique
- Genetic Algorithm Techniques
- Greedy Strategy as an Algorithm Technique
- Divide and Conquer Algorithm Technique
- The Backtracking Algorithm Technique
- More Pattern Matching Algorithms: B-M
- Pattern Matching Algorithms Demystified: KMP
- Coding Standards
- A Peek into the Future: Transactional Memory
- Learning About the Graph Construct using Gam...
- Learning About the Graph Construct using Gam...
- Learning About the Graph Construct using Gam...
- How to Strike a Match

Watch our Tech Videos 
Dev Articles Forums 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us 
Weekly Newsletter
Developer Updates  
Free Website Content 
Contact Us 
Site Map 
Privacy Policy 

Developer Shed Affiliates


© 2003-2019 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap
Popular Web Development Topics
All Web Development Tutorials