Home arrow MySQL arrow Page 2 - Lord Of The Strings Part 2

Lord Of The Strings Part 2

When I saw the latest in the Lord of the Rings trilogy of movies a short while ago, I wondered how Tolkien had invented the artificial languages of Middle Earth. In my previous article, I told of my desire to discover which real language had been the biggest influence on Tolkien for his invented ones. As a software developer, I wanted to discover this information algorithmically. My idea was to use my own string similarity algorithmto compare each word from a list of Tolkien words to words from 14 other real languages. For each Tolkien word, I would find and record the language with the word that is (lexically) most similar. The set of most-similar words and the languages from which they came would provide new insights into the influences on Tolkien.

Author Info:
By: Simon White
Rating: 5 stars5 stars5 stars5 stars5 stars / 28
March 17, 2004
  1. · Lord Of The Strings Part 2
  2. · The Size of the Problem
  3. · Discovering String Similarities
  4. · A Class Called Compare
  5. · Analyzing the Results
  6. · The Finnish Line
  7. · Differences on the Table

print this article

Lord Of The Strings Part 2 - The Size of the Problem
(Page 2 of 7 )

In fact, I opted to write the results back to the database, but not for the reason given above. Actually, I was concerned about the size of the problem. There were 470 Tolkien words in the database, each of which could potentially be compared against 1.3 million other strings. That's a lot of computation, and even for a single given word I thought it would surely take a while to compute the most similar string. My idea was to use the persistence of the database to help address the combinatorics of the problem. If I designed the program such that it wrote results to the database as it computed them, and chose words to analyze that are not already in the results list in the database, then the computer could be shut down and restarted and the program would still pick up where it left off.

In other words, very little processing time would be lost through a system shutdown, even if the analysis was not complete. This is very desirable behavior, so I created another database table, called 'matches' for storing the results of the similarity comparisons. The table needed to store the word (or its id), its best matching word, the language of the best matching word, and the similarity score. I used the following SQL command to create it:

CREATE TABLE matches (
  word       varchar
  word_id    int
  best       varchar
  lang       enum
  similarity float
  primary key
  index word_i 

(Although not strictly necessary, I have included the word and best word in this table as well as in the words table. I realize this duplicates information and runs the risk of the data becoming inconsistent, but it keeps things simple in this article. The normalization and further refinement of the database schema is left as an exercise for the reader. Good luck!)

blog comments powered by Disqus

- MySQL and BLOBs
- Two Lessons in ASP and MySQL
- Lord Of The Strings Part 2
- Lord Of The Strings Part 1
- Importing Data into MySQL with Navicat
- Building a Sustainable Web Site
- Creating An Online Photo Album with PHP and ...
- Creating An Online Photo Album with PHP and ...
- PhpED 3.2 More Features Than You Can Poke ...
- Creating An Online Photo Album with PHP and ...
- Creating An Online Photo Album with PHP and ...
- Security and Sessions in PHP
- Setup Your Personal Reminder System Using PHP
- Create a IP-Country Database Using PERL and ...
- Developing a Dynamic Document Search in PHP ...

Watch our Tech Videos 
Dev Articles Forums 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us 
Weekly Newsletter
Developer Updates  
Free Website Content 
Contact Us 
Site Map 
Privacy Policy 

Developer Shed Affiliates


© 2003-2018 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap
Popular Web Development Topics
All Web Development Tutorials