Home arrow MySQL arrow Page 4 - Lord Of The Strings Part 1

Lord Of The Strings Part 1

I recently enjoyed the latest in the Lord of the Rings trilogy of movies at the cinema. I was intrigued by Tolkienís invented languages (such as Elvish and Dwarvish) and was curious to know where the languages came from, or more precisely, which real language was the biggest influence on Tolkien for his inventions. As I have been thinking about issues of string similarity recently (see Matching Strings and Algorithms), I wondered whether I could extend my ideas of string similarity to language similarity. In other words, could I discover to which real language Tolkienís artificial language is most similar?

Author Info:
By: Simon White
Rating: 5 stars5 stars5 stars5 stars5 stars / 55
March 15, 2004
  1. · Lord Of The Strings Part 1
  2. · Word Lists and Databases
  3. · Word Lists
  4. · Loading the Text Files to the DB
  5. · Running the Query

print this article

Lord Of The Strings Part 1 - Loading the Text Files to the DB
(Page 4 of 5 )

In addition to the word and its language, there is an identifier (word_id), which is the primary key for the table. Note that I used an enum type for the lang column, since we know we are only going to use a limited set of languages. By using an enum, only one byte need be used to store the value of that column Ė far less data than if Iíd used a varchar. I also added indexes for the lang and word columns to improve execution times for queries that constrain these columns.
Now for each language, I loaded the words from the text file into the database table. The following two SQL commands load the word list for Danish; the other languages were loaded similarly.

load data infile 'C:\temp\danish.txt' into table words(word);
update words set lang
='danish' where lang is null;

Once the word lists were in the database, I carried out one further data cleansing action. I removed any words that were less than three characters long:

mysqldelete from words where length(word)<3;
Query OK
2112 rows affected (4.70 sec)

Checking the Data
At this point it is reassuring to run a query to get an overview of the data that we have stored. First, letís check how many words are in the database. As we have one word per row of the database table, thatís the same as counting the number of rows in the table. The following query counts the number of rows, but also stores that value in a variable, called @total.

mysqlselect @total:=count(*) as wordcount from words;
wordcount |
|   1343410 |
1 row in set 
(0.00 sec)

Now letís look at the breakdown of the word lists into languages:

mysqlselect lang as languagecount(word) as wordcount 
       from words group by lang
language  wordcount |
DANISH    |     25291 |
DUTCH     |    178341 |
ENGLISH   |     56355 |
FINNISH   |    287231 |
FRENCH    |    138168 |
GERMAN    |    159989 |
HUNGARIAN |     17818 |
JAPANESE  |    115291 |
LATIN     |     77049 |
NORWEGIAN |     61679 |
POLISH    |    109343 |
SPANISH   |     85965 |
SWAHILI   |     18363 |
SWEDISH   |     12057 |
TOLKIEN   |       470 |
15 rows in set 
(3.55 sec)

blog comments powered by Disqus

- MySQL and BLOBs
- Two Lessons in ASP and MySQL
- Lord Of The Strings Part 2
- Lord Of The Strings Part 1
- Importing Data into MySQL with Navicat
- Building a Sustainable Web Site
- Creating An Online Photo Album with PHP and ...
- Creating An Online Photo Album with PHP and ...
- PhpED 3.2 Ė More Features Than You Can Poke ...
- Creating An Online Photo Album with PHP and ...
- Creating An Online Photo Album with PHP and ...
- Security and Sessions in PHP
- Setup Your Personal Reminder System Using PHP
- Create a IP-Country Database Using PERL and ...
- Developing a Dynamic Document Search in PHP ...

Watch our Tech Videos 
Dev Articles Forums 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us 
Weekly Newsletter
Developer Updates  
Free Website Content 
Contact Us 
Site Map 
Privacy Policy 

Developer Shed Affiliates


© 2003-2019 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap
Popular Web Development Topics
All Web Development Tutorials