I recently enjoyed the latest in the Lord of the Rings trilogy of movies at the cinema. I was intrigued by Tolkienís invented languages (such as Elvish and Dwarvish) and was curious to know where the languages came from, or more precisely, which real language was the biggest influence on Tolkien for his inventions. As I have been thinking about issues of string similarity recently (see Matching Strings and Algorithms), I wondered whether I could extend my ideas of string similarity to language similarity. In other words, could I discover to which real language Tolkienís artificial language is most similar?
Lord Of The Strings Part 1 - Running the Query (Page 5 of 5 )
Given that we stored the total number of rows in a variable, it is now quite easy to run a query to express the word counts as a percentage of the total number of words. The query rounds the percentages to the nearest number to one decimal place.
> select lang as language, round(100*count(word)/@total,1) as percent from words group by lang; +-----------+---------+ | language | percent | +-----------+---------+ | DANISH | 1.9 | | DUTCH | 13.3 | | ENGLISH | 4.2 | | FINNISH | 21.4 | | FRENCH | 10.3 | | GERMAN | 11.9 | | HUNGARIAN | 1.3 | | JAPANESE | 8.6 | | LATIN | 5.7 | | NORWEGIAN | 4.6 | | POLISH | 8.1 | | SPANISH | 6.4 | | SWAHILI | 1.4 | | SWEDISH | 0.9 | | TOLKIEN | 0.0 | +-----------+---------+ 15 rows in set (3.56 sec)
By converting the output of this query to a comma-separated values file, and then loading it into Microsoft Excel, we can generate the following pie chart:
It is important that we understand how well represented the different languages are in the set of word lists, as this will affect our interpretation of the results of the lexical similarity analysis. Clearly
Finnish is well represented in our word lists, as are Dutch, French and German. With Finnish having the largest number of words, we have a good starting point for testing the belief that Finnish was the biggest influence on Tolkienís languages of Middle Earth.
In my next article, I will explain the algorithm that I used to analyze the word lists, present an overview of the Java source code, and reveal the language that, according to my findings, most influenced Tolkien.
DISCLAIMER: The content provided in this article is not warranted or guaranteed by Developer Shed, Inc. The content provided is intended for entertainment and/or educational purposes in order to introduce to the reader key ideas, concepts, and/or product reviews. As such it is incumbent upon the reader to employ real-world tactics for security and implementation of best practices. We are not liable for any negative consequences that may result from implementing any information covered in our articles or tutorials. If this is a hardware review, it is not recommended to open and/or modify your hardware.