I recently enjoyed the latest in the Lord of the Rings trilogy of movies at the cinema. I was intrigued by Tolkien’s invented languages (such as Elvish and Dwarvish) and was curious to know where the languages came from, or more precisely, which real language was the biggest influence on Tolkien for his inventions. As I have been thinking about issues of string similarity recently (see Matching Strings and Algorithms), I wondered whether I could extend my ideas of string similarity to language similarity. In other words, could I discover to which real language Tolkien’s artificial language is most similar?
Apparently, this is a much discussed topic. The article ‘Are High Elves Finno-Ugric?’ suggests that Finnish had the greatest influence on the development of the Elvish language Quenya. Tolkien first came across a Finnish grammar while he was studying at Oxford, and admitted that it made a strong (even ‘intoxicating’!) impression on him. Indeed, in early versions of Quenya there are many Finnish or near-Finnish words, although the meanings of the words are not those of Finnish. Tolkien himself wrote that Quenya was based on Latin, but with the added ‘phonaesthetic ingredients’ of Finnish and Greek. It has also been argued that some aspects of Tolkien’s invention are more like Uralic languages that are outside of Baltic Finnish, whilst other aspects more closely resemble Hungarian. An Algorithmic Approach As a developer, I was thinking about an algorithmic approach to the problem. My idea was to write a program that takes each Tolkien word in turn and finds which real language has the word which is most similar. By inspecting the number of times each language is chosen, we should be able to decide which language was Tolkien’s biggest influence. Of course I would need to look on the Web to find lists of Tolkien words, as well as word lists for other languages, but I assumed that wouldn’t be a problem. My own string similarity metric could be used for the word-by-word comparison, and is a good choice because it acknowledges similarity for a common substring of any size, and is robust to differences in string size. Of course this would be a comparison of lexical similarity, as my string similarity algorithm makes only lexical comparisons. It is still possible that the inspiration for the grammar and the lexical structure of Tolkien’s languages came from entirely different sources.