Getting Started With MySQL's Full-Text Search Capabilities - Full-Text Rules And The MATCH Command (Page 4 of 6 )
Remember earlier when I listed a number of bullet points, one of which stated that MySQL removes noise words and those of less than 3 characters. Let's test that theory will 2 basic full-text search queries:
select firstName, match(firstName, lastName, details) against('devArticles is on the www') as relevance from testTable;
This query returns the following records:
Notice how our last query only had one word that was longer than 3 characters in lengh, "devArticles". If we remove all words of 3 characters or less from the search string then the relevance ranking will remain the same:
select firstName, match(firstName, lastName, details) against('devArticles') as relevance from testTable;
Here is the list of records that matches the search:
As we can clearly see, the relevance ranking remains the same -– MySQL does indeed remove noise words and those words with 3 characters or less.
MySQL's full-text search ranks words based on their semantic values -- common words rank lower than uncommon words. This makes sense, as a word that exists in many records will have a lesser relevance to a word that only appears in 1 or 2 records. Semantic word rankings are used in most popular full-text searching algorithms. Popular search engines and directories also employ this method.
The 50% Threshold MySQL removes noise words and short words, but if a word is present in more than 50% of the records being searched, then those records will not be returned. MySQL calls this the "50% threshold". In a way this makes sense, as it filters out records that have a low relevance.
Here's one of the comments from a MySQL user on their site:
"... you should add at least 3 rows to the table before you try to match anything, and what you're searching for should only be contained in one of the three rows. This is because of the 50% threshold. If you insert only one row, then now matter what you search for, it is in 50% or more of the rows in the table, and therefore disregarded."