The popular fuzzy string matching methods are Levenshtein distance, Soundex, Metaphone and Double Metaphone. I created a post on Levenshtein distance here
. But the performance is very disappointing. Scanning 40,000 records took around 25 seconds.
An alternative is Soundex. It has some deficiencies in terms of accuracy, specially if the language is not English. However, since this is the most widely known phonetic algorithm, it has native support in MySQL. It is also extremely fast. When I switched algorithms, the query ran from 25 seconds from Levenshtein, down to a fraction of a second for Soundex.
This is a hackish approach for fuzzy string matching in Grails, as an alternative to searchable plugin, assuming MySQL database is used.
As an introduction, the levenshtein distance is used on how similar two strings are. It computes the minimum number of character substitution that has to be done, in order to convert one string from the other. For example, if the levenshtein distance is 0, it means the two strings are exactly the same. If levenshtein distance is 1, it means there is only 1 character differing the two strings. We can use this function for fuzzy matching using a very small treshold for the distance.