The popular fuzzy string matching methods are Levenshtein distance, Soundex, Metaphone and Double Metaphone. I created a post on Levenshtein distance here
. But the performance is very disappointing. Scanning 40,000 records took around 25 seconds.
An alternative is Soundex. It has some deficiencies in terms of accuracy, specially if the language is not English. However, since this is the most widely known phonetic algorithm, it has native support in MySQL. It is also extremely fast. When I switched algorithms, the query ran from 25 seconds from Levenshtein, down to a fraction of a second for Soundex.