The popular fuzzy string matching methods are Levenshtein distance, Soundex, Metaphone and Double Metaphone. I created a post on Levenshtein distance here. But the performance is very disappointing. Scanning 40,000 records took around 25 seconds.
An alternative is Soundex. It has some deficiencies in terms of accuracy, specially if the language is not English. However, since this is the most widely known phonetic algorithm, it has native support in MySQL. It is also extremely fast. When I switched algorithms, the query ran from 25 seconds from Levenshtein, down to a fraction of a second for Soundex.
This is a hackish approach for fuzzy string matching in Grails, as an alternative to searchable plugin, assuming MySQL database is used.
As an introduction, the levenshtein distance is used on how similar two strings are. It computes the minimum number of character substitution that has to be done, in order to convert one string from the other. For example, if the levenshtein distance is 0, it means the two strings are exactly the same. If levenshtein distance is 1, it means there is only 1 character differing the two strings. We can use this function for fuzzy matching using a very small treshold for the distance.
While ORM is great in simplifying our work with the database, it is still very useful to know SQL statements our application is throwing at the database. Specially when we are still developing the application.
One of the convenient features of Hibernate that is inherited by Grails, is that once you have defined your data model, it can create the tables for you automatically on start up of the application. The nice thing about this is since Hibernate/GORM is database agnostic, it means you don't need to know how to define the tables in different relational databases. Be it Oracle, DB2, or MySQL. If you change your connection property, and load the appropriate JDBC driver, it will know how to talk to the database and create the correct tables.