As an introduction, the levenshtein distance is used on how similar two strings are. It computes the minimum number of character substitution that has to be done, in order to convert one string from the other. For example, if the levenshtein distance is 0, it means the two strings are exactly the same. If levenshtein distance is 1, it means there is only 1 character differing the two strings. We can use this function for fuzzy matching using a very small treshold for the distance.

For illustration, assume we have a Person domain:

package example.fuzzy class Person { String firstName String lastName String middleName String nickName }We can use a MySQL stored procedure to compute for Levenshtein distance. It would be much slower if you will fetch all records from the database and do the matching inside groovy/java. It is much more efficient if all computation are done on the database side. I'm using a stored procedure borrowed from here. Create an sql file ( E.g. sp.sql ) with the contents below.

DELIMITER $$ DROP FUNCTION IF EXISTS levenshtein_distance $$ CREATE FUNCTION levenshtein_distance (s1 VARCHAR(255), s2 VARCHAR(255)) RETURNS INT DETERMINISTIC BEGIN DECLARE s1_len, s2_len, i, j, c, c_temp, cost INT; DECLARE s1_char CHAR; DECLARE cv0, cv1 VARBINARY(256); SET s1_len = CHAR_LENGTH(s1), s2_len = CHAR_LENGTH(s2), cv1 = '\0', j = 1, i = 1, c = 0; IF s1 = s2 THEN RETURN 0; ELSEIF s1_len = 0 THEN RETURN s2_len; ELSEIF s2_len = 0 THEN RETURN s1_len; ELSE WHILE j <= s2_len DO SET cv1 = CONCAT(cv1, UNHEX(HEX(j))), j = j + 1; END WHILE; WHILE i <= s1_len DO SET s1_char = SUBSTRING(s1, i, 1), c = i, cv0 = UNHEX(HEX(i)), j = 1; WHILE j <= s2_len DO SET c = c + 1; IF s1_char = SUBSTRING(s2, j, 1) THEN SET cost = 0; ELSE SET cost = 1; END IF; SET c_temp = CONV(HEX(SUBSTRING(cv1, j, 1)), 16, 10) + cost; IF c > c_temp THEN SET c = c_temp; END IF; SET c_temp = CONV(HEX(SUBSTRING(cv1, j+1, 1)), 16, 10) + 1; IF c > c_temp THEN SET c = c_temp; END IF; SET cv0 = CONCAT(cv0, UNHEX(HEX(c))), j = j + 1; END WHILE; SET cv1 = cv0, i = i + 1; END WHILE; END IF; RETURN c; END $$ DELIMITER ;Note that this function seems to be case insensitive.

Create this to your MySQL schema:

$ mysql -u root -ppassword myschema < sp.sql

Now inside your controller or service, you can do the fuzzy matching via HQL:

def firstNameToMatch = 'john' def lastNameToMatch = 'Doe' def peopleWithSimilarName = Person.executeQuery("from Person p where levenshtein_distance (p.lastName,:firstNameToMatch) + levenshtein_distance (p.firstName, :lastNameToMatch) <= 2", [firstNameToMatch:firstNameToMatch, lastNameToMatch:lastNameToMatch])And of course, since Grails is based on Hibernate, you should be able to do the same in straight Hibernate.

Based on my testing, this function is not suitable for large tables. In my i7 Ivy Bridge laptop, it takes 25 seconds to search through 40,000 records. Not that practical unless used on very small table.

Tags: fuzzy, gorm, grails, hql, levenshtein distance, mysql, search, stored function, stored procedure