cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
The Discovery Summit 2025 Call for Content is open! Submit an abstract today to present at our premier analytics conference.
See how to use to use Text Explorer to glean valuable information from text data at April 25 webinar.
Choose Language Hide Translation Bar
View Original Published Thread

Fuzzy string match

john_madden
Level VI

In the Recode pane, there is an option to group strings that allows fuzzy matching.

Is there a JSL function for doing fuzzy string matching? If there is, I'm having trouble finding it. Help!

John

2 ACCEPTED SOLUTIONS

Accepted Solutions
Craige_Hales
Super User


Re: Fuzzy string match

You might be able to use Shortest Edit Script to make one. The example in the scripting index assembles a string of the characters the two strings share in order.

Craige

View solution in original post

john_madden
Level VI


Re: Fuzzy string match

Because Python's string delimiter is a single-quote character (cf. JSL double-quote), it needs to be escaped. You should also put the following two lines at the beginning of the function:

Substitute Into(str1, "'", "\'");
Substitute Into(str2, "'", "\'"); 

View solution in original post

3 REPLIES 3
Craige_Hales
Super User


Re: Fuzzy string match

You might be able to use Shortest Edit Script to make one. The example in the scripting index assembles a string of the characters the two strings share in order.

Craige
john_madden
Level VI


Re: Fuzzy string match

Just in follow-up, I ran across a Python open package that provides all kind of string similarity measures. It seems to be really well-done. It's at:

 

https://github.com/luozhouyang/python-string-similarity

 

It was straightforward to write a little JSL function that wraps one of the Python functions in this package, e.g., I decided to use the Jaro-Winkler algorithm as implemented there. My function looks like this:

 

JaroWinkler = Function( {str1, str2},
	{arg, rslt},
	arg = Eval Insert(
		"\[
from strsimpy.jaro_winkler import JaroWinkler;
jarowinkler = JaroWinkler();
rslt = jarowinkler.similarity('^str1^', '^str2^')
]\"
	);
	Python Init();
	Python Submit( arg );
	rslt = Python Get( rslt );
	Python Term();
	rslt;
);

rslt = JaroWinkler( "My string", "My tsring" );
Show( rslt );  // Log displays the following: rslt = 0.974074074074074;

(I'm using Python 3.8 on Mac. strsimpy does have a dependence on numpy, which must be installed in your Python.)

 

 

 

john_madden
Level VI


Re: Fuzzy string match

Because Python's string delimiter is a single-quote character (cf. JSL double-quote), it needs to be escaped. You should also put the following two lines at the beginning of the function:

Substitute Into(str1, "'", "\'");
Substitute Into(str2, "'", "\'");