cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Submit your abstract to the call for content for Discovery Summit Americas by April 23. Selected abstracts will be presented at Discovery Summit, Oct. 21- 24.
Discovery is online this week, April 16 and 18. Join us for these exciting interactive sessions.
Choose Language Hide Translation Bar
john_madden
Level VI

Fuzzy string match

In the Recode pane, there is an option to group strings that allows fuzzy matching.

Is there a JSL function for doing fuzzy string matching? If there is, I'm having trouble finding it. Help!

John

2 ACCEPTED SOLUTIONS

Accepted Solutions
Craige_Hales
Super User

Re: Fuzzy string match

You might be able to use Shortest Edit Script to make one. The example in the scripting index assembles a string of the characters the two strings share in order.

Craige

View solution in original post

john_madden
Level VI

Re: Fuzzy string match

Because Python's string delimiter is a single-quote character (cf. JSL double-quote), it needs to be escaped. You should also put the following two lines at the beginning of the function:

Substitute Into(str1, "'", "\'");
Substitute Into(str2, "'", "\'"); 

View solution in original post

3 REPLIES 3
Craige_Hales
Super User

Re: Fuzzy string match

You might be able to use Shortest Edit Script to make one. The example in the scripting index assembles a string of the characters the two strings share in order.

Craige
john_madden
Level VI

Re: Fuzzy string match

Just in follow-up, I ran across a Python open package that provides all kind of string similarity measures. It seems to be really well-done. It's at:

 

https://github.com/luozhouyang/python-string-similarity

 

It was straightforward to write a little JSL function that wraps one of the Python functions in this package, e.g., I decided to use the Jaro-Winkler algorithm as implemented there. My function looks like this:

 

JaroWinkler = Function( {str1, str2},
	{arg, rslt},
	arg = Eval Insert(
		"\[
from strsimpy.jaro_winkler import JaroWinkler;
jarowinkler = JaroWinkler();
rslt = jarowinkler.similarity('^str1^', '^str2^')
]\"
	);
	Python Init();
	Python Submit( arg );
	rslt = Python Get( rslt );
	Python Term();
	rslt;
);


rslt = JaroWinkler( "My string", "My tsring" ); Show( rslt ); // Log displays the following: rslt = 0.974074074074074;

(I'm using Python 3.8 on Mac. strsimpy does have a dependence on numpy, which must be installed in your Python.)

 

 

 

john_madden
Level VI

Re: Fuzzy string match

Because Python's string delimiter is a single-quote character (cf. JSL double-quote), it needs to be escaped. You should also put the following two lines at the beginning of the function:

Substitute Into(str1, "'", "\'");
Substitute Into(str2, "'", "\'");