In the Recode pane, there is an option to group strings that allows fuzzy matching.
Is there a JSL function for doing fuzzy string matching? If there is, I'm having trouble finding it. Help!
John
You might be able to use Shortest Edit Script to make one. The example in the scripting index assembles a string of the characters the two strings share in order.
Because Python's string delimiter is a single-quote character (cf. JSL double-quote), it needs to be escaped. You should also put the following two lines at the beginning of the function:
Substitute Into(str1, "'", "\'");
Substitute Into(str2, "'", "\'");
You might be able to use Shortest Edit Script to make one. The example in the scripting index assembles a string of the characters the two strings share in order.
Just in follow-up, I ran across a Python open package that provides all kind of string similarity measures. It seems to be really well-done. It's at:
https://github.com/luozhouyang/python-string-similarity
It was straightforward to write a little JSL function that wraps one of the Python functions in this package, e.g., I decided to use the Jaro-Winkler algorithm as implemented there. My function looks like this:
JaroWinkler = Function( {str1, str2},
{arg, rslt},
arg = Eval Insert(
"\[
from strsimpy.jaro_winkler import JaroWinkler;
jarowinkler = JaroWinkler();
rslt = jarowinkler.similarity('^str1^', '^str2^')
]\"
);
Python Init();
Python Submit( arg );
rslt = Python Get( rslt );
Python Term();
rslt;
);
rslt = JaroWinkler( "My string", "My tsring" );
Show( rslt ); // Log displays the following: rslt = 0.974074074074074;
(I'm using Python 3.8 on Mac. strsimpy does have a dependence on numpy, which must be installed in your Python.)
Because Python's string delimiter is a single-quote character (cf. JSL double-quote), it needs to be escaped. You should also put the following two lines at the beginning of the function:
Substitute Into(str1, "'", "\'");
Substitute Into(str2, "'", "\'");