- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Fuzzy string match
In the Recode pane, there is an option to group strings that allows fuzzy matching.
Is there a JSL function for doing fuzzy string matching? If there is, I'm having trouble finding it. Help!
John
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Fuzzy string match
You might be able to use Shortest Edit Script to make one. The example in the scripting index assembles a string of the characters the two strings share in order.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Fuzzy string match
Because Python's string delimiter is a single-quote character (cf. JSL double-quote), it needs to be escaped. You should also put the following two lines at the beginning of the function:
Substitute Into(str1, "'", "\'");
Substitute Into(str2, "'", "\'");
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Fuzzy string match
You might be able to use Shortest Edit Script to make one. The example in the scripting index assembles a string of the characters the two strings share in order.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Fuzzy string match
Just in follow-up, I ran across a Python open package that provides all kind of string similarity measures. It seems to be really well-done. It's at:
https://github.com/luozhouyang/python-string-similarity
It was straightforward to write a little JSL function that wraps one of the Python functions in this package, e.g., I decided to use the Jaro-Winkler algorithm as implemented there. My function looks like this:
JaroWinkler = Function( {str1, str2},
{arg, rslt},
arg = Eval Insert(
"\[
from strsimpy.jaro_winkler import JaroWinkler;
jarowinkler = JaroWinkler();
rslt = jarowinkler.similarity('^str1^', '^str2^')
]\"
);
Python Init();
Python Submit( arg );
rslt = Python Get( rslt );
Python Term();
rslt;
);
rslt = JaroWinkler( "My string", "My tsring" );
Show( rslt ); // Log displays the following: rslt = 0.974074074074074;
(I'm using Python 3.8 on Mac. strsimpy does have a dependence on numpy, which must be installed in your Python.)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Fuzzy string match
Because Python's string delimiter is a single-quote character (cf. JSL double-quote), it needs to be escaped. You should also put the following two lines at the beginning of the function:
Substitute Into(str1, "'", "\'");
Substitute Into(str2, "'", "\'");