- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Compare values
Hi,
here is an example of a list
Here is the final result I would like to achieve:
Thanks,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Compare strings and assign most matched
It isn't obvious are from your question + post that are you looking for most matching characters or closest match as a string (two different things with two different solutions?
For matching strings one of these might help Shortest Edit Script and Choose Closest (scripting index isn't able to get link for help page for this).
Edit:
Link from JMP help Identify Differences Between Strings, Lines, or Sequences (jmp.com)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Compare strings and assign most matched
@jthi I am looking for the closet match as a string from the list. I am aware of the Shortest Edit script index however I am trying to figure out how can I assign the closest match string from the sample list which is randomly sorted
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Compare strings and assign most matched
Loop over your list, perform calculation using Shortest Edit Script (or write your own algorithm) for each of the values, pick the one with best match?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Compare strings and assign most matched
What criteria can I use to determine the best match among the strings? "Common" in the SES? Can you share an example?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Compare strings and assign most matched
I think this depends what you consider "most matched". Few examples:
- PortX_Currents_5879_On which is closer PortX_Curre_589_On or Port_Currents_587_On?
- Port3_Currents_5879_On which is closer Port3_Currents_587 or Port_Currents_587_On?
and why?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Compare strings and assign most matched
- Thx
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Compare strings and assign most matched
Most likely your algorithm wouldn't consider those equal. If I would have to make a guess, quite many would consider the second option closer as it requires less edits. For simple check using shortest edit distance() you could just check for inserts, removes and how many letters were edited. Choose Closest() might be a bit more brute-force method as you could just allow it use 9999 edits or something similar.
My fairly quick solutions disagree with Most Matched string you provided (would just just wish to check how long common sequence there is at the beginning)
Few links to read regarding edit distance
- https://en.wikipedia.org/wiki/Edit_distance
- Just for fun and not directly related to this https://norvig.com/spell-correct.html
If you have JMP18 you could also use Python for string distance comparison (but most likely JMP's methods are more than enough). If you allow yourself to use non-standard libraries with Python you get even more options.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Compare strings and assign most matched
I will take a look. Thanks Jarmo.
Did you manually create the ShorestEdit/ChooseCloset Columns in the above snippet? If not, do you mind sharing your logic?
Thanks,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Compare strings and assign most matched
Too much of a work to try manually calculate the closest matches so I did use those functions which I have mentioned.
Choose Closest is very easy to implement if you check the scripting index example which allows edits (first argument is your value in table, second is your sample list and increase max edit count as necessary).
Edit: Here is the script used to create the table
// https://norvig.com/spell-correct.html
// https://en.wikipedia.org/wiki/Edit_distance
// https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance
// https://en.wikipedia.org/wiki/Levenshtein_distance
Names Default To Here(1);
samples = {"Port3_Currents_587", "Port1_Volt_8_Off", "PortC_Volt_12",
"PortX_Curre_589_On", "Curren_Port_1280_On", "Currents_5sd_On_Port1", "Port_Currents_587_On"};
dt = Open("$DOWNLOADS/dt_stings.jmp");
dt << clear select << Clear Column Selection;
new_col = dt << New Column("ShortestEdit", Character, Nominal);
For Each Row(dt,
matches_dif = {};
For Each({cur_sample}, samples,
edits = Shortest Edit Script(:Strings[], cur_sample);
difs = 0;
For Each({cur_edit}, edits,
If(cur_edit[1] != "Common",
difs = difs + Length(cur_edit[2]);
);
);
Insert Into(matches_dif, difs);
);
:ShortestEdit = samples[Loc Min(Matrix(matches_dif))];
);
newcol = dt << new column("ChooseClosest", character, nominal, formula(
Choose Closest(:Strings, samples, Max Edit Count(9999)) // Col Max(Length(:Strings))
));
newcol << delete formula;