cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
Jackie_
Level VI

Compare values

Hi,

 

here is an example of a list 

 

Here is the final result I would like to achieve:

 

Thanks,

9 REPLIES 9
jthi
Super User

Re: Compare strings and assign most matched

It isn't obvious are from your question + post that are you looking for most matching characters or closest match as a string (two different things with two different solutions?

 

For matching strings one of these might help Shortest Edit Script and Choose Closest (scripting index isn't able to get link for help page for this).

 

Edit: 

Link from JMP help Identify Differences Between Strings, Lines, or Sequences (jmp.com)

-Jarmo
Jackie_
Level VI

Re: Compare strings and assign most matched

@jthi I am looking for the closet match as a string from the list. I am aware of the Shortest Edit script index however I am trying to figure out how can I assign the closest match string from the sample list which is randomly sorted

jthi
Super User

Re: Compare strings and assign most matched

Loop over your list, perform calculation using Shortest Edit Script (or write your own algorithm) for each of the values, pick the one with best match?

-Jarmo
Jackie_
Level VI

Re: Compare strings and assign most matched

What criteria can I use to determine the best match among the strings? "Common" in the SES? Can you share an example? 

jthi
Super User

Re: Compare strings and assign most matched

I think this depends what you consider "most matched". Few examples:

  • PortX_Currents_5879_On which is closer PortX_Curre_589_On or Port_Currents_587_On?
  • Port3_Currents_5879_On which is closer Port3_Currents_587 or Port_Currents_587_On?

and why?

-Jarmo
Jackie_
Level VI

Re: Compare strings and assign most matched

  • Thx
jthi
Super User

Re: Compare strings and assign most matched

Most likely your algorithm wouldn't consider those equal. If I would have to make a guess, quite many would consider the second option closer as it requires less edits. For simple check using shortest edit distance() you could just check for inserts, removes and how many letters were edited. Choose Closest() might be a bit more brute-force method as you could just allow it use 9999 edits or something similar.

 

My fairly quick solutions disagree with Most Matched string you provided (would just just wish to check how long common sequence there is at the beginning)

jthi_0-1716389377400.png

Few links to read regarding edit distance

If you have JMP18 you could also use Python for string distance comparison (but most likely JMP's methods are more than enough). If you allow yourself to use non-standard libraries with Python you get even more options.

-Jarmo
Jackie_
Level VI

Re: Compare strings and assign most matched

I will take a look. Thanks Jarmo.

Did you manually create the ShorestEdit/ChooseCloset Columns in the above snippet? If not, do you mind sharing your logic?

 

Thanks,

 

jthi
Super User

Re: Compare strings and assign most matched

Too much of a work to try manually calculate the closest matches so I did use those functions which I have mentioned.

 

Choose Closest is very easy to implement if you check the scripting index example which allows edits (first argument is your value in table, second is your sample list and increase max edit count as necessary).

jthi_0-1716393519623.png

 

Edit: Here is the script used to create the table

// https://norvig.com/spell-correct.html
// https://en.wikipedia.org/wiki/Edit_distance
// https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance
// https://en.wikipedia.org/wiki/Levenshtein_distance

Names Default To Here(1);

samples = {"Port3_Currents_587", "Port1_Volt_8_Off", "PortC_Volt_12",
"PortX_Curre_589_On", "Curren_Port_1280_On", "Currents_5sd_On_Port1", "Port_Currents_587_On"};


dt = Open("$DOWNLOADS/dt_stings.jmp");
dt << clear select << Clear Column Selection;

new_col = dt << New Column("ShortestEdit", Character, Nominal);

For Each Row(dt,
	matches_dif = {};
	
	For Each({cur_sample}, samples,
		edits = Shortest Edit Script(:Strings[], cur_sample);
		difs = 0;
		For Each({cur_edit}, edits,
			If(cur_edit[1] != "Common",
				difs = difs + Length(cur_edit[2]);
			);
		);
		Insert Into(matches_dif, difs);
	);
	
	:ShortestEdit = samples[Loc Min(Matrix(matches_dif))];
);

newcol = dt << new column("ChooseClosest", character, nominal, formula(
	Choose Closest(:Strings, samples, Max Edit Count(9999)) // Col Max(Length(:Strings))
));
newcol << delete formula;
-Jarmo