cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
Voizingu
Level III

Update a table with another table based on "Nearest Match" columns

Hello,

 

I have 2 tables (Table1 and Table2), both queried from databases.

I need to update Table1 with Table2 by matching columns (:match).

 

The problem is: 

- I get :match content for Table1 with a Regex from the database with (sometimes) this format: “XXXX-YY-ZZZ”.

- I get :match content for Table2 from another database with a different but consistent format: “XXXXYYZZZ”.

 

I can manually use If() and Subsitute() to get rid of “-“ in an extra step but I need to loop over a high number of table updates with the column format usually matching. I would like to avoid that route if possible.

 

I see two possible solutions:

1- Is there an elegant way within Update() that could look for the “nearest match” between columns by ignoring specific characters?

2- Is there a way to match a specific chain of character with Regex like “XXXX-YY-ZZZ” but output only “XXXXYYZZZ"? So that I can fix upstream the format of :match in table1 

 

If you have another approach in mind, I would love to hear it. 

Below an example script with the “extra” step

 

Thanks,

 

-voiz

 

Names Default To Here( 1 );

dt1 = New Table( "Table1",
	Add Rows( 20 ),
	New Column( "match", // format "XXX-YY-ZZZ"
		Character,
		"Nominal",
		set values({"786-79-6822", "413-31-2411", "286-37-3500", "985-96-1683", "641-67-8572","537-38-5509", "136-71-5147", "965-59-8232", "141-62-8885", "212-98-2829","330-28-2219", "938-39-3349", "203-47-1149", "310-16-8252", "273-85-3049","543-87-1778", "784-24-3481", "708-81-6285", "902-33-5331", "763-59-8837"})
	),
	New Column( "Value1",
		Character,
		"Nominal",
		set values ( {"1.7826391133", "3.4239116441", "0.1994181285", "3.5850992007", "1.8049403103", "2.3814269388", "2.3045256995", "3.3546657711", "2.341782487", "1.9017582694", "1.5864833118", "1.3739423649", "2.0963420644", "0.4244481763", "3.6469046306", "3.1553423936", "0.6822735481", "3.9464927437", "0.7582216617", "0.9802403664"})
	)
);


dt2 = New Table( "Table2", 
	Add Rows( 20 ),
	New Column( "match", // format "XXXYYZZZ"
		Character,
		"Nominal",
		set values({"537385509", "708816285", "938393349", "543871778", "286373500", "902335331", "784243481", "273853049", "641678572", "203471149", "985961683", "413312411", "786796822", "965598232", "136715147", "763598837", "330282219", "212982829", "141628885", "310168252"}
	)
	),
	New Column( "Value2",
		Character,
		"Nominal",
		set values ( {"1.7489787", "1.763834", "3.0609319", "0.7657611", "2.0515768", "0.8648471", "1.0966909", "1.5494098", "2.7582114", "1.1778722", "2.8651535", "2.3157859", "4.3617672", "0.7137677", "1.1055379", "1.8654949", "4.0616821", "2.3122891", "1.9496742", "2.3774745"})
	)
);

// I would like to remove this extra step (it doens't apply on all tables I need to loop through...
dt1 << new column ("Column 3",Character,"Nominal",formula(substitute(:match,"-","")));

// ...and make the Update() more robust with a "nearest match", not "exact "match"
dt1 << Update (
	with (dt2),
	match columns (Column("Column 3") == Column("match"))
)
2 ACCEPTED SOLUTIONS

Accepted Solutions
txnelson
Super User

Re: Update a table with another table based on "Nearest Match" columns

It is such a minor step, I don't see the problem.  You can add the column, or change the column in place.  A real easy solution.

Jim

View solution in original post

jthi
Super User

Re: Update a table with another table based on "Nearest Match" columns

If you can just use Substitute() do that. JMP's update/join do not have "nearest" options for joining (and this is more difficult as you are doing it based on strings where the question is "what is nearest"). You can also use Transform Columns if you don't want to create new ones (getting this working can initially be a bit more effort)

// Update data table
Data Table("Big Class") << Update(
	With(Data Table("Untitled")),
	Match Columns(
		Transform Column("Concatenate[name]", Character, Formula(Concat(:name))) =
		:Column 1
	)
);

 

Script the "Group Similar Values" function in the Recode platform with JSL and Need to join two tables based on non-identical strings (Use Regex?) in JSL (your post) can give some ideas how to join by closest string match if you really want to go that route.

-Jarmo

View solution in original post

6 REPLIES 6
txnelson
Super User

Re: Update a table with another table based on "Nearest Match" columns

It appears that your example table joins 100% of the data when using the column you created by removing the "-".  Do you have examples of your real data do not match after removing the "-"?

Jim
Voizingu
Level III

Re: Update a table with another table based on "Nearest Match" columns

Hi Jim,

 

If I add the step of removing the "-" with substitute() as described in my script, that works. 

If I do it with my real data, it works too, but I would like to avoid this step, because not all of my Table1 have the extra "-" in the :match column

txnelson
Super User

Re: Update a table with another table based on "Nearest Match" columns

It is such a minor step, I don't see the problem.  You can add the column, or change the column in place.  A real easy solution.

Jim
Voizingu
Level III

Re: Update a table with another table based on "Nearest Match" columns

Hi Jim,

 

My constraint was that no all of cases need this transformation, so I wanted my script to be more smart. But I found a solution:

I look for "-" with a IF() CONTAINS(), and if it is true, I am doing a double REGEX(): The first one to look for the pattern to match, and the second one to apply the transformation (remove "-").

It is not super elegant but it works and is robust enough.

 

Thanks again, as always

 

-voiz

-voiz

jthi
Super User

Re: Update a table with another table based on "Nearest Match" columns

If you can just use Substitute() do that. JMP's update/join do not have "nearest" options for joining (and this is more difficult as you are doing it based on strings where the question is "what is nearest"). You can also use Transform Columns if you don't want to create new ones (getting this working can initially be a bit more effort)

// Update data table
Data Table("Big Class") << Update(
	With(Data Table("Untitled")),
	Match Columns(
		Transform Column("Concatenate[name]", Character, Formula(Concat(:name))) =
		:Column 1
	)
);

 

Script the "Group Similar Values" function in the Recode platform with JSL and Need to join two tables based on non-identical strings (Use Regex?) in JSL (your post) can give some ideas how to join by closest string match if you really want to go that route.

-Jarmo
Voizingu
Level III

Re: Update a table with another table based on "Nearest Match" columns

Hi Jarmo,

 

I encountered some issues with the approach from last time: Need to join two tables based on non-identical strings (Use Regex?) in JSL 

 

the false positive rate was a bit of a problem, If there is no "nearest" match, I need to leave the cell empty instead of forcing a match, which was kind of the spirit of this approach.

 

Instead I solved this issue as described in the reply to Jim use a double regex with a condition. That way I avoid false positive.

 

But thanks so much for jumping in as always

 

-voiz