cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Browse apps to extend the software in the new JMP Marketplace
Choose Language Hide Translation Bar
ts2
ts2
Level III

Dealing with wildcards for joining tables

I need to join multiple tables, but the column I need to join on has wildcards for one of the tables columns. The tables are not overly large so it doens't need to be very efficient.

 

In the Table1 KEY column below there could be any single alphanmueric character where the question marks are. In the Brackets are OR statements, so it could be 1 or 2 or 3 or 4.

 

So for example it should join onto rows 1&2 of Table2 since the wildcards would match. But row 3 has a 9 where the OR pattern only has 1|2|3|4.

 

I was thinking of duplicating the row for each possible alphanumeric combination where the question marks are, and for each combination in the OR brackets then the join would work - not sure how to do this. 

 

Table1

KEYDATA
ABC?Y?[1|2|3|4]Z99999

 

 

Table2

KEYDATE
ABC1YA1Z2019/08/14 10:10:10 PM
ABCZYB4Z2019/08/14 11:11:11 PM
ABC1YA9Z2019/08/14 12:12:12 PM

 

Result Table:

KEYDATEDATA
ABC1YA1Z2019/08/14 10:10:10 PM99999
ABCZYB4Z2019/08/14 11:11:11 PM99999
ABC1YA9Z2019/08/14 12:12:12PM.

 

2 REPLIES 2
jara95
Level III

Re: Dealing with wildcards for joining tables

You can convert your KEY format to regular expression format.

dt1 is Table1 and dt2 is Table2.

dt1<<new column("KEY_reg",character,formula(substitute(:KEY,"|","","?","[A-Z0-9]")));

I tried to use the update command but not succesful.

dt2<< update(with(dt1),match columns(regex(:KEY,:KEY_reg ))) //[NOT WORKING]

Here are manual way to do the job.

dt1<<new column("KEY_reg",character,formula(substitute(:KEY,"|","","?","[A-Z0-9]")));
dt2<<new column("DATA");
for(r=1,r<=ncol(dt1),r++,
	r1 = dt2<<get rows where(not(is missing(regex(:KEY, dt1[r,"KEY_reg"]))));
	dt2[r1,"DATA"]=dt1[r,"DATA"];
);

Thanks.

ts2
ts2
Level III

Re: Dealing with wildcards for joining tables

Thank you Jara95 - this is very close to the solution. I made some small edits - not sure about how you were referencing column rows using this method:

dt1[r,"DATA"]

I switched to this method and it works:

dt1:DATA[r]

 

I also found that with Regex you can leave the OR | operators in place, no need to Substitute them out.

One final thing I changed is For loop from N Col to N Row.

 

 

So my final script looks something like this:

dt1 << New Column( "KEY_reg", character, formula( Substitute( dt1:KEY, "?", "[A-Z0-9]" ) ) );
dt2 << New Column( "DATA" );
For( r = 1, r <= N Row( dt1 ), r++,
	r1 = dt2 << Get Rows Where( !Is Missing( Regex( dt2:KEY, dt1:KEY_reg[r] ) ) );
	dt2:DATA[r1] = dt1:DATA[r];
);

Thanks again!