cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Browse apps to extend the software in the new JMP Marketplace
Choose Language Hide Translation Bar
Jekko
Level I

How do I join tables based on one fully matching column and one closest matching?

Hi, 

 

I would like to join two data tables via the 'join' function in JMP. I would like to join them based on time and reactor number. However, this is not working as the column 'time' has 8 decimals in one of the data files and the other one zero decimals.

(How) would it be possible to match time on the closest matching point and reactor number exactly?

I've already found some older discussions about this problem. However, these solutions didn't work for me as a second column 'reactor number' could not be matched for.

Capture1.JPG

Here is a screenshot of the two data tables I would like to combine. The output should contain both time, reactor, reflectance and dry weight. 

Thanks in advance!

1 ACCEPTED SOLUTION

Accepted Solutions
jthi
Super User

Re: How do I join tables based on one fully matching column and one closest matching?

Does the example dataset you provided have wrong values matched?

 

Create column dry weight g/l and rename Time h to time h in reflectance data:

jthi_2-1642007496381.png

Concatenate the dry weight data to that:

jthi_1-1642007367227.png

Select reactor and Time h column and Sort by them.

Select missing value in dry weight -> select matching cells -> invert selection

Now you can jump around the table to find closest values:

jthi_3-1642007574151.png

At this point you could create some helping formulas or scripts to calculate the nearest value for example something like this to further reduce the data:

Names Default To Here(1);
dt = Current Data Table();
r = dt << Get Selected Rows; //selecting of wanted rows could be done in script instead of datatable
r_min = r - 1;
r_max = r + 1;
dt << Subset(rows(Sort List(r||r_min||r_max)), Selected Columns(0));

jthi_8-1642008350940.png

At this point you could most likely copy-paste the dry weight time values to new column and create formula to calculate closest row.

jthi_9-1642008571936.png

 

 

This is what join nearest rows would return when Join is used:

jthi_4-1642007727491.png

With Update it wont create as many messy columns:

jthi_10-1642008595468.png

UI of the tools seems to have some problem on table selection... and user has to make sure correct tables or selected or the tool will do self-join.

-Jarmo

View solution in original post

4 REPLIES 4
jthi
Super User

Re: How do I join tables based on one fully matching column and one closest matching?

Could you provide small example datasets which have multiple reactors on both datatables (and in best case also table with correct results)? This might be possible to do by hand if there aren't too many groups and there is no need for automation. You could maybe use the fuzzy_merge addin and use lots of subsets to manage the bioreactor groups.

 

I do also have a script (and tool) which should be able to do this, but I haven't had time to fully test it yet, and I know for sure it still has some slight bugs. Join nearest rows tool should be able to:

  1. Join based on distance
  2. Join tables backwards, forwards and to nearest (with optional tie-breaker options)
  3. Allow grouping
  4. Allow using tolerance
  5. Shouldn't require having sorted datatables

jthi_2-1642000994868.png

 

-Jarmo
Jekko
Level I

Re: How do I join tables based on one fully matching column and one closest matching?

Thanks for your reply! I will upload a small example yes.

I've once tried to do it manually but it took so long as there are up to 24 groups/reactors and you'll need to scroll through a lot of time points. Creating subsets? How would you see that working?

jthi
Super User

Re: How do I join tables based on one fully matching column and one closest matching?

Does the example dataset you provided have wrong values matched?

 

Create column dry weight g/l and rename Time h to time h in reflectance data:

jthi_2-1642007496381.png

Concatenate the dry weight data to that:

jthi_1-1642007367227.png

Select reactor and Time h column and Sort by them.

Select missing value in dry weight -> select matching cells -> invert selection

Now you can jump around the table to find closest values:

jthi_3-1642007574151.png

At this point you could create some helping formulas or scripts to calculate the nearest value for example something like this to further reduce the data:

Names Default To Here(1);
dt = Current Data Table();
r = dt << Get Selected Rows; //selecting of wanted rows could be done in script instead of datatable
r_min = r - 1;
r_max = r + 1;
dt << Subset(rows(Sort List(r||r_min||r_max)), Selected Columns(0));

jthi_8-1642008350940.png

At this point you could most likely copy-paste the dry weight time values to new column and create formula to calculate closest row.

jthi_9-1642008571936.png

 

 

This is what join nearest rows would return when Join is used:

jthi_4-1642007727491.png

With Update it wont create as many messy columns:

jthi_10-1642008595468.png

UI of the tools seems to have some problem on table selection... and user has to make sure correct tables or selected or the tool will do self-join.

-Jarmo
vince_faller
Super User (Alumni)

Re: How do I join tables based on one fully matching column and one closest matching?

Why not just make a new column that round()s the time for the one with 8 digits and join upon that?


**Edit** Nevermind.  I immediately see why.  

 

You could get a subset of only the closest time by doing the following

 

Names default to here(1);
dt = New Table( "Example",
	New Column( "Time",
		Set Values( [14.14, 14.28, 13.56, 13.11, 12.32, .] )
	)
);

dt << New Column("Rounded Time", 
	Formula(Round(:Time, 0))
);

// we're just going to for the row that the value for time - rounded time = the min for that value
dt << New Column("Closest", 
	Formula(
		Abs(:Time - :RoundedTime) == ColMin(Abs(:Time - :RoundedTime), :RoundedTime)
	)
);

Vince Faller - Predictum