Solved: How do I join tables based on one fully matching column and one closest matchin...

Jekko · Jun 10, 2023 4:42 PM

Hi,

I would like to join two data tables via the 'join' function in JMP. I would like to join them based on time and reactor number. However, this is not working as the column 'time' has 8 decimals in one of the data files and the other one zero decimals.

(How) would it be possible to match time on the closest matching point and reactor number exactly?

I've already found some older discussions about this problem. However, these solutions didn't work for me as a second column 'reactor number' could not be matched for.

Here is a screenshot of the two data tables I would like to combine. The output should contain both time, reactor, reflectance and dry weight.

Thanks in advance!

jthi · Jan 12, 2022 12:30 PM

Does the example dataset you provided have wrong values matched?

Create column dry weight g/l and rename Time h to time h in reflectance data:

Concatenate the dry weight data to that:

Select reactor and Time h column and Sort by them.

Select missing value in dry weight -> select matching cells -> invert selection

Now you can jump around the table to find closest values:

At this point you could create some helping formulas or scripts to calculate the nearest value for example something like this to further reduce the data:

Names Default To Here(1);
dt = Current Data Table();
r = dt << Get Selected Rows; //selecting of wanted rows could be done in script instead of datatable
r_min = r - 1;
r_max = r + 1;
dt << Subset(rows(Sort List(r||r_min||r_max)), Selected Columns(0));

At this point you could most likely copy-paste the dry weight time values to new column and create formula to calculate closest row.

This is what join nearest rows would return when Join is used:

With Update it wont create as many messy columns:

UI of the tools seems to have some problem on table selection... and user has to make sure correct tables or selected or the tool will do self-join.

-Jarmo

View solution in original post

jthi · Jan 12, 2022 10:28 AM

Could you provide small example datasets which have multiple reactors on both datatables (and in best case also table with correct results)? This might be possible to do by hand if there aren't too many groups and there is no need for automation. You could maybe use the fuzzy_merge addin and use lots of subsets to manage the bioreactor groups.

I do also have a script (and tool) which should be able to do this, but I haven't had time to fully test it yet, and I know for sure it still has some slight bugs. Join nearest rows tool should be able to:

Join based on distance
Join tables backwards, forwards and to nearest (with optional tie-breaker options)
Allow grouping
Allow using tolerance
Shouldn't require having sorted datatables

-Jarmo

Jekko · Jan 12, 2022 11:58 AM

Thanks for your reply! I will upload a small example yes.

I've once tried to do it manually but it took so long as there are up to 24 groups/reactors and you'll need to scroll through a lot of time points. Creating subsets? How would you see that working?

jthi · Jan 12, 2022 12:30 PM

Does the example dataset you provided have wrong values matched?

Create column dry weight g/l and rename Time h to time h in reflectance data:

Concatenate the dry weight data to that:

Select reactor and Time h column and Sort by them.

Select missing value in dry weight -> select matching cells -> invert selection

Now you can jump around the table to find closest values:

At this point you could create some helping formulas or scripts to calculate the nearest value for example something like this to further reduce the data:

Names Default To Here(1);
dt = Current Data Table();
r = dt << Get Selected Rows; //selecting of wanted rows could be done in script instead of datatable
r_min = r - 1;
r_max = r + 1;
dt << Subset(rows(Sort List(r||r_min||r_max)), Selected Columns(0));

At this point you could most likely copy-paste the dry weight time values to new column and create formula to calculate closest row.

This is what join nearest rows would return when Join is used:

With Update it wont create as many messy columns:

UI of the tools seems to have some problem on table selection... and user has to make sure correct tables or selected or the tool will do self-join.

-Jarmo

vince_faller · Jan 12, 2022 8:03 AM

Why not just make a new column that round()s the time for the one with 8 digits and join upon that?

**Edit** Nevermind. I immediately see why.

You could get a subset of only the closest time by doing the following

Names default to here(1);
dt = New Table( "Example",
	New Column( "Time",
		Set Values( [14.14, 14.28, 13.56, 13.11, 12.32, .] )
	)
);

dt << New Column("Rounded Time", 
	Formula(Round(:Time, 0))
);

// we're just going to for the row that the value for time - rounded time = the min for that value
dt << New Column("Closest", 
	Formula(
		Abs(:Time - :RoundedTime) == ColMin(Abs(:Time - :RoundedTime), :RoundedTime)
	)
);

Vince Faller - Predictum

How do I join tables based on one fully matching column and one closest matching?

Re: How do I join tables based on one fully matching column and one closest matching?

Re: How do I join tables based on one fully matching column and one closest matching?

Re: How do I join tables based on one fully matching column and one closest matching?

Re: How do I join tables based on one fully matching column and one closest matching?

Re: How do I join tables based on one fully matching column and one closest matching?