cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
New to using JMP? Hit the ground running with the Early User Edition of Discovery Summit. Register now, free of charge.
Register for our Discovery Summit 2024 conference, Oct. 21-24, where youā€™ll learn, connect, and be inspired.
Choose Language Hide Translation Bar
spinup
Level II

How do I add data from one table to another table based on the distance

I have two data tables for two set of measurement on the wafer. There are four columns in each table, X, Y representing the location of measurements, and two columns of measurement data for each table. the measurements  are not collected from the exact same positions (X,Y) of the wafer.  I would like to update ( add extra columns of the other table) either of the tables based on the proximity of measurement locations. To be more clear, I would like to have a combined table ( total 8 columns ) based on matching of  X1,Y1 to the closest point on X2, and Y2. Please see the picture I uploaded. I know Distance function probably can be utilized to make a matrix of all distances from a given X,Y but not sure how to write the jsl code to compute the updated table for me. 

14 REPLIES 14
spinup
Level II

Re: How do I add data from one table to another table based on the distance

Thanks hogi,

This runs like a champ, however , I had the exact same question as Steve: 

Can you explain the line 

dt1 << new column("row_2",Nominal, set each value(pos = Matrix({:x, :y});(MD2 << K nearest rows( 1, pos))[1][1]));

Additionally, when I run my version ( reading the tables with four columns ( x, y, data 1, data 2) on Table 1 and ( x2, y2, data 3, data 4)) I get the below alert . I think it is upset because the Table 1, and Table 2 have more than two columns (x, y) which are not in the matrix. But if you explain the line, I think I can figure out the rest! Thanks for the help 

spinup_0-1724027255837.png

 

hogi
Level XI

Re: How do I add data from one table to another table based on the distance

The Documentation of VPTree could be optimized.

 

KD Table:

hogi_0-1724065620983.png

KDTable <<  K nearest rows has two modes:

  1. default: take the input positions as starting points.
    -> not suitable for us.

  2. optional: use a single position as starting point.
    -> yes!

I calculate this point as pos = Matrix ({:x,:y})for every row and supply it as second argument for K nearest rows 

The return values are vectors of rows and distances. With the [1][1], I can first get the rows, and then the first (and only) entry.

 

Unfortunatley, there is no mode to provide a set of starting point as input.

 

VPTree:

For K nearest Rows there is no real documentation in the Scripting INdex, but there is an example right in the definition of VPTree:

hogi_2-1724066253901.png

and there is some documentation in the online help:

hogi_1-1724066227445.png

 

hogi_0-1724066636952.png

sounds like the same 2 options as for KDTable.

 

From @jthi 's example you can see that VPTree allows a third mode, where a set of positions is provided as a matrix.

This is a bit surprising, but very convenient

tree = VPTree(m2);
{rows, distances} = tree << K Nearest Rows(1, m1);
dt1 << New Column("Nearest", Numeric, Ordinal, Values(rows));

 

jthi
Super User

Re: How do I add data from one table to another table based on the distance

Something like this is what I usually end up with

Names Default To Here(1);

dt1 = New Table("Table 1",
	Add Rows(20),
	New Column("x", Numeric, "Continuous", Formula(Random Normal())),
	New Column("y", Numeric, "Continuous", Formula(Random Normal()))
);

dt2 = New Table("Table 2",
	Add Rows(20),
	New Column("x", Numeric, "Continuous", Formula(Random Normal())),
	New Column("y", Numeric, "Continuous", Formula(Random Normal())),
	New Column("Val", Numeric, Continuous, Formula(Random Normal(2, 100)))
);

m1 = dt1[0, {"x", "y"}];
m2 = dt2[0, {"x", "y"}];

tree = VPTree(m2);
{rows, distances} = tree << K Nearest Rows(1, m1);

dt1 << New Column("Nearest", Numeric, Ordinal, Values(rows));
dt1 << New Column("Distance", Numeric, Ordinal, Values(distances));
dt2 << New Column("Row", Numeric, Ordinal, Values(1::N Rows(dt2)));


/* Depending on the wanted result, using Update might be better */
dt_result = dt1 << Join(
	With(dt2),
	By Matching Columns(:Nearest = :Row),
	Drop multiples(0, 0),
	Include Nonmatches(0, 0),
	Preserve main table order(1),
	Output Table("Result")
);

/*
Close(dt1, no save);
Close(dt2, no save);
*/

Join/K Nearest Rows could be done to wrong direction as I didn't verify the results

-Jarmo
hogi
Level XI

Re: How do I add data from one table to another table based on the distance

VPtree << K Nearest Rows(1, matrix of coordinates);

nice  : )

hogi
Level XI

Re: How do I add data from one table to another table based on the distance

Further Ideas:

- related wish for 1D:Add fuzzy join / asof merge / join by columns which do not have exact matches 
I just added a comment to consider higher dimensions.


- In real application cases, there are additinal nominal coodinates like lots, wafers, variants. The user has to split the data accordingly and set up separate VPTrees for each "variant". VPTree with groupBy will be a logical next step.

 

- On the other hand, instead of just using a neighboring value, the user might prefer to interpolate the input data to get more meaningful values. Group by - sooo useful!!! how about Interpolate?