cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
The Discovery Summit 2025 Call for Content is open! Submit an abstract today to present at our premier analytics conference.
Choose Language Hide Translation Bar
View Original Published Thread

join tables based on matched columns and closest match column

Ohad_s
Level III

Hi,

I want to join two tables.

I have full match on two column and closest match on two other columns.

Each table with different number of row.

reference sorted= 108 rows

scan sorted= 193 rows

I want the final table to have all the row from the reference table, 108 rows, + the matching data from the second table.

exact match columns:

1. Row Reference= Row Scan

2. Col Reference= Col Scan

closest match columns:

1. X Coordinates Reference= X Coordinates Scan

2. Y Coordinates Reference= Y Coordinates Scan

If possible, I want to control the proximation value, like search area.

I've tried other suggestions in the forum: fuzzy merge and nearest rows but didn't get the results I want.

 

would appreciate any help.

1 ACCEPTED SOLUTION

Accepted Solutions
jthi
Super User


Re: join tables based on matched columns and closest match column

This might give some idea

Names Default To Here(1);

dt_scan = Open("$DOWNLOADS/scan sorted.jmp");
dt_ref = Open("$DOWNLOADS/reference sorted.jmp");

dt_scan << clear select << Clear Column Selection;
dt_ref << clear select << Clear Column Selection;

dt_res = dt_ref << Join(
	With(dt_scan),
	By Matching Columns(:Col_Reference = :Col_Scan, :Row_Reference = :Row_Scan),
	Drop multiples(0, 0),
	Include Nonmatches(1, 0),
	Preserve main table order(1),
	Output Table("Join of reference sorted with scan sorted")
);

dt_res << New Column("Distance", Numeric, Continuous, Formula(
	Sqrt(
		(:X_Coordinates_Scan - :X_Coordinates_Reference)^2
		+
		(:Y_Coordinates_Scan - :Y_Coordinates_Reference)^2
	);
));

dt_res << Select Where(
	:Distance == Col Min(:Distance, :Col_Reference, :Row_Reference, :ID_Reference)
	| Is Missing(:ID_SCAN)
);
dt_res << Invert Row Selection;
dt_res << Delete Rows;
-Jarmo

View solution in original post

5 REPLIES 5
jthi
Super User


Re: join tables based on matched columns and closest match column

Can you provide an example of the results you want?

-Jarmo
Ohad_s
Level III


Re: join tables based on matched columns and closest match column

Hi,

this is example of results i want.

for rows 14, 15 - 

if there is an option to set search area, for example 30, than this rows will be without a match and show only the reference table data.

but if we take the closest result regardless of search area then we will see the results as in the provided table.

 

hope it is clear 

 

jthi
Super User


Re: join tables based on matched columns and closest match column

This might give some idea

Names Default To Here(1);

dt_scan = Open("$DOWNLOADS/scan sorted.jmp");
dt_ref = Open("$DOWNLOADS/reference sorted.jmp");

dt_scan << clear select << Clear Column Selection;
dt_ref << clear select << Clear Column Selection;

dt_res = dt_ref << Join(
	With(dt_scan),
	By Matching Columns(:Col_Reference = :Col_Scan, :Row_Reference = :Row_Scan),
	Drop multiples(0, 0),
	Include Nonmatches(1, 0),
	Preserve main table order(1),
	Output Table("Join of reference sorted with scan sorted")
);

dt_res << New Column("Distance", Numeric, Continuous, Formula(
	Sqrt(
		(:X_Coordinates_Scan - :X_Coordinates_Reference)^2
		+
		(:Y_Coordinates_Scan - :Y_Coordinates_Reference)^2
	);
));

dt_res << Select Where(
	:Distance == Col Min(:Distance, :Col_Reference, :Row_Reference, :ID_Reference)
	| Is Missing(:ID_SCAN)
);
dt_res << Invert Row Selection;
dt_res << Delete Rows;
-Jarmo
Ohad_s
Level III


Re: join tables based on matched columns and closest match column

thanks for the quick replay.

this works great.

 

i tried to add a statement-  if :Distance <30 don't show the results, only the reference.

but ended without the results where distance >30.

 

is there an option to add this condition ?

 

jthi
Super User


Re: join tables based on matched columns and closest match column

How did you try to add it?

 

Script first picks the rows with the minimum distances for each of the groups and rows which couldn't be joined

dt_res << Select Where(
	:Distance == Col Min(:Distance, :Col_Reference, :Row_Reference, :ID_Reference)
	| Is Missing(:ID_SCAN)
);

Then it will invert the selection and remove those inverted rows

dt_res << Invert Row Selection;
dt_res << Delete Rows;

You can either modify the selection condition to work without inverse or think how you would take the inversion into account with your distance limit.

-Jarmo