cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
TheriaultAJ
Level I

Joining Two Sets of Data, between Dates, by Identifier.

Hi all, first timer,

 

I am trying to combine and match two different tables/data sets from two different sources, with very few unique identifiers...

 

Process example: A widget is processed (in this case with two different reagents), and the output table (below, one row per event) contains "Operation Start Date Time" and "Operation Completion Date Time". The unique identifier for the widget is "SFC" as well as "Shop Order".

 

Output Table 1

TheriaultAJ_1-1614187262940.png

 

Between the "Operation Start Date Time" and "Operation Completion Date Time" process data is submitted (below, 24 rows per event)...

 

Output Table 2

TheriaultAJ_2-1614187729192.png

I am trying to match the submission data (Output Table 2) to the event data (Output Table 1) , essentially the results from each event. While each "SFC" (widget) is connected to "Shop Order", they have multiple event data sets; ie SN12345 == Shop Order 98765, and may have multiple "DATE" submissions (Output Table 2), but each "DATE" submission carries a unique "GROUP_ID" per submission.

 

Main goal being to match Output Table 2 "NAME" and "VALUE" to "Reagent Type" in Output Table 1, to where I can then visualize comparison of reagents.

 

I hope I explained it well enough (clear as mud), and hope this is how this site works! 

 

Thanks in advance!

1 ACCEPTED SOLUTION

Accepted Solutions
txnelson
Super User

Re: Joining Two Sets of Data, between Dates, by Identifier.

I am not sure if this is exactly what you want.  The sample data does not provide really good matching based upon the Start and Completion time in your sample data.  Please run the below code on more complete data/  I have also attached the 2 example data tables you provided.

Names Default To Here( 1 );
dt1 = Data Table( "Output 1" );
dt2 = Data Table( "Output 2" );

// Loop through the Output 1 data table and set a unique id,
// and assign the same unique id to the Output 2 data rows
// that fall within the start and completion times
dt1 << New Column( "The Row", formula( Row() ) );
// Turn values to static
dt1:The Row << delete formula;

// Add a new column to the Output 2 data table that will be
// filled in below
dt2 << New Column( "The Row" );

For( i = 2, i <= N Rows( dt1 ), i++,
	foundRows = dt2 << get rows where(
		dt2:SFC == dt1:SFC[i] & dt2:SHOP_ORDER == dt1:Shop Order[i] & dt1:Operation Start Date Time[i] <= dt2
		:DATETIME & dt1:Operation Completion Date Time[i] >= dt2:DATETIME
	);
	If( N Rows( foundRows ) > 0,
		dt2:The Row[foundRows] = dt1:The Row[i];
	)
);

// Join the data
dt3 = dt1 << Join(
	With( dt2 ),
	By Matching Columns(
		:SFC = :SFC,
		:Shop Order = :SHOP_ORDER,
		:The Row = :The Row
	),
	Drop multiples( 0, 0 ),
	Include Nonmatches( 1, 0 ),
	Preserve main table order( 1 )
);

// Cleanup the data tables
// uncomment the 2 lines below to automate the cleanup
// dt1 << delete columns(:The Row );
// dt2 << delete columns(:The Row );
Jim

View solution in original post

3 REPLIES 3
jthi
Super User

Re: Joining Two Sets of Data, between Dates, by Identifier.

Could you provide example datasets for testing? If you have to you can Anonymize data by using JMPs Anonymize functionality found under Tables menu.I think you can anonymize everything except for times, SFC because those are needed for matching

-Jarmo
txnelson
Super User

Re: Joining Two Sets of Data, between Dates, by Identifier.

I am not sure if this is exactly what you want.  The sample data does not provide really good matching based upon the Start and Completion time in your sample data.  Please run the below code on more complete data/  I have also attached the 2 example data tables you provided.

Names Default To Here( 1 );
dt1 = Data Table( "Output 1" );
dt2 = Data Table( "Output 2" );

// Loop through the Output 1 data table and set a unique id,
// and assign the same unique id to the Output 2 data rows
// that fall within the start and completion times
dt1 << New Column( "The Row", formula( Row() ) );
// Turn values to static
dt1:The Row << delete formula;

// Add a new column to the Output 2 data table that will be
// filled in below
dt2 << New Column( "The Row" );

For( i = 2, i <= N Rows( dt1 ), i++,
	foundRows = dt2 << get rows where(
		dt2:SFC == dt1:SFC[i] & dt2:SHOP_ORDER == dt1:Shop Order[i] & dt1:Operation Start Date Time[i] <= dt2
		:DATETIME & dt1:Operation Completion Date Time[i] >= dt2:DATETIME
	);
	If( N Rows( foundRows ) > 0,
		dt2:The Row[foundRows] = dt1:The Row[i];
	)
);

// Join the data
dt3 = dt1 << Join(
	With( dt2 ),
	By Matching Columns(
		:SFC = :SFC,
		:Shop Order = :SHOP_ORDER,
		:The Row = :The Row
	),
	Drop multiples( 0, 0 ),
	Include Nonmatches( 1, 0 ),
	Preserve main table order( 1 )
);

// Cleanup the data tables
// uncomment the 2 lines below to automate the cleanup
// dt1 << delete columns(:The Row );
// dt2 << delete columns(:The Row );
Jim
TheriaultAJ
Level I

Re: Joining Two Sets of Data, between Dates, by Identifier.

What an incredible community! Thank you!