Our World Statistics Day conversations have been a great reminder of how much statistics can inform our lives. Do you have an example of how statistics has made a difference in your life? Share your story with the Community!
Choose Language Hide Translation Bar
Highlighted
Level II

## Join Question: timestamp range, kinda sorta?

So I have a question about how to do a join to get rows matched up kinda sorta to their similar timestamps.

I've got two tables, one for defect data, one for wafer count through a tool.  The timestamps don't match up and not every wafer measured in one table is recorded in the other.

Any tips on how to join this table for the ultimate goal of plotting defect count vs wafer count?

========

1.6million rows, ~6months of data

Table 1 - Defect Data

Defect data only measured on a small sample of wafers.

Time stamps only for the in/out, so a single time stamp for the whole lot

Table 2 - Wafer Count

individual wafer time stamps for a larger sampling of wafers.

unique time stamps for each wafer

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted
Super User

## Re: Join Question: timestamp range, kinda sorta?

Graham Blatz actually talked about just this exact thing at the Advanced Scripting Session at the JMP conference this year.  I have his code from a while ago but he actually improved upon it by using KDTable() which makes it insanely fast.

Here's a snippet of the function that does the most important bits.

``````dt2 << New Column( "Loc Min", numeric, continuous, formula( Row() ) ); //indexing, basically, to match/update upon
list_big = Column( dt2, rt2 ) << get values; //creates a vector of values for the following

dt1 << New Column( "Loc Min",
numeric,
continuous,
formula(
Loc Min( Abs( Column( rt1 )[Row()] - list_big ) )
)
);   //simple, but fast.  Very fast.

dt_join = dt1 << Join(
With( dt2 ),
By Matching Columns( :Loc Min = :Loc Min ),
Drop multiples( 0, 1 ),
Name( "Include non-matches" )(0, 0),
Preserve main table order( 1 )
);  // I opted for a join rather than an update, for now. may change in the future.

Here's the actual function as well that you can call and has a UI with it.
nearest_neighbor = Function(
//by Graham Blatz
//2/20/15
//Matches Nearest Neighbor of ostensibly dates, but whatever, man
{dt1 = Current Data Table(),
},
{DEFAULT LOCAL},
If( dt1 == dt2,   //default execution of a prompt, dunno a clean way around this without making some user camp inconvenienced.

list_tables = {};
For( i = 1, i <= N Table(), i++,
Insert Into( list_tables, Data Table( i ) << get name );
);  //sets up names of data tables to pick from
dt1 = Data Table( list_tables[1] );
dt2 = Data Table( list_tables[2] ); //Reasonable Starting point

win_tableselect = New Window( "Select Tables",
<<modal,
lup_box = Lineup Box( N Col( 2 ),
Text Box( "Main Table (i.e. SubID List)", <<set font size( 10 ) ),
Text Box( "Update-With Table (i.e. Tool Data)", <<set font size( 10 ) ),
c1 = Combo Box(
list_tables
,
s1 = c1 << getselected;
dt1 = Data Table( s1 );
vref1[1] << delete;
vref1 << append( //Deletes and redraws
col1 = Col List Box(
Data Table( dt1 ),
all,
maxSelected( 1 ),
rt1 = col1 << get selected
)
)
;
),  //these parts change the following collistboxes and their fantastic syntax based upon the data table combo box selection
c2 = Combo Box(
list_tables,
<<set( 2 ),
s2 = c2 << getselected;
dt2 = Data Table( s2 );
vref2[1] << delete;
vref2 << append(
col2 = Col List Box(
Data Table( dt2 ),
all,
maxSelected( 1 ),
rt2 = col2 << get selected
)
);
),  //see above
vref1 = V List Box(
col1 = Col List Box(
Data Table( dt1 ),//still stoopdi
all,
maxSelected( 1 ),
rt1 = col1 << get selected
)
),
vref2 = V List Box(
col2 = Col List Box(
Data Table( dt2 ),
all,
maxSelected( 1 ),
rt2 = col2 << get selected
)
),    //the point of the vrefs is to have something to delete without messing with the lineupbox's proclivity for left-right ordering
),
Text Box( "" ),
Hlistbox(
Button Box( "Ok",
s1 = c1 << getselected;
s2 = c2 << getselected;
dt1 = Data Table( s1 );
dt2 = Data Table( s2 );
rt1 = (col1 << get selected)[1];
rt2 = (col2 << get selected)[1];
),  //not everyone has the << return result option yet if they're not in 11
Button Box( "Cancel"),
),
);
If(win_tableselect["Button"] == -1, Throw("Canceled by User"))
);

dt2 << New Column( "Loc Min", numeric, continuous, formula( Row() ) ); //indexing, basically, to match/update upon
list_big = Column( dt2, rt2 ) << get values; //creates a vector of values for the following

dt1 << New Column( "Loc Min",
numeric,
continuous,
formula(
Loc Min( Abs( Column( rt1 )[Row()] - list_big ) )
)
);   //simple, but fast.  Very fast.

dt_join = dt1 << Join(
With( dt2 ),
By Matching Columns( :Loc Min = :Loc Min ),
Drop multiples( 0, 1 ),
Name( "Include non-matches" )(0, 0),
Preserve main table order( 1 )
);  // I opted for a join rather than an update, for now. may change in the future.

dt1 << delete columns("Loc Min");
dt2 << delete columns("Loc Min");  //Removes pesky join column
list_columns = dt_join << get column names( string );

For( i = 1, i <= N Items( list_columns ), i++,
If( Contains( list_columns, "Loc Min" ),
Column( dt_join, list_columns ) << set selected( 1 )
);
);  //the joined table will having lingering Loc Min-like columns.  This just hunts them down and removes them.
dt_join << delete columns;
dt_join << set name( Char( dt1 << get name ) || " & " || Char( dt2 << get name ) );  //Good enough
return(dt_join);
);``````

Vince Faller - Predictum
6 REPLIES 6
Highlighted
Staff

## Re: Join Question: timestamp range, kinda sorta?

I assume that lot integrity is always maintained, but are the wafers ever shuffled between slots? Saying it differently, do row 4 of table 1 (L22, W10) and row 18 of table 2 (L22, W10) correspond to the same physical wafer?

Highlighted
Level II

## Re: Join Question: timestamp range, kinda sorta?

Yes, sorry.  Should have used WID.  I can assume that that physical slot value stays with it.  L11 Slot4 in table 1 is the same as L11 Slot4 in table 2.

I have several other matching columns I can use to increase the join accuracy, like operation#, process recipe, product type, etrc.

Highlighted
Super User

## Re: Join Question: timestamp range, kinda sorta?

Graham Blatz actually talked about just this exact thing at the Advanced Scripting Session at the JMP conference this year.  I have his code from a while ago but he actually improved upon it by using KDTable() which makes it insanely fast.

Here's a snippet of the function that does the most important bits.

``````dt2 << New Column( "Loc Min", numeric, continuous, formula( Row() ) ); //indexing, basically, to match/update upon
list_big = Column( dt2, rt2 ) << get values; //creates a vector of values for the following

dt1 << New Column( "Loc Min",
numeric,
continuous,
formula(
Loc Min( Abs( Column( rt1 )[Row()] - list_big ) )
)
);   //simple, but fast.  Very fast.

dt_join = dt1 << Join(
With( dt2 ),
By Matching Columns( :Loc Min = :Loc Min ),
Drop multiples( 0, 1 ),
Name( "Include non-matches" )(0, 0),
Preserve main table order( 1 )
);  // I opted for a join rather than an update, for now. may change in the future.

Here's the actual function as well that you can call and has a UI with it.
nearest_neighbor = Function(
//by Graham Blatz
//2/20/15
//Matches Nearest Neighbor of ostensibly dates, but whatever, man
{dt1 = Current Data Table(),
},
{DEFAULT LOCAL},
If( dt1 == dt2,   //default execution of a prompt, dunno a clean way around this without making some user camp inconvenienced.

list_tables = {};
For( i = 1, i <= N Table(), i++,
Insert Into( list_tables, Data Table( i ) << get name );
);  //sets up names of data tables to pick from
dt1 = Data Table( list_tables[1] );
dt2 = Data Table( list_tables[2] ); //Reasonable Starting point

win_tableselect = New Window( "Select Tables",
<<modal,
lup_box = Lineup Box( N Col( 2 ),
Text Box( "Main Table (i.e. SubID List)", <<set font size( 10 ) ),
Text Box( "Update-With Table (i.e. Tool Data)", <<set font size( 10 ) ),
c1 = Combo Box(
list_tables
,
s1 = c1 << getselected;
dt1 = Data Table( s1 );
vref1[1] << delete;
vref1 << append( //Deletes and redraws
col1 = Col List Box(
Data Table( dt1 ),
all,
maxSelected( 1 ),
rt1 = col1 << get selected
)
)
;
),  //these parts change the following collistboxes and their fantastic syntax based upon the data table combo box selection
c2 = Combo Box(
list_tables,
<<set( 2 ),
s2 = c2 << getselected;
dt2 = Data Table( s2 );
vref2[1] << delete;
vref2 << append(
col2 = Col List Box(
Data Table( dt2 ),
all,
maxSelected( 1 ),
rt2 = col2 << get selected
)
);
),  //see above
vref1 = V List Box(
col1 = Col List Box(
Data Table( dt1 ),//still stoopdi
all,
maxSelected( 1 ),
rt1 = col1 << get selected
)
),
vref2 = V List Box(
col2 = Col List Box(
Data Table( dt2 ),
all,
maxSelected( 1 ),
rt2 = col2 << get selected
)
),    //the point of the vrefs is to have something to delete without messing with the lineupbox's proclivity for left-right ordering
),
Text Box( "" ),
Hlistbox(
Button Box( "Ok",
s1 = c1 << getselected;
s2 = c2 << getselected;
dt1 = Data Table( s1 );
dt2 = Data Table( s2 );
rt1 = (col1 << get selected)[1];
rt2 = (col2 << get selected)[1];
),  //not everyone has the << return result option yet if they're not in 11
Button Box( "Cancel"),
),
);
If(win_tableselect["Button"] == -1, Throw("Canceled by User"))
);

dt2 << New Column( "Loc Min", numeric, continuous, formula( Row() ) ); //indexing, basically, to match/update upon
list_big = Column( dt2, rt2 ) << get values; //creates a vector of values for the following

dt1 << New Column( "Loc Min",
numeric,
continuous,
formula(
Loc Min( Abs( Column( rt1 )[Row()] - list_big ) )
)
);   //simple, but fast.  Very fast.

dt_join = dt1 << Join(
With( dt2 ),
By Matching Columns( :Loc Min = :Loc Min ),
Drop multiples( 0, 1 ),
Name( "Include non-matches" )(0, 0),
Preserve main table order( 1 )
);  // I opted for a join rather than an update, for now. may change in the future.

dt1 << delete columns("Loc Min");
dt2 << delete columns("Loc Min");  //Removes pesky join column
list_columns = dt_join << get column names( string );

For( i = 1, i <= N Items( list_columns ), i++,
If( Contains( list_columns, "Loc Min" ),
Column( dt_join, list_columns ) << set selected( 1 )
);
);  //the joined table will having lingering Loc Min-like columns.  This just hunts them down and removes them.
dt_join << delete columns;
dt_join << set name( Char( dt1 << get name ) || " & " || Char( dt2 << get name ) );  //Good enough
return(dt_join);
);``````

Vince Faller - Predictum
Highlighted
Level II

## Re: Join Question: timestamp range, kinda sorta?

Awesome!  I will need to poke through this...i'm still new to this and trying to limp along.

Are there sample data tables that this applies to that I can setup?

So creates a new column in both DT1 & DT2, "Loc Min"

then joins by matching the two columns in a new table

Highlighted
Super User

## Re: Join Question: timestamp range, kinda sorta?

So you create a column called whatever you want in both tables.

One Table, dt1, you create a row column that is just the row number.  Then you get the values as a matrix for whatever column you're trying to join upon.

``mat = Column(dt1, "Column with the information you want to join") << Get Values;``

Then create a column in the other table (dt2) with the formula:

``````Loc Min( Abs( Column( dt2, "Other Join Column" )[Row()] - mat ) );
``````

What this will do is give you the location (row) where the minimum difference between the current row in dt2 and the entire set of values for dt1.

Then you just join the two columns together.

Vince Faller - Predictum
Highlighted
Level I

## Re: Join Question: timestamp range, kinda sorta?

Thanks Vince for sharing the code! It is very helpful!

I am wondering is there a way to join two tables by nearest date/time and also group by something else, for example, Tool_ch in this case. I have been doing this by subseting the same elements in both tables, then join by nearest date, and merge everything together. I don't think this is an efficient way to do this. :(

Thanks~

Peng

Article Labels

There are no labels assigned to this post.