Subscribe Bookmark RSS Feed

Join Question: timestamp range, kinda sorta?

aliegner1

New Contributor

Joined:

Oct 5, 2016

So I have a question about how to do a join to get rows matched up kinda sorta to their similar timestamps. 

I've got two tables, one for defect data, one for wafer count through a tool.  The timestamps don't match up and not every wafer measured in one table is recorded in the other. 

Any tips on how to join this table for the ultimate goal of plotting defect count vs wafer count?

========

1.6million rows, ~6months of data

Table 1 - Defect Data

Defect data only measured on a small sample of wafers. 

Time stamps only for the in/out, so a single time stamp for the whole lot


12896_pastedImage_3.png

Table 2 - Wafer Count

individual wafer time stamps for a larger sampling of wafers.

unique time stamps for each wafer

12895_pastedImage_2.png

1 ACCEPTED SOLUTION

Accepted Solutions
vince_faller

Super User

Joined:

Mar 17, 2015

Solution

Graham Blatz actually talked about just this exact thing at the Advanced Scripting Session at the JMP conference this year.  I have his code from a while ago but he actually improved upon it by using KDTable() which makes it insanely fast. 

Here's a snippet of the function that does the most important bits.

dt2 << New Column( "Loc Min", numeric, continuous, formula( Row() ) ); //indexing, basically, to match/update upon

       list_big = Column( dt2, rt2 ) << get values; //creates a vector of values for the following

      

       dt1 << New Column( "Loc Min",

              numeric,

              continuous,

              formula(

                     Loc Min( Abs( Column( rt1 )[Row()] - list_big ) )

              )

       );   //simple, but fast.  Very fast.

      

       dt_join = dt1 << Join(

              With( dt2 ),

              By Matching Columns( :Loc Min = :Loc Min ),

              Drop multiples( 0, 1 ),

              Name( "Include non-matches" )(0, 0),

              Preserve main table order( 1 )

       );  // I opted for a join rather than an update, for now. may change in the future.




Here's the actual function as well that you can call and has a UI with it. 

nearest_neighbor = Function(

       //by Graham Blatz

       //2/20/15

       //Matches Nearest Neighbor of ostensibly dates, but whatever, man

       {dt1 = Current Data Table(),

              dt2 = Current Data Table(), rt1 = "ReadTime", rt2 = "ReadTime"   

       },

       {DEFAULT LOCAL},

       If( dt1 == dt2,   //default execution of a prompt, dunno a clean way around this without making some user camp inconvenienced.

             

              list_tables = {};

              For( i = 1, i <= N Table(), i++,

                     Insert Into( list_tables, Data Table( i ) << get name );

              );  //sets up names of data tables to pick from

              dt1 = Data Table( list_tables[1] );

              dt2 = Data Table( list_tables[2] ); //Reasonable Starting point

              win_tableselect = New Window( "Select Tables",

                     <<modal,

                     lup_box = Lineup Box( N Col( 2 ),

                           Text Box( "Main Table (i.e. SubID List)", <<set font size( 10 ) ),

                           Text Box( "Update-With Table (i.e. Tool Data)", <<set font size( 10 ) ),

                           c1 = Combo Box(

                                  list_tables

                                  ,

                                  s1 = c1 << getselected;

                                  dt1 = Data Table( s1 );

                                  vref1[1] << delete;

                                  vref1 << append( //Deletes and redraws

                                         col1 = Col List Box(

                                                Data Table( dt1 ),

                                                all,

                                                maxSelected( 1 ),

                                                rt1 = col1 << get selected

                                         )

                                  )

                                  ;

                           ),  //these parts change the following collistboxes and their fantastic syntax based upon the data table combo box selection

                           c2 = Combo Box(

                                  list_tables,

                                  <<set( 2 ),

                                  s2 = c2 << getselected;

                                  dt2 = Data Table( s2 );

                                  vref2[1] << delete;

                                  vref2 << append(

                                         col2 = Col List Box(

                                                Data Table( dt2 ),

                                                all,

                                                maxSelected( 1 ),

                                                rt2 = col2 << get selected

                                         )

                                  );

                           ),  //see above

                           vref1 = V List Box(

                                  col1 = Col List Box(

                                         Data Table( dt1 ),//still stoopdi

                                         all,

                                         maxSelected( 1 ),

                                         rt1 = col1 << get selected

                                  )

                           ),

                           vref2 = V List Box(

                                  col2 = Col List Box(

                                         Data Table( dt2 ),

                                         all,

                                         maxSelected( 1 ),

                                         rt2 = col2 << get selected

                                  )

                           ),    //the point of the vrefs is to have something to delete without messing with the lineupbox's proclivity for left-right ordering

                     ),

                     Text Box( "" ),

                     Hlistbox(

                                  Button Box( "Ok",

                                  s1 = c1 << getselected;

                                  s2 = c2 << getselected;

                                  dt1 = Data Table( s1 );

                                  dt2 = Data Table( s2 );

                                  rt1 = (col1 << get selected)[1];

                                  rt2 = (col2 << get selected)[1];

                           ),  //not everyone has the << return result option yet if they're not in 11

                           Button Box( "Cancel"),

                     ),

              );

              If(win_tableselect["Button"] == -1, Throw("Canceled by User"))

       );

      

       dt2 << New Column( "Loc Min", numeric, continuous, formula( Row() ) ); //indexing, basically, to match/update upon

       list_big = Column( dt2, rt2 ) << get values; //creates a vector of values for the following

      

       dt1 << New Column( "Loc Min",

              numeric,

              continuous,

              formula(

                     Loc Min( Abs( Column( rt1 )[Row()] - list_big ) )

              )

       );   //simple, but fast.  Very fast.

      

       dt_join = dt1 << Join(

              With( dt2 ),

              By Matching Columns( :Loc Min = :Loc Min ),

              Drop multiples( 0, 1 ),

              Name( "Include non-matches" )(0, 0),

              Preserve main table order( 1 )

       );  // I opted for a join rather than an update, for now. may change in the future.

      

       dt1 << delete columns("Loc Min");

       dt2 << delete columns("Loc Min");  //Removes pesky join column

       list_columns = dt_join << get column names( string );

      

       For( i = 1, i <= N Items( list_columns ), i++,

              If( Contains( list_columns, "Loc Min" ),

                     Column( dt_join, list_columns ) << set selected( 1 )

              );

       );  //the joined table will having lingering Loc Min-like columns.  This just hunts them down and removes them.

       dt_join << delete columns;

       dt_join << set name( Char( dt1 << get name ) || " & " || Char( dt2 << get name ) );  //Good enough

       return(dt_join);

);

5 REPLIES
ian_jmp

Staff

Joined:

Jun 23, 2011

I assume that lot integrity is always maintained, but are the wafers ever shuffled between slots? Saying it differently, do row 4 of table 1 (L22, W10) and row 18 of table 2 (L22, W10) correspond to the same physical wafer?

aliegner1

New Contributor

Joined:

Oct 5, 2016

Yes, sorry.  Should have used WID.  I can assume that that physical slot value stays with it.  L11 Slot4 in table 1 is the same as L11 Slot4 in table 2.

I have several other matching columns I can use to increase the join accuracy, like operation#, process recipe, product type, etrc.

vince_faller

Super User

Joined:

Mar 17, 2015

Solution

Graham Blatz actually talked about just this exact thing at the Advanced Scripting Session at the JMP conference this year.  I have his code from a while ago but he actually improved upon it by using KDTable() which makes it insanely fast. 

Here's a snippet of the function that does the most important bits.

dt2 << New Column( "Loc Min", numeric, continuous, formula( Row() ) ); //indexing, basically, to match/update upon

       list_big = Column( dt2, rt2 ) << get values; //creates a vector of values for the following

      

       dt1 << New Column( "Loc Min",

              numeric,

              continuous,

              formula(

                     Loc Min( Abs( Column( rt1 )[Row()] - list_big ) )

              )

       );   //simple, but fast.  Very fast.

      

       dt_join = dt1 << Join(

              With( dt2 ),

              By Matching Columns( :Loc Min = :Loc Min ),

              Drop multiples( 0, 1 ),

              Name( "Include non-matches" )(0, 0),

              Preserve main table order( 1 )

       );  // I opted for a join rather than an update, for now. may change in the future.




Here's the actual function as well that you can call and has a UI with it. 

nearest_neighbor = Function(

       //by Graham Blatz

       //2/20/15

       //Matches Nearest Neighbor of ostensibly dates, but whatever, man

       {dt1 = Current Data Table(),

              dt2 = Current Data Table(), rt1 = "ReadTime", rt2 = "ReadTime"   

       },

       {DEFAULT LOCAL},

       If( dt1 == dt2,   //default execution of a prompt, dunno a clean way around this without making some user camp inconvenienced.

             

              list_tables = {};

              For( i = 1, i <= N Table(), i++,

                     Insert Into( list_tables, Data Table( i ) << get name );

              );  //sets up names of data tables to pick from

              dt1 = Data Table( list_tables[1] );

              dt2 = Data Table( list_tables[2] ); //Reasonable Starting point

              win_tableselect = New Window( "Select Tables",

                     <<modal,

                     lup_box = Lineup Box( N Col( 2 ),

                           Text Box( "Main Table (i.e. SubID List)", <<set font size( 10 ) ),

                           Text Box( "Update-With Table (i.e. Tool Data)", <<set font size( 10 ) ),

                           c1 = Combo Box(

                                  list_tables

                                  ,

                                  s1 = c1 << getselected;

                                  dt1 = Data Table( s1 );

                                  vref1[1] << delete;

                                  vref1 << append( //Deletes and redraws

                                         col1 = Col List Box(

                                                Data Table( dt1 ),

                                                all,

                                                maxSelected( 1 ),

                                                rt1 = col1 << get selected

                                         )

                                  )

                                  ;

                           ),  //these parts change the following collistboxes and their fantastic syntax based upon the data table combo box selection

                           c2 = Combo Box(

                                  list_tables,

                                  <<set( 2 ),

                                  s2 = c2 << getselected;

                                  dt2 = Data Table( s2 );

                                  vref2[1] << delete;

                                  vref2 << append(

                                         col2 = Col List Box(

                                                Data Table( dt2 ),

                                                all,

                                                maxSelected( 1 ),

                                                rt2 = col2 << get selected

                                         )

                                  );

                           ),  //see above

                           vref1 = V List Box(

                                  col1 = Col List Box(

                                         Data Table( dt1 ),//still stoopdi

                                         all,

                                         maxSelected( 1 ),

                                         rt1 = col1 << get selected

                                  )

                           ),

                           vref2 = V List Box(

                                  col2 = Col List Box(

                                         Data Table( dt2 ),

                                         all,

                                         maxSelected( 1 ),

                                         rt2 = col2 << get selected

                                  )

                           ),    //the point of the vrefs is to have something to delete without messing with the lineupbox's proclivity for left-right ordering

                     ),

                     Text Box( "" ),

                     Hlistbox(

                                  Button Box( "Ok",

                                  s1 = c1 << getselected;

                                  s2 = c2 << getselected;

                                  dt1 = Data Table( s1 );

                                  dt2 = Data Table( s2 );

                                  rt1 = (col1 << get selected)[1];

                                  rt2 = (col2 << get selected)[1];

                           ),  //not everyone has the << return result option yet if they're not in 11

                           Button Box( "Cancel"),

                     ),

              );

              If(win_tableselect["Button"] == -1, Throw("Canceled by User"))

       );

      

       dt2 << New Column( "Loc Min", numeric, continuous, formula( Row() ) ); //indexing, basically, to match/update upon

       list_big = Column( dt2, rt2 ) << get values; //creates a vector of values for the following

      

       dt1 << New Column( "Loc Min",

              numeric,

              continuous,

              formula(

                     Loc Min( Abs( Column( rt1 )[Row()] - list_big ) )

              )

       );   //simple, but fast.  Very fast.

      

       dt_join = dt1 << Join(

              With( dt2 ),

              By Matching Columns( :Loc Min = :Loc Min ),

              Drop multiples( 0, 1 ),

              Name( "Include non-matches" )(0, 0),

              Preserve main table order( 1 )

       );  // I opted for a join rather than an update, for now. may change in the future.

      

       dt1 << delete columns("Loc Min");

       dt2 << delete columns("Loc Min");  //Removes pesky join column

       list_columns = dt_join << get column names( string );

      

       For( i = 1, i <= N Items( list_columns ), i++,

              If( Contains( list_columns, "Loc Min" ),

                     Column( dt_join, list_columns ) << set selected( 1 )

              );

       );  //the joined table will having lingering Loc Min-like columns.  This just hunts them down and removes them.

       dt_join << delete columns;

       dt_join << set name( Char( dt1 << get name ) || " & " || Char( dt2 << get name ) );  //Good enough

       return(dt_join);

);

aliegner1

New Contributor

Joined:

Oct 5, 2016

Awesome!  I will need to poke through this...i'm still new to this and trying to limp along.

Are there sample data tables that this applies to that I can setup?

So creates a new column in both DT1 & DT2, "Loc Min"

then joins by matching the two columns in a new table

vince_faller

Super User

Joined:

Mar 17, 2015

So you create a column called whatever you want in both tables. 

One Table, dt1, you create a row column that is just the row number.  Then you get the values as a matrix for whatever column you're trying to join upon.

mat = Column(dt1, "Column with the information you want to join") << Get Values;

Then create a column in the other table (dt2) with the formula:

Loc Min( Abs( Column( dt2, "Other Join Column" )[Row()] - mat ) );

What this will do is give you the location (row) where the minimum difference between the current row in dt2 and the entire set of values for dt1.


Then you just join the two columns together.