Solved: Re: Join Question: timestamp range, kinda sorta?

aliegner1 · Oct 5, 2016 11:12 AM

So I have a question about how to do a join to get rows matched up kinda sorta to their similar timestamps.

I've got two tables, one for defect data, one for wafer count through a tool. The timestamps don't match up and not every wafer measured in one table is recorded in the other.

Any tips on how to join this table for the ultimate goal of plotting defect count vs wafer count?

========

1.6million rows, ~6months of data

Table 1 - Defect Data

Defect data only measured on a small sample of wafers.

Time stamps only for the in/out, so a single time stamp for the whole lot

Table 2 - Wafer Count

individual wafer time stamps for a larger sampling of wafers.

unique time stamps for each wafer

vince_faller · Oct 5, 2016 11:33 PM

Graham Blatz actually talked about just this exact thing at the Advanced Scripting Session at the JMP conference this year. I have his code from a while ago but he actually improved upon it by using KDTable() which makes it insanely fast.

Here's a snippet of the function that does the most important bits.

dt2 << New Column( "Loc Min", numeric, continuous, formula( Row() ) ); //indexing, basically, to match/update upon
       list_big = Column( dt2, rt2 ) << get values; //creates a vector of values for the following
      
       dt1 << New Column( "Loc Min",
              numeric,
              continuous,
              formula(
                     Loc Min( Abs( Column( rt1 )[Row()] - list_big ) )
              )
       );   //simple, but fast.  Very fast.
      
       dt_join = dt1 << Join(
              With( dt2 ),
              By Matching Columns( :Loc Min = :Loc Min ),
              Drop multiples( 0, 1 ),
              Name( "Include non-matches" )(0, 0),
              Preserve main table order( 1 )
       );  // I opted for a join rather than an update, for now. may change in the future.



Here's the actual function as well that you can call and has a UI with it. 
nearest_neighbor = Function(
       //by Graham Blatz
       //2/20/15
       //Matches Nearest Neighbor of ostensibly dates, but whatever, man
       {dt1 = Current Data Table(),
              dt2 = Current Data Table(), rt1 = "ReadTime", rt2 = "ReadTime"   
       },
       {DEFAULT LOCAL},
       If( dt1 == dt2,   //default execution of a prompt, dunno a clean way around this without making some user camp inconvenienced.
             
              list_tables = {};
              For( i = 1, i <= N Table(), i++,
                     Insert Into( list_tables, Data Table( i ) << get name );
              );  //sets up names of data tables to pick from
              dt1 = Data Table( list_tables[1] );
              dt2 = Data Table( list_tables[2] ); //Reasonable Starting point

              win_tableselect = New Window( "Select Tables",
                     <<modal,
                     lup_box = Lineup Box( N Col( 2 ),
                           Text Box( "Main Table (i.e. SubID List)", <<set font size( 10 ) ),
                           Text Box( "Update-With Table (i.e. Tool Data)", <<set font size( 10 ) ),
                           c1 = Combo Box(
                                  list_tables
                                  ,
                                  s1 = c1 << getselected;
                                  dt1 = Data Table( s1 );
                                  vref1[1] << delete;
                                  vref1 << append( //Deletes and redraws
                                         col1 = Col List Box(
                                                Data Table( dt1 ),
                                                all,
                                                maxSelected( 1 ),
                                                rt1 = col1 << get selected
                                         )
                                  )
                                  ;
                           ),  //these parts change the following collistboxes and their fantastic syntax based upon the data table combo box selection
                           c2 = Combo Box(
                                  list_tables,
                                  <<set( 2 ),
                                  s2 = c2 << getselected;
                                  dt2 = Data Table( s2 );
                                  vref2[1] << delete;
                                  vref2 << append(
                                         col2 = Col List Box(
                                                Data Table( dt2 ),
                                                all,
                                                maxSelected( 1 ),
                                                rt2 = col2 << get selected
                                         )
                                  );
                           ),  //see above
                           vref1 = V List Box(
                                  col1 = Col List Box(
                                         Data Table( dt1 ),//still stoopdi
                                         all,
                                         maxSelected( 1 ),
                                         rt1 = col1 << get selected
                                  )
                           ),
                           vref2 = V List Box(
                                  col2 = Col List Box(
                                         Data Table( dt2 ),
                                         all,
                                         maxSelected( 1 ),
                                         rt2 = col2 << get selected
                                  )
                           ),    //the point of the vrefs is to have something to delete without messing with the lineupbox's proclivity for left-right ordering
                     ),
                     Text Box( "" ),
                     Hlistbox(
                                  Button Box( "Ok",
                                  s1 = c1 << getselected;
                                  s2 = c2 << getselected;
                                  dt1 = Data Table( s1 );
                                  dt2 = Data Table( s2 );
                                  rt1 = (col1 << get selected)[1];
                                  rt2 = (col2 << get selected)[1];
                           ),  //not everyone has the << return result option yet if they're not in 11
                           Button Box( "Cancel"),
                     ),
              );
              If(win_tableselect["Button"] == -1, Throw("Canceled by User"))
       );
      
       dt2 << New Column( "Loc Min", numeric, continuous, formula( Row() ) ); //indexing, basically, to match/update upon
       list_big = Column( dt2, rt2 ) << get values; //creates a vector of values for the following
      
       dt1 << New Column( "Loc Min",
              numeric,
              continuous,
              formula(
                     Loc Min( Abs( Column( rt1 )[Row()] - list_big ) )
              )
       );   //simple, but fast.  Very fast.
      
       dt_join = dt1 << Join(
              With( dt2 ),
              By Matching Columns( :Loc Min = :Loc Min ),
              Drop multiples( 0, 1 ),
              Name( "Include non-matches" )(0, 0),
              Preserve main table order( 1 )
       );  // I opted for a join rather than an update, for now. may change in the future.
      
       dt1 << delete columns("Loc Min");
       dt2 << delete columns("Loc Min");  //Removes pesky join column
       list_columns = dt_join << get column names( string );
      
       For( i = 1, i <= N Items( list_columns ), i++,
              If( Contains( list_columns, "Loc Min" ),
                     Column( dt_join, list_columns ) << set selected( 1 )
              );
       );  //the joined table will having lingering Loc Min-like columns.  This just hunts them down and removes them.
       dt_join << delete columns;
       dt_join << set name( Char( dt1 << get name ) || " & " || Char( dt2 << get name ) );  //Good enough
       return(dt_join);
);

Vince Faller - Predictum

View solution in original post

ian_jmp · Oct 5, 2016 01:57 PM

I assume that lot integrity is always maintained, but are the wafers ever shuffled between slots? Saying it differently, do row 4 of table 1 (L22, W10) and row 18 of table 2 (L22, W10) correspond to the same physical wafer?

aliegner1 · Oct 5, 2016 03:59 PM

Yes, sorry. Should have used WID. I can assume that that physical slot value stays with it. L11 Slot4 in table 1 is the same as L11 Slot4 in table 2.

I have several other matching columns I can use to increase the join accuracy, like operation#, process recipe, product type, etrc.

vince_faller · Oct 5, 2016 11:33 PM

Graham Blatz actually talked about just this exact thing at the Advanced Scripting Session at the JMP conference this year. I have his code from a while ago but he actually improved upon it by using KDTable() which makes it insanely fast.

Here's a snippet of the function that does the most important bits.

dt2 << New Column( "Loc Min", numeric, continuous, formula( Row() ) ); //indexing, basically, to match/update upon
       list_big = Column( dt2, rt2 ) << get values; //creates a vector of values for the following
      
       dt1 << New Column( "Loc Min",
              numeric,
              continuous,
              formula(
                     Loc Min( Abs( Column( rt1 )[Row()] - list_big ) )
              )
       );   //simple, but fast.  Very fast.
      
       dt_join = dt1 << Join(
              With( dt2 ),
              By Matching Columns( :Loc Min = :Loc Min ),
              Drop multiples( 0, 1 ),
              Name( "Include non-matches" )(0, 0),
              Preserve main table order( 1 )
       );  // I opted for a join rather than an update, for now. may change in the future.



Here's the actual function as well that you can call and has a UI with it. 
nearest_neighbor = Function(
       //by Graham Blatz
       //2/20/15
       //Matches Nearest Neighbor of ostensibly dates, but whatever, man
       {dt1 = Current Data Table(),
              dt2 = Current Data Table(), rt1 = "ReadTime", rt2 = "ReadTime"   
       },
       {DEFAULT LOCAL},
       If( dt1 == dt2,   //default execution of a prompt, dunno a clean way around this without making some user camp inconvenienced.
             
              list_tables = {};
              For( i = 1, i <= N Table(), i++,
                     Insert Into( list_tables, Data Table( i ) << get name );
              );  //sets up names of data tables to pick from
              dt1 = Data Table( list_tables[1] );
              dt2 = Data Table( list_tables[2] ); //Reasonable Starting point

              win_tableselect = New Window( "Select Tables",
                     <<modal,
                     lup_box = Lineup Box( N Col( 2 ),
                           Text Box( "Main Table (i.e. SubID List)", <<set font size( 10 ) ),
                           Text Box( "Update-With Table (i.e. Tool Data)", <<set font size( 10 ) ),
                           c1 = Combo Box(
                                  list_tables
                                  ,
                                  s1 = c1 << getselected;
                                  dt1 = Data Table( s1 );
                                  vref1[1] << delete;
                                  vref1 << append( //Deletes and redraws
                                         col1 = Col List Box(
                                                Data Table( dt1 ),
                                                all,
                                                maxSelected( 1 ),
                                                rt1 = col1 << get selected
                                         )
                                  )
                                  ;
                           ),  //these parts change the following collistboxes and their fantastic syntax based upon the data table combo box selection
                           c2 = Combo Box(
                                  list_tables,
                                  <<set( 2 ),
                                  s2 = c2 << getselected;
                                  dt2 = Data Table( s2 );
                                  vref2[1] << delete;
                                  vref2 << append(
                                         col2 = Col List Box(
                                                Data Table( dt2 ),
                                                all,
                                                maxSelected( 1 ),
                                                rt2 = col2 << get selected
                                         )
                                  );
                           ),  //see above
                           vref1 = V List Box(
                                  col1 = Col List Box(
                                         Data Table( dt1 ),//still stoopdi
                                         all,
                                         maxSelected( 1 ),
                                         rt1 = col1 << get selected
                                  )
                           ),
                           vref2 = V List Box(
                                  col2 = Col List Box(
                                         Data Table( dt2 ),
                                         all,
                                         maxSelected( 1 ),
                                         rt2 = col2 << get selected
                                  )
                           ),    //the point of the vrefs is to have something to delete without messing with the lineupbox's proclivity for left-right ordering
                     ),
                     Text Box( "" ),
                     Hlistbox(
                                  Button Box( "Ok",
                                  s1 = c1 << getselected;
                                  s2 = c2 << getselected;
                                  dt1 = Data Table( s1 );
                                  dt2 = Data Table( s2 );
                                  rt1 = (col1 << get selected)[1];
                                  rt2 = (col2 << get selected)[1];
                           ),  //not everyone has the << return result option yet if they're not in 11
                           Button Box( "Cancel"),
                     ),
              );
              If(win_tableselect["Button"] == -1, Throw("Canceled by User"))
       );
      
       dt2 << New Column( "Loc Min", numeric, continuous, formula( Row() ) ); //indexing, basically, to match/update upon
       list_big = Column( dt2, rt2 ) << get values; //creates a vector of values for the following
      
       dt1 << New Column( "Loc Min",
              numeric,
              continuous,
              formula(
                     Loc Min( Abs( Column( rt1 )[Row()] - list_big ) )
              )
       );   //simple, but fast.  Very fast.
      
       dt_join = dt1 << Join(
              With( dt2 ),
              By Matching Columns( :Loc Min = :Loc Min ),
              Drop multiples( 0, 1 ),
              Name( "Include non-matches" )(0, 0),
              Preserve main table order( 1 )
       );  // I opted for a join rather than an update, for now. may change in the future.
      
       dt1 << delete columns("Loc Min");
       dt2 << delete columns("Loc Min");  //Removes pesky join column
       list_columns = dt_join << get column names( string );
      
       For( i = 1, i <= N Items( list_columns ), i++,
              If( Contains( list_columns, "Loc Min" ),
                     Column( dt_join, list_columns ) << set selected( 1 )
              );
       );  //the joined table will having lingering Loc Min-like columns.  This just hunts them down and removes them.
       dt_join << delete columns;
       dt_join << set name( Char( dt1 << get name ) || " & " || Char( dt2 << get name ) );  //Good enough
       return(dt_join);
);

Vince Faller - Predictum

aliegner1 · Oct 6, 2016 12:07 PM

Awesome! I will need to poke through this...i'm still new to this and trying to limp along.

Are there sample data tables that this applies to that I can setup?

So creates a new column in both DT1 & DT2, "Loc Min"

then joins by matching the two columns in a new table

vince_faller · Oct 6, 2016 03:27 PM

So you create a column called whatever you want in both tables.

One Table, dt1, you create a row column that is just the row number. Then you get the values as a matrix for whatever column you're trying to join upon.

mat = Column(dt1, "Column with the information you want to join") << Get Values;

Then create a column in the other table (dt2) with the formula:

Loc Min( Abs( Column( dt2, "Other Join Column" )[Row()] - mat ) );

What this will do is give you the location (row) where the minimum difference between the current row in dt2 and the entire set of values for dt1.

Then you just join the two columns together.

Vince Faller - Predictum

Peng · Nov 27, 2018 08:21 PM

Thanks Vince for sharing the code! It is very helpful!

I am wondering is there a way to join two tables by nearest date/time and also group by something else, for example, Tool_ch in this case. I have been doing this by subseting the same elements in both tables, then join by nearest date, and merge everything together. I don't think this is an efficient way to do this. :(

Thanks~

Peng

UersK · Feb 8, 2023 12:35 AM

Thanks!