Solved: Re: Most efficient way to add information to rows by matching the time stamp fro...

Report Inappropriate Content · Jun 8, 2023 9:32 AM

I extracted Historian data at 5 second intervals over a long time period, ending up with a data table of many many rows. Initially in this data table I had two columns (call this table "2205 - 2209 PD3102" and I referenced to it as dtPDTag):

1. Date/time stamp

2. Data value (for the Historian tag)

In a different data table (dt in the below JSL syntax) I had the start and end time of productions for which I had the PO identifier and recipe symbol. (So four columns: 1. Start date/time, 2. End date/time, 3. PO number, 4. Recipe symbol.)

In dtPDTag I added two columns - PO and Symbol - into which I wanted to add the PO number and recipe symbol (based on the date/time stamp). The JSL code I used to fulfil this task is below, but I'm assuming this is a pretty inefficient way to manage this task. (My computer didn't seem to respond too healthily to the demand placed upon it!) Would be great to hear of better / more effective ways to approach such a task in the future.

By "most efficient" I refer to a more effective scripting method than the combination of two for loops and the one "if" condition seen in my JSL below.

Names Default To Here(1);

dt = Current Data Table();
nr = N Rows(dt); // show(nr);

lstPO = dt:PO << Get Values;
lstSymbol = dt:Symbol << Get Values;
lstStartTime = dt:StartTime << Get Values;
lstEndTime = dt:EndTime << Get Values; // show(lstPO,lstSymbol,lstStartTime,lstEndTime);

dtPDTag = data Table( "2205 - 2209 PD3102" );
nrPD = N Rows(dtPDTag);  //  show(nrPD);

For(j=1,j<=nr,j++,

 _start = lstStartTime[j]; _end = lstEndTime[j];  // show(_start,_end);
 _PO =lstPO[j]; _symbol = lstSymbol[j];

	For(i=1,i<=nrPD,i++,

		If( ( dtPDTag:DateTime[i] > lstStartTime[j] ) & ( dtPDTag:DateTime[i] <  lstEndTime[j] ),
			
			dtPDTag:PO[i] = lstPO[j];
			dtPDTag:Symbol[i] = lstSymbol[j];

		,
			

		);
		
	);

);

txnelson · Feb 28, 2023 07:59 AM

Here is my approach. If the data are sorted, then they can be passed through one time and the values needed can be found. Take a look at the JSL below for the general approach to take. The code is not tested, I do not have data tables to test it against. However, I have used this approach many times before, and it is an efficient approach. But be prepared to adjust it a bit.

Names Default To Here( 1 );

dt = Current Data Table();
nr = N Rows( dt );

dtPDTag = Data Table( "2205 - 2209 PD3102" );
nrPD = N Rows( dtPDTag );

// Sort both data tables by the timestamps to allow the tables to only 
// have to be read once

// Before sorting the data, if the tables need to be returned to the 
// original order then create a new column in each table that represents
// the current Row number, which then will allow the table to be sorted
// back to the original order

// dt << New Column( "_row_", set each value( Row() ));
// dtPDTag << New Column( "_row_", set each value( Row() ));

dt << sort( By( :StartTime, :EndTime ), order( ascending, ascending ), replace table( 1 ) );
dtPDTag << sort( By( :DateTime ), order( ascending ), replace table( 1 ) );

PDStart = 1; // Variable that keeps track of where in the dtPDTag table to start to find values

For( dtI = 1, dtI <= nr, dtI++,
	For( dtPDI = PDStart, dtPDI <= nrPD, dtPDI++,
		If( ( dtPDTag:DateTime[dtPDI] > dt:StartTime[dtI] ) & ( dtPDTag:DateTime[dtPDI] <  dt:EndTime ),
			dtPDTag:PO[dtPDI] = dt:PO[dtI];
			dtPDTag:Symbol[dtPDI] = dt:Symbol[dtI];
			PDStart = dtPDI;
			Break();
	)
);

// Resort to original if required and delete the _row_ column
// dt << sort( By( _row_ ), order( ascending ), replace table( 1 ) );
// dt << delete columns( :_row_ );
// dtPDTag << sort( By( _row_ ), order( ascending ), replace table( 1 ) );
// dtPDTag << delete columns( :_row_ );

Jim

View solution in original post

jthi · Feb 28, 2023 06:13 AM

There are quite a few of options to do this (especially with scripting). You might be able to build an SQL Query using JMP Query Builder under Tables menu (you can use data tables with it) in which you can define the ranges and how rows should be joined.

-Jarmo

txnelson · Feb 28, 2023 07:59 AM

Here is my approach. If the data are sorted, then they can be passed through one time and the values needed can be found. Take a look at the JSL below for the general approach to take. The code is not tested, I do not have data tables to test it against. However, I have used this approach many times before, and it is an efficient approach. But be prepared to adjust it a bit.

Names Default To Here( 1 );

dt = Current Data Table();
nr = N Rows( dt );

dtPDTag = Data Table( "2205 - 2209 PD3102" );
nrPD = N Rows( dtPDTag );

// Sort both data tables by the timestamps to allow the tables to only 
// have to be read once

// Before sorting the data, if the tables need to be returned to the 
// original order then create a new column in each table that represents
// the current Row number, which then will allow the table to be sorted
// back to the original order

// dt << New Column( "_row_", set each value( Row() ));
// dtPDTag << New Column( "_row_", set each value( Row() ));

dt << sort( By( :StartTime, :EndTime ), order( ascending, ascending ), replace table( 1 ) );
dtPDTag << sort( By( :DateTime ), order( ascending ), replace table( 1 ) );

PDStart = 1; // Variable that keeps track of where in the dtPDTag table to start to find values

For( dtI = 1, dtI <= nr, dtI++,
	For( dtPDI = PDStart, dtPDI <= nrPD, dtPDI++,
		If( ( dtPDTag:DateTime[dtPDI] > dt:StartTime[dtI] ) & ( dtPDTag:DateTime[dtPDI] <  dt:EndTime ),
			dtPDTag:PO[dtPDI] = dt:PO[dtI];
			dtPDTag:Symbol[dtPDI] = dt:Symbol[dtI];
			PDStart = dtPDI;
			Break();
	)
);

// Resort to original if required and delete the _row_ column
// dt << sort( By( _row_ ), order( ascending ), replace table( 1 ) );
// dt << delete columns( :_row_ );
// dtPDTag << sort( By( _row_ ), order( ascending ), replace table( 1 ) );
// dtPDTag << delete columns( :_row_ );

Jim

scottahindle · Mar 1, 2023 01:52 AM

Thanks Jim. To check on interpretation versus my not so efficient code:
- You introduce PDStart as a counter to avoid going through rows time and time again that have already been already been responded to
- You include Break(); to leave the loop as soon as possible to do so
In short, with these two additions you eliminate the redundancy that was inherent to my code?

txnelson · Mar 1, 2023 02:29 AM

Correct!

Jim

brady_brady · Mar 1, 2023 11:42 PM

Hi,

I realize I am late to this. I decided to respond because I had a similar need back when I worked in semi, and my data table was so large that even a single loop with a break did not work for me. If your tables are that large, then see if the following works for you. It is what I had to do. In the common case, when you have many timestamps during a single process step, it will save MUCH time. This is for 2 reasons: first, there are no comparisons. Second, it only iterates once for each process step, setting table rows to values in bulk. It does this by concatenating the recipe timestamp data directly onto the original table, working with the two together, and then removing the recipe data.

Names Default To Here( 1 );

//sample data table
dt1 = As Table( (1 :: 30)` || J( 30, 1, Random Uniform() ), <<column names( {"Time", "value"} ) );

//sample recipe lookup table
dt2 = As Table( (1 :: 26 :: 5)` || (5 :: 30 :: 5)`, <<column names( {"Begin", "End"} ) );
dt2 << New Column( "PO", character, <<set values( Words( "abcdef", "" ) ) );
dt2 << New Column( "recipe", character, <<set values( Words( "lmnopq", "" ) ) );


/////////////////////////////processing begins here

//rename the beginning timestamp to match the timestamp name in the other table
dt2:begin << set name( "Time" );
dt1 << concatenate( dt2, appendToFirstTable );
dt1 << sort( by( :time, :end ), order( ascending, descending ), replace table );

// get the rows where you change from timestamp band to timestamp band
indices = Loc( dt1:end << get values ) |/ (N Row( dt1 ) + 1);

// for each segment, set the values for OP and recipe to the value of the first row in the segment
For Each( { { beg, end }, i },
	across( indices[1 :: (N Row( indices ) - 1)], indices[2 :: N Row( indices )] - 1 ),
	dt1:PO[ beg :: end ] = dt1:PO[ beg ];
	dt1:recipe[ beg :: end ] = dt1:recipe[ beg ];
);

// clean up
dt1 << select where( !Is Missing( :end ) );
dt1 << delete rows;
dt1 << delete columns( "end" );

If you happen to see this, let me know how it goes.

Cheers,

Brady

scottahindle · Mar 24, 2023 10:43 AM

Hi Brady, many thanks for the code.
The code misses one part to do the job fully. It allocates the right PO number and recipe ID from the start of the run, but it doesn't stop when the PO is over ... meaning the recipe ID and PO continue be put in rows until the start of the next PO, hence a kind of "over-run".
I have thousands of rows in my data table, but say there were 10 time periods from the beginning of one PO until the next, but the first PO went from time period 1 to 7, I'd want periods 8 and 9 to be "empty", hence deleted because there was no production at that time, and then continue from time period 10 with the next PO.
Hope the explanation makes sense.

Most efficient way to add information to rows by matching the time stamp from two data tables?

Re: Most efficient way to add information to rows by matching the time stamp from two data tables?

Re: Most efficient way to add information to rows by matching the time stamp from two data tables

Re: Most efficient way to add information to rows by matching the time stamp from two data tables?

Re: Most efficient way to add information to rows by matching the time stamp from two data tables?

Re: Most efficient way to add information to rows by matching the time stamp from two data tables?

Re: Most efficient way to add information to rows by matching the time stamp from two data tables?

Re: Most efficient way to add information to rows by matching the time stamp from two data tables?