cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar

Option or function to evaluate a custom column formula at once, or cache values between rows

Some built in column formulas like Col Mean() are very fast because they only calculate expensive values once and return or modify them for each row.  With custom formulas this is harder to achieve because each row is evaluated separately.  Sometimes using a separate column to store intermediate values can speed things up (for example storing the row indexes that meet a certain criteria, instead of re-calculating them every row), but that gets messy and hard to follow. If there were some concept of a column context such that the evaluation of one row could access intermediate values calculated in the previous row, then a script could calculate results for all rows at once and then quickly return them and pass the intermediate results to the next row. The formula would also need access to the whole column instead of just the current row, so the column reference should be passed unevaluated.

16 Comments
SamGardner
Level VII
Status changed to: Needs Info

@ih would you be able to share an example of a data table that illustrates the problem you describe?  

jthi
Super User

I think this post might demonstrate the problemCreating statistical custom (formula) function which has byVar 

ih
Super User (Alumni)
Super User (Alumni)

Here are a couple of potential use cases, I know there are others that I'm forgetting right now.  Many of these can be solved easily for small tables, or by using multiple columns or table variables, but providing a column context variable would make them easier.

  • Use a random value in the column formula so it can be swapped out in a simulation
    // this value should be cached and re-used for all rows,
    // but it should change each time the column is recalculated
    startofweek = Random Uniform(
    	Col Min( :Timestamp ),
    	Col Max( :Timestamp ) - In Days( 7 )
    );
    If( :Timestamp > startofweek & :Timestamp < startofweek + In Days( 7 ),
    	1,
    	0
    );
  • The number of rows since the start of the last out of spec sample.
  • The time until this campaign ends
  • Moving average of the last 10 samples for this product code
mia_stephens
Staff
Status changed to: Investigating
 
mia_stephens
Staff

Thank you for the example and additional information @ih. Changed the status to investigating.

XanGregg
Staff

Have you seen the As Constant(x) function? That should handle the only-once part of the request.

 

Local(
	{startofweek = As Constant(
		Random Uniform(
			Col Min( :Timestamp ),
			Col Max( :Timestamp ) - In Days( 7 )
		)
	)},
	If( :timestamp > startofweek & :timestamp < startofweek + In Days( 7 ),
		1,
		0
	)
)
ih
Super User (Alumni)
Super User (Alumni)

Hi @XanGregg,

 

I could be missing something but I could not seem to get As Constant to re-evaluate when the table formulas are ran again.  It seems to (probably correctly) cache its value as long as the table is open. Here is a more complete examples which solves this problem using a global variable.

View more...
here:dt = New Table( "Single Evaluation in Column Formula Example",
	Add Rows( 1000 ),
	New Column( "Timestamp",
		Numeric,
		"Continuous",
		Format( "ddMonyyyy h:m:s", 22, 0 ),
		Input Format( "ddMonyyyy h:m:s", 0 ),
		Formula( Today() - (1000 + In Days( Row() )) )
	),
	New Column( "Local With As Constant",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Formula(
			Local( {startofweek},
				startofweek = As Constant(
					Random Uniform(
						Col Min( :Timestamp ),
						Col Max( :Timestamp ) - In Days( 7 )
					)
				);
				If(
					:Timestamp > startofweek & :Timestamp < startofweek
					+In Days( 7 ),
					1,
					0
				);
			)
		)
	),
	New Column( "Global With Evaluation in Row 1",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Formula(
			If( Row() == 1,
				::: startofweek = Random Uniform(
					Col Min( :Timestamp ),
					Col Max( :Timestamp ) - In Days( 7 )
				)
			);
			If(
				:Timestamp > ::: startofweek & :Timestamp < ::: startofweek
				+In Days( 7 ),
				1,
				0
			);
		)
	)
);

here:dt << Graph Builder(
	Size( 833, 456 ),
	Show Control Panel( 0 ),
	Variables(
		X( :Timestamp ),
		Y( :Local With As Constant ),
		Y( :Global With Evaluation in Row 1 )
	),
	Elements( Position( 1, 1 ), Line( X, Y, Legend( 7 ) ) ),
	Elements( Position( 1, 2 ), Line( X, Y, Legend( 6 ) ) )
);

for each( {i}, 1::10, 
	wait(1);
	here:dt << Rerun Formulas;
);
XanGregg
Staff

Yes, I noticed that, too, but sure if you were using it that way. It seems to be taking the "constant" part too seriously for what I would want. We'll have to investigate, but it likely needs new functionality for the dependency/rerun awareness.

For cases like yours, I normally make the formula depend on a table variable, and update the table variable each time through the loop to have a new random value.

ih
Super User (Alumni)
Super User (Alumni)

Another example: writing a custom smoother function like a Savitzky–Golay filter that ignores excluded rows.  This can be efficiently calculated for the whole column at once when the first row evaluates, I would like to cache the values for the whole column on the first evaluation and then just return the pre-calculated result in each row instead of re-calculating it every time.

SamGardner
Level VII

Investigation of this issue is on-going, and will update you if we arrive at a solution.