How to optimize formula columns for faster recalculation speed?

Chris_Rodrigues · Nov 5, 2019 03:32 PM

I have a fairly large JMP data table (132k rows, 2860 columns) which also has a large number of formula columns (200+). I regularly import new data into this file, so I want to keep all of the formula columns "live" so they will auto-populate when new data is added. I've started to run into issues with the speed of formula recalculation.

At first the formulas were mostly simple arithmetic (addition/subtraction/multiplication of other column values) and speed was not an issue. However, I've been starting to add a few formulas with greater complexity and this has drastically increased the time it takes to recalculate all of the formulas in this table.

I use an OnOpen script to automatically suppress formula eval (since the Preferences option "Suppress Formula Eval on Open" does not work) and I use a Data Import script to bring in the new data from Excel, evaluate formulas, and then re-suppress formula eval. It has gotten to the point where it takes 15 minutes to complete the import script, and JMP is completely locked up and unresponsive that entire time. I can pinpoint a few specific formula columns that take the majority of that 15 minutes.

I was wondering if anyone has ideas on how to optimize the calculation speed. For example:

Maybe my JSL code is written in an inefficient manner and can be improved?
The formulas only really need to be evaluated for the new rows being added (typically no more than 50 rows at a time), but the recalculation time leads me to believe the entire 132k rows are being evaluated every time. Is there a way to do a partial eval on specific rows?

Here is an example of one of the slower formulas. This calculates a corrected product viscosity based on the raw viscosity value in each row of the table. The calculation depends on product grade, so a lookup is performed to obtain the grade-specific information. GradeSpecs:find is an associative array containing product data and it is defined in the OnOpen script for the table.

Names Default To Here( 1 );
myGrade = :Name( "##run_grade_jmp" );
Try(
	If( Namespace Exists( "GradeSpecs" ),
		Visc_Property = GradeSpecs:find[myGrade]["Prod Visc"]["Property"];
		If( Visc_Property == "xyzTestMethod",
			xyz_Target = GradeSpecs:find[myGrade]["Prod Visc"]["Target"];
			xyz_corrected_value = :Name( "xyz1_Value" ) * xyz_Target / :Name( "xyz2_Value" )) ^ 1.333;
		,
			xyz_corrected_value = Empty()
		);
	,
		xyz_corrected_value = Empty()
	)
,
	xyz_corrected_value = Empty()
);
xyz_corrected_value;

Here's another example. The function ThermoCH:RxEffDenZ is a custom function that is defined in the OnOpen script. Many parameters are also defined in the ThermoCH namespace as associative arrays.

ThermoCH:RxEffDenZ(
	:Pressure,
	:Temp,
	:z1conc,
	(z2conc+z3conc),
	(z4conc+z4conc),
	0,
	0,
	0,
	0
)

And here is the definition of RxEffDenZ:

ThermoCH:RxEffDenZ = Function( 
//parameters
	{dPressure_PSIG, dTemp_DEGF, dz4_wf, dz1_wf, dz2_wf, dz5_wf, dz6_wf, dz7_wf, dz3_wf}, 
//local variables
	{, T1, T2, P3, TOTWF, iz4SF, z1SF, z2SF, z5SF, z7SF, z3SF, TOTMOL, Iz4, z1, z2, z5, z7, z3, Mw, ac_z1, ac_z2, ac_z4,
	ac_z5, ac_z7, ac_z3, tr_z4, tr_z1, tr_z2, tr_z5, tr_z7, tr_z3, psat_z4, psat_z1, psat_z2, psat_z5, psat_z7, psat_z3, k_z4,
	k_z1, k_z2, k_z5, k_z7, k_z3, alpha_z4, alpha_z1, alpha_z2, alpha_z5, alpha_z7, alpha_z3, a_z4, a_z1, a_z2, a_z5, a_z7, a_z3,
	delta_1, delta_2, delta_3, delta_4, delta_5, delta_6, delta_7, delta_8, delta_9, delta_10, delta_11, delta_12, delta_13,
	delta_14, delta_15, am, bm, pr_c1, pr_z1, pr_z2, q, r, D, radian, theta, v, i, S_1A, s_1, T_1A, T_1, den,
	x1, x2, x3, S, T, Sprime, Tprime}, 

	T1 = dTemp_DEGF + 459.67;  //Convert F to R
									  
	T2 = T1 * (5 / 9);  //Convert R to K
								   
	P3 = (dPressure_PSIG + 14.696) * 6894.757293178; //Convert psig to Pa
									 
	TOTWF = dz4_wf + dz1_wf + dz2_wf + dz5_wf + dz7_wf + dz3_wf;
	iz4SF = dz4_wf / TOTWF;
	z1SF = dz1_wf / TOTWF;
	z2SF = dz2_wf / TOTWF;
	z5SF = dz5_wf / TOTWF;
	z7SF = dz7_wf / TOTWF;
	z3SF = dz3_wf / TOTWF;
									
	// Determine mole fraction of solvent components
	TOTMOL = iz4SF / ThermoCH:MW["z4"] + z1SF / ThermoCH:MW["z1"] + z2SF / ThermoCH:MW["z2"] + z5SF / ThermoCH:MW["z5"] + z7SF
	 / ThermoCH:MW["z7"] + z3SF / ThermoCH:MW["z3"];
	Iz4 = iz4SF / ThermoCH:MW["z4"] / TOTMOL;
	z1 = z1SF / ThermoCH:MW["z1"] / TOTMOL;
	z2 = z2SF / ThermoCH:MW["z2"] / TOTMOL;
	z5 = z5SF / ThermoCH:MW["z5"] / TOTMOL;
	z7 = z7SF / ThermoCH:MW["z7"] / TOTMOL;
	z3 = z3SF / ThermoCH:MW["z3"] / TOTMOL;
									
									 
	Mw = Iz4 * ThermoCH:MW["z4"] + z1 * ThermoCH:MW["z1"] + z2 * ThermoCH:MW["z2"] + z5 * ThermoCH:MW["z5"] + z7 * ThermoCH:MW[
	"z7"] + z3 * ThermoCH:MW["z3"];
									
									
	//a factors
	ac_z4 = 0.45724 * 8.3143 ^ 2 * ThermoCH:Tc["z4"] ^ 2 / (ThermoCH:Pc["z4"] * 10 ^ 5);
	ac_z1 = 0.45724 * 8.3143 ^ 2 * ThermoCH:Tc["z1"] ^ 2 / (ThermoCH:Pc["z1"] * 10 ^ 5);
	ac_z2 = 0.45724 * 8.3143 ^ 2 * ThermoCH:Tc["z2"] ^ 2 / (ThermoCH:Pc["z2"] * 10 ^ 5);
	ac_z5 = 0.45724 * 8.3143 ^ 2 * ThermoCH:Tc["z5"] ^ 2 / (ThermoCH:Pc["z5"] * 10 ^ 5);
	ac_z7 = 0.45724 * 8.3143 ^ 2 * ThermoCH:Tc["z7"] ^ 2 / (ThermoCH:Pc["z7"] * 10 ^ 5);
	ac_z3 = 0.45724 * 8.3143 ^ 2 * ThermoCH:Tc["z3"] ^ 2 / (ThermoCH:Pc["z3"] * 10 ^ 5);
									
									
									
	tr_z4 = T2 / ThermoCH:Tc["z4"];
	tr_z1 = T2 / ThermoCH:Tc["z1"];
	tr_z2 = T2 / ThermoCH:Tc["z2"];
	tr_z5 = T2 / ThermoCH:Tc["z5"];
	tr_z7 = T2 / ThermoCH:Tc["z7"];
	tr_z3 = T2 / ThermoCH:Tc["z3"];
									
	//psat in bar
	psat_z4 = 10 ^ (6.9895 - 1216.92 / (227.451 + (dTemp_DEGF - 32) * (5 / 9))) * 0.019336721 * 0.068947573;
	psat_z1 = 10 ^ (6.96636 - 649.806 / (262.73 + (dTemp_DEGF - 32) * (5 / 9))) * 0.019336721 * 0.068947573;
	psat_z2 = 10 ^ (7.01612 - 860.992 / (255.895 + (dTemp_DEGF - 32) * (5 / 9))) * 0.019336721 * 0.068947573;
	psat_z5 = 10 ^ (10.413 - 4466.47 / (447.523 + (dTemp_DEGF - 32) * (5 / 9))) * 0.019336721 * 0.068947573;
	psat_z7 = 10 ^ (4.05752 - 1353.486 / (T2 + -60.386));
	psat_z3 = 10 ^ (4.24696 - 1099.207 / (T2 + -8.256));
									
									
	k_z4 = 0.37464 + 1.54226 * ThermoCH:w["z4"] - 0.26992 * ThermoCH:w["z4"] ^ 2;
	k_z1 = 0.37464 + 1.54226 * ThermoCH:w["z1"] - 0.26992 * ThermoCH:w["z1"] ^ 2;
	k_z2 = 0.37464 + 1.54226 * ThermoCH:w["z2"] - 0.26992 * ThermoCH:w["z2"] ^ 2;
	k_z5 = 0.37464 + 1.54226 * ThermoCH:w["z5"] - 0.26992 * ThermoCH:w["z5"] ^ 2;
	k_z7 = 0.37464 + 1.54226 * ThermoCH:w["z7"] - 0.26992 * ThermoCH:w["z7"] ^ 2;
	k_z3 = 0.37464 + 1.54226 * ThermoCH:w["z3"] - 0.26992 * ThermoCH:w["z3"] ^ 2;
									
	alpha_z4 = (1 + k_z4 * (1 - tr_z4 ^ (1 / 2))) ^ 2;
	alpha_z1 = (1 + k_z1 * (1 - tr_z1 ^ (1 / 2))) ^ 2;
	alpha_z2 = (1 + k_z2 * (1 - tr_z2 ^ (1 / 2))) ^ 2;
	alpha_z5 = (1 + k_z5 * (1 - tr_z5 ^ (1 / 2))) ^ 2;
	alpha_z7 = (1 + k_z7 * (1 - tr_z7 ^ (1 / 2))) ^ 2;
	alpha_z3 = (1 + k_z3 * (1 - tr_z3 ^ (1 / 2))) ^ 2;
									
	a_z4 = alpha_z4 * ac_z4;
	a_z1 = alpha_z1 * ac_z1;
	a_z2 = alpha_z2 * ac_z2;
	a_z5 = alpha_z5 * ac_z5;
	a_z7 = alpha_z7 * ac_z7;
	a_z3 = alpha_z3 * ac_z3;
									
	delta_1 = 0;
	delta_2 = 0;
	delta_3 = 0;
	delta_4 = 0;
	delta_5 = 0;
	delta_6 = 0;
	delta_7 = 0;
	delta_8 = 0;
	delta_9 = 0;
	delta_10 = 0;
	delta_11 = 0;
	delta_12 = 0;
	delta_13 = 0;
	delta_14 = 0;
	delta_15 = 0;
									 
	am = Iz4 ^ 2 * a_z4 + z1 ^ 2 * a_z1 + z2 ^ 2 * a_z2 + z5 ^ 2 * a_z5 + z7 ^ 2 * a_z7 + z3 ^ 2 * a_z3 + 2 * Iz4 * z1 * (1
	-delta_1) * Sqrt( a_z4 * a_z1 ) + 2 * Iz4 * z2 * (1 - delta_2) * Sqrt( a_z4 * a_z2 ) + 2 * Iz4 * z5 * (1 - delta_3) *
	Sqrt( a_z4 * a_z5 ) + 2 * Iz4 * z7 * (1 - delta_4) * Sqrt( a_z4 * a_z7 ) + 2 * Iz4 * z3 * (1 - delta_5) * Sqrt( a_z4 * a_z3 )
	 + 2 * z1 * z2 * (1 - delta_6) * Sqrt( a_z1 * a_z2 ) + 2 * z1 * z5 * (1 - delta_7) * Sqrt( a_z1 * a_z5 ) + 2 * z1 * z7 *
	(1 - delta_8) * Sqrt( a_z1 * a_z7 ) + 2 * z1 * z3 * (1 - delta_9) * Sqrt( a_z1 * a_z3 ) + 2 * z2 * z5 * (1 - delta_10) *
	Sqrt( a_z2 * a_z5 ) + 2 * z2 * z7 * (1 - delta_11) * Sqrt( a_z2 * a_z7 ) + 2 * z2 * z3 * (1 - delta_12) *
	Sqrt( a_z2 * a_z3 ) + 2 * z5 * z7 * (1 - delta_13) * Sqrt( a_z5 * a_z7 ) + 2 * z5 * z3 * (1 - delta_14) *
	Sqrt( a_z5 * a_z3 ) + 2 * z7 * z3 * (1 - delta_15) * Sqrt( a_z7 * a_z3 );
									
	bm = Iz4 * (0.0778 * 8.3143 * ThermoCH:Tc["z4"] / (ThermoCH:Pc["z4"] * 10 ^ 5)) + z1 * (0.0778 * 8.3143 * ThermoCH:Tc["z1"] / (
	ThermoCH:Pc["z1"] * 10 ^ 5)) + z2 * (0.0778 * 8.3143 * ThermoCH:Tc["z2"] / (ThermoCH:Pc["z2"] * 10 ^ 5)) + z5 * (0.0778 *
	8.3143 * ThermoCH:Tc["z5"] / (ThermoCH:Pc["z5"] * 10 ^ 5)) + z7 * (0.0778 * 8.3143 * ThermoCH:Tc["z7"] / (ThermoCH:Pc["z7"] * 10
	 ^ 5)) + z3 * (0.0778 * 8.3143 * ThermoCH:Tc["z3"] / (ThermoCH:Pc["z3"] * 10 ^ 5));

									
	pr_c1 = bm - 8.3143 * T2 / P3;
	pr_z1 = am / P3 - 3 * bm ^ 2 - 2 * bm * 8.3143 * T2 / P3;
	pr_z2 = bm ^ 3 + bm ^ 2 * 8.3143 * T2 / P3 - am * bm / P3;
	q = -1 * (3 * pr_z1 - pr_c1 ^ 2) / 9;
	r = -1 * (9 * pr_c1 * pr_z1 - 27 * pr_z2 - 2 * pr_c1 ^ 3) / 54;
	D = r ^ 2 - q ^ 3;
									
	//Discriminant: D < 0, all 3 roots are real and unequal; D = 0, all roots are real and at least two are equal
	//D > 0, only one root is real and two are imaginary
									
	// Calculate the first root
	If( D < 0, 
		radian = ArcCosine( r / Sqrt( q ^ 3 ) );
		x1 = -1 * (2 * Sqrt( q ) * Cos( (1 / 3) * radian )) - (1 / 3) * pr_c1;
		x2 = -1 * (2 * Sqrt( q ) * Cos( (radian + 2 * Pi()) / 3 )) - (1 / 3) * pr_c1;
		x3 = -1 * (2 * Sqrt( q ) * Cos( (radian - 2 * Pi()) / 3 )) - (1 / 3) * pr_c1;
										
		v = x2;
	, 
		Sprime = -1 * (r / Abs( r )) * (Abs( r ) + Sqrt( D )) ^ (1 / 3);
		Tprime = q / Sprime;
										
		x1 = Sprime + Tprime - (1 / 3) * pr_c1;
		v = x1;
										
	);

	den = (1 / v) * Mw;
	Return( den );
);

In the past, these calculations were being done by Excel using VBA UDFs. As I learn more about JMP I am starting to re-implement them using JSL code to reduce my dependence on Excel. However, I feel like I'm starting to reach the limits of JMP's capability here and maybe I will not be able to do these kinds of things in JMP. Or at the very least I will be forced to work with smaller data sets when using array lookups and custom functions.

ms · Nov 5, 2019 05:46 PM

The JSL looks good. However, I rarely use such complex calculations with many local variables and look-up functions in a column formula. It is probably more efficient to fill the (new) rows by running the code outside of the table, which also would make it easier "parallelize" the computations using matrix operations.

If the column formulas are required, one way to avoid recalculation of the entire table is to add the new data to a 1-row subset of the main table, eval formulas, suppress formula eval, delete first row (to avoid duplicate), and concatenate the updated subset with the main table (with formula eval supressed).

The associative array may be a bottleneck here. I think formulas would run faster if these metadata could be retrieved, for each row, from other columns in the datatable instead of looking through an array.

Chris_Rodrigues · Nov 6, 2019 11:45 AM

Thanks for the suggestions. I agree it seems like the array lookup is a bottleneck because the viscosity correction formula is relatively simple but still has a big impact on the recalculation speed. These values could be stored in columns but I initially did not want to do that because it would add dozens of extra columns with repeated values and it seemed inefficient from a storage perspective. I had not considered the impact on computation speed with an array lookup versus a column value lookup.

These calculations probably don't need to be column formulas but I did that because that's what I already knew how to do. I will try to make a separate script to calculate those column values.

How to optimize formula columns for faster recalculation speed?

Re: How to optimize formula columns for faster recalculation speed?

Re: How to optimize formula columns for faster recalculation speed?