topic Re: Formula Parallelization of new columns in Discussions

Formula Parallelization of new columns

Robbb — Fri, 10 Apr 2026 05:59:00 GMT

Hello JMP community,

in my JSL scripts I often generate new columns one after another like this:

...
DT << New Column( "Some Name1", Numeric, "Continuous", Formula( Some Formula1 ) );
DT << New Column( "Some Name2", Numeric, "Continuous", Formula( Some Formula2 ) );
DT << New Column( "Some Name3", Numeric, "Continuous", Formula( Some Formula3 ) );
...

Usually my data tables are pretty large and can reach 100 Mio Rows. The formulas that I use often include statistical functions like Col Mean(), Col Maximum() etc. which makes them slower and slower the more rows I have. It might be the case that a few new columns depend on each other but please assume for the moment they do not.

When my script evaluates I am seeing JMP 19.1 using just 3-4% of my CPU (16 cores, 32 threads)
Is it possible to accelerate the computation e.g. by parallelization but without building subsets and having to merge subset results back? Can I work in parallel on one and the same data table using JSL in JMP?
If not, do you think the "subset-formula computation-merge back" approach is worth a try?

Cheers
Rob

Re: Formula Parallelization of new columns

jthi — Fri, 10 Apr 2026 07:54:05 GMT

Depending on your calculation (and if you need formulas or not), you could possibly create Summary table, do calculations there and update them back to your table.

Re: Formula Parallelization of new columns

ih — Fri, 10 Apr 2026 17:29:32 GMT

There is a tool to manage to manage parallelization for matrices: Parallel Assign.

I am not confident this will actually speed things up unless you are making a lot of new columns based on a small number of input columns, and you are very limited on what functions you use.

Here is an example:

Names default to here(1);

dt = Open("$Sample_data/Pollen.jmp");

//first a simple demo of parallel assign to calculate column means
moutmeans = J(n col(dt), 1, .);         //blank matrix that will be populated
mdata = dt << get as matrix;            //input data as a matrix
Parallel Assign(
	{ mdata = mdata },                  //variables needed in the 'worker thread'
	moutmeans[a, b] = mean(mdata[0,a]); //this fills in one cell
);
show(moutmeans);                        //col means are here

// Now for new columns
//mdata = mdata[1::2,0];                //a way to start small if you add show()
nrows = nrow(mdata);
cnames = dt << get column names();      //used to find position of columns in matrix

// Write formulas for each columns using column names as placeholders
// to be replaced by matrix values later, this would work for pretty simple logic
formulas = Eval List({
	Expr(edge + nub + 10),
	Eval Expr(nub - Expr( col mean(dt:nub) ) ) //note this col mean evaluates now
});
newcnames = {"a","b"};                  //new column names used later

mout = J(nrows, length(formulas), .);   //empty matrix to populate

Parallel Assign(
	{
		mdata = mdata,
		formulas = formulas,
		cnames = cnames
	}, 
	mout[a, b] = (
		f = formulas[b];
		
		//assign values to all potential variables used in matricies 
		//(bad idea if lots of unused columns)	
		for(i=1, i<= n items(cnames), i++,
		
			//Make a list with the variable to assign and value to assign it to,
			// then replace the list with assign
			Eval( Substitute(
				Eval List( {as name( name expr( cnames[i] ) ), mdata[a,i]} ),
				Expr({}),
				Expr( assign() )
			) );
		);
		eval(f);
	);
);

//put new columns in table
for each({c,i}, newcnames, dt << New Column(c,Numeric,Continuous, << Set Values(mout[0,i])));

//add columns to double check it worked
dt << New Column("a-check", Numeric, "Continuous", Format("Best", 12), Formula(:edge + :nub + 10));
dt << New Column("b-check", Numeric, "Continuous", Format("Best", 12), Formula(:nub - Col Mean(:nub)));

Re: Formula Parallelization of new columns

Ryan_Gilmore — Fri, 10 Apr 2026 17:55:37 GMT

You could try suppressing formula evaluation until all columns have been created then use Run Formulas.

Re: Formula Parallelization of new columns

Craige_Hales — Sat, 11 Apr 2026 17:12:57 GMT

( @Ryan_Gilmore , @Robbb ) dt<<runFormulas is the answer. Without it you'll see JMP using some of the background CPU cycles to do the eval and leaving most available for your interactive foreground use...you can continue scrolling the table for example. It is a bit of a pain to runformulas interactively, and you might not have to; almost always JMP will do it for you when you try to use the table with a platform (make a graph, analyze data). If you see JMP working with the table before it is completely evaluated, that's a bug and should be reported. But if you are watching the slow evaluation, scrolling the table looking for the rows to fill in, the behavior is not ideal.

Here's a wish list item to vote for