cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
0 Kudos

Save trees formula when using bootstrap forest

What inspired this wish list request? 

I wanted to compute the uncertainty of prediction fastly and easily for predictive model without using bagging. Indeed, bagging is quite slow to compute with the profiler.

 

What is the improvement you would like to see? 

 

I would like to save the all the trees formula when I use the Bootstrap Forest Plateform. Or at least the standard deviation of the distribution.

 

 

Why is this idea important? 

 

This is very important for me to be able to compute acquisition function when I do Surrogate Model Optimization (Bayesian Optimization). I tried to use bagging with the partition plateform, but the result is not satisfactory. The gaussien process plateform do not deal with categorical variable.

 

 

3 Comments
SamGardner
Staff
Status changed to: New

@Florent_M I am curious to find out more about why you want to do this.  Can you describe the problem you are solving in more detail?

 

If I understand the feature you are requesting, with a little bit of JSL you split out the bootstrap forest prediction model in to separate columns.

Here is an example:

names default to here(1);

dt=open("$sample_data/bands data.jmp");

bf=dt << Bootstrap Forest(
	Y( :Banding? ),
	X( Column Group( "Predictors" ) ),
	Missing Value Order(
		Low(
			:proof cut, :caliper, :humidity, :roughness, :solvent pct, :ESA Amperage,
			:wax, :hardener, :current density, :anode space ratio, :chrome content
		),
		High(
			:viscosity, :ink temperature, :blade pressure, :varnish pct,
			:press speed, :ink pct, :ESA Voltage
		)
	),
	Method( "Bootstrap Forest" ),
	Portion Bootstrap( 1 ),
	Number Terms( 25 ),
	Number Trees( 100 ),
	Go
);
bf << Save Prediction Formula;

bf << Close Window;
c=column(dt, "Prob(Banding?==band)");

/* 
because a bootstrap forest model has the structure
( tree1 + tree2 + ... + treeN)/N

--get the prediction formula expression
--get the argument inside the main parenthesis (the sum of the trees)
--convert that experession to character and use the Words() function to get each tree formula
as a character string into a list
--then iterate through the list and create a new formula column for each tree,
*/


formula_expr=c << get formula;
arg(formula_expr,1);
trees_list=words(char(arg(formula_expr,1)), "+");

for each({tree, ii}, trees_list, 
	ctree = New Column("TREE"||char(ii), numeric, continuous);
	tree=trees_list[ii];
	Eval(parse(substitute("ctree << Set Formula(fff)", "fff", tree)))

);

SamGardner_0-1695733191181.png

 

Florent_M
Level III

Thank you @SamGardner!

You are right, using JSL for that case makes the job and I warmly thank you for your script. It would be an help for me to have this option in the save column formula menu (as well as other metrics such as median, quantiles, and so on...).

I will use it to have a look on the confidence interval and possibly to use an acquisition function EI or PI for iterative surrogate modelling with categorical variables (see https://community.jmp.com/t5/JMP-Add-Ins/Bayesian-optimization-add-in/ta-p/496785).

Best regards,

Florent

SamGardner
Staff
Status changed to: Investigating

Thanks for the idea.  We will investigate this further.  Any additional details you can provide about why and how you would use the individual tree formulas and their predictions and prediction summaries would be helpful as we evaluate this.