Re: Want to run the same tuning design table for multiple y-variables

ClassCanary405 · Oct 15, 2023 05:17 PM

I'm running bootstrap forest as a predicative model. I have hyperparameters I like but want to run multiple times (ex. 500 replicates) to find the random forest that gives me the best validation r square. I have the tuning design table built, but every time it runs for one predicted y-variable (there are 11 in my dataset) I have to manually choose the same design table every time. It's time consuming to sit by the computer and wait 10-15 minutes for each run and then click the next. I'd much rather set it and forget it. Is there a setting so that it can run the same tuning table for each y-variable automatically?

For reference this is the box that I'm having to click after each bootstrap forest is complete:

txnelson · Oct 15, 2023 06:53 PM

According to the Documentation on Bootstrap Forest, you should just specify the table to use in the JSL for the platform.

i.e.

Bootstrap Forest(
	Y( :Percent body fat ),
	X(
		:"Age (years)"n, :"Weight (lbs)"n, :"Height (inches)"n,
		:"Neck circumference (cm)"n, :"Chest circumference (cm)"n,
		:"Abdomen circumference (cm)"n, :"Hip circumference (cm)"n,
		:"Thigh circumference (cm)"n, :"Knee circumference (cm)"n,
		:"Ankle circumference (cm)"n, :"Biceps (extended) circumference (cm)"n,
		:"Forearm circumference (cm)"n, :"Wrist circumference (cm)"n
	),
	Tuning Design Table("place your design table name here"),
	Go
)

Jim

Victor_G · Oct 16, 2023 05:37 AM

Hi @ClassCanary405,

Welcome in the Community !

Some further questions and remarks, not directly linked to your question, but maybe worth to consider :

As you're using bootstrap forest as a predictive model, how did you partition your data into train/validation/test sets (k-folds or "fixed" cross-validation ?) and ensure that sets are representative of the future samples ?
If using "fixed" cross-validation, have you assessed the robustness of your model by varying the training, validation and test sets (for example by using a validation formula column to simulate different models results with the same grouping or stratification if needed : https://community.jmp.com/t5/Discussions/Boosted-Tree-Tuning-TABLE-DESIGN/m-p/609591/highlight/true#...)
Why did you choose bootstrap forest ?
Different tree-based (or other types) models exist for tabular data that may be more suited for providing precise predictions (like Boosted Tree (jmp.com) or XGBoost Add-In for JMP Pro). It's also necessary, before any analysis, to check for missing values, outliers, balance in the data ... Even if tree-based models may be robust in these situations, it may sometimes explain why certains predictions are "off" compared to measurements, due to a lack of data or representativeness.
You may also consider creating an ensembling of models, by using different types of models that behave great on different regions of your experimental space or in different scenarios (missing values, outliers, etc...).
Why is your validation metrics R² for a predictive purpose ?
Perhaps other metrics, such as RASE, AAE or Missclassification Rate (in the case of categorical response, like classes) may be more suited : The Model Comparison Report (jmp.com)

Finally, Random Forest (called Bootstrap Forest in JMP) are one of the most robust supervised learning model available in Machine Learning. Unlike other possible algorithms, it's one of the few models that is less sensitive to hyperparameters tuning, and you can obtain good performances for Random Forests with the basic JMP Pro recommendation/default settings.

For more info about "tunability" of Machine Learning models (impact of hyperparameters tuning on performances), you can check this paper : [1802.09596] Tunability: Importance of Hyperparameters of Machine Learning Algorithms (arxiv.org)

I hope this complementary answer will help you,

Victor GUILLER
Scientific Expertise Engineer
L'Oréal - Data & Analytics