Hi @dlehman1
The Bootstrap Forest platform fits an ensemble model by averaging many decision trees each of which is fit to a bootstrap sample (with replacement) of the training data. Each split in each tree considers a random subset of the predictors (from Bootstrap Forest).
This randomisation in training samples selection and features subset selection enables to reduce the risk of overfitting (by creating multiple independant trees based on slightly different training data), improve accuracy and robustness to noise, and enables to handle missing values, outliers and correlated features/colinearity (thanks to subset feature selection at each split of each tree).
Concerning your original question, as I have not JMP 17 on my computer anymore, I won't be able to compare the outcomes. Regarding the documentations in JMP 17 and JMP 18 about the default hyperparameters used when launching the Bootstrap Forest, there doesn't look to be a difference in the default settings.
If the individual trees seem too complex, you can modify some of the default hyperparameters :
- Minimum splits per tree : By default it's on 10, but on small datasets I tend to always reduce this value to 2. Having a large value on the minimum splits per tree tend to create complex individual trees, which may not be beneficial and may be prone to overfitting.
- Maximum splits per tree : By default it's on 2000, but again 2000 splits maximum on any individual tree is a lot (and may lead to similar problems mentioned before) ! I tend to reduce this value depending on the size of the dataset and complexity of the task, around 100-1000.
- Minimum size split : By default it's on 5 (minimum number of samples to have a candidate split), but on small datasets it may not be relevant (and you might reduce this number, even if this may lead to higher risk of overfitting), while on bigger datasets you might want to increaser this value to increase robustness of each individual tree.
- Early Stopping : Make sure this option is checked if you have used a validation set, to make sure the creation of trees stop if the validation metrics do not improve with the creation of new trees.
Sorry for not having more precise/definitive answers to your questions, do you have some screenshots, datasets, or anything to show the differences ? It may help debugging the situation.
Best,
Victor GUILLER
"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)