Hi, I want to do one hold-out cross-validation for the random forest, but I don't know if JMP pro has this option or not. If it has where can I find it?
When you say, '...hold-out cross-validation... do you mean what is more commonly known as 'leave one out'?
yes exactly leave out one
Have you read the documentation for Bootstrap Forest models?
yes i did. The problem is this: I have a small size of dataset so I need to use leave out one
Hi @tuo88138,
As @Mark_Bailey suggests, you can have a look at the documentation and info behind the Random Forests models, as this model is already robust and don't specifically need cross-validation, due to the bootstrapping process used :
However, if you are interested in relaunching the Random Forest platform several time with various training and validation samples, it is possible to do so by :
The whole process can be seen in this presentation from Chris Gotwalt : Different goals, different models: How to use models to sharpen up your questio... - JMP User Commun... (around 17 minutes)
The cross-validation of Random Forests is indeed possible in JMP, but Random Forests are models robust to messy/noisy data and accurate on small dataset due to the bootstrapping, so unless you have a specific idea in mind to use this cross-validation, you can have good estimation of accuracy (with OOB samples) and reliable results with the model "as it is".
I hope this answer will help you,
The Model Screening platform actually makes this pretty simple, just set K in the Folded Crossvalidation section equal to the number of active rows.
Find details for each fold in the Details section of the report, and if you want to know which row goes with which fold you can open the Training section, select a row corresponding to that fold, and press 'Save Script Selected' under the table. A new column will be added to the table which indicates the validation rows.
Hi @ih,
From my side, using "Model screening" and the setup you proposed does work for K-folds crossvalidation, but not always for Leave-One-Out method (depend on dataset).
If I specify K = number of observations, an error message appears "The validation sets inside each of the folds are too small to support some methods", even if only Bootstrap Forests is checked in the "Method" panel. So I also thought about using the "Model screening platform" but it may not be possible, depending on the dataset (on "Boston housingprices" dataset with this method I have no summary of the folds and missing values in each folds details). It does work for the "Big Class" dataset.
Hi @tuo88138
You can follow the method described by Chris Gotwalt, I just tested it and it worked perfectly. You might have to uncheck "Early Stopping" in the Bootstrap Forest analysis panel, in order to avoid "blank values" for the different metrics in the output (Rsquare, RASE, etc...).
This technique can be interesting to compare contribution importance of variables across several simulations (see capture "Contribution-importance_simulations" attached), or provide confidence intervals on some metrics like Rsquare for example (see capture "Confidence_Intervals_Rsquare" for Training Rsquare on same dataset with same number of simulations).
Hope this answer will help you,
Hi @Victor_G, I don't believe that is an error, just a warning message. I do not think the unavailable methods will not affect this analysis.
Hi @ih,
Yes, no problem for the message in itself (as it doesn't stop the analysis), but depending on which datatable you use, you might get blank spaces everywhere instead of the expected results (see screenshot "Error_RandomForest_Model-screening_LOOcrossval" done on the housing prices dataset with leave-one-out crossvalidation).
As I mentioned before, I have the same problem when using the "regular" Bootstrap Forest platform with a validation formula column (with a single observation in validation to do leave-one-out crossvalidation) and the option "Early Stopping" checked, so this might be the same issue with the platform Model Screening, where the option "Early Stopping" is probably checked by default.
You can have a look at the housing prices table attached, with several scripts to illustrate the problems mentioned :
Hope the problem I mentioned is now clearer.
Depending on the dataset of @tuo88138, there are two methods available to choose to realize leave-one-out crossvalidation.