Re: about the way of desiding parameters in Random Forest
Jun 16, 2015 12:05 PM(1381 views)
I'm not sure I'm interpreting your question correctly. Perhaps you're asking how the default values in the Bootstrap Forest dialog are chosen.
Number of trees in the forest: the default is 100. In practice, you may want to use even more than the default number of trees (hundreds or even thousands).
Number of terms sampled per split: The default is ¼ of total number of predictor columns specified in the launch dialog.
Bootstrap sample rate: The default is 1, which means the bootstrap sample will have the same number of rows as the original data table. The bootstrap sampling happens automatically and you don’t actually ever see the separate bootstrap samples. If you choose a value that is less than 1, then the bootstrap sample will have fewer rows that in the original table. It’s best to use the default here.
Minimum and Maximum splits per tree: The default minimum size is 10 splits per tree. The maximum helps to control the computational complexity of the model, and the default maximum size is 2000. If cross-validation is used, an individual tree may stop earlier than the maximum number of splits if adding more splits to the tree is no longer improving its validation RSquare. Also, the next setting, minimum split size, will tend to keep the size of the individual trees smaller than the maximum.
Minimum split size: The default is the maximum of 5 or 1% of the number of rows in the data table. This also keeps the individual trees from becoming too over-fit.
Early Stopping: If hold-out cross-validation is used, then a checkbox is displayed and selected by default. Early stopping will cause JMP Pro to stop adding trees to the bootstrap forest if the validation performance of the model stops improving.