cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
y121002
Level I

about the way of desiding parameters in Random Forest

On JMP,How do it  calculate the parameters of option in RANDOM FOREST(for example The  max number of branches in tree)?

And,can the parameters of Initial setting take advantage result(example:AUC of ROC Curve )?

1 REPLY 1
Jeff_Perkinson
Community Manager Community Manager

Re: about the way of desiding parameters in Random Forest

I'm not sure I'm interpreting your question correctly. Perhaps you're asking how the default values in the Bootstrap Forest dialog are chosen.

8969_Pasted_Image_6_16_15__2_18_PM.png

  • Number of trees in the forest: the default is 100. In practice, you may want to use even more than the default number of trees (hundreds or even thousands).
  • Number of terms sampled per split: The default is ¼ of total number of predictor columns specified in the launch dialog.
  • Bootstrap sample rate: The default is 1, which means the bootstrap sample will have the same number of rows as the original data table. The bootstrap sampling happens automatically and you don’t actually ever see the separate bootstrap samples. If you choose a value that is less than 1, then the bootstrap sample will have fewer rows that in the original table. It’s best to use the default here.
  • Minimum and Maximum splits per tree: The default minimum size is 10 splits per tree. The maximum helps to control the computational complexity of the model, and the default maximum size is 2000. If cross-validation is used, an individual tree may stop earlier than the maximum number of splits if adding more splits to the tree is no longer improving its validation RSquare. Also, the next setting, minimum split size, will tend to keep the size of the individual trees smaller than the maximum.
  • Minimum split size: The default is the maximum of 5 or 1% of the number of rows in the data table. This also keeps the individual trees from becoming too over-fit.
  • Early Stopping: If hold-out cross-validation is used, then a checkbox is displayed and selected by default. Early stopping will cause JMP Pro to stop adding trees to the bootstrap forest if the validation performance of the model stops improving.

Thanks to jgrayson, sjgardner1 and mia.stephens for the information above. Look for their forthcoming book, Building Better Models with JMP Pro.

-Jeff