cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
frankderuyck
Level VI

Predictor screening

For predictor screening in JMP bootstrap forest is used; this works very well however why is this method chosen and not boosted tree which is also a very powerful method? Does JMP use XGboosted?

4 REPLIES 4
SDF1
Super User

Re: Predictor screening

Hi @frankderuyck ,

 

  I'm not an expert in how JMP actually performs things on the back end (the precise coding that is executed), but I believe bootstrap forest is used because it has two layers of randomization to it: one is randomly selecting a subset of the factors. For example, if you have a column you want to predict and test which of 10 factors best contribute to that prediction, the bootstrap forest approach will randomly select a subset, say 6 of the ten and use a decision tree to best predict the outcome and then rank the factors based on their contribution. The other randomization is bootstrap forest randomly selects a certain number of rows to fit the model and then the remainder to validate (or simplify it), and it does this with random selection and replacement at each iteration. It's my understanding that boosted tree by default does not do this.

 

  As to your last question, you can download the XGBoost add-in for JMP Pro from the JMP website. It works for JMP Pro 14 and up, I believe.

 

  On a side note, since it sounds like you are working with JMP Pro, you might want to run the GenReg platform and use the different prediction algorithms (Lasso, Elastic Net, etc.) to see the model coefficient estimates and run some bootstrap simulations there as well. Sometimes you get different answers and this can help to determine if a borderline factor is really important or not. I often do both approaches if I am tasked with finding the best predictor set for a model.

 

Hope this helps!,

DS

frankderuyck
Level VI

Re: Predictor screening

Thanks for this extended answer!

Yes I believe that running both bootstrap & boosted is a good idea for selecting predictors out of many candidates.

I am right saying that bootstrap forest is preferred for categorical responses?

Is the XGboosted method better than the one used in jmp pro? 

statman
Super User

Re: Predictor screening

Not sure which version of JMP you are using, but if you are on 16 JMP Pro, you can compare many different modeling platforms:

 

https://www.jmp.com/support/help/en/16.0/#page/jmp/launch-the-model-screening-platform.shtml?os=mac&...

 

This may be useful to you.

"All models are wrong, some are useful" G.E.P. Box
peng_liu
Staff

Re: Predictor screening

Predictor Screening platform is Bootstrap Forest platform with minimal number (just one now: Number Trees) of tuning parameters to specify and only provides Column Contributions report. The magic is that Predictor Screening platform has decided default values for the remaining tuning parameters. The setting serves well for finding useful predictors.

 

If one wants to apply boosted tree for the same purpose with minimal number of tuning parameters, say one parameter, that is a very difficult task. E.g. boosted tree relies much more heavily than random forest on the choice and usage of a validation method, and the choice of a validation method is itself an art.

 

Predictor Screening platform serves the purpose of finding useful predictors. It is not a replacement of the Bootstrap Forecast. Your final model won't be a random forest model if you just use Predictor Screening. You will probably decide to use other models using the predictors found by Predictor Screen. A benefit of using Predictor Screening to find useful predictors, rather than using built-in variable selection methods in your final model, is that the method does not rely on any parametric assumptions.

 

Between random forest and boosted tree, I have not come across any studies that conclude their performance based on the type of the response variable.

 

Boosted Tree is an implementation of the gradient boosting method. So is XGBoost. And so are other implementations, e.g. LightGBM. Their comparisons can be rather complicated, and probably subjective as well. But it is always exciting to play with different tools to understand their pros and cons.