cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
barr26
Level I

Predictor Screening

I am trying to use the predictor screening platform and though I have walked through the example on JMP, but I still have questions.

 

I put my response variable in a yes, no format like the example; however, when I look to see which variables are the best predictor of the response how do I interpret the results? Is the number #1 ranked variable the best for predictor for a "yes" outcome or a "no" outcome? Or is the idea that you perform this analysis and then follow up with another analysis?

 

           Successful Outcome?

 

 

Predictor

Contribution

Portion

 

Rank

 

Variable1

4.36861

0.3072

 

 

1

 

Variable4

4.36012

0.3066

 

 

2

 

Variable5

2.29150

0.1612

 

 

3

 

Variable2

1.98570

0.1396

 

 

4

 

Variable3

1.21356

0.0853

 

 

5

 

 

I then wanted to further my analysis to see if which variable was the best predictor for a successful outcome of given a certain treatment. For this, I filtered only the data on the treatment I wanted, then excluded what I didn't want, and I reran the analysis. Would this be correct?

 

My final question is that the data in the example on JMP was all character nominal. Are my results accurate if my predictor variables are character ordinal or is this platform only good for character nominal data?

 

 

1 REPLY 1
dlehman1
Level IV

Re: Predictor Screening

I'm not sure what example you are following so I can't be specific about what you are showing.  But I can say that you can certainly mix nominal and continuous predictors and still use Predictor screening.  Your first question sounds like you are asking whether the ranking applies to predicting "yes" or "no."  With a binary variable, the ranking will be the same for each outcome since it must be symmetric - if a variable is good for predicting "yes" it must also be good for predicting "no."  If I understand your other question, then you have several treatments in your data and you want to apply predictor screening for individual treatments rather than pooling them together.  If so, you can subset those and use predictor screening.  Alternatively, you can put treatment in the "by" box and run predictor screening for each treatment separately.  There may be a statistical issue with interpreting the results if you have many treatments (the multiple comparisons problem) which might make your statistical findings overly estimated - but I'll leave that for a statistician to respond to.

 

You had one other question - about whether you use predictor screening and then follow it with another analysis.  I think the answer is yes.  Predictor screening uses a Bootstrap Forest to identify the most important predictors.  There are other predictive models that may perform better once you know what predictors you want to use.  Even running the Bootstrap Forest will give you different results when you use that platform rather than predictor screening since it has more tuning variables it uses in the modeling (which you can further adjust).  Predictor screening provides a quick first step to narrow the important predictors, but Model Screening would be the next step to use to actually build better models using these predictors.