Choose Language Hide Translation Bar
Highlighted
Level VI

## Discriminant Analysis vs. Predictor Screening: Different hits from same data set; why?

Hi JMP Community,

First, let me apologize for not sharing actual data and result: I'm currently working on sensitive data sets that I have not had the time to anonymize.

I would like to better understand the differences between the Discriminant Analysis platform and the Predictor Screening platform. I understand that these represent different approaches with different assumptions but, if a combination of continuous variables were to score high in the Predictor Screening platform would it be reasonable to expect that at least some of the same variables be picked by the Discriminant Analysis platform (Stepwise Variable Selection)?

In other words, if the top hits from the Discriminant Analysis and the Predictor Screening are mostly different, does it strongly suggest that none of the variables entered in these models are actually associated with outcome?

Sincerely,

TS

Thierry R. Sornasse
1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted
Staff

## Re: Discriminant Analysis vs. Predictor Screening: Different hits from same data set; why?

These techniques are very different with different assumptions.

Predictor screening is a random forest, which is a series of tree models. No distributional assumptions.

Discriminant analysis is a multivariate technique that is fairly sensitive to the normality assumption. Plus, there can be large sample sizes required to estimate some discriminant models (quadratic, regularized, etc.)

Which is best and "correct" for you? Who knows? Check assumptions closely. If the data are large, could you perform the analysis on multiple subsets (which is what predictor screening does automatically)?

To quote George Box, all models are wrong, since are useful. Look for something that is useful.
Dan Obermiller
Highlighted
Staff

## Re: Discriminant Analysis vs. Predictor Screening: Different hits from same data set; why?

These techniques are very different with different assumptions.

Predictor screening is a random forest, which is a series of tree models. No distributional assumptions.

Discriminant analysis is a multivariate technique that is fairly sensitive to the normality assumption. Plus, there can be large sample sizes required to estimate some discriminant models (quadratic, regularized, etc.)

Which is best and "correct" for you? Who knows? Check assumptions closely. If the data are large, could you perform the analysis on multiple subsets (which is what predictor screening does automatically)?

To quote George Box, all models are wrong, since are useful. Look for something that is useful.
Dan Obermiller
Article Labels

There are no labels assigned to this post.