cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar

How to screen easily the non significant of many factors(~200 columns) before to perform a mulitple regression.

I have many factor (~200 columns) and one response,i have many measures.

Most of this factors show colinearity between them,is there a simple way to perform a first filter of the not relevant factors?

Then in the remaining i will perform a multiple regression.

3 REPLIES 3
hcarr01
Level VI

Re: How to screen easily the non significant of many factors(~200 columns) before to perform a mulitple regression.

Bonjour,

 

Pour classer des paramètres pour une variable réponse vous pouvez utiliser plusieurs outils :

 

- Criblage des régresseurs
- Criblage des réponses

 

Il se peut que les deux outils ne donnent pas les mêmes résultats. Dans ce cas, cela veut peut être dire qu’il existe des termes « cachés » qui ne sont pas explicites dans le modèle : des termes quadratiques, des termes croisés.

 

D’une autre part, vous pouvez utiliser la régression des moindres carrés (modèle linéaire), si vous souhaitez avoir une formule de prédiction. Avec cette méthode, vous pouvez construire votre modèle pas à pas, en supprimant les paramètres qui n’ont pas d’impact sur votre modèle.

Victor_G
Super User

Re: How to screen easily the non significant of many factors(~200 columns) before to perform a mulitple regression.

Hi @Perpignan_italy,

 

If you want to highlight the factors contributing most to your response, the Predictor Screening (jmp.com) platform seems a good first exploration step.
As it is based on Random Forest, it will enable each factor/feature to be tested "equally", even in the presence of multicollinearity.

 

If you want to create a model based on the most important predictors and use a regression model, you may have to choose an adequate model (penalized like Lasso, Ridge or Elastic Net regression, or other regression models like Partial Least Squares, ...) or do some pre-processing (with Principal Component analysis and doing the regression on principal components) in order to account for multicollinearity correctly in your model.
You may also use simple models that are able to deal with multicollinearity and with less asumptions (like Decision Tree from platform Partition Models (jmp.com)) once your predictors are selected.

I hope this answer will help you,

Victor GUILLER
Scientific Expertise Engineer
L'Oréal - Data & Analytics

Re: How to screen easily the non significant of many factors(~200 columns) before to perform a mulitple regression.

Thank you very much,i will try your suggestion.