World Statistics Day was yesterday, but we’re celebrating all week long! This celebration means acknowledging the impact statistics has on our world. Who is your favorite statistician? Share with us who they are and why they top your favorites list.
Choose Language Hide Translation Bar
Highlighted
Level II

## Model: Are high VIF an issue in all cases?

Here's the situation: I am trying to build a model to express a particular Y using 25 parameters in X. All the data from my parameters X is given by the same single technology, which is analyzing 1 sample using 25 frequencies (my X), as a frequency scanning to gain maximum information. In other word, for each single Y data, I will have 25 X.

But the issue is that when I build models, I obtain huge VIFs because my parameters X are inherently collinear as they are just frequencies of the same technology so they evolve the same way ...

And if I do as I learned, meaning removing each parameters one by one based on the highest VIF I end up with one single frequency left, which removes all the interest of the scanning technology.

In that particular case, is the model really biaised by these colinearities or can we trust it?

@alexbeck maybe?

11 REPLIES 11
Highlighted
Level VII

## Re: Model: Are high VIF an issue in all cases?

I don't think I understand the issue completely.  It sounds like you have 25 levels of one X (frequency)....so you are creating a model with only one X in the model? You would have many degrees of freedom to estimate a rather complex non-linear model, but there is only one predictor in the equation.

High VIF's (>10 excessive, >5 should be evaluated) are only an indication that your model should be re-evaluated (the models usefulness may be over stated).  Yes it is a measure of multicollinearity, but the high values don't necessarily indicate what is collinear.  This usually requires subject matter expertise to understand and appropriately compensate for.

Highlighted
Level II

## Re: Model: Are high VIF an issue in all cases?

I'm sorry if it was clear enough, let's try again:
It is not exactly 25 levels of X, but 25 different X coming from the same technology. The techno scans various frequencies and for each, it will give one value, which is my X.
And when I am mentionning high VIF, I am talking about scores of hundreds and thousands ...
Highlighted
Staff

## Re: Model: Are high VIF an issue in all cases?

This sounds like you might actually want to use the Functional Data Explorer. For each Y you have a range of frequencies that defines a profile. That is exactly what functional data explorer is meant to analyze.

However, if you want to use regression, high VIFs are not really much of an issue if you do NOT remove any terms from the full model (the model with all of the X's in the model) and you are interested in just using the model for prediction.

Since it sounds like you are more interested in which frequencies are most important (by removing non-significant terms with high VIFs), you will still be running the risk of the high VIFs causing issues. Your approach is probably the best if you stick with regression, but remember that the high VIFs mean that the variance on the parameter estimates is inflated. That also means that the parameter estimates are terribly unstable. Removing one X from the model could dramatically alter the parameter estimates and significance testing of the other parameters. The model you end up with may not be the best.

If you really need to identify the important features of these profiles, please look at functional data explorer. If that is not option, you should consider looking at a modeling technique that is designed to handle the multicollinearity. PLS or PCR are just two such techniques.

Dan Obermiller
Highlighted
Level II

## Re: Model: Are high VIF an issue in all cases?

I have no clue what if that tool Functional Data Explorer, I'll find out thanks a lot for this tip !

Indeed, within all my 25 frequencies, only few are really relevant to explain my Y, and therefore during modeling I used stepwise to remove the non-significant ones. If I don't, I will obtain a model with like 18 non-significant parameters and only 6 signficant ... Does that make really sense?

I'll also check PLS indeed that's a good idea, thank you for your help!
Highlighted
Staff

## Re: Model: Are high VIF an issue in all cases?

I have seen a common "rule of thumb" for VIF: below 10 is good, above 10 is bad. This rule is unfortunately useless and should not be followed.

In general, a tolerable VIF depends on the relative size of the parameter estimate and the response variance. If your response changes are huge and your variance is small, you can tolerate very large VIF. On the other hand, if your response changes are small and your variance is large, then a small VIF might be too much.

I wonder if you could use another regression method. For example, PLS is very successful where the X are the domain of some spectra, like wavelength or molecular weight. These X are highly correlated. The PLS regression model exploits this information instead of penalizing you with collinearity.

If you have JMP Pro, then you could also treat the X as functional data and save the functional principal components. These FPCs would be used in place of X in your regression model of Y.

What do you think?

Learn it once, use it forever!
Highlighted
Level VII

## Re: Model: Are high VIF an issue in all cases?

As Mark indicates, "Rules of Thumb" should always be used with caution.  In building an appropriate model, there are a number of techniques used (e.g., assessing the assumptions (NID(0, s2)), R-square-R-square Adj, testing for outliers, residuals analysis, etc.) of which VIF's is only one.

Highlighted
Level II

## Re: Model: Are high VIF an issue in all cases?

Absolutely. Everything in the model is quite OK: Outliers were checker with studentized residuals, Adj. R2 is very fine etc ... Just the VIF are crazy
Highlighted
Level II

## Re: Model: Are high VIF an issue in all cases?

I forgot to mention but when I speak about high VIF, I meant scores of hundreds and thousands ... If still my variance is not that high, can I tolerate such values?
The PLS was previously mentionned indeed, I'll try that!
Unfortunately I don't have JMP Pro ...
Highlighted
Super User

## Re: Model: Are high VIF an issue in all cases?

There's a lot of techniques you could use in this situation.  Regularized models like ridge regression or neural nets can overcome collinearity pretty well.  You can also use dimensionality reduction techniques like principle components and do regression on the component scores.

-- Cameron Willden
Article Labels

There are no labels assigned to this post.