Here's the situation: I am trying to build a model to express a particular Y using 25 parameters in X. All the data from my parameters X is given by the same single technology, which is analyzing 1 sample using 25 frequencies (my X), as a frequency scanning to gain maximum information. In other word, for each single Y data, I will have 25 X.
But the issue is that when I build models, I obtain huge VIFs because my parameters X are inherently collinear as they are just frequencies of the same technology so they evolve the same way ...
And if I do as I learned, meaning removing each parameters one by one based on the highest VIF I end up with one single frequency left, which removes all the interest of the scanning technology.
In that particular case, is the model really biaised by these colinearities or can we trust it?
I don't think I understand the issue completely. It sounds like you have 25 levels of one X (frequency)....so you are creating a model with only one X in the model? You would have many degrees of freedom to estimate a rather complex non-linear model, but there is only one predictor in the equation.
High VIF's (>10 excessive, >5 should be evaluated) are only an indication that your model should be re-evaluated (the models usefulness may be over stated). Yes it is a measure of multicollinearity, but the high values don't necessarily indicate what is collinear. This usually requires subject matter expertise to understand and appropriately compensate for.
This sounds like you might actually want to use the Functional Data Explorer. For each Y you have a range of frequencies that defines a profile. That is exactly what functional data explorer is meant to analyze.
However, if you want to use regression, high VIFs are not really much of an issue if you do NOT remove any terms from the full model (the model with all of the X's in the model) and you are interested in just using the model for prediction.
Since it sounds like you are more interested in which frequencies are most important (by removing non-significant terms with high VIFs), you will still be running the risk of the high VIFs causing issues. Your approach is probably the best if you stick with regression, but remember that the high VIFs mean that the variance on the parameter estimates is inflated. That also means that the parameter estimates are terribly unstable. Removing one X from the model could dramatically alter the parameter estimates and significance testing of the other parameters. The model you end up with may not be the best.
If you really need to identify the important features of these profiles, please look at functional data explorer. If that is not option, you should consider looking at a modeling technique that is designed to handle the multicollinearity. PLS or PCR are just two such techniques.
I have seen a common "rule of thumb" for VIF: below 10 is good, above 10 is bad. This rule is unfortunately useless and should not be followed.
In general, a tolerable VIF depends on the relative size of the parameter estimate and the response variance. If your response changes are huge and your variance is small, you can tolerate very large VIF. On the other hand, if your response changes are small and your variance is large, then a small VIF might be too much.
I wonder if you could use another regression method. For example, PLS is very successful where the X are the domain of some spectra, like wavelength or molecular weight. These X are highly correlated. The PLS regression model exploits this information instead of penalizing you with collinearity.
If you have JMP Pro, then you could also treat the X as functional data and save the functional principal components. These FPCs would be used in place of X in your regression model of Y.
What do you think?
As Mark indicates, "Rules of Thumb" should always be used with caution. In building an appropriate model, there are a number of techniques used (e.g., assessing the assumptions (NID(0, s2)), R-square-R-square Adj, testing for outliers, residuals analysis, etc.) of which VIF's is only one.
There's a lot of techniques you could use in this situation. Regularized models like ridge regression or neural nets can overcome collinearity pretty well. You can also use dimensionality reduction techniques like principle components and do regression on the component scores.
There are no labels assigned to this post.