I wonder if it is statistically ok to repeat the "Make model using VIP" button-step in building a PLS model?
Say I have 30 variables in my original model, and 15 of them turn out to be above the treshold line, so I select "Make model using VIP", and get a new model with the 15 that were above the line. Altough, in my new model, there are 12 that are above the line and 3 that are below. Can I repeat the "Make model...".-step until I have a model wilth only VIP values above the treshold or can I only do it one time?
Hoping for a fast reply,
I don't know about the statistical validity. The threshold of 0.8 is really just a "rule of thumb" as I understand it. You certainly can repeat this procedure. The question is how does it reduce the predictive power of the model. If the model with only 12 variables has similar predictive power then why not use it?
As always it really depends what the objective is? Prediction or explanation? Is identifying the most important variables the aim? Do you need to find the simplest model?
And remember: "all models are wrong..."
It might also be worth seeing if Generalised Regression gets you to a useful model, with less work.
My intuition tells me repeatedly fitting a model with a smaller collection of predictor variables for each iteration kind of defeats the purpose of the PLS general idea of using ALL the variables present in a system to create a smaller set of latent variables in that you are presenting a 'new' set of original variables each iteration, and in turn having a new correlation structure from which the latent variables are constructed. So I guess it's no surprise you'll get different VIP results. I agree with all that Phil Kay contributed. One thing I'll add is how do your residuals change from iteration to iteration? If there's no practical change in the residuals then I guess I'd be inclined to stop the iterative process.