Subscribe Bookmark RSS Feed

High VIF in constrained design space

mdeij

Community Trekker

Joined:

Nov 21, 2011

Hi there,

I'm working on analyzing a design space that is constrained. There are 3 factors, all between 0 and 1, whose sum should not exceed 1. Note that this is not a mixture design, as the sum may be less than 1 but not above.

I've created a design space with 20 experiments. As the outcome of experiments comes from a lengthy CFD computation, there are no replicates (replicates would give exactly the same answer).

I am able to fit a nice quadratic model on the data, but I find Variable Inflation Factors (VIF) values that are very high, from 7 to the low hundreds. My textbooks advise me that these should never exceed 5-10, so I'm a bit worried about this.

I have a suspicion that the high VIF's are caused by the constraint on the design space, because in unconstrained design spaces for similar problems I don't find such high VIF values.

Any thoughts on why the VIF's are so high? is it indeed due to the constraint? I've attached some example data FYI.

1 ACCEPTED SOLUTION

Accepted Solutions
David_Burnham

Super User

Joined:

Jul 13, 2011

Solution

Certainly applying constraints may induce some collinearity which could manifest itself as high VIF values.  However, when I took a look at your data and fitted a response surface the VIF values were all below 10 - see screen shot below:

1312_Capture.png

So the question is why the difference?  The data from the DOE will have column properties e.g. Coding ... I am wondering if somewhere along the line these have got mis-specified causing an artificial collinearity due to strange ranges on the coding property?

I've just changed the coding of the A,B,C columns to have ranges 0 tol 1 and now I see the following much higher values of VIF:

1314_Capture.jpg

My interpretation is that this is being artificially induced by the coding property.  If all my factors are in the same numerical range then I am happy to just remove the coding. Alternatively you could specify the coding to be based on the actual data range 0 to 0.667 which brings the values below the "10 threshold".

To understand the nature of the collinearity you can look at the correlations of A,B,C under Multivariate Methods>Multivariate and also within Fit Model: Estimates>Correlation of Estimates.

Dave

-Dave
3 REPLIES
David_Burnham

Super User

Joined:

Jul 13, 2011

Solution

Certainly applying constraints may induce some collinearity which could manifest itself as high VIF values.  However, when I took a look at your data and fitted a response surface the VIF values were all below 10 - see screen shot below:

1312_Capture.png

So the question is why the difference?  The data from the DOE will have column properties e.g. Coding ... I am wondering if somewhere along the line these have got mis-specified causing an artificial collinearity due to strange ranges on the coding property?

I've just changed the coding of the A,B,C columns to have ranges 0 tol 1 and now I see the following much higher values of VIF:

1314_Capture.jpg

My interpretation is that this is being artificially induced by the coding property.  If all my factors are in the same numerical range then I am happy to just remove the coding. Alternatively you could specify the coding to be based on the actual data range 0 to 0.667 which brings the values below the "10 threshold".

To understand the nature of the collinearity you can look at the correlations of A,B,C under Multivariate Methods>Multivariate and also within Fit Model: Estimates>Correlation of Estimates.

Dave

-Dave
mdeij

Community Trekker

Joined:

Nov 21, 2011

It figures that adding a constraint like 0 <= A+B+C <= 1 will make the design space much less orthogonal therefore increasing VIF's.

Just one thing: you say that the data range is 0 to 0.667, but it clearly is 0 to 1. Entries 2-4 have a value of 1 for each of the three factors.

David_Burnham

Super User

Joined:

Jul 13, 2011

Yes, sorry, my mistake about the ranges.

-Dave