Discussions

konstantin · Mar 31, 2020 04:55 PM

I am doing a regression model without intercept since all my predictors are part of a chemical mixture and the sum of all predictor is equal to one. I have good R2 and R2 validation but my VIF for main terms and cross-terms are way higher than 5.

What is the best way to deal with this?

I am using this model to predict the property of the mixture. Should I provide additional constraints for my predictors like ratios (e.g. x1/x2 in the range)?

Dan_Obermiller · Apr 1, 2020 9:04 AM

I would check the boxes for L pseudocomponent coding. You could check both, but you run the risk of different types of coding being used for different factors. I find it easier to just stick to one type, and L pseudocomponent coding is usually the best.

Note that this standardization is different than the traditional design of experiments for independent factors.

As for your last question, you get to decide what makes the most sense for the people who will be using this model. Personally, I would just want the component ranges. I would find ratios awkward since the model is not in terms of ratios. Finally, if you are using JMP and have the components set up properly (an advantage of using JMP to create your design), the profiler will automatically set the ranges for the components to avoid extrapolation. Further, you have the mixture profiler that will show the valid ranges for your components.

Dan Obermiller

View solution in original post

Dan_Obermiller · Mar 31, 2020 09:46 PM

First, a point of clarification. For Scheffe mixture models there IS an intercept. It is not displayed because it is combined with the main effect parameter estimates. This is NOT a no-intercept model. You can confirm that you are fitting a Scheffe model by looking at the Analysis of Variance table. It should say "Testing Against Reduced Model: Y = Mean". If it does not, you are not fitting a Scheffe mixture model as you think.

Now for the high VIFs. This is VERY common for a mixture model. There is little that can be done for it. First, are you using pseudocomponent coded values? If the design was created in JMP, this should be used automatically. You can see if the pseudocomponent coding is used by looking at the parameter estimates. You should see something like (X1-c)/d where c and d are constants. If you do not see this, you can turn on the pseudocomponent coding column property, then fit your model.

Even with pseudocomponent coding, the VIFs could be very high. The more constrained your mixture components are, the higher the VIFs will be. Although high VIFs are not desirable, the goal of a mixture model is prediction. You can still obtain good predictions from the model. Remember that this is the ultimate goal. The high VIFs will cause higher standard errors for the parameter estimates which affects the testing. Reducing the model would be more difficult, but you can use the model for prediction.

Dan Obermiller

konstantin · Apr 1, 2020 11:11 AM

Dan, thanks for the response! As far as my design was not created in JMP there is no pseudo component coding, but I marked all of my predictors as Mixture through column properties dropbox.
I also see checkboxes for L Pseudo Component Coding and U Pseudo Component Coding, should I check them? I tried to do standardization for all of my predictors manually, but they would not be a mixture anymore (Sum is not equal 1) and JMP is not reporting R2.
Also, after I built the mixture model with high VIF, do you think it makes sense to report the ranges for my predictor ratios together with the model? Usually, I only proved the min-max to avoid extrapolation.

Dan_Obermiller · Apr 1, 2020 9:04 AM

I would check the boxes for L pseudocomponent coding. You could check both, but you run the risk of different types of coding being used for different factors. I find it easier to just stick to one type, and L pseudocomponent coding is usually the best.

Note that this standardization is different than the traditional design of experiments for independent factors.

As for your last question, you get to decide what makes the most sense for the people who will be using this model. Personally, I would just want the component ranges. I would find ratios awkward since the model is not in terms of ratios. Finally, if you are using JMP and have the components set up properly (an advantage of using JMP to create your design), the profiler will automatically set the ranges for the components to avoid extrapolation. Further, you have the mixture profiler that will show the valid ranges for your components.

Dan Obermiller

statman · Apr 1, 2020 12:25 PM

Konstantin,

My $.02, take it or leave it.

You are in the world of optimization. This requires different use and interpretation of statistical analyses. For example, you are not as interested in statistical significance, this has already been determined. While R-squares (and R-square adjusted) can be quite useful in certain stages of model building, they are not as useful during optimization. (BTW, R-squares have little to do with extrapolation of results, they only pertain to the data in hand). My advice would be to use the contour maps provided by JMP and actually run those mixtures and compare actual results to the predicted results (of the contour maps). Do residuals analysis and modify the maps as appropriate using empirical data. I'd stay away from over-complicating values with ratios or other transforms. A further word of caution, if you are running optimization designs without truly understanding noise and the effect of noise over time (e.g., measurement errors, lot to lot variation of input chemicals, ambient conditions, etc), the contour maps will not be very good predictors of actual results.

"All models are wrong, some are useful" G.E.P. Box

konstantin · Apr 1, 2020 12:49 PM

Statman, thanks! I am not into optimization yet thought it may come later. For now, I just need to predict the certain property of glass knowing its composition. I use very “expensive’ historical data for modeling and it is not designed well (or not at all) and some of my components have Correlations more than 0.7.
So I am thinking about how to select significant cross-terms for my model as well to define proper glass constituent rage for prediction. I will check with contour maps as you suggested as it sounds very promising!

statman · Apr 2, 2020 12:00 PM

Perhaps I am not communicating very well. If you are doing mixture designs, you are, in fact, in the world of optimization. This means you have already determined significant factors to model (have at least a first order model) and you understand noise. Your latest post suggests you are looking at historical data set from a poorly designed study. So my analogy of using mixture designs in this application: "you are mapping the base of the mountain". Not sure of your background knowledge of mixture designs, but I suggest you get Cornell's book "Experiments with Mixtures".

Obviously you can look at historical data to get clues, but usually this is some sort of regression procedure. Then you will iterate until you get to a point where you want to experiment on the components of a mixture to optimize.

"All models are wrong, some are useful" G.E.P. Box

konstantin · Apr 2, 2020 12:40 PM

Already got the book, thanks!

Mark_Bailey · Apr 2, 2020 12:49 PM

I also recommend the new book by Snee and Hoerl about mixture experiments for formulations using JMP. This is a good time to get a copy with a promotion from SAS Press for eBooks, too!

Discussions

How to deal with multicollinearity when doing multiple mixture model regression

Re: How to deal with multicollinearity when doing multiple mixture model regression

Re: How to deal with multicollinearity when doing multiple mixture model regression

Re: How to deal with multicollinearity when doing multiple mixture model regression

Re: How to deal with multicollinearity when doing multiple mixture model regression

Re: How to deal with multicollinearity when doing multiple mixture model regression

Re: How to deal with multicollinearity when doing multiple mixture model regression

Re: How to deal with multicollinearity when doing multiple mixture model regression

Re: How to deal with multicollinearity when doing multiple mixture model regression

Re: How to deal with multicollinearity when doing multiple mixture model regression

Recommended Articles