I had created a predictive model and recently my colleagues at QC had to change one of their ingredients, which is one of the many variables in the model, to continue batching their products. I want to evaluate the existing model instead of fine-tuning it because there is not enough data yet and changing it will lower the accuracy. They have new measured responses, using the new ingredient, which I can use to compare the model predictions with. That’s one way to evaluate my model. But I am wondering if JMP has already any tools I can use to do so and perhaps to dig deeper. Anything from the Statistical Process Control (SPC) charts?!
Any suggestion is highly appreciated.
Some thoughts and questions:
1. How well was your predictive model working before the change in ingredients?
2. Does the current model use a completely different ingredient or is it "related" from a scientific theoretical perspective?
3. I'm not sure what you mean by "fine tuning" the model? If you have removed a variable from the model, the model could be completely different in terms of the beta coefficients of the other terms in the model. What happens to the precision of the model on historical data when that term is removed from the model?
4. I think your idea of having a column with the current predicted value (from your formula), a column of the measured response variable and a column that is the delta of the 2 would be a simple way to compare. How much difference would matter to you?
Control charts are a diagnostic. They are meant to compare different sources of variation and determine where the greatest leverage is. They do this by first assessing the stability of the basis for comparison (the range chart - which assess the stability of the within subgroup sources of variation as a function of the variables changing at that frequency). Then comparing the effect those sources have on the variability in the response to the other sources in your study (the x-bar chart - which compares the within subgroup variation (control limits) to the between subgroup sources of variation.
Now, you could do sampling of the process as it worked before the ingredient was changed (if you are just taking a quick look and not looking for diagnostics, you could use the MR, X chart of the data in time series) and the process after the ingredient was changed to see if there was an appreciable change in the response. You could also look at distributional summaries as well to get a graphical look.
1. I think the current model (Boosted Tree) is working well. R2 is 92% and I checked its predictions with new measured responses and the QC is very happy. The responses I am referring to is from the productions with the same 30 ingredients. Only one ingredient has changed just recently because they cannot import it and they are using an alternative one.
2. I think I answered this in #1. So the only model I have created uses 30 variables that we have been working with for several years. One of them had to change recently which is a similar type but has a slightly different chemical composition.
3. Oh sorry. Nothing has been removed from the model. I want to check if the only model I had created still predicts the responses acceptably.
4. I like to have the same accuracy as the previous model. That 92% R2 was for training, validation, and test set. It's a great model. If I want to create a new model using the 29 previous variables and the new ingredient (30 total), I wouldn't have enough data set to train a good model. So I have to wait for a year at least to produce enough data. Regarding the delta, how to calculate it to check the same accuracy is maintained? Can I use the Model Comparison from the menu?
So I think your last paragraph is very interesting. Are there any examples of how to apply the prediction formula in those charts?
I am a newbie to these charts. Also, what do you mean by distributional summaries? Sorry I am a little confused.
1. How significant was the previous chemical in the previous model? How different is the new chemical? Can you experiment with this variable to see its effect?
1. R^2 has nothing to do with the accuracy of your model. R^2 is the amount of variation IN THE DATA SET that is explained by the model. R^2 should not be evaluated on its own. You should compare R^2 with R^2 adjusted. R^2 will always increase as you increase the degrees of freedom in the model (which you apparently have a lot of DF in the model), but the point is to include only significant variables in the model. R^2 adjusted accounts for insignificant terms in the model, so you want the delta between R^2 and R^2 adjusted to be minimal.
2. To create a column of deltas between previous model (column 1) and new model (column 2), just create a column with a formula subtracting the 2 columns.
For insight into using control charts you should read:
Wheeler, Donald, and Chambers, David (1992) “Understanding Statistical Process Control” SPC Press (ISBN 0-945320-13-2)
@statman thanks a lot for the link, the book, and your time again.
1. It's significant according to screening factors and contribution factor columns. The new ingredient is from a different supplier. It's still within ASTM limits for the same type. Yes, I can experiment. I will do that in near future.
1. Yes, I am aware of that. Boosted Tree probably doesn't give me adjusted R2.
2. Yes, obviously for delta. I meant which delta to use. But it seems it's based on my own decision... thanks again.
Adding to @statman , you want to demonstrate equivalence after the change in the ingredient to validate the current model is still accurate. That purpose reverses the null and alternative hypotheses. You can perform the equivalence test with the Distribution or Oneway platforms. You will need to state the interval in which the results are considered practically the same.
There are no labels assigned to this post.