cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar

Solution in RS (D-optimal)

Hello everybody,

 

I've successfully run my first RS design but I struggle to understand the solution provided after fitting the model. My design includes 5 continous factors with 4 axial points and 2 center points. I did 1 replicate and fitted the model. There is no significant lack of fit.

 

The goal is to minimized the response and but the solution is higher than all of my measured values. I included a range check for my factors but for the solution 3 of the 5 factors are negative, which is impossible in reality. How do I get a solution that only considers positive factors?

 

Thanks!

Best regards,

FactorAlligator

7 REPLIES 7

Re: Solution in RS (D-optimal)

I'm not quite sure what you are seeing or asking. It is impossible for a model to always predict values that are higher than all of your data. What do you mean by a positive factor? Are your factors coded between -1 and +1 or did you leave them in their natural units? Would it be possible for you to share your JMP data table? You could anonymize the data, if necessary.

Dan Obermiller
statman
Super User

Re: Solution in RS (D-optimal)

I agree with Dan.  You haven't provided enough information for an answer to your query.  In your analysis, did you look at the surface?

 

Models can easily be extrapolated beyond reasonable conditions (they are simply a model of the data).  It is you that must interpret the model.

"All models are wrong, some are useful" G.E.P. Box

Re: Solution in RS (D-optimal)

Thank you for your response. The data table is attached. In the mean time I reanalyzed two outliers and this solved the issue with values being negative. By that I meant that my factor is coded between -1 and +1 and in reality it can go down to -2.5 (at this point the concentration of the factor is 0) but my solution was at -3 which cannot be. I would still be interested on how to handle this case if it happens again. 

 

As you can see from my model the best values within the design space yield a min Y response of 162. The lowest response I got in the pattern 00A00 at 213. But the calculated solution is at Y=296. If my goal is to minimize the response I would expect a limit solution of the design space (like when I maximize the desirability). 

 

@statman Also thank you for your response. I checked the surface also in the model view but I cannot gain anything from that unfortunately since some surfaces show a maximum and some a minimum.

statman
Super User

Re: Solution in RS (D-optimal)

I ran the model you had saved to the data table.  Not sure if this is the model you are evaluating, so if not some of my comments be inappropriate.  

The model needs work.  The residuals are suggesting non-constant variance and you have insignificant terms in the model.  The statistical significance seems to be based on the center point replicates and higher order quadratic interactions (your error is not based on randomized replicates).  Not sure what you mean by your comment about response surfaces?  You should get something like the following (though this is using your model):

Screen Shot 2022-05-02 at 7.51.50 AM.jpg

"All models are wrong, some are useful" G.E.P. Box

Re: Solution in RS (D-optimal)

Thank you for your input. I am trying to understand your suggestions and hope you could give me further feedback on my thoughts.

 

The residuals are suggesting non-constant variance and you have insignificant terms in the model.

Do you state that because of the "Residual by Prediceted Plot" that shows a decrease in Y Residual the higher Y Predicted is? Do I understand correctly that I should reduce the model by removing all insignificant terms? If yes, how would I treat main effects that are insignificant like factor X1 while the 2 way interaction X1*X5 is significant?

 

The statistical significance seems to be based on the center point replicates and higher order quadratic interactions (your error is not based on randomized replicates).

Is this seen in the "Analysis of Variance" where majority of my variance comes from the model and only a small amount from random error? I thought having little unexplained error is desirable. 

statman
Super User

Re: Solution in RS (D-optimal)

Regarding your first question, yes.  There may be other ways to see this, but that one is obvious.  Your second question is one of hierarchy.  From a practical standpoint, if the main effect is "involved" in a significant interaction but is not significant itself, the advice is often to include the main effect.  This will have a resultant impact on the delta between the R-square and R-square adjusted (as the model will appear over-specified), but it may make sense as you may have to manipulate the insignificant main effect to take advantage of the interaction.  To reduce models you use:

  • subject matter knowledge,
  • R-square-R-square adjusted delta (over-specification),
  • p-values (significance),
  • RMSE (usefulness of the model),
  • residual plots (testing assumptions),
  • others (dependent on the situation)

Regarding statistical significance from the ANOVA perspective. The F-test is a comparison of The MS of the effect of each term in the model with an estimate of the random errors (MSerror).  How you estimate the MSerror is an important decision. If your estimate of the random errors is small compared to the actual variance in the process, then statistically significant effects may have little to no effect in reality.  Analytically speaking, you want the estimates of random errors in your experiment to be representative of the true variation in the system in the future.  This can be challenging to accomplish as typically experiments have restricted inference space (they are done over a short period of time on a relatively small scale).  You may have to exaggerate the effects of noise during the experiment to represent future conditions appropriately.  If you have not devoted much planning to identifying and understanding noise in the system, you are left with running randomized replicates to get un-biased estimates of that variation. In un-replicated designs, you are typically using higher order terms to estimate the MSerror (although I prefer Daniel's advice for analysis of un-replicated designs to prevent the MSerror bias). If you have identified the noise, there are many options to help increase the inference space the design precision simultaneously (e.g., repeats, blocks, split-plots)

"All models are wrong, some are useful" G.E.P. Box

Re: Solution in RS (D-optimal)

We further recommend that deletion of terms does not break the model hierarchy (e.g., do not remove the X1 term if you still keep the X1*X2 term) because if you change or transform the factors afterward, then some terms in the model might disappear and some other terms might re-appear. JMP automatically follows the best practice of coding the factor levels for the regression analysis in order to provide several benefits. You might decide to back transform the model to use the actual factor levels after selecting the best model. You might be surprised by the result.