Discussions

HoaLe99 · Jun 8, 2023 2:12 PM

Dear JMP staffs and users!

I'm an undergraduate student and in my process to do a research. I have some problems with result analysis. I used RSM (classical design) to optimize my response with a matrix including 3 independent variables. To increase confidence level, I added 1 replicate in my design, so the matrix included 40 runs. Then I started to analyze the results from experimental matrix design, my R2 value is 96% and Pvalue <0.001 but P lack of fit value is significant. In another way, I didn't show replicate in my design, but each run was still replicated 1 time and calculated mean value. Then, my R2 and Pvalue are not changed and P lack of fit value is not significant. Therefore, I want to know what is the right way to analyze the results?

I am looking forward to your responses.

Best regard!

Phil_Kay · Nov 1, 2022 07:46 AM

Hi @HoaLe99 ,

Just to clarify, by saying that you added 1 replicate do you mean that you had a design with 20 runs (probably the central composite with 6 centre points) and you duplicated all runs for 40 runs in total? And can I check that they are true replicates. That is, all 40 runs were carried out independently rather than just replicating the measurement of the response.

Assuming that is all true then the first analysis that you did with 40 runs is appropriate. This analysis properly reflects the noise of the experiment, and therefore you could expect that it would give rise to detecting significant lack of fit.

Significant lack of fit is not necessarily a problem. It is an indication that the model might not be completely adequate.

You should consider how big the lack of fit is. Because you have so many replicates (I think you have many repeated centre points as well) the test will be sensitive to detecting really quite small lack of fit as significant.

Also remember that significance is based on a fairly arbitrary choice of p-value threshold (usually 0.05). You could reasonably choose another threshold (say 0.01) and the lack of fit would no longer be "significant".

You should consider other parts of the report including residuals plots to consider if there really is a problem with the model. How does the model help with your overall objectives? The high R^2 indicates that the model is capturing almost all of the variation in the response data, which is positive.

I hope this helps,

Phil

HoaLe99 · Nov 3, 2022 12:05 AM

Thank you so much for your help, I really appreciate it!

HoaLe99 · Nov 3, 2022 03:19 AM

Dear Mr. Phil_Kay!

I understood what you said but I do not know how to set up another p-value threshold in JMP. Can you show me the way to set up it? And I attached a file and picture about my model below. Can you help me to check it.

Many thanks

Phil_Kay · Nov 4, 2022 06:36 AM

Hello again, @HoaLe99 !

This is a really interesting example! I've just spent the last 2 hours exploring this when I really should be doing other things. But it was so much fun. Thank you for sharing!

First of all, about the threshold: I wasn't recommending that you change the p-value threshold in the analysis. I think you can do that, but it wouldn't change anything. You would still get exactly the same analysis and p-values. I was really just saying that we declare "significance" when a p-value is below the threshold of 0.05. And that threshold is only a convention or a "rule of thumb". We could easily use a different threshold and that would change what we say is "significant."

Anyway, it is largely irrelevant in this case because the p-value for lack of fit is a long way below 0.05. It really does look like you have fairly serious lack of fit, which you can see from some of the plots:

Notice how the points are not just randomly distributed around the line in the actual vs predicted. There are groupings above and below the line at different points.

This suggests that the model is inadequate in some way, so it might require higher order terms. You only considered first and second order terms (the RSM model). I looked at adding 3rd order terms.

This is complicated because there are more 3rd order terms than you can fit. I actually used some of the tools in JMP Pro to help with this, including Best Subset and SVEM Lasso in Gen Reg in JMP Pro 17. However, you can use Stepwise in standard JMP. I locked in all the second order terms and then did forward stepwise to see which 3rd order terms were most important. (There is a script for Stepwise in the attached version of the table)

It also seemed to me that your response should not be less than zero, and possibly not greater than 1, so I added a Logit transform to the response for the "best" model (script in attached table). Without this transform you can get response predictions below 0, which I suspect is not possible here.

I found that the best model was the RSM with the cubic term for X2 (X2*X2*X2) and the interaction of the X3 with the quadratic of X1 (X1*X1*X3). This is a complicated effect. It means that the curvilinear nature of X1 is affected by X3. The curve of X1 is more pronounced when X3 is higher. Maybe that makes sense scientifically? It is unusual to fit these in models of experiments, but my feeling is that they are probably quite important effects in many cases.

The model still has a significant lack of fit, but it is much less (p = 0.009 versus 0.0004). And the residuals all look better to me. To improve the model further you would probably need to augment with more runs to test the 3rd order terms. I am not sure that this would be worthwhile but that depends on what you need the model for.

This is a really interesting example, so please let me know if this will ever be published or if it would be okay to use this in anonymised form.

I hope this was helpful for you. I suspect the model that you already had would have been useful for your objectives. The improved model looks better statistically but the difference might not be that important practically speaking.

Phil

HoaLe99 · Nov 5, 2022 02:18 AM

Dear Mr. Phil_Kay!

This result was used in my undergraduate thesis. Many thanks for spending your precious time on solving my problem, I really appreciate it!

Best regard!

Phil_Kay · Nov 8, 2022 06:42 AM

It was a pleasure to help. I would be interested to see your thesis, if you are able to share it. This is really interesting data so I hope that you will publish it and make the data available for public use.

Phil_Kay · Nov 1, 2022 07:48 AM

Sorry, you might also consider a transformation of the response. Particularly if you have reason to believe that a normal distribution is not appropriate. E.g. if the response is %yield for example.

David_Burnham · Nov 1, 2022 09:12 AM

Here is what I do when I see lack of fit in a DOE.

1. Save my residuals to the data table. The plot then against each of the factors that we in the design. Look for patterns which might be indicative of the cause of lack of fit.

2. The lack of fit ANOVA decomposes the error variation into a lack of fit component and pure error. I often see people underestimate the pure error component - this occurs either because there are few DoFs contributing the the estimate or that the experimental procedure didn't fully capture the variability that we are looking for in replication. Find the rows in your data table that correspond the these replicates and change there colour. Now look at the graphs in step (1). Most likely the variation in the graphs is much smaller than what you are seeing in the replicated points - why is that? Either lack of fit it too big (i.e. you need to improve the model) or pure error is too small (your experimental procedure underestimated the variation).

3. Plot your residuals against the predicted values. The 'envelop' that contains the residuals should be uniform over the range of the predicted values. If, for example, it diverges as the predicted values increase, this might suggest that the model would improve with a log transform. You can use the Box-Cox transformation option to investigate transformations.

-Dave

HoaLe99 · Nov 3, 2022 12:07 AM

I am so grateful for your help!

Discussions

Problem with RSM analysis

Re: Problem with RSM analysis

Re: Problem with RSM analysis

Re: Problem with RSM analysis

Re: Problem with RSM analysis

Re: Problem with RSM analysis

Re: Problem with RSM analysis

Re: Problem with RSM analysis

Re: Problem with RSM analysis

Re: Problem with RSM analysis

Recommended Articles