cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar

Need help with 3x3x3 Full Factorial DoE Data Analysis

Hello!

I am new to the community and I would like to get some advice on the data result after completing DoE (3x3x3 full factorial) experiment.

 

I have three factors (sodium chloride, sodium citrate, and glycine) and tested at 3 levels (0,100,200mM sodium chloride; 0,50,100mM sodium citrate; and 0,50,100mM glycine). Tested all 27 runs. The response is the %monomer intensity. Ideally, I want to achieve close to 100% in monomer intensity based on these three factors.

 

At the end of the experiment however, the data does not have a clear trend (the lowest p value was 0.27 for sodium chloride and the second lowest p value was 0.28 for sodium chloride*sodium citrate. Everything else (glycine and other two way interactions) have at least P value of 0.5 (glycine at 0.9).

 

 

labscientist_RD_0-1650026143774.png

labscientist_RD_1-1650026254962.png

 

How can I interpret this data if the outcome was not close to the 100% monomer? I suspect that the highest level for each factor was not high enough to generate more useful data. Perhaps none of these factors has huge impact on the % monomer improvement.

 

Please let me know how to interpret this data based on the result (file JMP attached). How can I make conclusions based on the result?

labscientist_RD_2-1650026565859.png

 

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Need help with 3x3x3 Full Factorial DoE Data Analysis

What would expect of the variation in the response if nothing changed. That is to say, if all factors were held constant, how much would the response vary over repeated runs? The Summary of Fit table estimates the standard deviation of the response to be 12%. Does that result make sense or does it surprise you?

 

summary.PNG

 

A plot of the residuals from the full model suggests that the variance of the response is not constant over the full range of the response. The variance appears to be proportional to the response. Is that case possible?

 

resid.PNG

 

I tried a Box-Cox transform of the response with lambda = -1, or reciprocal. It did not really help. But the function is very flat, so I did not expect much change.

 

box-cox.PNG

 

The studentized residual over time (assuming row order is run order) does not suggest a strong outlier, but it does suggest a potential cycle. Could there be a lurking variable that was not controlled, not measured, but could have varied run to run and affected the response?

 

studentized.PNG

 

I also thought that strong non-linear effects might be present but adding quadratic terms did not improve the model.

 

Is it possible that the data for the response were entered in the wrong rows?

View solution in original post

10 REPLIES 10

Re: Need help with 3x3x3 Full Factorial DoE Data Analysis

What would expect of the variation in the response if nothing changed. That is to say, if all factors were held constant, how much would the response vary over repeated runs? The Summary of Fit table estimates the standard deviation of the response to be 12%. Does that result make sense or does it surprise you?

 

summary.PNG

 

A plot of the residuals from the full model suggests that the variance of the response is not constant over the full range of the response. The variance appears to be proportional to the response. Is that case possible?

 

resid.PNG

 

I tried a Box-Cox transform of the response with lambda = -1, or reciprocal. It did not really help. But the function is very flat, so I did not expect much change.

 

box-cox.PNG

 

The studentized residual over time (assuming row order is run order) does not suggest a strong outlier, but it does suggest a potential cycle. Could there be a lurking variable that was not controlled, not measured, but could have varied run to run and affected the response?

 

studentized.PNG

 

I also thought that strong non-linear effects might be present but adding quadratic terms did not improve the model.

 

Is it possible that the data for the response were entered in the wrong rows?

Re: Need help with 3x3x3 Full Factorial DoE Data Analysis

Hello markbailey,

 

Thank you for your response! This is very interesting study because we saw improvement for the % monomer when we added various sodium chloride concentrations to the sample (from 20% without sodium chloride to about 50% monomer with 500mM). So, we were very interested to see if we can improve % monomer intensity by adding not just sodium chloride itself, but other types of chemicals (either one factor at a time or mix of factors).

 

During the DoE run, % monomer intensity fluctuates about 5-10%. This is because the instrument is very sensitive for even dimer or large particulates. Additionally, the sample that we picked for the study is already unstable (17-20% monomer). The instrument measures total of 10 times and take average of that to make it as 1 measurement. I took two measurements and take the average from the two. If the % monomer gave me 30% on the first one and 55%, I redo the measurement and compare the measurement with the previous data (including those 10 separate measurements). Then, if repeat measurement showed about 28%, and then 32%, I simply remove 55% off the equation.

 

You mentioned that Summary of Fit table estimates the standard deviation of the response to be 12%. I think this makes sense to me given that the sample is already not too stable. Perhaps I should've focused more on much higher salt concentrations (there is a paper suggest that high salt concentration may be necessary to optimize % monomer).

 

This is very helpful! Perhaps I should do a quick test with each salt as high as 500mM instead of 100mM.

Re: Need help with 3x3x3 Full Factorial DoE Data Analysis

Hi markbailey,

 

I've added two measurements per sample to the JMP file and attached the file to this message (this experiment was performed in the row order).

I used Fit Model to test for % monomer 1 and % monomer 2. I saw some improvement in Pvalue under Effect Summary, but I think I need to test out for higher level than 200mM sodium chloride, 100mM sodium citrate, or 100mM glycine.

labscientist_RD_1-1650039542816.png

 

Please take a look at this file and let me know if you can provide me some feedback on the data result or recommendation for the future experiment. 

Thank you!

 

 

Re: Need help with 3x3x3 Full Factorial DoE Data Analysis

There seems to be a large range of results for same conditions:

 

range.PNG

 

Are the two results for each condition / treatment two independent runs or two measures for one run? I wonder if it is measurement error, not experiment error. I mean error in the statistical sense (random variation). I do not mean that a mistake was made.

 

I also tried the full factorial model hoping that it might suggest some strong effects. The weaker effects might then be removed and a clearer picture would emerge. But instead I got a significant lack of fit.

 

lof.PNG

 

This test compares the estimate of the random variation in the response using the sum of squares error from the model with the estimate from the replicates. Again, I do not know if the two response for each treatment represent replication (new run) or repeated measurement of the same run. The total error sum of squares is from the model. The pure error is from the 'replicates.' You can see that the difference is large and it suggests that there remain unaccounted effects in the data. The effect of a lurking variable?

Re: Need help with 3x3x3 Full Factorial DoE Data Analysis

Hi @Mark_Bailey , it is a repeated measurement of the same run. Each run measures other useful information not just % monomer, and I am hoping to get some answers/trends once I include all response data to JMP. But I think this may be the effect of a lurking variable. The sample itself is unstable (as you can see from sample#111 with ~30%. Ideal sample would be 100%).

It could be due to the measurement error or these factors are just not the right choice for this sample.

Perhaps these factors and levels are not the most optimal conditions.

 

I need to do more analysis on this data but thank you for providing me with your feedback!

Re: Need help with 3x3x3 Full Factorial DoE Data Analysis

if the sample is unstable, is the time between the end of the run and the measurement constant?

 

The choice of factors, in terms of finding effects, is obviously critical. Perhaps a 'wider net' following brainstorming or review of a process map or the chemical reaction could identify other factors. it is too easy (i.e., it happens all the time) to think you know more than you really know because your understanding is based on OFAT data or anecdotal experience. One purpose of the experiment is to realistically explore as many variables in 'the system' as you can to learn how it actually works. I have been teaching and helping users with DOE for decades and it is the exception in just my experience that the results fail to surprise, to challenge what is know.

 

I am not sure what you mean when you say, "", but your DOE space should provoke a change in the response for better and for worse. You want to model the range of the response. The experiment is NOT about finding the optimal conditions or 'pick the winner' among the treatments. It is about providing the best data you can afford to support the estimation of the model parameters. The model is about finding the optimal conditions. Again, I might have misinterpreted your meaning.

P_Bartell
Level VIII

Re: Need help with 3x3x3 Full Factorial DoE Data Analysis

In addition to all of @Mark_Bailey 's thoughts of which I concur, one other observation I see...in the design you have many treatment combinations where one and even sometimes two of the factors are set to 'zero' values, and finally since it's a full factorial you have to have one combination where everything is set to zero. I'm interpreting this as total absence from the trial, reaction (if that's the type of system in play here)? Sometimes in chemical systems complete absence of one or more compounds/ingredients/chemicals can induce an entirely different physical/chemical/biological behavior from the behavior that you are trying to characterize. Could that be what's going on here? With something in the reaction 'missing' a completely different physical phenomena is in play compared to the treatment combinations where all three ingredients are included in the treatment combination. Process knowledge and understanding is key here.

Re: Need help with 3x3x3 Full Factorial DoE Data Analysis

Hello P_Bartell,

Thank you for your response!

 

Yes you are correct. In this study, zero values meaning total absence from the trial. 

The team wanted to see if we can improve the sample % monomer by introducing either all three combinations (with various concentrations), two combinations or just one factor to the sample. We had one experiment last time where the sodium chloride actually improved the % monomer of the sample, but the outcome was 50% monomer at 500mM NaCl, about 47% at 150mM NaCl from 20% monomer without salt addition. 100mM NaCl to the sample last time gave us about 38% monomer (slight improvement from 20%). The team was also interested to see if there is any relationship between two factors (or all three).

 

I wonder f the team selected the incorrect concentration for each factor for this experiment.

 

Thank you for your feedback!

P_Bartell
Level VIII

Re: Need help with 3x3x3 Full Factorial DoE Data Analysis

I'm not sure I follow the sequence of factor levels in the "...one experiment last time..." but it sure sounds like 'the team?' may have been trying a little OFAT experimentation and making conclusions after each run...kind of chasing something that would be much more fruitful (efficiency chief among 'em) using a rational DOE approach? This OFAT approach can be very misleading especially with systems where interactions abound.

 

I was never a big fan of having chemical based experiments/systems where a factor was set to 'zero' in the way in which you describe. Too often this absence of 'something' created large non linearities in the response space we were really interested in, making linear modeling methods problematic.