Hi!
I am currently optimising an experimental procedure and have opted for a custom design with 4 continuous variables with 2 levels each and a single 3 level-categorical variable. I have included centre points (replicated) to allow for some analysis of variance and the model is generated with RSM interactions not going beyond 2 factor interactions. The residual by plots prove to be very useful but only in relation to the output. Is there a way in which one could see whether there is an increase in variance (decrease in repeatability) in association with an extreme end of one of the factors/interactions. i.e. how can I assess variability in relation to the factors in the design space as opposed to just the response.
Thank you for your help before hand!
I agree there are very good answers here, but I'll add perhaps one more with a specific approach.
>>i.e. how can I assess variability in relation to the factors in the design space as opposed to just the response?
You can use the EMP platform to see if specific regions of your design space have more variation than others. Credit goes to Bill Kappele for inventing this tactic, which he calls a "Sanity Check." In the example below, one of the areas/trials in the design space has more variation than the others. The trick involves creating a new "check" column, which is an indicator of all your factor levels. It also assumes that you have repeated runs in enough places to be able to estimate variance. I've attached a sample data table for which the EMP script on the table outputs the example below.
If you want a statistical rather than visual test, the Unequal Variances test in the Fit Y by X platform can give this (also saved to data table as Fit Y by X).
Hi @aaidaa,
Happy New Year ! And thank you for your response.
Looking at your different points :
All these approachs are quite complementary, and can be really helpful to focus the efforts on the most informative experiments to run.
I hope these new comments will help you,
I might not understand your question. I think you are asking about modeling or testing the variance of the response. Your design and model assume that the variance of the response is constant everywhere. The confidence intervals/regions represent uncertainty in the estimate of the response mean or individual predictions, not the inherent variation in the response.
Hi Mark,
Thanks for this! Yes I'm referring to testing the variance of the response but in relation to specific regions across an input factor. I'll try and give a more specific example:
One of my input factors is pH ranging from 4.5 to 7.5 with pH 6 being one of the centre points. There may be the possibility of there being more variation in the response at a higher pH than a lower pH, if this does exist, we would like to estimate this. As it may mean we have to narrow our pH range (a little below 7.5) to generate a more accurately fitted model. However, I assume I would need to run replicates of particular points across the design space (I cannot afford to run duplicate runs), which would prove challenging given the fact that it is a custom design and I cannot visualise "the corners" of my design space.
I hope this is a bit clearer, I am new to DoE so it may be a gap in my knowledge that I'll need to address here. Any suggestion for particular resources would be much appreciated. Thank you!
Thank you for confirming that you are interested in testing and modeling effects on the variance of the response. There is currently no method to design experiments to model the response's variance. Please follow the advice of others regarding the replication of runs. I am not referring to repeatedly measuring the experimental unit at the end of a run but replicating the treatment with a new experimental unit. The cool thing is that JMP has a method of analysis to suit your needs! (It's been around for a long time, too.) See the LogLinear Variance Model documentation available through the Analyze > Fit Model launch dialog.
Here are my thoughts as I understand your question. You want to model variance in addition to the mean. This requires you to collect multiple data points at each treatment combination. How you collect those data points will depend on what x's you think would be affecting variation. For example, let's say you have a sample taken at each treatment and it is measured at multiple places across the sample (one experimental unit for each treatment). In this case, the variation would be due to x's changing within sample and measurement error. If you model the factors in your experiment, you can learn if factors (or factor interactions) impact those confounded components of variation (likely within sample as it is unlikely the factors in your experiment influence the measurement errors). Another example, Let's say you get multiple samples for each treatment (still one experimental unit) and measure each sample. The reason for variation in this case is the x's changing sample-to-sample, within sample and measurement error. Again modeling the factors from your DOE would provide some insight as to whether the factors affect the variability of these components of variation. Of course you can do multiple "layers" of nested components to determine if factors influence variance components. Make sense?
Hi Statman,
Thank you for your message!
The only issue is that I'm not quite sure which factors (and most importantly, the ranges within those factors) will indeed cause larger variance, which will therefore require me to repeat all experimental runs, making this a very costly process. This is why in my initial design I did not include repeat runs, I only replicated centre points. If I were to repeat some of the treatments from the initial design I could probably afford to re-run 7 experimental variations (out of the 28 runs that my custom DoE generated). This is where I'm looking for suggestions.
That aside, there are certain points that I'm unsure about in your suggestion if you can kindly expand on. I've broken it down just to make sure I'm following.
Example 1:
Example 2
Many thanks once again, for your time!
Followup:
Repeats and replicates are 2 completely different strategies for experimentation. Replicates are independent runs of the same experimental treatments (this usually means they are run at different times, often randomized. If done in blocks, then you can learn about long term variation components). Each treatment will result in an independent experimental unit.
Repeats are not independent of the treatments. They are, for lack of a universal word, multiple "data points" for the same treatment combination. They are not independent experimental units (EU) and therefore are not additional degrees of freedom. Repeats can be done for a multiple situations/reasons:
1. If the EU is measured multiple times in the exact same location, this would estimate measurement error. If you then average those data points, you would reduce the measurement error in the study (variance/n) and increase the precision of the experiment.
2. If each EU is measured multiple times in different locations within the EU, this would provide an estimate of both measurement error and within sample variation (of course you could measure each location within EU multiple times to separate measurement from within EU variation). Averaging again will reduce the variation of the data points within treatment, but in this case it that variation includes within sample variation as well, so you might be interested in whether the treatments impact that short-term variation within sample. You can do this by adding a response variable in the form of variation (e.g., range, standard deviation, variance). If I were concerned with measurement error, I would measure each location within sample multiple times and use the averages to estimate the within sample variance.
3. If the EU was multiple samples each measured once, then the data points would reflect measurement error, within sample and sample-to-sample variation (of course just as above, you could separate and assign the multiplicity of variance components depending on how you take the data points). Again, you can use averages to reduce the variation or using nested components of variation concepts, assign the different variances for each component and create a response variable in the form of variation to model in your experiment.
Regarding your comments:
Example 1:
Example 2
Start here:
Here is a good paper:
Sanders, D., Sanders, R., and Leitnaker, M. (1994) “The Analytic Examination of Time-Dependent Variance Components”, Quality Engineering
Hi @aaidaa,
You already got excellent answers and questions from other brilliant members of this Community.
When reading your questions, I think about (at least) three sources of variance that may be connected but could be evaluated independently :
A combination of these different techniques (several measurements/runs, use of replicates, and visualization/analysis of the Prediction variance profile) might help you distinguish the causes of variability and their "sizes".
Hope this additional answer will help you,
I agree there are very good answers here, but I'll add perhaps one more with a specific approach.
>>i.e. how can I assess variability in relation to the factors in the design space as opposed to just the response?
You can use the EMP platform to see if specific regions of your design space have more variation than others. Credit goes to Bill Kappele for inventing this tactic, which he calls a "Sanity Check." In the example below, one of the areas/trials in the design space has more variation than the others. The trick involves creating a new "check" column, which is an indicator of all your factor levels. It also assumes that you have repeated runs in enough places to be able to estimate variance. I've attached a sample data table for which the EMP script on the table outputs the example below.
If you want a statistical rather than visual test, the Unequal Variances test in the Fit Y by X platform can give this (also saved to data table as Fit Y by X).
Hi Jed,
Thank you for this, this is a super useful tool!
I'm assuming the Y here is simply the response and not the variance of response and therefore it would be most useful to look at the range of Y against the check. Given the fact the assumption here is having conducted enough repeats of experimental runs, do you know what the minimum number of repeats required are? I could probably repeat 7-10 runs but I would have to somehow augment the design to include those repeats, since I've only repeated centre points in my original design. Do you have any additional suggestions?
Many thanks for your you help!