cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
aaidaa
Level II

How to repeat points across extremities in a custom DoE to better understand variance?

Hi!

I am currently optimising an experimental procedure and have opted for a custom design with 4 continuous variables with 2 levels each and a single 3 level-categorical variable. I have included centre points (replicated) to allow for some analysis of variance and the model is generated with RSM interactions not going beyond 2 factor interactions. The residual by plots prove to be very useful but only in relation to the output. Is there a way in which one could see whether there is an increase in variance (decrease in repeatability) in association with an extreme end of one of the factors/interactions. i.e. how can I assess variability in relation to the factors in the design space as opposed to just the response. 

 

Thank you for your help before hand!

2 ACCEPTED SOLUTIONS

Accepted Solutions

Re: How to repeat points across extremities in a custom DoE to better understand variance?

I agree there are very good answers here, but I'll add perhaps one more with a specific approach.

 

>>i.e. how can I assess variability in relation to the factors in the design space as opposed to just the response?

 

You can use the EMP platform to see if specific regions of your design space have more variation than others. Credit goes to Bill Kappele for inventing this tactic, which he calls a "Sanity Check." In the example below, one of the areas/trials in the design space has more variation than the others. The trick involves creating a new "check" column, which is an indicator of all your factor levels. It also assumes that you have repeated runs in enough places to be able to estimate variance. I've attached a sample data table for which the EMP script on the table outputs the example below.

Jed_Campbell_0-1671726522255.png

If you want a statistical rather than visual test, the Unequal Variances test in the Fit Y by X platform can give this (also saved to data table as Fit Y by X).

Jed_Campbell_1-1671726977355.png

 

 

View solution in original post

Victor_G
Super User

Re: How to repeat points across extremities in a custom DoE to better understand variance?

Hi @aaidaa,

 

Happy New Year ! And thank you for your response.

Looking at your different points :

 

  1. Finding an optimum in an area of large(r) variance is an interesting situation. What you have to figure out is how large is this prediction variance compared to your target and expectations. It may be wiser to do some validation runs in this optimum settings, to decrease variance and assess the real predicted optimum performances/responses at this point.

  2. (& 3.) As described in "Optimal Design of Experiments: a case study approach" by Bradley Jones and Peter Goos : "The best way to allocate a new experimental test is at the treatment combination with the highest prediction variance". In order to optimize your efforts, you can iteratively create new runs at locations with the highest variance in your experimental space.
    - Looking at the model variance, you can look at the script "Evaluate Design" and in the red triangle of the "Prediction Variance Profile", click on "Maximize Variance". This will give you the settings of the factors where the model variance is the highest, and can provide a good direction on where to add a new experiment in your DoE.
    - For the input and response variance, if you already have knowledge on the variance of the factors (and/or on the response measurements, thanks to previous MSA studies for example), you can also use this information through the Simulator (jmp.com) platform to be able to create simulated distributions of your responses at the optimum settings and evaluate mean and standard deviation of your different responses, given the variance of inputs and responses you have entered.
    - If you're looking at the "final/total" variance (which will probably be a mix of model variance, response variance and input variance if you have replicates), one way to continue could be to save the column "PredSE" of each of your responses, and using the profiler (from "Graph" menu, then "Profiler") with the formula of predicted standard errors of your responses (and then search to maximize PredSE of your responses, with the possibility to change the relative importance of your responses if it is relevant for your case) to determine where you can focus your efforts and repeat or create new experimental runs. You can also have a look at the Design Space Profiler (jmp.com) platform from JMP 17 to assess if you're able to find optimum points (and how much of the samples would be in specs), given some constraints/specifications on your responses target. You can also add PredSE of your responses to specify a constraint on the standard deviation of each of the responses if you have an idea on the precision you would like to have.

 

All these approachs are quite complementary, and can be really helpful to focus the efforts on the most informative experiments to run.

I hope these new comments will help you, 

Victor GUILLER
Scientific Expertise Engineer
L'Oréal - Data & Analytics

View solution in original post

12 REPLIES 12

Re: How to repeat points across extremities in a custom DoE to better understand variance?

I might not understand your question. I think you are asking about modeling or testing the variance of the response. Your design and model assume that the variance of the response is constant everywhere. The confidence intervals/regions represent uncertainty in the estimate of the response mean or individual predictions, not the inherent variation in the response.

aaidaa
Level II

Re: How to repeat points across extremities in a custom DoE to better understand variance?

Hi Mark,

Thanks for this! Yes I'm referring to testing the variance of the response but in relation to specific regions across an input factor. I'll try and give a more specific example: 

 

One of my input factors is pH ranging from 4.5 to 7.5 with pH 6 being one of the centre points. There may be the possibility of there being more variation in the response at a higher pH than a lower pH, if this does exist, we would like to estimate this. As it may mean we have to narrow our pH range (a little below 7.5) to generate a more accurately fitted model. However, I assume I would need to run replicates of particular points across the design space (I cannot afford to run duplicate runs), which would prove challenging given the fact that it is a custom design and I cannot visualise "the corners" of my design space.

 

I hope this is a bit clearer, I am new to DoE so it may be a gap in my knowledge that I'll need to address here. Any suggestion for particular resources would be much appreciated. Thank you!

 

Re: How to repeat points across extremities in a custom DoE to better understand variance?

Thank you for confirming that you are interested in testing and modeling effects on the variance of the response. There is currently no method to design experiments to model the response's variance. Please follow the advice of others regarding the replication of runs. I am not referring to repeatedly measuring the experimental unit at the end of a run but replicating the treatment with a new experimental unit. The cool thing is that JMP has a method of analysis to suit your needs! (It's been around for a long time, too.) See the LogLinear Variance Model documentation available through the Analyze > Fit Model launch dialog.

statman
Super User

Re: How to repeat points across extremities in a custom DoE to better understand variance?

Here are my thoughts as I understand your question.  You want to model variance in addition to the mean.  This requires you to collect multiple data points at each treatment combination.  How you collect those data points will depend on what x's you think would be affecting variation. For example, let's say you have a sample taken at each treatment and it is measured at multiple places across the sample (one experimental unit for each treatment).  In this case, the variation would be due to x's changing within sample and measurement error.  If you model the factors in your experiment, you can learn if factors (or factor interactions) impact those confounded components of variation (likely within sample as it is unlikely the factors in your experiment influence the measurement errors). Another example, Let's say you get multiple samples for each treatment (still one experimental unit) and measure each sample.  The reason for variation in this case is the x's changing sample-to-sample, within sample and measurement error.  Again modeling the factors from your DOE would provide some insight as to whether the factors affect the variability of these components of variation.  Of course you can do multiple "layers" of nested components to determine if factors influence variance components. Make sense?  

"All models are wrong, some are useful" G.E.P. Box
aaidaa
Level II

Re: How to repeat points across extremities in a custom DoE to better understand variance?

Hi Statman,

 

Thank you for your message! 

 

The only issue is that I'm not quite sure which factors (and most importantly, the ranges within those factors) will indeed cause larger variance, which will therefore require me to repeat all experimental runs, making this a very costly process. This is why in my initial design I did not include repeat runs, I only replicated centre points. If I were to repeat some of the treatments from the initial design I could probably afford to re-run 7 experimental variations (out of the 28 runs that my custom DoE generated). This is where I'm looking for suggestions.

 

That aside, there are certain points that I'm unsure about in your suggestion if you can kindly expand on. I've broken it down just to make sure I'm following.

 

Example 1:

  1. "For example, let's say you have a sample taken at each treatment and it is measured at multiple places across the sample (one experimental unit for each treatment). In this case, the variation would be due to x's changing within sample and measurement error."I believe you are referring to re-measuring the samples of experimental run to understand measurement error?
  2.   "If you model the factors in your experiment, you can learn if factors (or factor interactions) impact those confounded components of variation (likely within sample as it is unlikely the factors in your experiment influence the measurement errors)." By modelling the factors in my experiment do you mean model them with the variance of the response being the response here?

 

Example 2

  1. "Another example, Let's say you get multiple samples for each treatment (still one experimental unit) and measure each sample.  The reason for variation in this case is the x's changing sample-to-sample, within sample and measurement error."  When you refer to the factors changing sample-to-sample - are you referring to experimental error, e.g. pipetting error, causing variation between factors? 
  2. "Again modeling the factors from your DOE would provide some insight as to whether the factors affect the variability of these components of variation."  - I've taken this to be a reiteration of the suggestion from the prior example you've provided, but was unsure what you mean by "components of variation". Do you mean the measurement and experimental errors here?
  3. "Of course you can do multiple "layers" of nested components to determine if factors influence variance components." - how would one go about this? Do you have an example that I could perhaps refer to?

Many thanks once again, for your time!

statman
Super User

Re: How to repeat points across extremities in a custom DoE to better understand variance?

Followup:

Repeats and replicates are 2 completely different strategies for experimentation.  Replicates are independent runs of the same experimental treatments (this usually means they are run at different times, often randomized. If done in blocks, then you can learn about long term variation components).  Each treatment will result in an independent experimental unit.  

Repeats are not independent of the treatments.  They are, for lack of a universal word, multiple "data points" for the same treatment combination.  They are not independent experimental units (EU) and therefore are not additional degrees of freedom.  Repeats can be done for a multiple situations/reasons:

1. If the EU is measured multiple times in the exact same location, this would estimate measurement error.  If you then average those data points, you would reduce the measurement error in the study (variance/n) and increase the precision of the experiment.

2. If each EU is measured multiple times in different locations within the EU, this would provide an estimate of both measurement error and within sample variation (of course you could measure each location within EU multiple times to separate measurement from within EU variation).  Averaging again will reduce the variation of the data points within treatment, but in this case it that variation includes within sample variation as well, so you might be interested in whether the treatments impact that short-term variation within sample.  You can do this by adding a response variable in the form of variation (e.g., range, standard deviation, variance).  If I were concerned with measurement error, I would measure each location within sample multiple times and use the averages to estimate the within sample variance.

3. If the EU was multiple samples each measured once, then the data points would reflect measurement error, within sample and sample-to-sample variation (of course just as above, you could separate and assign the multiplicity of variance components depending on how you take the data points).  Again, you can use averages to reduce the variation or using nested components of variation concepts, assign the different variances for each component and create a response variable in the form of variation to model in your experiment.

 

Regarding your comments:

Example 1:

  1. "For example, let's say you have a sample taken at each treatment and it is measured at multiple places across the sample (one experimental unit for each treatment). In this case, the variation would be due to x's changing within sample and measurement error."I believe you are referring to re-measuring the samples of experimental run to understand measurement error? In this case the variation is due to BOTH measurement error and within sample not just measurement error
  2.   "If you model the factors in your experiment, you can learn if factors (or factor interactions) impact those confounded components of variation (likely within sample as it is unlikely the factors in your experiment influence the measurement errors)." By modelling the factors in my experiment do you mean model them with the variance of the response being the response here?  Yes, although if I were concerned about the measurement system, I would have measured each location twice and use the averages of those since this would reduce the measurement error.

Example 2

  1. "Another example, Let's say you get multiple samples for each treatment (still one experimental unit) and measure each sample.  The reason for variation in this case is the x's changing sample-to-sample, within sample and measurement error."  When you refer to the factors changing sample-to-sample - are you referring to experimental error, e.g. pipetting error, causing variation between factors? Not an easy answer via this forum...If the variation can be assigned, I wouldn't necessarily call it experimental error.  If you randomize, then the error cannot be assigned and it would be experimental error. I am suggesting there may be multiple x's changing between sample. It will be a function of the process making the samples.  What x's change in the process every time you make a sample? Usually x's that change at a higher frequency since this is short-term variation.  I would map the process to help identify those variables.
  2. "Again modeling the factors from your DOE would provide some insight as to whether the factors affect the variability of these components of variation."  - I've taken this to be a reiteration of the suggestion from the prior example you've provided, but was unsure what you mean by "components of variation". Do you mean the measurement and experimental errors here?  No...sorry, I am unable to elaborate on this.  You need to understand components of variation studies (see Nested or Hierarchical studies).
  3. "Of course you can do multiple "layers" of nested components to determine if factors influence variance components." - how would one go about this? Do you have an example that I could perhaps refer to?  Again, you need to understand CoV studies as mentioned above.

Start here:

https://www.jmp.com/support/help/en/17.0/?os=mac&source=application#page/jmp/statistical-details-for...

 

Here is a good paper:

Sanders, D., Sanders, R., and Leitnaker, M. (1994) “The Analytic Examination of Time-Dependent Variance Components”, Quality Engineering

 

"All models are wrong, some are useful" G.E.P. Box
Victor_G
Super User

Re: How to repeat points across extremities in a custom DoE to better understand variance?

Hi @aaidaa,

 

You already got excellent answers and questions from other brilliant members of this Community.

 

When reading your questions, I think about (at least) three sources of variance that may be connected but could be evaluated independently :

  1. Prediction variance of the design created : Depending on the repartition of the points in the design, you'll have a difference in the precision of your predictions due to the design chosen, optimality criterion, number of runs, replicates/centre points, etc... If you want to know the prediction variance attributed only to your design, you can have a look at the script "Evaluate Design" on your DoE datatable, and look at the "Prediction Variance Profile (jmp.com)". This prediction variance is only due to your design, and it will help you figure out where in your design space do your predictions have more or less variability.
  2. Response variance : You already have a great answer by @statman. For this part, what might be interesting would be to measure several time each runs of your design, and then calculate mean and standard deviation of your pH response for each run. When modeling the pH response, you can then have two responses instead of one (mean and Std), in order to have an idea about the predicted value in your experimental space (thanks to mean values calculated) and variability of your response (thanks to Std values calculated). It might help you figure out if your pH response has an heteroskedastic behavior (standard deviation is non-constant over the measurement range).
  3. Input variance : In addition to these two sources of variance, you might have to consider also variance in your inputs (for example, influence of the batch of raw material for your factor). For this part, only replicates (individual runs done with the same treatment) will be able to bring this variance information, mixed with measurement variance of the response and prediction variance (depending on where are located the replicate points in your experimental space).

 

A combination of these different techniques (several measurements/runs, use of replicates, and visualization/analysis of the Prediction variance profile) might help you distinguish the causes of variability and their "sizes". 

 

Hope this additional answer will help you,

Victor GUILLER
Scientific Expertise Engineer
L'Oréal - Data & Analytics

Re: How to repeat points across extremities in a custom DoE to better understand variance?

I agree there are very good answers here, but I'll add perhaps one more with a specific approach.

 

>>i.e. how can I assess variability in relation to the factors in the design space as opposed to just the response?

 

You can use the EMP platform to see if specific regions of your design space have more variation than others. Credit goes to Bill Kappele for inventing this tactic, which he calls a "Sanity Check." In the example below, one of the areas/trials in the design space has more variation than the others. The trick involves creating a new "check" column, which is an indicator of all your factor levels. It also assumes that you have repeated runs in enough places to be able to estimate variance. I've attached a sample data table for which the EMP script on the table outputs the example below.

Jed_Campbell_0-1671726522255.png

If you want a statistical rather than visual test, the Unequal Variances test in the Fit Y by X platform can give this (also saved to data table as Fit Y by X).

Jed_Campbell_1-1671726977355.png

 

 

aaidaa
Level II

Re: How to repeat points across extremities in a custom DoE to better understand variance?

Hi Jed,

 

Thank you for this, this is a super useful tool!

 

I'm assuming the Y here is simply the response and not the variance of response and therefore it would be most useful to look at the range of Y against the check. Given the fact the assumption here is having conducted enough repeats of experimental runs, do you know what the minimum number of repeats required are? I could probably repeat 7-10 runs but I would have to somehow augment the design to include those repeats, since I've only repeated centre points in my original design. Do you have any additional suggestions?

 

Many thanks for your you help!