cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
New to using JMP? Hit the ground running with the Early User Edition of Discovery Summit. Register now, free of charge.
Register for our Discovery Summit 2024 conference, Oct. 21-24, where you’ll learn, connect, and be inspired.
Choose Language Hide Translation Bar
przem88
Level I

Determining number of samples per run in fractional factorial design

Hello everybody,

In one experiment we are going to measure 3d printed samples to improve prints quality (geometry). Additionally, I have value of standard deviation of sample run (10 samples distributed over printing bed, and printed under typical operation condition) to be 0.006mm and this should be standard deviation of measured geometry distorion. We want to minimize geometry distortion. We thought a fractional factorial design with 5 factors will be appropriate. Additionally I would like to use 4 central points. According to JMP I got table with 20 runs (minimum). I focused myself on main factors and 2-factor interactions. Without replicate runs I got, according to power analysis, power approx. 0.84. But how to determine the number of samples per run, to be possibly close to the real run mean? What needs to be considered?

I would appreciate for hints. If something is unclear I will try to explain.

Thank you

5 REPLIES 5
statman
Super User

Re: Determining number of samples per run in fractional factorial design

"Data points" added to each treatment combination, while the treatment combination remains constant, are called repeats.  They are not independent events and do not add to the DF's for the experiment.  They can be quite useful.  The reason there is variation in the repeated data points cannot be due to the treatments as they are constant when the repeated measures are taken.  So the variation in those data points is likely due to short-term noise.  What noise is captured in those data points depends on how the data points are acquired.  For example, if the multiple measures are the same geometrical dimension at the same location on the printed part, then that variation is attributed to measurement error (e.g., measurement component of variation).  If the measures are the same geometrical dimension at different locations on the same printed part, then the variation in those data points would be indicative of both measurement error and within part variation.  If the data points are collected from various locations within the printing bed, then you can add the within bed component of variation to the potential source of that variation.  How many data points you need to estimate the variation of each component you want to include in your study is not a statistical question.  It is a function of how many measures will you need to represent the sources of variation you want to estimate.  How likely will the potential x's vary as you collect data?For example, if you are interested in understanding what factors effect within bed variation, how many locations would you want in the study to expose those effects?  The more specific your hypotheses, the more efficient your sampling.  Let's say, for example, you have an hypothesis that suggests how level the bed is will affect the dimensions.  So you chose locations on the bed that will expose that potential effect (probably locations at extremes of the printed part).

"All models are wrong, some are useful" G.E.P. Box
przem88
Level I

Re: Determining number of samples per run in fractional factorial design

Thank you for your feedback!

 

What I understood and what would fit to my experiment (I will try to describe with my own words, please correct me if I misunderstood something):

1. I wanted initially distribute samples from each run (treatment) over printing bed, also including extreme location like corners. Now I see that would introduce addtional error to the model, since bed leveling is surely not perfect. If I want to consider only printing parameters like nozzle and bed temperature, horizontal expansion, etc. I should select only one location on the bed, for example in the middle, is that correct? I am interessted only in geometry measurements.

2. In case of only one location on the bed, would it be then enough to print only one sample (data point) for each run or several samples printed very close to each other and then average measurements (there might be always some differences in material flow or quality, air flow, etc)?

Thank you

statman
Super User

Re: Determining number of samples per run in fractional factorial design

Here are my thoughts:

 

1. I wanted initially distribute samples from each run (treatment) over printing bed, also including extreme location like corners. Now I see that would introduce addtional error to the model, since bed leveling is surely not perfect. If I want to consider only printing parameters like nozzle and bed temperature, horizontal expansion, etc. I should select only one location on the bed, for example in the middle, is that correct? I am interessted only in geometry measurements.

 

If you "randomly" distrusted the treatments over different locations of the bed, you do indeed confound within bed variation with treatment variation.  This will decrease the precision of the experiment.  However, holding the location constant is also NOT a good idea as the results of your experiment are limited to that location.  This is an inference space issue.  So you have options:

1. Confound the location with block and run a RCBD or BIB (replicate strategy).

2. Collect data from different locations while using the same treatment combinations (repeat strategy).  With the data, you do two things (of course plot the within treatment data to look for outliers, et. al.)

  • Average the data which will reduce the within bed variation and increase the precision, AND
  • Calculate the variance of the data and use this as an additional response variable (do the average and variance for each Y).  You will model both the mean and the variance to determine whether factors affect the mean of the dimensions or the variance of the dimensions.  Recognize, the within bed may also be confounded with within part and measurement components of variation.

2. In case of only one location on the bed, would it be then enough to print only one sample (data point) for each run or several samples printed very close to each other and then average measurements (there might be always some differences in material flow or quality, air flow, etc)?

 

This is not a statistical question (and a statistician can't answer it).  If you are concerned with variables not explicitly being varied in the experiment (e.g., noise), you need strategies to handle the noise.  Holding the noise constant is the wrong strategy!

 

The exact standardization of experimental conditions, which is often thoughtlessly advocated as a panacea, always carries with it the real disadvantage that a highly standardized experiment supplies direct information only in respect to the narrow range of conditions achieved by the standardization.  Standardization, therefore, weakens rather than strengthens our ground for inferring a like result, when, as is invariably the case in practice, these conditions are somewhat varied.

R. A. Fisher (1935), Design of Experiments (p.99-100)

 

So the question is how do you run an experiment that is representative of future conditions without reducing the precision of the experiment so much the experiment provides no useful information?

 

 “Unfortunately, future experiments (future trials, tomorrow’s production) will be affected by environmental conditions (temperature, materials, people) different from those that affect this experiment…It is only by knowledge of the subject matter, possibly aided by further experiments to cover a wider range of conditions, that one may decide, with a risk of being wrong, whether the environmental conditions of the future will be near enough the same as those of today to permit use of results in hand.”

Dr. W. Edwards Deming

 

"Block what you can, randomize what you cannot"

Dr. G.E.P. Box

"All models are wrong, some are useful" G.E.P. Box
przem88
Level I

Re: Determining number of samples per run in fractional factorial design

Hello, sorry for late feedback.

Thank you for your thoughs, I will try to plan experiment in such a way.

 

I would have some additional questions:

What if one of treatments fail, for example because of insufficient adhesion to the bed, so that samples cant be printed properly? Should I start next run and then analyze data I can gather? Is there way (and is it possible?) to consider such runs in final model/evaluation?

I can imagine, when I will start the next batch of runs (with parameters that should give me best quality possible) to confirm final model behaviour, I will not consider parameter regions that failed previously.

Thank you

statman
Super User

Re: Determining number of samples per run in fractional factorial design

Just curious, have you had any training in DOE or do you have an experienced mentor?

 

Here are my thoughts.  Always predict outputs (Y's) for each treatment combination.  Remember, the experimentation is part of an iterative learning process (i.e., scientific method).  The first experiment will help you design a better experiment. If you are predicting one of the treatments will fail, is it because level setting is too bold?  Perhaps this is what you may learn from the first experiment.

 

There are a number of ways to handle missing, lost (failed) or special cause treatments.  The effectiveness of the methods do depend on how many treatments you had in the experiment and how many were "lost" (e.g., losing one treatment in a 16 run design is doable).

Dr. Taguchi would suggest the lost treatment may be the most informative.  What did you learn.

When you say "fail", that means you can't measure the intended Y?  Perhaps you could add additional Y's that might quantify the phenomena?

Here are some options for replacement:

  1. Use the grand average of the existing data.
  2. Use your predicted value (or modified prediction based on the other results).
  3. Use regression to approximate the lost treatment by leaving the highest order (or least likely to be active) term out of the model. Model the remaining data and use the value from the prediction equation (saved prediction formula).
  4. Do all of the above, compare the results.  If they agree then the missing treatment did not appear to have much affect  on the analysis. If the results disagree, then:
  1. Rerun the missing treatment combination (and perhaps some others?).  Be aware of potential blocking effects and changes in the noise.
  2. Rerun the entire DOE (not sure this is efficient as it will be in the same space and you still have potential block effects).
  3. If you anticipate (predict) this and run repeats you could use the other data points for that treatment.
"All models are wrong, some are useful" G.E.P. Box