Solved: Response for DOE run with repetition & non normal distributed

Justin_Bui · Jun 8, 2023 2:11 PM

Hi all,

I'm newbie for DoE & trying to use 2^3 Full factorial design.

some detail of my test is:

- All 3 factors is categorical. I have 8 runs with combination of them

- My response is continuous with target is minimize. I measure the result when changing factors condition.

- I use repetition, for each run i built 20 pcs for same factors setting.

When putting data into the data table. I have some big question.

1) There are only 1 box for response result in data table . Normally I will put the mean of 20pcs repetition. But distribution of this 20pcs is not normal. It's lognormal & extremely right skewed. So what data should I put into the run's results? (mean or std deviation or what)

2) I tried to put the mean or std.deviation then fit the regression model to data. it show very bad fit.

Is that normal if you factors are all categorical?

Can I still use factorial profiler with my test to have some judment about main effect & interaction effect?

Thank so much for helping

statman · Sep 17, 2022 09:59 AM

As I thought, you have repeats. I would do the following:

1. First set your data table up so you have the Treatment combinations grouped (there should be 20 rows of each treatment combination, a total of 160 rows) and 1 column for the response variable (the 20 repeats should correspond to the 20 rows for each treatment combination). If you had the 20 data points for each treatment as separate columns in your data table, then simple Stack those columns.

2. Graph the within treatment data. You can use the Variability (Analyze>Quality and Process>Variability/Attribute Gauge Chart) or Graph Builder. If Variability Chart, use run order or treatment combination in the X, Grouping. You can also plot the distributions within treatment (Analyze>Distribution (By treatment). Look at the data. Are there any unusual data points? If not , you will need to determine what enumerative statistics best describe the central tendency and variation of those distributions.

3. Summarize the within treatment variation (due to repeats: pc-to-pc, within pc and measurement error components). Table>Summary> highlight the Y and select the appropriate statistics from the Statistics drop down menu (You will likely select multiple statistics)>Put the factor columns in the Group window. This will get you back to the 8 treatments with summary statistics for each treatment.

4. Perform your Analysis of the experiment Always Practical>Graphical>Quantitative. Does the data makes sense? How does it compare to your predictions, your hypotheses? Did the Y change of any practical significance? How you analyze the data is a personal choice. I suggest saturating the model and getting Normal, Pareto and Bayes plots to analyze un-replicated designs (e.g., Analyze>Fit Model). Identify the insignificant effects and remove them from the model, then re-run the analysis to get residual plots.

"All models are wrong, some are useful" G.E.P. Box

View solution in original post

ian_jmp · Sep 16, 2022 06:45 AM

The attached table might help to get you started. Open it up and click on the '+' sign of the Y column to see the formula that is generating the response values. Then run each of the saved scripts in turn. Note that (in the first script) if you have measured 20 pieces for each factor combination, you need to define the number of replicates as 19.

Justin_Bui · Sep 16, 2022 11:48 PM

Thanks for your answer
But is that really replicate?
What i do is for each run, I built a batch with 20pcs. No changing or reseting factors.
After finish this batch i move to next run with a new 20pcs batch.

With each run I observed sample distributions are right skewed. But the mean & std deviation change. I guess that there are significant factors created this observation. But model show no factor is significant (p-value).

In opposite. When looking at factor profiler. I see the slopes & it matches my guess about which factors is important.

Then I don'n know how to make a conclusion

statman · Sep 17, 2022 09:59 AM

As I thought, you have repeats. I would do the following:

1. First set your data table up so you have the Treatment combinations grouped (there should be 20 rows of each treatment combination, a total of 160 rows) and 1 column for the response variable (the 20 repeats should correspond to the 20 rows for each treatment combination). If you had the 20 data points for each treatment as separate columns in your data table, then simple Stack those columns.

2. Graph the within treatment data. You can use the Variability (Analyze>Quality and Process>Variability/Attribute Gauge Chart) or Graph Builder. If Variability Chart, use run order or treatment combination in the X, Grouping. You can also plot the distributions within treatment (Analyze>Distribution (By treatment). Look at the data. Are there any unusual data points? If not , you will need to determine what enumerative statistics best describe the central tendency and variation of those distributions.

3. Summarize the within treatment variation (due to repeats: pc-to-pc, within pc and measurement error components). Table>Summary> highlight the Y and select the appropriate statistics from the Statistics drop down menu (You will likely select multiple statistics)>Put the factor columns in the Group window. This will get you back to the 8 treatments with summary statistics for each treatment.

4. Perform your Analysis of the experiment Always Practical>Graphical>Quantitative. Does the data makes sense? How does it compare to your predictions, your hypotheses? Did the Y change of any practical significance? How you analyze the data is a personal choice. I suggest saturating the model and getting Normal, Pareto and Bayes plots to analyze un-replicated designs (e.g., Analyze>Fit Model). Identify the insignificant effects and remove them from the model, then re-run the analysis to get residual plots.

"All models are wrong, some are useful" G.E.P. Box

Victor_G · Sep 16, 2022 09:12 AM

Hello @Justin_Bui,

Welcome to the Community !

For information about Full Factorial designs, you can check the JMP help : Full Factorial Designs (jmp.com)

Just to add some information and explanations to the great work provided by @ian_jmp :

1) There are several options to deal with "repetitions" (depending if it is replicate runs or repeats):

Repetition is making multiple response(s) measurements on the same experimental run, while replication is making multiple experimental runs for each treatment combination.
In your case, you have done several times (20 times in total) each treatment from the design, so you are in the situation of replicates. You specify 19 as your number of replicates: you have done the original treatment combination from the design, and then replicate it 19 times, so you end up with 20 pieces/experiments per treatment. So here, you just add a row for each experimental run you have done with its response, since it is done on a different piece for each treatment of the design. No need to aggregate the results with a mean or std deviation here.

2) For the second part, your model certainly needs some "refining"/improvement.

For example, your last interaction "Materials*Assy side" doesn't seem significant (p-value 0,659 > 0,05), so it can be removed from the model. To have an example on how to remove non-significant terms from your model, please check Reduce the Model (jmp.com)

Once you have introduced all your experimental runs (20 experimental runs per treatments in the DOE) as rows in your datatable and removed non-significant terms, it will be a lot easier for you to assess/evaluate the DoE model :

Do you have a significant model ? Does it make sense with your domain expertise ?
How much do the factors and model explain the variability in your response measurements ?
Are there interactions ?
What are the effect sizes of the factors ?
...

...and for us to help you further in the interpretation if needed.

Hope this answer will help you,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

Justin_Bui · Sep 16, 2022 11:30 PM

Thanks for your answer!
First I think what I do is repeat instead of replicate. The 20pcs were made without changes of factors (same run- no reset).
Just see it like for each run, i build a batch of 20 pcs. And this 20pcs have right skewed distribution. So how can I place result in my data table?

About second answer, i understand that we should remove non significant factor or interaction out of the model. But for all factors I have. Nothing is significant.

But with descriptive approach i clearly see some changes in the test.

Victor_G · Sep 17, 2022 01:47 AM

Ok, then you're right it's repeats, not replicates since there is no randomization.

Your idea of having the results through the mean and std dev seems a good idea. Depending on your response distribution (and depending on what it makes sense for you on a practical level), you can replace the mean by median (or mode), and std dev by variance, or range (max - min). From then you would have two responses to optimize :
With the mean/median/mode, you can reach your target/minimize/maximize (depending on your objective).
For the dispersion of results (Std Dev, Variance or Range), your aim will be to minimize (but perhaps with a lower importance), in order to find an optimized response with the best certainty about its repeatability.

Other options could be to transform your response (from lognormal to normal), or use Generalized Linear Regression with the right type of distribution, or try bootstrapping the mean and std dev of each treatment thanks to your 20 experiments per treatment combination.

If you don't have statistical significance but your model has still practical relevance, this is already interesting and a good sign. Even if the model here is not significant, you explain 88% of the variability in your system through the model and factors used.

Hope that it will help you,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

statman · Sep 16, 2022 10:54 AM

I'm not sure I understand exactly how the 20 pcs are gotten. It seems Ian and Victor think the 20 pcs are independent experimental units. The independence comes from changing the treatment combinations between each pc that is gotten for each treatment combination. This is considered replication. If however you are setting up a treatment and getting 20 pcs without making any changes to the treatment combinations, then you really have only one experimental unit that consists of 20 pcs. This is considered repeats. In the second case, the pcs are not considered independent units and therefore do not increase the DF of the experiment. In order to provide advice, this needs to be clarified. Based on your explanation, it looks like you have repeats?

One other observation from your attached picture, you are missing an effect (third order interaction) in your effect summary.

"All models are wrong, some are useful" G.E.P. Box

Justin_Bui · Sep 16, 2022 11:24 PM

Yes. I dont change the factors setting and take 20pcs for each run. Then move to next run so it's repeat in my opinion

Response for DOE run with repetition & non normal distributed

Re: Response for DOE run with repetition & non normal distributed

Re: Response for DOE run with repetition & non normal distributed

Re: Response for DOE run with repetition & non normal distributed

Re: Response for DOE run with repetition & non normal distributed

Re: Response for DOE run with repetition & non normal distributed

Re: Response for DOE run with repetition & non normal distributed

Re: Response for DOE run with repetition & non normal distributed

Re: Response for DOE run with repetition & non normal distributed

Re: Response for DOE run with repetition & non normal distributed

Recommended Articles