Re: DOE Question - Too Few Degrees of Freedom

Report Inappropriate Content · Jun 8, 2023 2:09 PM

I am very new to the JMP community - just got my account up and running the other day.

I am running an experiment and I wish to see the effects of the main factors and all two-level interactions. I am using an L27 Taguchi array to account for four 3-level factors with three responses per row/per trial. I left some columns blank to make room for the two-level interactions. However, what I don't understand is when I run Fit Model and look at the Effect Tests, it appears that I do not have enough degrees of freedom to see whether or not my main factors and my two-level interactions are significant. (I plan on using an F-test to determine significance).

From my understanding, the total degrees of freedom for a given design/system is N-1, where N is the number of responses, or data points. I should have (27 runs * 3 responses per run) - 1 = 80 degrees of freedom. Even with subtracting off the #levels - 1 degrees of freedom for each main factor (80 less 8 = 72 left now) and the degrees of freedom for each two-way interaction, I don't see how I could run out of degrees of freedom.

Maybe I input something wrong in the table in JMP? Note: I did change some of the combinations of the numbers/levels for each run in my Taguchi array according to the columns that I left blank; see my attachments for my JMP file and my raw data.

Thank you in advance for your help and guidance!

Georg · Apr 11, 2022 2:05 AM

Welcome to the Community.

I think you need to specify your two-way interactions in the model dialog (or in the DOE generation dialog already, where it effects power and n runs). The script saved in your table contains only main effects.

Georg

Dan_Obermiller · Apr 11, 2022 07:56 AM

You have all of your factors set up as nominal. With three levels, that means that each main effect will require two degrees of freedom. The interactions will require even more.

If you convert your columns to be numeric, continuous, you can estimate the model that you want. I have included that revised version of your table.

Dan Obermiller

CentroidRabbit1 · Apr 11, 2022 10:28 AM

Hi @Dan_Obermiller . That worked; I can now see the effects of the main factors and interactions. Can you explain/elaborate a bit more on what you did?

Here are my questions:

1) How did you convert the columns to numeric, continuous? What guided you to make this decision? What did you mean when you said that my original model was "nominal"?

2) How did you control the degrees of freedom for each interaction/factor by making this change to numeric, continuous?

3) What are the differences between my original model (without the numeric, continuous option selected) and the revised model (including the numeric, continuous option)? Why is the revised model a better option?

Thank you for your help!

Dan_Obermiller · Apr 11, 2022 8:56 AM

Before I explain what I did, I must point out that I am not certain that you SHOULD do it. Please see the post by @statman for some very good questions.

To answer your questions 2 and 3: I only took a very cursory look at your table and saw the factors were nominal. I also saw that the factors were "left-aligned" within the column indicating that they were text fields even though the values were numbers. Nominal factors will always take more degrees of freedom because the idea of interpolation does not exist. For example, -1, 0, +1 only uses one degree of freedom if treated as continuous. Conceptually, you can safely ignore the 0 since you can estimate the slope using only -1 and +1 (again, this is the concept, not what actually happens mathematically). But for those same three values, if they are nominal, I could call them A, B, and C. Now I need to see the effect of A as well as the effect from B. (C can be obtained by difference). That is two degrees of freedom as I cannot use interpolation to help me.

Note that I am not saying the model with continuous factors is the better model. That is a different question. I am only saying this is a way for you to get a model with interactions. Does it make sense? I don't know. You know your data. Does it make sense to think of any of these as continuous variables? Based on their names, I am not sure it does. Again, refer to some of @statman's questions. Taguchi designs are often resolution III designs which means they are not supposed to estimate interactions.

As for your first question, how did I make the changes in JMP. I selected your factor columns. I went to Cols > Standardize Attributes. I clicked on Attributes and chose Data Type. Changed the Data Type to Numeric

I went back to Attributes and selected Modeling Type. I changed the Modeling Type field to Continuous. Clicked OK. Fit the model.

Dan Obermiller

CentroidRabbit1 · Apr 11, 2022 1:56 PM

You bring up a good point about whether or not the factors are continuous or discrete. Based on this post -

Continuous Vs Discrete Numeric Factor - it seems like my factor levels are discrete. What do I mean by this? I have chosen discrete/specific levels for each factor that our convenient for my experiment. For example, I could have chosen any value for each level of the factor that was experimentally possible, such as 50 inches for ball position 1, 0.5 inches for ball position 2, 1000 inches for ball position 3, etc. However, when reflecting this information in the Taguchi array, I decided to leave the labels 1, 2, 3 instead of 50 inches, 0.5 inches, 1000 inches to reflect the levels of each factor, so I could clearly see the levels in my design table/array.

I don't think that using labels such as 1, 2, 3 for each factor instead of the actual values of 50 inches, 0.5 inches, and 1000 inches would throw off my analysis?

statman · Apr 11, 2022 11:08 AM

I have the following thoughts:

An important question regarding your "runs", are these repeats or are they replicates? If the runs are done without any change to the treatment combinations, they are repeats and are not considered independent events (so they do not contribute to degrees of freedom). I don't see run order in your table (always good to keep track of run order). It appears they are repeats as you have one SN ratio for each treatment. Therefore you have 26 degrees of freedom total; 8 for your main effects (4 linear, 4 quadratic) and 18 for higher order effects. 2-factor interactions "require" 4 degrees of freedom each, so you do not have enough DF's.

A couple of questions about the data table:

1. The mean of the three runs does not correspond to the mean column in your data table? What data was used to calculate the means? Your formula is just calculating the mean of run 1 and 3?

2. Your equation for SN ratio appears to be the standard deviation of runs 1&3? This not the correct formula for SN ratio (per Taguchi)? If the target value of the response variable is Larger or Smaller is better, then just use the Standard deviation. If you are trying to hit a target nominal, then there is a different equation. What is the target for the response?

https://www.jmp.com/support/help/en/16.2/?os=mac&source=application&utm_source=helpmenu&utm_medium=a...

3. You have an unusual range of data from the 3 runs for treatment (3 2 3 3, levels corresponding to factors in order of your table)?

4. If you take your design and go to DOE>Design Diagnostics>Evaluate Design, you may be able to figure out what the appropriate model is? Here is a color map of correlations:

Screen Shot 2022-04-11 at 9.07.00 AM.jpg

5. I don't see the outer array? One of the most significant contributions of Taguchi was to experiment on controllable factors over a factorial of noise factors. The SN ratio was meant to be calculated over the data collected from the repeated runs over the factorial of noise (outer array).

I've attached a data table with, I think are the correct Means and Standard deviations for each treatment. I saved the fit model platform though I did not completely diagnose the singularity.

"All models are wrong, some are useful" G.E.P. Box

CentroidRabbit1 · Apr 11, 2022 02:25 PM

Hi @statman, thanks for your explanation. I appreciate your patience and detailed response.

Here are the answers to your questions, along with follow-ups that I have:

I have the following thoughts:

Q1) An important question regarding your "runs", are these repeats or are they replicates?

A1) Based on your explanation, they are repeats. I did not change any of the treatments for each trial within each run/treatment.

Q2) It appears they are repeats as you have one SN ratio for each treatment. Therefore you have 26 degrees of freedom total; 8 for your main effects (4 linear, 4 quadratic) and 18 for higher order effects. 2-factor interactions "require" 4 degrees of freedom each, so you do not have enough DF's.

A2.1) Ok, this is where I made a mistake. I assumed that each trial within each run contributed to the DF (27 runs * 3 repeats each per run -1 = 80 degrees of freedom). However, I'm not too familiar with with SN ratio; this was something that was auto-generated when I created the Taguchi design in JMP.

A2.2) If my understanding from my last statement is correct, and I only have 26 degrees of freedom, and am in need of 6 more degrees of freedom. Would a way to see these other effects be to complete an F-test on the original data set in Effect Tests, see which factors are significant, and then create another Taguchi design comparing only the factors/interactions that were significant in the original dataset plus the factors that you couldn't see in the original because I didn't have enough DFs? For example, say factor B is insignificant. Then, you know that factor B cross factor C would also be insignificant, so you would want to get rid of it in a new design as it is eating up precious DF?

A couple of questions about the data table:

Q1) The mean of the three runs does not correspond to the mean column in your data table? What data was used to calculate the means? Your formula is just calculating the mean of run 1 and 3?

A1) This is an error on my end; it stemmed from the fact that I inserted a column for the run 2 without also updating the mean column. I assumed that the mean column would automatically update to take the average of runs 1, 2, 3 without me changing that column's settings.

Q2) Your equation for SN ratio appears to be the standard deviation of runs 1&3? This not the correct formula for SN ratio (per Taguchi)? If the target value of the response variable is Larger or Smaller is better, then just use the Standard deviation. If you are trying to hit a target nominal, then there is a different equation. What is the target for the response?

A2) This is probably a carry-over error from not getting the correct mean above.

Q3) You have an unusual range of data from the 3 runs for treatment (3 2 3 3, levels corresponding to factors in order of your table)?

A3) I'm not sure what you meant by this. Can you rephrase?

Q4) If you take your design and go to DOE>Design Diagnostics>Evaluate Design, you may be able to figure out what the appropriate model is? Here is a color map of correlations.

A4) By this, I'm guessing you're talking about whether to use discrete or continuous variables, and what type of regression to use?

Q5) I don't see the outer array? One of the most significant contributions of Taguchi was to experiment on controllable factors over a factorial of noise factors. The SN ratio was meant to be calculated over the data collected from the repeated runs over the factorial of noise (outer array).

A5) Based on this link - https://www.jmp.com/support/help/en/16.2/index.shtml#page/jmp/factors-8.shtml# - I'm pretty sure I only have signal factors. I am able to control all of the inputs. Therefore, would I still need an outer array?

Thank you for your help!

statman · Apr 11, 2022 06:15 PM

My responses for clarification and to agree with Pete, if you are new to experimentation, perhaps some fundamental understanding and especially an understanding of Taguchi is suggested.

1. "However, I'm not too familiar with with SN ratio; this was something that was auto-generated when I created the Taguchi design in JMP. "

Again, you need too understand the methodology are using and you should determine if this is the approach you want to take. I don't see any outer array, so SN Ratios aren't that useful. You should investigate cross product arrays(Cox) and Inner/outer arrays (Taguchi)

2. Regarding your point A.2.2: First look at the data for practical significance. Did the response change of any practical value? You don't have randomized replicates, so the F-test might be meaningless. I would saturate the model and use Normal and Pareto plots to determine significance. Then iterate with another experiment.

3. A2 data table. There is a different equation based on what target you want for the response variable. In any case the one you're using is incorrect.

4. Look at the results for the 3 runs for the specified treatment. The variation in that treatment is extraordinary. This is not likely due to factor effects, so due to noise. You might want to evaluate the integrity of those data points before summarizing and using that summary data for analysis.

5. A4, NO. This has nothing to due with data type. It is evaluating the aliasing of your design. You should have written the model you were investigating BEFORE running the experiment.

To consult the statistician after an experiment is finished is often merely to ask him to conduct a post mortem examination. He can perhaps say what the experiment died of.

Sir Ronald Fisher

"All models are wrong, some are useful" G.E.P. Box

P_Bartell · Apr 11, 2022 01:25 PM

I'm coming to this thread with a different line of questioning from @Dan_Obermiller and @statman although I concur with all their thoughts. I'm going to ask one question: What is your level of expertise wrt to the Design of Experiments and modeling process? If it's novice...I would cease and desist any further experimentation or analysis and run to the SAS "Statistical Thinking for Industrial Problem Solving" course and complete the entire course end to end. Here is a link to the course landing page:

Statistical Thinking for Industrial Problem Solving

The course is free, all you need is a web browser and curiosity. You don't even need JMP...the course design is such that as you complete the course and exercises you will be running JMP Pro in a virtual machine environment.

DOE Question - Too Few Degrees of Freedom