Discussions

SDF1 · Jun 8, 2023 5:31 PM

Dear JMP Community,

Hoping to hear some thoughts regarding a DOE design targeted at examining standard deviations of a test method.

My colleagues have a test that they perform on a product we make and are wanting to see if they can change parts of the test methodology in order to reduce standard deviations. I'm not aware of a specific DOE platform in JMP that is designed to do this, so we ended up using the custom platform.

After some discussions, we settled on four factors that could be changed: X1 -- continuous factor (-1,1), X2 -- continuous factor (-1,1), X3 -- 2-level categorical (L1, L2), and X4 -- 6-level categorical (L1-L6). We were really interested in each of the four main effects and their first-order interactions: X1, X2, X3, X4, X1*X2, X1*X3, X1*X4, X2*X3, X2*X4, and X3*X4 (10 effects in all). We went with 36 as the total number of runs. But, in order to get an estimate of the standard deviation at each setting, we replicated the DOE a total of 4 times -- for a total 144 test. That was about all we could afford.

We can of course run the DOE evaluation on all 144 tests to estimate the mean response (and noise of the experimentation) with the model factors, which we did. But, what we really want to look at is whether or not a treatment setting changed the test method standard deviation. How we did this was to calculate a standard deviation from the replicated runs (now back to 36 data points) and ran those results through the model.

To me, this all seems like a reasonable way to go about performing a DOE where the goal is to evaluate changes in standard deviations and not means. However, if there is a better approach (maybe from experience), or a DOE specifically designed to do this, we'd be glad to learn about it and perhaps adopt it for the next attempts.

Thoughts and feedback much appreciated.

Thanks!,

DS

P.S. In case you're interested, after evaluating all the individual measurements and the standard deviations from the replicates, we found that factor X4 had the biggest impact on the standard deviations. Unfortunately, when we later ran some verification runs -- now just comparing 32 runs (total) across two of the levels in X4, the results did not support the original DOE and we could not see the improvement in standard deviation we were hoping to (on the plus side, it wasn't any worse!).

statman · Mar 26, 2021 03:48 PM

Here are some of my thoughts, though your description of the situation is not sufficient enough to provide specific advice.

1. To clarify you are interested in understanding what factors may influence the variation of the test system (precision). Are there any hypotheses about whether the precision is influenced by different instruments or operators? If so, this is reproducibility precision. If it is just due to instrument variation, this is repeatability precision. Why this is important, is because you want to know how to acquire measurements so the proper summary statistics are used. Is there interest in stability? How about discrimination?

2. You need to consider response variables and how they will be gotten. II don't know what you are testing? I don't see any information about how many samples will be measured? You should be doing repeated measures of the same sample to get measurement errors (perhaps in the same location). Whether you include op-to-op or just repeatability precision should be a function of your hypotheses.

3. If you really did replicates, you confounded treatment-to-treatment variation with measurement errors. This is not how I would approach the problem. I would be using repeats, not replicates to get the proper summary statistics. Think about what data is being used to estimate the summary statistic (e.g., standard deviation). What is changing between those data points? You want to just have measurement components captured in the data points (minimize the other components within subgroup) you are using to estimate standard deviation.

4. I don't know where you are in the knowledge continuum, but you biased your study to factor X4 by testing it at so many levels (think degrees of freedom). I would have started with all factors at 2 levels (pick the extremes of factor 4 categories) and proceeded as if a screening design.

"All models are wrong, some are useful" G.E.P. Box

SDF1 · Mar 29, 2021 11:10 AM

Hi @statman ,

Thanks for your thoughts and feedback. To address your comments/questions:

1. Yes, the people running the DOE were interested in modifying different factors they thought were most influential on the noise of the test. Each of the factors tested is known to impact the mean response of the measurements, but we wanted to see if the factors and their interactions could also lead to an improvement in the noise. Prior tests have shown that the instruments involved and operators are not as noisy as test methodology. One of the big problems with this test is that we are dealing with small-grained powder with very small amounts -- less than 1/4 of a gram. So, my guess is that particle size-induced segregation has a big role. One of the factors tested was the amount, but we couldn't find any improvement in standard deviations with using more material.

2/3. From the first post, I mentioned that based on the 10 factors (main effects and interactions), JMP recommended 36 runs. Each run (treatment) was measured four times in order to calculate a std dev for each treatment, so a total of 144 individual measurements were performed. So, to address your concern, we did do repeated measures of the "same" sample. I put in quotes because the test sample is consumed in the process of the test. But, the source material was generated for each run and samples from this were then measured multiple times (4) for each treatment. Replicate might not be the right term -- but it's what we did in JMP to augment the DOE and keep track of each run that we needed to do.

4. X4 has to do with the fact that we're dealing with powders and need to look at different PSD ranges, hence the many levels of X4. We know from previous tests that the low and high end of the PSD has a big impact on the mean of the response, and there was indication std dev could be reduced. The extra levels were added to see if we can use narrower PSD sizes to reduce the noise. This DOE ended up giving an indication that is possible, but verification runs did not support that outcome. The other factors are all high/low settings, only two of which are actually continuous.

Unfortunately, our material is a powder and powders are not "nicely" behaved like semiconductor units, or machined items for example. I think that the methodology is just inherently noisy because of this and is about as good as it's going to get.

Thanks again for your thoughts, some good points to consider for my colleagues when they want to do something similar in the future.

Thanks!,

DS

statman · Mar 29, 2021 02:16 PM

OK, thanks for additional information you have provided on the situation. I still do not know what characteristics/properties of the powders you are trying to quantify?

If I understand correctly this measurement is altering or destructive (the identical sample cannot be measured twice). Since this is the case, it is impossible to separate the measurement repeatability form the sample-to-sample variation. In theses cases, there are some options to "bias" the effect to the measurement component. I have attached my notes on how to handle this.

You should not analyze the repeats (which are nested within treatment) as replicates. You can look at those 4 data points on a range control chart and then summarize those 4 data points for each treatment (mean and std. dev. for example) if appropriate. The model you are most interested in is for the response variable of std dev. (or equivalent summary statistic)

Replicates- treatment combinations change between each "data point" (we usually call these experimental units...they are considered independent and can be treated as such).

Repeats - the treatment combinations do NOT change between "data points". These are not to be considered independent events (therefore not increasing degrees of freedom) and this variation is nested in treatment.

Just my own biased commentary... when experimenting on levels for factors >2, I am suspicious of a pick the "winner" mentality. What I want to do first is a fair comparison to see what factors/interactions have an effect on variation and then in a subsequent iteration fine tune the level setting for the factors that are significant.

I would say every situation has its own inherent challenges. Powders no more than others. I've used particle size analyzers to help understand the distribution of particle sizes, SEM for chemical composition and topography and even an OJAY for nano measurements.

"All models are wrong, some are useful" G.E.P. Box

SDF1 · Mar 29, 2021 03:56 PM

Hi @statman ,

Thanks again for continuing the conversation on this. Unfortunately, I am not at liberty to discuss the exact type of test were doing or what the material is that we are testing. But, yes, the exact same sample cannot be tested twice. Each test uses up material, which makes analyzing things a bit more complicated.

The analysis you describe is essentially what they did, even with the attached PDF. They didn't analyze as replicates, only used that feature in JMP to make the data table for the lab techs to follow. They settled on those factors (among others) as they were easy to change and had shown from previous studies to impact the mean. Basically, Option 1 is the only option we have because of the specifics of our case, and this is true for the many other tests we run to characterize our material.

Unfortunately because of limitations regarding industry standards and testing, there's also some things we cannot change, which might have more of an impact on our results than the factors do. We just simply can't change them because of either customer requirements or because of what is considered acceptable in the industry. In many ways, it's like trying to do something that needs both hands, when one is tied behind your back.

We too have PSD analyzers, some 2D and some 3D, but so far, all we can really nail down is that the factors that have been examined definitely modify the mean response we're testing, but they don't have much impact on the variation in the test method. It would be very cool if there was a specific kind of DOE structure that was specifically geared toward testing variance. Maybe this will be the next development in DOEs?

Thanks again for the input/feedback!,

DS

Discussions

DOE design to specifically study variance (or std dev)

Re: DOE design to specifically study variance (or std dev)

Re: DOE design to specifically study variance (or std dev)

Re: DOE design to specifically study variance (or std dev)

Re: DOE design to specifically study variance (or std dev)

Recommended Articles