Discussions

Wardiam · Jun 8, 2023 2:01 PM

I am using JMP 14.3 to optimize a workflow with 6 input variables and 3 levels (attached is the file with the data table). My idea is to use JMP to obtain the best combination of the 6 variables that gives me the best output result but performing the least number of experiments.

It seems to me that the best option is to make a definitive screening design to see the main effects. I have defined my factor table with the extreme values (minimum and maximum for each variable) and I have created the table of experiments to be performed. The generated table proposes 17 experiments.

During several weeks I have carried out the 17 experiments and I have added the results (Y) in the attached table. With these data I have adjusted the screening and I have obtained that 4 of the variables are statistically significant. I have selected in the prediction profiler: Optimization and Desirability > Maximize Desirability, to obtain the most optimal combination of factors.

At this point I have several questions:

1. What values/levels should I choose for the non-significant values?. Could I choose any of them?.

2. I would like to create a model to predict or estimate the output (Y) based on the selected values of the factors. How could I do this?. Could I simulate a larger number of values?.

3. Could I get an equation (or formula from the model)?.

Thank you very much.

Best regards,

Wardiam

statman · Nov 25, 2020 12:38 PM

Some comments:

1. First allow me to make the point there is a difference between statistical significance (which you control by how you run the experiment and the model you use) and practical significance. Without knowing how much of a change in Y is of practical significance, analysis will be suboptimal.

2. Regarding factors that are determined to be statistically insignificant, you may interpret this as Given the levels you tested at, those factors do not a have the influence of the other factors in the model. Whether you can infer or extrapolate those findings is more an engineering/scientific question than a statistical one.

3. Always use Rsquare Adj as the default. RSquares will always increase with the addition of degrees of freedom to the model, but the point is to determine the RSquare with terms that are considered significant. That is what RSquare Adj accounts for. If there are differences between RSquare and RSquare Adj it is an indication of an over specified model. Oh and by the way this has nothing to do with whether the model will be useful in the future or under a different inference space.

4. When choosing the "best" model a number of considerations must be taken into account. RSquare, RMSE, Residuals, Practical assessment, etc.

5. Running replicates is an excellent technique to test model adequacy. Using blocking method can be quite useful in exploring the effect of "noise" and in some cases to determine if the effects estimated in the first replicate (Block = -1) repeat over different noise conditions (Block = 1). This is one of the intended purposes, to see if factor effects are robust to noise. IMHO, Blocking should be well thought out prior to experimentation, but you may be able to salvage some useful information as a posteriori attempt.

6. Looks like you could have run a Res. III fractional factorial with factors at 2 levels in 8 treatments (vs. the DSD) and gotten the same results. You could have saved runs and used them for additional replicates...hindsight is always 20-20

"All models are wrong, some are useful" G.E.P. Box

View solution in original post

Mark_Bailey · Nov 24, 2020 11:09 AM

Did you see the Fit Definitive Screening command in the DOE menu? This platform performs a fully automated analysis and selects the best model for you. You can then jump to the Fit Least Squares platform by clicking Make Model or Run Model button. You can use the Prediction Profiler to explore the response or optimize the response. You can save the fitted model as a new column formula.

Did you read the JMP > Help > Documentation > Fitting Linear Models guide?

Mark_Bailey · Nov 24, 2020 11:10 AM

Also, you did not specify if your situation is a screening study. The DSD, like any screening design, assumes the key screening principles.

Wardiam · Nov 24, 2020 12:47 PM

Dear Mark,

thank you for answering. Indeed, I think my study is a screening situation since I have multiple variables to test and I look for the optimal combination of all those variables to get the best output. This is the main objective.

But at the same time I was taking advantage of my question in the forum to know if any additional analysis of the same data would allow me to obtain a pattern or a formula that could predict in the future if I change the values of the variables what would be the expected output without performing the experiment. This second question is more curious and would be secondary.

Could you help me, please?.

Thank you for your help.

Wardiam

Mark_Bailey · Nov 24, 2020 01:07 PM

One of the most important screening principles is sparsity of effects. It means that you expect to eliminate at least half of the factors through screening. Screening designs are economical through this principle. They do not provide the data necessary to fit the model with all the terms.

I explained the process of model selection and exploitation already. Please specify which step is unclear.

Have you seen the documentation for DSD? It starts here.

statman · Nov 24, 2020 04:14 PM

Mark has already given you appropriate direction. I just want to comment on your objective: "I think my study is a screening situation since I have multiple variables to test and I look for the optimal combination of all those variables to get the best output. This is the main objective."

This is not the intent of a screening design. To reiterate Mark's comments, the purpose of a screening design is to determine which subset of the entire set is worthy of further investigation. This is not a test where you are trying to "pick the winner", but an experiment to provide insight into the causal relationships between the predictor variables and the response variables. Iteration is almost always required. Carry on.

"All models are wrong, some are useful" G.E.P. Box

Wardiam · Nov 25, 2020 11:47 AM

Dear Mark and statman,

Thank you very much for your answers. I have carefully reviewed all the help available on your DSD link and have already better understood many of the questions I had.

There are still some questions I would like to discuss with you and see if you can clarify them for me.

After doing the DSD and running the model only consider 4 of the 6 factors as significant ("active"). I know that the effects of the other two factors are not relevant (voltage and temperature) but in my experimental protocol those 2 factors have to be because they are necessary elements (what I did not know was their optimal level). Following the summary of main effects then I could consider any of the levels introduced for those factors because it would not affect the output, is this correct?.

Based on the fit platform, I have obtained an RSq = 0.94 but when I save the predicted values in a new column I see that some are far from the observed values. How could I better fit the model?. Should I use more runs?.

I had thought to make a replica of the proposed runs in the generated table, in this case, should I activate the blocking option in the design?. If so, should I add a categorical factor that refers to each replica?.

Finally, I have already seen that by using the fit definitive screening platform I get a formula of the model that is what I wanted and I can add to the table a column with the predicted values. It is great.

Thank you very much for your help and forgive me for asking these rookie questions.

Wardiam

statman · Nov 25, 2020 12:38 PM

Some comments:

1. First allow me to make the point there is a difference between statistical significance (which you control by how you run the experiment and the model you use) and practical significance. Without knowing how much of a change in Y is of practical significance, analysis will be suboptimal.

2. Regarding factors that are determined to be statistically insignificant, you may interpret this as Given the levels you tested at, those factors do not a have the influence of the other factors in the model. Whether you can infer or extrapolate those findings is more an engineering/scientific question than a statistical one.

3. Always use Rsquare Adj as the default. RSquares will always increase with the addition of degrees of freedom to the model, but the point is to determine the RSquare with terms that are considered significant. That is what RSquare Adj accounts for. If there are differences between RSquare and RSquare Adj it is an indication of an over specified model. Oh and by the way this has nothing to do with whether the model will be useful in the future or under a different inference space.

4. When choosing the "best" model a number of considerations must be taken into account. RSquare, RMSE, Residuals, Practical assessment, etc.

5. Running replicates is an excellent technique to test model adequacy. Using blocking method can be quite useful in exploring the effect of "noise" and in some cases to determine if the effects estimated in the first replicate (Block = -1) repeat over different noise conditions (Block = 1). This is one of the intended purposes, to see if factor effects are robust to noise. IMHO, Blocking should be well thought out prior to experimentation, but you may be able to salvage some useful information as a posteriori attempt.

6. Looks like you could have run a Res. III fractional factorial with factors at 2 levels in 8 treatments (vs. the DSD) and gotten the same results. You could have saved runs and used them for additional replicates...hindsight is always 20-20

"All models are wrong, some are useful" G.E.P. Box

Wardiam · Nov 25, 2020 01:42 PM

Thank you very much for your comments statman ;)

Wardiam

CanonicalHazard · Apr 12, 2023 04:24 PM

Using JMP 17.0 I get this, turning off the two heredity defaults. Assuming the math was all done properly (no bugs or coding errors), I must conclude non-heredical temperature * temperature was not statistically significant. A little surprised. I can't see a way to get it to show me that in this dialog, a more general question on the DSD special analysis method that I'll post elsewhere. If I add it manually via "make model" it sure looks statistically significant to me. Mysterious and gives rise to questioning my assumption that there isn't a bug, math or coding error (but hey, I can't do all the ANOVA and such math myself yet). A chance for an explanation?

Discussions

How to correctly interpret a definitive screening design (DSD)

Re: How to correctly interpret a definitive screening design (DSD)

Re: How to correctly interpret a definitive screening design (DSD)

Re: How to correctly interpret a definitive screening design (DSD)

Re: How to correctly interpret a definitive screening design (DSD)

Re: How to correctly interpret a definitive screening design (DSD)

Re: How to correctly interpret a definitive screening design (DSD)

Re: How to correctly interpret a definitive screening design (DSD)

Re: How to correctly interpret a definitive screening design (DSD)

Re: How to correctly interpret a definitive screening design (DSD)

Re: How to correctly interpret a definitive screening design (DSD)

Recommended Articles