Hello to the Community,
i´m new here, so i´ll get right to it:
I have created the attached experimental design with JMP12 and performed my experiments according to it.
In the following it turned out, however, that an evaluation of all 90 samples would be too expensive/time consuming, which is why now I will be evaluating only a subset of 30 samples. As a result, I will probably get rid of third-degree interactions.
In order to choose 30 out of 90 samples: Is there a way to filter out the most relevant data points/specimens for such a reduced evaluation?
Working with a new created smaller plan does not work, since the combinations are different. The 90 specimen already exist and there is no possibility to redo the whole
experiment with a new plan :/
Thank you very much in advance!
If you load in your factors as covariates, it will choose from the rows in your original experiment. Under the "Add Factor" button, choose Covariate with your data table open (to get a rough idea, I used covariates in this example).
When you get to the run size, change it from 90 to 30.
You'll have to make a decision on the model - even for an RSM you need 35 runs (and covariates are not playing nicely with if possible terms, I'll have to look into that).
I'm not sure how you're planning to model the results (since you have more terms in an RSM than you do runs), but for the design you could try removing some different sets of terms, create a few different designs and compare them in design evaluation. A choice that might end up being desirable and worth comparing is to have the main effects and quadratic effects in the model, and choose an alias optimal design.
Another thought just crossed my mind. Rather than analyze all 30 of your treatment combinations from a modeling perspective, you may want to consider holding out a small number of combinations to act as a surrogate model cross validation data set. Then compare your predicted values for the hold out set to the actual values to see how well this model generalizes to the held out set? Might give you some insight on if you should really try and lobby for including more of the original designs treatment combinations in your modeling work.
I took a quick peek at your design. What you want to do (subset of 30) will require a more simplified model than simply sacrificing the 3rd order terms (3-way interactions and cubic terms). Just to estimate a main effects (1st order) and 2-way interaction model would consume all available degrees of freedom (ie the design would be saturated) in a 30 run subset. In such a case, you would not have any left over df to estimate error. You received great advice from Ryan. It is possible to bring in your existing design identifying all your factors as covariates. In this case your existing 90 run design will be the candidate set from which algorithms can select the most balanced 30 run subset for a much more simplified model of your choice. The question is whether any 30 run subset is "rich enough" in information to estimate your chosen model. One of the complications is that your candidate set has excess levels for most factors (5 levels for first 5 factors) than really will be needed for an estimable simplified model. So although one might normally think of a 90 run candidate set as "rich enough" to select a 1/3 subset, your 90 runs may not cover the 7 dimensional space as thoroughly as compared to a full factorial candidate set. Although I think you can likely come up with a reasonable approach for those 30 runs, I think you should be realistic in expecting some colinearity for most any simplified model. Of course the real beauty of JMP's DOE capability is that you can easily see that in the "color on correlation map" in the "evaluate design" or when you do algorithmically select the subset. You can also see the "correlation of estimates" in the "fit model" report by doing a dummy analysis on a Y with random data before actually committing to measure the subset. best of luck
Maybe to add just a bit to Ryan's comment wrt to analysis...maybe what you can try from an analysis point of view are partial least squares, stepwise (watch your VIFs), or if you have JMP Pro, the penalized regression methods within the Generalized Regression platform. Then if you've got JMP Pro, again, with all these different models, use the Model Comparison platform as a one stop shopping hub for comparing the models from these disparate modeling procedures. And of course I'd preclude any of this modeling work with simple factor vs. response plots. Pay extra close attention to any residual plots you get out of these models since you'll in all likelihood be leaving 'information' on the table by sub-setting the original experiment.
lingo: Not to worry that you don't have JMP Pro. Good old fashioned OLS (Standard Least Squares Fit Model personality (is a viable option to start (I'm presuming OLS will work...continuous response, assumed normal distribution of errors, etc.)) your modeling work. Maybe supplement with stepwise to aid in model selection? Lots of options in JMP to choose from. Not sure I'd go the partial least squares route since in JMP you can only fit main effects in JMP.
Lingo had asked me separately about the 35 run design that can fit the RSM model, so I'll describe step-by-step details here in case anyone else comes across this thread.
With the original data table open
select DOE->Custom Design
from the Add Factor button, select Covariate
choose the first 7 columns (which were the factors), and click OK
click continue to the next screen
under model, click the RSM button
at the bottom, change the Number of Runs to 35 (from 90)
click Make Design
You should now get a 35 run design that can fit the full RSM model. Note there's no degrees of freedom left to estimate the error, so you'll probably be doing some type of model reduction, but at least you know everything there can be estimated.
Yes Ryan this is quite likely Lingo's best approach if he can justify including 35 (vs 30 originally stated) in his measurement plan. Lingo, a simple model reduction method you might consider is to iteratively eliminate the smallest (closest to 0) effects or parameter estimates when the model is fit to all factors on a coded scale. But only do this for the 2nd order effects (2-way interactions and quadratic effects). Your data and model might suggest that you can easily eliminate 4 or 5 of those effects which would give you enough df for a fairly stable error estimate. Then you would have some significant tests but perhaps more importantly the ability to generate predictions for optima with realistic prediction confidence limits. I would be great to hear about the outcome of your study so please come back and report how it went.