Re: DoE: off-center focus point in a range of a factor and "time"-factor

Roman-V · Jul 13, 2017 05:37 AM

Hi!

I have two questions regarding design of experiment (custom design) using JMP 12.1.0:

I have a factor (discrete, numeric) with 5 levels: 0, 20, 50, 100, 150.
When I try to create DoE table, JMP considers level 20 as off-center and generates less runs using this level. Is it possible to tell JMP explicitly that center of interest must be at level 50, or even at some virtual point in between (e.g. 35)? How can I do it in custom design platform?
I am planning to produce the set of samples based on DoE. The same set will be tested later on by non-destructive method. In this way the response will be generated for several time points. Can I just add "time point" column and append the copy of original DoE table for every time point? Will it be possible later to add "time" factor during results evaluation?

Thank you in advance!

Mark_Bailey · Jul 13, 2017 06:51 AM

You can define the levels that you want to include in the experiment using the Discrete Numeric factor but then custom design defines the model terms for that factor and the optimal distribution of those levels in the design for a given number of runs.
1. How did you come to choose the factor levels 0, 20, 50, 100, and 150 for this experiment?
2. What do you mean by 'center of interest?'
3. You seem troubled by the lack of balance in the resulting design. You intuitively expected the same amount of replication for each factor level. Balance is not a design criterion, though. Minimal variance is the criterion. D-optimal designs minimize the variance of the parameter estimates, which is good for testing hypotheses. I-optimal designs minimize the variance of the model predictions, which is good for response optimization. The dependence of the variance on the design is a non-linear function of the design, so sometimes the minimum variance design is balanced but often it is not.
4. You can ask for a number of replicate treatments but you cannot choose which treatments are replicated. The algorithm ensures the desired level of replication but chooses the runs to provide the most information. It might exclude some levels in order to satisfy your constraint of replication, which might result in a design that is even less intuitive.
5. You can always modify the design afterward if you deem necessary. That is, you save the optimal custom design as a data table and then you manually change the levels as you intended. It won't break anything, although it will compromise the support from your data for your model.
6. The main point is that the treatments are determined for the sake of the model. The model is used to find the optimum factor levels, not the design. The design is not supposed to include the optimal levels, but only those levels that best support the estimation of the model parameters.
Of course you can include repeated measures in your experiment, but what is your ultimate response?
1. Is it one of the time points, such as the end point? If so, then I would probably use multiple columns to capture these observations over time.
2. Is it a function of more than one time point, such as the slope, inflection point, or asymptote? If so, then I would stack the responses and add a new column to represent the time point as you suggest. You could then include time effects in the model.
3. There are other ways, such as using multiple columns and deriving a response. I have optimized assays by computing parameters of the dose-response curve separately and using them as responses. The same technique could be applied to the time course of the response.

So in conclusion please keep in mind that the experimental design is the result that answers the question, "What is the best data to fit my model?"

Roman-V · Jul 13, 2017 07:39 AM

Thank you for the comprehensive answer. I will try to follow you recommendations.
Regarding the levels, based on previous knowledge, I expect most interesting correlations in the range 20-100. 150 was added as an extreme value and 0 as a negative control.
It was just counterintuitive, that I have to perform many runs with the level "0" (where I don't expect any informative response), while, at the same time, there are very few runs with the level 20. Perhaps it was bad idea to include negative control into design space?

Peter_Bartell · Jul 13, 2017 09:10 AM

Mark offers some great advice and counsel. Based on your second reply...if you really suspect no valuable information to be obtained with any treatment combinations at the zero level, then why include that level at all? Generally speaking one shouldn't include treatment combinations in an experiment that, through prior knowledge or domain expertise, would result in abject failures providing no information upon which to contribute to your ability to address the practical problem. Since the factor is discrete numeric, I suggest picking a low level which will provide valuable empirical information.

Mark_Bailey · Jul 13, 2017 10:14 AM

So you expect "the most interesting correlations in the range 20-100" based on experience. Good.

You extended this range down to 0 and up to 150. That change might be good, too.

It is OK to include 'control' runs but you might want to add them after the custom design is made and exclude them from the analysis (fitting the model). This way you have them for comparison but they don't have to meet the needs of or detract from the modeling.

By 'negative control' and 'don't expect any informative response' do you mean that you won't get a response at all or that the nature of what you are studying will fundamentally change from the nature obtained with non-zero levels and won't be relevant or useful? Sorry I am not clear about your point.

Again, the point of the factor range and design levels in your experiment is to support fitting the model. For example, the most informative runs (highest leverage) for estimating the linear parameter are at the extremes of the range and nowhere in between.

You will use the model to find the most satisfactory factor level for the desired response. (This prediction will be confirmed empirically with more tests.)

Roman-V · Jul 14, 2017 04:45 AM

Mark, Peter, thank you for your comments and suggestions. Very appreciated!

I am going to exclude zero level from DoE. The response is boolean (object survived/failed) and I will monitor samples over time. Zero level will bring no benefit, as it is known from the nature of failure, that it is impossible to fail with the zero-level of concentration factor.

I have several more factors and want to check if they have any interactions with the main factor (failure will happen later, or earlier).

Regards,

Roman

Mark_Bailey · Jul 14, 2017 06:26 AM

Your decision to omit a zero level makes perfect sense.

So the outcome is binary (survived, failed)? What is the response to be modeled?

Will you model the outcome directly with logistic regression?
Is the response life time?
Is the response something else like failure rate versus time?
How do you intend to model the response? (What kind of analysis will you use?)

Roman-V · Jul 15, 2017 07:02 AM

The goal of the experiment is to identify active factors apart from main one.

Response will be a life time for particular factor combinations.

Currently, with the binary outcome there will be no much use of the model (in sence of prediction).

In order to give answers on all other questions I will need to dive into extensive reading :)

Mark_Bailey · Jul 15, 2017 07:18 AM

You can use Life Distribution to explore candidate distribution models for your life data.

You can use Fit Model > Fit Parametric Survival to model the data with a selected distribution for errors.

The best layout of your data table might need some help. This kind of analysis is a bit different than the usual regression modeling.

Mark_Bailey · Jul 15, 2017 07:19 AM

I am not sure what you mean by 'with the binary outcme there will be no much use of the model,' How will you analyze the data?