Regarding the "Rep" factor:
In which way do I need to integrate it into my modeling? I usually choose my three factors, click the RSM macro in JMP and proceed. Do I need to include the "rep" factor into the RSM macro or leave him out (Thus no interactions with other factors)?
Futhermore, if the "rep" factor is considered significant and I have to include it into the model, it also appears in the profilers. However, this is kind of confusing for me, because on the one hand I understand that this factor is significant and thus my results are depending on the batch. This makes perfectly sense, I have a biological component in there, they are never the same. On the other hand, if I now optimize my results, the batch is also included and for example batch #1 does not exist anymore....
Do I miss something? Is there a way to include the batch variance without including a batch variable?
After you create your RSM with your design variables, add the Rep variable. At this point it is considered a Fixed effect, and will appear in the model equation. Go ahead and do the regression this way first. You can check to see if it is significant, and how big an effect it is. If it is significant, transform the Rep variable to a 'Random Effect' (click the Attributes red triangle), and do the regression again. This time it will not appear in the model equation, rather, it is considered a random effect. "A random effect is a factor whose levels are considered a random sample from some population. Often,...variance components). However, there are also situations where you want to predict the response for a given level of the random effect. Technically, a random effect is considered to have a normal distribution with mean zero and nonzero variance."
Thank you very much for your answer!
Out of curiosity, why is the "rep" factor not included into the RSM (or not allowed to be)? I just compared the two results
Following your guidelines:
Including it into the RSM:
What I can see, is that the first model includes less factors, namely X1, X2, X3, X3*X3 and Xrep, whereas the second model uses X1*Xrep, X2*Xrep and X3*Xrep additionally. Thus it has more terms and might overfit. However, all criteria like Rsq adj., BIC and AICc are also better for the 2nd model.
The results of the optimization are very similar.
Using simply common sense, I would imagine that during each replicate run, the input factors varied which these terms would take into account.
Or am I totally wrong and this simply forbidden?
I like to look at the higher XRep terms as a diagnostic, for example to help figure out what may be happening and maybe to reduce the effects in the future. Based on the factors you listed, it looks like there may have been something that occured over time. But I would not include anything except the XRep as a random effect in the final model. As you know, XRep is not something that can be asigned in future simulations.
hope this helps,
mark
I would first add a new nominal variable called 'Rep' or 'Run' and tabulate which set of replicates were run together, for example 1,2,3 etc. Now include this your model analyses. Start by adding this new variable as a random or fixed effect. This will assign some of the rep to rep variation, but does little to diagnose it. You can add more complex effects such as rep-to-rep interactions to explore what may be causing the variation.
Thank you all very much for your help and detailed explanation! Especially cipollone.mg/Mark!