Discussions

rcast15

Hi,

Curious to get any thoughts/discussion from the community on the following.

I have a 48-run response surface DOE with 5 continuous factors and two responses measured by two independent assays:

X, call it yield, which I want to maximize
Y, call it total stuff, where X is a component of Y

The derived quantity Z = X/Y, call it purity, is also of interest and should be maximized.

Historically, the data collected was used to optimize X and Z, but I have concerns over the mathematical implications of maximizing 2 responses where 1 response is a function of the other response. I haven't dug into it too much yet, but my intuition tells me that the optimization of the desirability function when your responses are functions of each other could be weird.

I am considering the following 4 options. Open to other suggestions if people have them.

Model X and Z, maximize both. Surfaces share information about X, so residuals aren't independent across responses.
Model X and Y, maximize X and minimize Y. Independent assay errors, but "minimize Y" seems weird since Y is bounded below by X.
Model log(Z) alone. Assay errors are multiplicative, so stabilizes the variance but discards absolute X information.
Model X and Y with multivariate methods. Can assume correlation between the responses.

I would appreciate any thoughts on this topic, and perhaps any relevant literature I could look over.

Thanks

Victor_G · Jun 8, 2026 8:12 AM

Hello ,

The use of Z response seems hazardous, as this ratio response creates a structural constraint (always X ≤ Y since X is a component of Y), which may create several issues:

Collinearity of surfaces: Any model for Z = X/Y is implicitly a function of both X and Y, so the response surfaces are "mixed" (and it may be more complex to get optimal/satisfactory solutions from the models).
Non-independence of residuals: If your assay for Y includes the measurement of X (i.e., Y is measured partly via X), then the errors are correlated. If Y is measured by a completely independent assay, the measurement errors may be independant but the dependence between responses still remains by "structure" (Y = X + other).

Option 2 is the one of the "cleanest/safest" options since it will fit models using raw measurements with their own independant errors, but the tricky situation may appear in the optimization: Maximizing X and minimizing Y may lead to sub-optimal solutions (depending on the importance given to each response), because a point with X/Yield = 50% and Y=55% could have similar desirability as a point with X/Yield = 70% and Y=90%.
So maybe modeling the two raw measurements but using X/Yield and a "Y-X" formula (for measuring impurity/by-products quantity) based on models' predictions could help optimize both responses, by maximizing the Yield and minimizing the by-products/impurity quantity.

Hope this answer may help you,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

rcast15 · | Posted in reply to message from Victor_G 06-08-2026

@Victor_G Thank you for your response.

I had thought about using X and Y-X as my responses (maximizing X and minimizing Y-X). Are you saying this would be your suggestion?

Also to clarify, Y is measured by a completely independent assay.

Victor_G · Jun 8, 2026 01:29 PM

Yes, here is the workflow I was thinking of:

Model your X and Y responses since they are independent measurements
Save your models' prediction formula for X and Y,
Create a formula Ypred - Xpred and optimize Xpred (maximize) as well as Ypred - Xpred (minimize).

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

MRB3855 · | Posted in reply to message from rcast15 06-08-2026

Hi @rcast15 : I am admittedly ignorant of the process, so I may be misunderstanding something; but isn't it enough to maximize Z since X is bounded above by Y? Z is a proportion (can't be greater than 1), so that maximizing Z maximizes the relavent X?

rcast15 · | Posted in reply to message from MRB3855 06-08-2026

Hi @MRB3855,

There are diminishing returns to maximizing Z. You are correct in that maximizing Z would maximize X, but only up to a certain point as too many impurities (defined as Y-X) would be introduced, since Y and X are positively correlated.

My other concern if we only maximized Z is that I would then get factor settings that favor very tiny amounts of my denominator, Y, thus producing smaller yields, X.

statman · | Posted in reply to message from rcast15 06-08-2026

It is, of course, hard to give specific advice without proper context. I tend to agree with Victor on the options you listed. I might suggest you investigate other response variables. It really helps to know what mechanisms you are investigating.

Since X is a component of Y, and Z = X/Y is derived from both, I would be hesitant to optimize all three directly. I’d first ask whether there is a more fundamental response that represents the actual objective. For example, is the goal to increase the amount of desired component, reduce the undesired component, improve selectivity, or improve conversion efficiency?

I’d also be careful with the ratio. Ratios can become unstable, especially if Y varies substantially or gets small. The ratio may exaggerate noise in either X or Y. A graph of predicted X versus predicted Z, or X versus Y with purity contours, may be more informative than simply optimizing a desirability function.

"All models are wrong, some are useful" G.E.P. Box

rcast15 · | Posted in reply to message from statman 06-08-2026

@statman Thank you for your response.

While unable to give very specific information, I can provide a bit more context.

X is a measurement of a target protein, which we call yield, and we want to maximize this. Y is a completely independent measurement of total protein. The goal is twofold, in order of importance:

Increase the desired component X, or target protein
Reduce the undesired component (Y-X), which represents all other protein we do not want

Goal 2 has historically been tackled by maximizing X/Y, which is just a proportion of the total protein that is the target protein. My initial concern started with the fact that choosing factor settings that maximize X and Z=X/Y essentially "double count" X.

Agree with inspecting graphs in addition to optimizing to a desirability function.

statman · | Posted in reply to message from rcast15 06-08-2026

I would avoid mixing the nomenclature. I’d define the responses as:

Y1 = target protein
Y2 = total protein − target protein

Then I would model both Y’s directly. This separates the desired material from the non-target material rather than creating a derived ratio response that may be difficult to interpret.

One additional thought: depending on the process, you may also be interested in factor effects on the variability of Y1 and Y2, not just their means. For example, if this is a batch process, you could estimate within-batch variation by taking repeated measurements from different locations within the batch. The average of those measurements could be used to improve precision for modeling the mean response surface, while the variance of those measurements could be used to model variation as a response.

"All models are wrong, some are useful" G.E.P. Box

Discussions

Best Response Parameterization for Optimization with DOE

Re: Best Response Parameterization for Optimization with DOE

Re: Best Response Parameterization for Optimization with DOE

Re: Best Response Parameterization for Optimization with DOE

Re: Best Response Parameterization for Optimization with DOE

Re: Best Response Parameterization for Optimization with DOE

Re: Best Response Parameterization for Optimization with DOE

Re: Best Response Parameterization for Optimization with DOE

Re: Best Response Parameterization for Optimization with DOE

Recommended Articles