Discussions

OrthogonalNoise · Aug 16, 2024 04:40 PM

I have collected data on a chromatography Response Surface DOE with the goal of maximizing resolution of 4 peak pairs throughout a chromatogram. For one of these peak pairs, the resolution is so poor for some conditions that the resolution cannot be calculated. If I enter 0 for those peaks, then I'm worried the DOE analysis will artifactually skew the values of the data towards 0. If I don't enter any value, then the analysis won't include those data points to determine those factor level combinations are outside my design space. What is the best way to include information about unresolved peaks in the data tables?

statman · Aug 17, 2024 7:14 AM

Unfortunately, I am not an SME for your situation, so I can only lend some general advice. What you can do does depend on how many of the total DFs are "missing". When you encounter missing or "special" or unusual data in an experiment situation, here are things you can try:

1. Use the mean of the remaining data (this will tend to nullify any effect possible during that treatment),

2. Use the values you predicted before you ran your experiment (yes, you should always do predictions),

3. Use regression to estimate what the missing Y would be (Run the model with the missing data point empty (a dot). Leave the highest order effect (or the effect least likely to be significant) out. Save the prediction formula and use the predicted value,

4. Do all of the above and see how well the analyses agree. If there is relative agreement, then proceed, if not consider running that treatment again (as well as some of the treatments you do have data for to account for a potential block effect).

5. Improve your measurement system and re-measure (Have you evaluated the measurement system á priori the experiment?)

6. Develop another measure of the phenomena (alternate Y)

Editorial: BTW, if you are running response surface designs, you should not be in this situation. RSM are optimization designs best suited for running AFTER you understand inference/design space and noise (e.g., measurement errors).

"All models are wrong, some are useful" G.E.P. Box

View solution in original post

statman · Aug 17, 2024 7:14 AM

Unfortunately, I am not an SME for your situation, so I can only lend some general advice. What you can do does depend on how many of the total DFs are "missing". When you encounter missing or "special" or unusual data in an experiment situation, here are things you can try:

1. Use the mean of the remaining data (this will tend to nullify any effect possible during that treatment),

2. Use the values you predicted before you ran your experiment (yes, you should always do predictions),

3. Use regression to estimate what the missing Y would be (Run the model with the missing data point empty (a dot). Leave the highest order effect (or the effect least likely to be significant) out. Save the prediction formula and use the predicted value,

4. Do all of the above and see how well the analyses agree. If there is relative agreement, then proceed, if not consider running that treatment again (as well as some of the treatments you do have data for to account for a potential block effect).

5. Improve your measurement system and re-measure (Have you evaluated the measurement system á priori the experiment?)

6. Develop another measure of the phenomena (alternate Y)

Editorial: BTW, if you are running response surface designs, you should not be in this situation. RSM are optimization designs best suited for running AFTER you understand inference/design space and noise (e.g., measurement errors).

"All models are wrong, some are useful" G.E.P. Box

OrthogonalNoise · Aug 19, 2024 12:00 PM

Thanks for your input statman! I will address your points as you wrote them:

1) This wouldn't work for the reason you stated. These "missing" values basically mean the response is on the low end of the response distribution but is so low that a numerical value can't be calculated. Using the mean would definitely nullify the effect.

2) Not sure if my manager would agree with using those values to build the model (predictions would be based on previous DOE's that aren't identical), but this is definitely something to keep in mind just in case.

3) This would be easier to convince my manager to use, but still wouldn't be the preferred approach since we are adding hypothetical data to the model.

4) I had been using this strategy when I added zeroes to the missing data or otherwise handling it differently to see the effect on the model. This is usually how I defend or dismiss an approach in our group meetings. (If both approaches generally agree, then we choose what we think is the best approach and move forward...if not, we look closer at the data to try to understand why both approaches to the data don't agree).

5) This measurement system is pretty robust, so I'm not too worried about this point. Most of the analysis is automated and we are using pre-labeled standards straight from the vendor on a well-maintained chromatographic system that we routinely use to measure accuracy, precision, linearity, etc for multiple types of assays and multiple molecules for each assay without issue.

6) This was the solution we decided on. We found another parameter called "Start p/v" that measures the ratio of the peak to the valley at the beginning of a peak. We found that this response is mostly linear to the resolution and gives values when the resolution cannot calculate a value. This gives a value from the DOE that we can use to create the model that won't artificially drive the data towards 0 had we entered that into the missing values.

Editorial comment: We started with response surface because we did have a lot of data about most of the factors using a different fluorescent tag and didn't expect too much difference. The particular peak pair we had trouble with didn't exhibit that issue in those previous DOEs, but we did change one of our factors in the DOE we had no data on. Our other 3 responses worked well, but this one tripped us up. In hindsight, we should have probably screened this new factor before going into the response surface DOE but I think we will be able to make it work using the alternate Y response.

MRB3855 · Aug 19, 2024 04:53 AM

Hi @OrthogonalNoise : This may or not be helpful...and you've probably already considered this. But, clearly, you've gone too bold in the range(s) you explored in this DOE (review @statman 's Editorial comment). I would expect the method to have some minimum criterion for resolution. And for a robust method I'd expect that peak order to be consistent between runs as well.

OrthogonalNoise · Aug 19, 2024 12:03 PM

Yeah, peak resolutions get tricky on the lower end when the peaks aren't resolved at all because you can't calculate a value when the peaks are too close to each other. We are using this DOE to set our resolution criteria, so we don't have any yet other than they need to be able to calculate a resolution. We found a workaround though, using an alternative response that is linear to resolution.

Discussions

JMP DOE chromatography data table: How do I enter values for unresolved peaks?

Re: JMP DOE chromatography data table: How do I enter values for unresolved peaks?

Re: JMP DOE chromatography data table: How do I enter values for unresolved peaks?

Re: JMP DOE chromatography data table: How do I enter values for unresolved peaks?

Re: JMP DOE chromatography data table: How do I enter values for unresolved peaks?

Re: JMP DOE chromatography data table: How do I enter values for unresolved peaks?

Recommended Articles