Hi Guys,
We have a process that is fairly well characterized at this point but we are now de-risking potential process variability from the starting material. The intention is to maximize purity, and maximize recovery of this unit operation.
There are 2 parameters we know are significant factors as they have been modeled in previous DoEs and are just well known parameters (B,C). A third parameter (A) is under investigation as it in theory could be impactful. This is the starting purity of the material, which can be variable (And we cannot fully control).
From happenstance experiments, I've have shown successful purification over a wide range of A (~30-50% starting purity). I wanted to investigate where the point of failure is (At what starting material purity, can the target final purity not be reached) not be reached. I also wanted to fully characterize our design space around this so we can learn about if the optimal condition of B and C has moved with a change in A.
I performed an RSM model which ranged B and C across its normally tested range, and A ranged from about 5% purity to ~85% purity. It did identify A as a significant parameter but overall, the model is shifted down from historical data. So for example, after reducing the model, it predicted 30% material would be ~60% pure by end of operation, but we have multiple data points showing this is underestimated and should be closer to 85%.
One issue occurred in the performance of this study. The target A conditions were the following, 5%, 40%, 85%. After performing, it was found that the intermediate condition actually tested was closer to 55% so this was a performance error. Could this have caused the issue?
Overall, the model fit was very good, with low RSME. The low purity conditions definitely were a stress test, and not something we would ever expect to see in production, but it seemed beneficial to explore a wide space for process understanding. My theories are the experimental error that occurred has led to an issue, or the range was too wide and the lower A condition which repeatedly struggled to purify, lowered the model.
1.) How can I repair this DOE to improve the accuracy?
I can augment the runs and focus the dataset in the "MFG range" we would expect
I could add a third order term to evaluate more conditions of A rather than just 3.
2.) How can I avoid an inaccurate model in the future?
If I did not have historical data, I would not have questioned this dataset as it seems at face value, correct.