Simulation Study of Process History Impact on Intensified Design of Experiment Regression Models
My research aims to enhance the efficiency of early-stage process development with mammalian cells in the biopharmaceutical industry by applying an intensified design of experiment (iDoE) approach. Unlike classical design of experiment, iDoE involves intra-experimental variations of critical process parameters (CPPs). This approach not only increases data-generation efficiency but also enables the consideration of temporal process dynamics through stage-wise optimization. However, a potential limitation is that previous CPP settings may (irreversibly) impact the cells and affect their response behavior to subsequent CPP set points.
To address this issue, my research focuses on developing guidelines for planning and evaluating iDoEs robustly, considering the impact of process history. The focus of the presented simulation study is to investigate the impact that different effect sizes of interaction terms associated with the process history have on our regression models. Subsequently, the beta estimates and variance components of these models are compared to evaluate the impact of not explicitly considering the process history. This research has the potential to significantly impact the biopharmaceutical industry by innovating the way process optimization in early-stage development is performed, considering the dynamic nature of these processes.
Hello and welcome everyone to my short talk that I'm going to give on the simulation study that I performed using JMP, where I simulated the impact of the process history on our iDoE regression models. Let me start with a really brief introduction.
We are working with bioprocesses, and those are usually performed within the bioreactors. Within these bioreactors, cells are grown, and these cells then produce our final product, which is usually an antibody. These processes depend on many different process parameters such as temperature, dissolved oxygen, and many others.
Usually, we would use design of experiment to identify ideal process parameter to maximize our response, for example, the yield of our antibody or the viable cell density that we obtain within our process. These processes can be divided into different classical stages that are typical for such a cell growth process, namely the growth, the transition, and the production phase.
Here in this next slide, I show two different bioreactor growth curves, the green one and the blue one. As you can see, the blue bioreactor, in this case, has three times the title of the green bioreactor. How do they differ? They differ, as you can see now, based on the temperature profile that has been executed in these respective processes.
As you can see, the blue growth curve is a lot higher. The viable cell density is a lot higher than in the green process, which leads to this increased title. This just highlights that by understanding the process dynamics, we can potentially vastly improve our process performance in our bioprocesses.
Exactly this is the idea behind intensified design of experiment. In intensified design of experiment, we would divide our process into separate stages. Here's stage one, stage two, and stage three, and perform intra-experimental parameter shifts over time.
We would change the process parameters from the growth to the transition to the production regression phase, and thereby we are able to optimize the parameter settings within each of these separate stages instead of having one set of parameters that we would use for the whole process, so we can optimize these process dynamics.
A certain challenge in this approach, though, is how to properly consider the effect of the process history on our regression models. For example, how do the process parameters of stage 1 influence the response behavior of our cells to our process parameter shifts in stage 2 or in stage 3? Exactly this is the aim of this simulation study to investigate the impact of process history on our regression models.
How did I simulate this? I simulated the data for an intensified design of experiment with two process parameters, in this case, temperature and dissolved oxygen, and three stages, stage 1, stage 2, and stage 3. The effect of each process parameter within each stage is described explicitly, as you can see here, temperature 1 and dissolved oxygen for the first stage, then for the second stage, and for the third stage. All of these process parameters are modeled as hard-to-change factors, and at the same time, we're also investigating the culture duration as an easy-to-change factor, which is the temporal component.
We also have the bioreactor, which is modeled as a whole plot to accommodate for random offsets between the bioreactor. We model it as a linear mixed model.
Where can we find the process history within this? The process history can be found within these across-stage interactions, meaning what is the impact of the process parameter settings of the first stage on the response behavior of our cells through the process parameter changes in the second or the third stage, here highlighted for the temperature in stage 1. These across-stage interactions can be used to get an idea of the influence of the process history.
This brings me to the setup of my simulation. I used the simulate response platform in JMP. I used exactly the design that I just showed to you within these last slides to create a model and to get an idea what these beta coefficients in this model potentially could look like.
I used knowledge from historic regression models to create this base model. Within this base model, I highlight simulated the across-stage interactions, which I used to model the impact of the process history. These across-stage interactions is what gives us an idea of the impact of the process history on the regression models.
To also get a visual idea of what the data looks like, I here plotted the viable cell density over the culture duration for this simulated base model, colored by the effect of temperature in stage 1. But since we are interested in the impact of process history, I deviated this base model, creating alternative model where I varied the magnitude of these across-stage interactions. You can see here, in this case, I doubled the coefficients of the across-stage interactions, then I halved them, and I quartered them to simulate different scenarios with different impact of the process history.
As you can see in those plots, these also result in perfectly valid-looking viable cell density curves. I used these four different models that I just showed to you as the fixed effects in my linear mixed models. I also introduced a random hole plot error and the random residual error. What I did then was to fit these models.
First, here you can see those are all the eligible terms that I had in my model. All the terms that I simulated, including these across-stage interactions, which I use as proxy for process history. The second model that I fit was a model that did not contain these across-stage interactions, so it had no way of accommodating for the process history.
Those two models I then compared based on their beta estimates, how well they are able to estimate the beta coefficients which I knew from my simulation, as well as the variance components. I repeated this fitting 1,000 times and afterwards looked at the respective distributions. This is what I would like to show to you next in the results section.
To give you an idea what the data actually looks like then, here for one beta, I visualized the beta estimate distribution for the full model with across-stage interaction, considering the process history, and for the reduced model without across-stage interactions, not considering the process history.
As you can see for big across-stage interactions here times two, we have a rather big difference between the mean beta-estimate distributions. Whereas for really small across-stage interaction effects, we have a really small difference between the mean estimates of both beta estimate distributions of this 1,000 fittings that I performed. Estimates of both beta distributions, if the difference of those is zero, it would mean that both models mean estimates are the same.
On the next slide, I calculated the mean difference between the beta estimate of the full model, considering the process history, and the reduced model without across-stage interaction interactions which doesn't consider the process history.
As we can see for big effect sizes of these across-stage interactions, there's quite a big difference in the mean beta estimates for the full model and the model without across-stage interactions. Whereas for small effect size of these across-stage interactions, there's a really marginal difference.
What I would like to show to you on the next slide are the variance components. Again, the comparison of the full model, considering process history, considering these across-stage interactions, and we can see that the variance components are estimated correctly here with 0.25 independent of the effect size of these across-stage interactions. Whereas for the reduced model without across-stage interactions, we can see that we have a very inflated whole plot variance compared to the residual variance.
The smaller the effect size of these across-stage interactions, the closer these variance components get to the actual simulated of the variance components. This inflated whole plot variance is probably due to the model explaining some of this variance due to the missing interaction terms in the model by inflating this whole plot variance term.
This brings me to the conclusion, so to the final of my presentation. What we saw was that for very small across-stage interactions, meaning a small effect of process history, the model without across-stage interactions projects onto the full model, which accommodates for the process history. Meaning that these across-stage interactions could be potentially neglected in modeling.
Whereas when we have big across-stage interactions or a high impact of process history, we can see quite a big impact on the beta estimate, which can be seen at the difference between the mean beta estimate of the full model compared to the model without across-stage interactions. This is partwise compensated in the model without across-stage interactions by an inflated whole-plots variants.
To give a really brief outlook, we would like to generalize these results maybe by quantifying the magnitude of across-stage interactions compared to non-across-stage interactions to get a way to compare them to actual experimental results that we will hopefully obtain in the future.
Furthermore, we would like to extend this benchmarking to further modeling approaches such as stage-wise regression models, as well as using the Functional Data Explorer for analyzing iDoE data.
That's the end of my presentation. I want to thank you all very much for listening, and I want to especially thank Verena and Beate for the supervision and the guidance during my PhD project.