Hi, my name is Tannaz Tajsoleiman. Today, I'm going to talk about an application of JMP data analysis platform for pharmaceutical industry. The focus of this presentation is going to be on how we can get more cross- level insight using smarter Design of Experiment DoE and afterwards how we can analyze it better in order to find optimum process settings for specifically, for example, pharmaceutical industries.
But what do I mean of cross- level insight? To get to that, let me explain the case story that we have done with one of our customers. That they were in the phase of process development of a vaccine production, and they wanted to characterize this process. If we're going to divide the different phase of vaccine production and the process. We can split it into two main phases of Upstream and Downstream.
The Upstream phase is the phase that they infect the host cells with a limited number of virus and then keep it under a controlled environment and under specific process condition for the virus to grow and the population reaches the target level. After that, they have to move the materials to the next unit. We call it Downstream Unit for purification and the formulation to make the vaccine ready for the injection and going out to the market.
The customer were focusing on the process development of these two phases, meaning that they needed to characterize the process. In order to do that, they needed to identify what's the most significant parameters for each of these units, both Upstream and Downstream using Design of Experiments. T hen after that, collect the data to be able to model the process for each individual unit, and then using the model to find the optimum robust setting for each of them.
It's a very typical task within the bio-industries to split the process into main units and then characterize each unit individually. M ost of the cases, this characterization are not influenced by each other. So they are completely independent task and Design of Experiments.
Keeping this in mind, the agenda of this talk is going to be on how we can make these DOEs more descriptive and more comprehensive to have the highest or more information about different levels of the process, and then afterwards, how we can use JMP data analysis platform to first model the data and then use the model to optimize the processes.
Let's start with the Design of Experiment for this specific case study. W e started with actually running several workshops with the teams coming from both Upstream and also Downstream. T hen we had several workshops to just figuring out what are the most important factors or parameters in their processes.
We could easily see, in the first glance, that we are ending up to a very high number of factors that practically it's impossible or it's out of the budget to run such a big Design of Experiments. W e had to narrow it down with starting scoring each of these parameters based on the importance and how much they vary within a normal production and then how it would be the ease of change for each of these factors. W e could score it first and then narrow the number of parameter to eight for each of these units.
After that, we could easily also within that workshop, could see the high influence of, for example, Upstream outcome on the Downstream Unit just because these two units are highly connected together and they can easily influence each other. Meaning that if they wanted to characterize a Downstream separately and finding the optimum setting, most probably it cannot work as optimum as they expected because it can highly influence by the outcome of Upstream Unit.
It's better to have it instead of two individual or independent Design of Experiment, having a joint DoE, cover both sides together and also covers the interaction. That gives us a very good cross- factor overview within our Design of Experiments and help also at the same time to minimize the number of runs that they make. A lso getting too much better randomization factor for our runs, of course, under practical supervision.
How we have done this theory? Before I jump to the demo that I wanted to show you, let me give you some information about the limitation that we had in this theory. F irst of all, each Upstream batch could directly go to one Downstream batch. But in the Downstream, they had the possibility to run two reactors in parallel to increase the capacity of Downstream.
But the biggest limitation they had in the Downstream was that when they were running those parallel reactors, two of the design factors had to be set the same for those two parallel reactors. For example, in this case, they had to keep the Enzyme time and Hold up time on the Downstream constant on both parallel reactors.
Then also it was very important to have a proper time planning between Upstream and Downstream, meaning that they couldn't have a big lag from the finishing of the Upstream batch until it reaches to the Downstream batch.
Also between these two phases, we could identify a factor called initial cell density that is mostly coming from Upstream, that it was a common factor between these two units. W e call it as a pair factor to join these two DoEs and connect these two DoEs into one complex one.
To start that, we also needed to understand, okay, what is the minimum number of Upstream batches or Downstream batches we need? S ince, as you can see here, they both had eight design factors, but since the Downstream process had the extra limitation on our design, that would be the one controlling the minimum number of experiments. That's why I start the DoE by looking at the Downstream process.
Let's have a look at the demo. As you can see here, these are the design factor that we had for the Downstream Unit. I want to know how many minimum, how many experiments or badges do I need from Upstream to be able to cover all the required badges or experiments I run in my DoEs.
Starting with custom design, then I can load my factor. These are all factors that I have for Downstream. Plus I need to add my initial cell density, the part factor that comes from Upstream batches. As you can see here, almost all of these factors, the changes are set as easy except the Hold up time and also Enzyme time, which are the two factors that I want to keep them similar for the two running batches in parallel.
With this setting, I can be sure that those batches are set to have the same Hold up time and Enzyme time. Moreover, I'm interested in the main effect in my model and the interactions. Let's have a look here. As you can see, some of them, the interactions are set at if possible, which I want to have them as necessary. And we are good here.
As you can see here, JMP suggests me that you need minimum 19 experiments in a Downstream to cover for these interactions. T o be safe, we always want to have one minimum extra on top. T hat gives me 20 experiments or 20 Upstream batch as the minimum starting point for Upstream.
Okay, then let's have a look at my Upstream. These are my Upstream factors and I can load it in my custom design. I am also here interested in two interactions. S ince we said that we want to have 20 Upstream experiments, I can specify as a user how many experiments I want. T hen I can make the design by pressing this one.
To save the time, I already saved the design and let me bring it up here. This is the experimental design for my Upstream in the first level. Now, I want to use this design to design my Downstream experiment. To do that, I go back to my design factor for Downstream. A gain, custom design. I will load my factor as I want them to be.
N ow, I want to include this design experiment for Upstream in my experiment. How I do that is that I first select that design window and then I can call it. I can select the coherent factors, and then import that experiment also in my design. A s you can see here, these are the experiments that was designed for the Upstream, and they are set as a covariate .
More than step is that I want to have all my Downstream parameters plus my initial set density. I want to look at them also in interaction mode. But the factors that comes from Upstream, I keep them as a main effect. As you can see here, there are some still at if possible, I need to be sure that they are set as a necessary one.
Then since I already included a design experiment, then I can exclude the factor that I have from Upstream. Maybe I can just remove them all here again. Add the main factor, the second order, and then the factors from Upstream only as a main factor.
Then I can choose them. If I go up here and say perfusion rate, this one, this one, and this one, these are the factors are coming from my Upstream, and I can set them as if possible. Just to be sure that I have everything included in this one and this one has to be also if possible. I don't need to force it in this design anymore because it's already included there. Now, I can set it to have 10 whole plots and 20 runs here and make the design.
To save the time again, I have made the design and this would be the outcome. Now here I have a combined DoE both for Upstream and Downstream. The nice part about it is that now I have a column called Whole Plot that it says that for the same number, meaning that it says that each parallel unit, I have a fixed Hold up time and the Enzyme time.
As you can see here that it changes, it keeps constant between each two pairs. I'm combining both designs Upstream and Downstream now into one unit. Then I can also prioritize which Upstream Unit has to be run first to be able to keep the time balance between the Upstream and Downstream, and also have so much flexibility in keeping the condition fixed between the parallel expert. This could give me a very good overview of both level and then both phases together.
Let's go back to the slides. N ow we have a good Design of Experiment covering both phase. Now let's have a look how we can do the analysis together. But to save the time, I'm going to only look at the Upstream process and look at the data. T he aim is that now I want to model the growth phase of the virus in the Upstream and use the Kinetic Modeling.
Then after I find that model, then I can find the optimal process conditions for Upstream. How can use the functionality build up in JMP? T his is my data set at the moment, it's a normalized data set that we have. A s you can see here, this is the data that we collected during this experiment. It was 30 different batches.
T hen we ran different processes and we monitored the growth phase of virus over different days and different conditions. And then we could see how the virus population builds up. U nder some conditions after a while, the virus population starts to degrade or go to the dead space.
N ow we want to characterize or model this profile or this growth phase of the virus to be able to predict, okay, what would be the best combination of cell, virus, the environmental condition, and also how many days do we need to run in the Upstream?
To do that, JMP has a very nice feature in the specialized modeling and fit curve. I can put Y, the yield is the virus concentration or population over time, as Y. X as my day and the batch number and the factors that is controlling my process, which is the incubator, the temperature and the initial virus concentration in the logarithmic scale.
The fit curve gives me a very nice initial overview of how each of these batches build up over time individually. But then now I want to model these batches or these characteristic, this Kinetic model. We have a very nice option here in JMP that's called Exponential Growth and Decay and then Fit Cell Growth. So it's a built- in library that you can easily choose and then you can fit a logistic growth model to your profile.
As you can see here, this cell growth models tries to fit this functionality to each of my curves, covering of Y0 , which is initial virus concentration, YM ax is the maximum reached virus concentration, and then division rate and then mortality rate. T hen it gives you a nice summary of each of these batches, and this is the valid development of each of these parameter for each of the batches.
But then I want to have one extra further, just figuring out how is this process parameter affecting each of my factors here. T hen in Pro version of JMP, there is a very nice option called Curve DoE A nalysis that gives you this option to analyze what factors impacting each of my process parameter of YM ax, for example, division ratio or mortality rate or so on.
As you can see here, it gives you a combined window that for each of these four parameters, it gives you a possibility to use the Generalized Regression and then try to analyze or model each of them individually. As an example, you see that for YM ax, I can see the temperature and initial virus concentration has a significant effect and incubators are not working similar. You really get a good overview of the different parameters and stuff.
Or if I go for mortality, I look at the Profiler, I see also how would be the effect of these two factors on each of the incubators, for example. It has a very good user interface to combine all these analysis together and gives you a very nice overview of different factors and their effect on my process.
But also gives you a nice Profiler at the bottom here that gives you good information about, okay, if I change my incubator and temperature and my initial virus concentration, how does it affect my yield? T hen if I'm going to, for example, reach to my target within half a day, how should be my process condition? T hen I can easily use the Desirability Function here and maximize it.
What I can see here, if I want to reach my target and get the maximum virus concentration within half a day, I have to start with the highest initial virus concentration and highest temperature. While i f I give it more time and, for example, say that it's fine, I can run my Upstream with two and a half days, then the situation would be different.
You would say that, okay, to get the higher yield, you need to start with the lower cell density or virus density, and then viruses that like that high temperature. That's why you should go with a lower temperature. It's a very nice intuitive way to just play around with different factors and then based on the visibility of your system and also the practicality that you have in your process, then you can fine tune and find the optimum robust process condition.
This is fantastic. But the biggest problem that I have with this Curve DoE Analysis is that I'm missing some very nice functionality in the standard modeling part of it, which if I go for diagnostic plot, for example, I'm missing this digitized residual plot that helped me to find outli er, which I cannot identify if I do have outli er here or not.
Or if I do need to apply some transformation like p lots transformation or so. I really missed those functionality in this part, but I still have an option to compensate that missing with extracting this summary out and then try to model each of these parameters individually. But let's do that.
I extracted that parameter, as you can see here, the group summary parameter, and I extracted them, and then I added to my data table. Now I can have a look at each of them individually and make the analysis and the to just figuring out.
For example, I will go for Y0 , and then you have the option, if I go here, to just apply your own routine of modeling and then include all the factors and use Generalized Regression and then move on with figuring out if you need, for example, to apply it in transformation or remove any outliers. Then, for example, if I look at this one, if I look at YMax model and I try to model it in a normal routine, I can see that, for example, I do have one out lier that I have to remove it.
Or for example, I need to apply some transformation on my data to clear it out first and also to make it more normalized and then include it in my... Then move on with my rest of the analysis. But then if I compare here, for example, now I want to remove the outlier here.
Now I'm removing the outlier here. Then I see that I don't need any more plots curves transformation after removing the outlier. What I see here, this is going to be the profile from a normal or standard approach of modeling of each of the parameters. While if I come back to this section, I was missing that analysis and therefore, what I could get out of it was such a Profiler because I still had my outlier in this data set and I couldn't see that. T hen it was coming with something else.
If I do this first, I do my initial inspection and then I can come back here in my picture and then, for example, I can apply... I first need to remove this again, Re move Fit. And then, for example, I can apply my local data filter and exclude my outlier and then do the modeling part of it. And then, for example, and move with the other part.
I have this option to do this. And of course, I do have the option to force factor in model launch, as that's controlled and force term, to force some factors that I could see in my previous analysis that I have to have it. To start with. I have these two options to compensate for the missing diagnostic part in this analysis.
But in general, that's a perfect tool to have a very good overview about what could be the potential factors affecting on my process.
Okay. To summarize how we can use this functionality, as you can see here, we started with fitting a curve, and then we extracted the model parameter, and then we could investigate each of those parameter individually, and then either we force the significant factor in the fit curve, or exclude the outliers, and also just beforehand apply any transformation that is needed, and then you can get a combined overview as such in the fit model. This going to be a very nice analysis and a very useful package of functionality for our case.
To summarize, what we have done is that I really wanted to emphasize on, first of all, it's very important to know the complexity of the system, especially when we have interact with some different level of processes that they are interacting partly together and before moving on for several DoEs that they are independently working and they are independently used to characterize the system.
Try to combine these DoEs into one coherent one because it really gives you a good overview of multi- layer interactions and also gives you a lot of information that you miss in practice if you run several independent DOEs.
Then JMP also gives you a lot of flexibility on building up these DoEs and also implement a lot of practicality information in your DoEs. T hen at the end also, we could see that how different data analysis functionality and also built in libraries could help to characterize the different processes, specifically, for example, in this case, it was the self- growth process, and how we can use some very powerful tools, especially in JMP Pro, and also how we can interact with other standard applications that you have in JMP to compensate for some missing features in the JMP Pro Advanced Analysis Toolbox. With that, I want to thank you so much for your attention and I hope you like the presentation.