Cell culture plays a crucial role in the production of biologics. When introducing process changes as part of a design of experiment (DOE), accurately modeling the behavior of the cell culture process is challenging as the process involves multiple interdependent growth and production phases, only some of which may be impacted by process changes. Traditional parametric non-linear models struggle to effectively capture this complexity, while non-parametric models alone can be disjointed and difficult to correlate with DOE parameters.

To address this issue, functional DOE simplifies the complexity into principal components and correlates the changes with DOE parameters. This approach enables the creation of a prediction profiler, which can optimize cell culture parameters from small scale data and use them to predict behavior during larger-scale production. The entire process can be performed within the Functional Data Explorer Platform in JMP Pro and can provide a more efficient approach for optimizing cell culture processes.

 

 

Today, I'll be going over a use case for using functional DOE to model some complex nonlinear cell culture processes. Initially, I'll go over what cell culture process we were trying to model, which is a continuous production. I'll go over two of the goals of this modeling. That would be maintaining steady state and then also seeing if we can predict some large-scale behavior based on some small-scale data.

Then I'll go over into non-parametric splines, which are the basis functions for the modeling that I did, and an introduction to Functional Data Explorer, which is the platform that in this case is using splines as well as principal components to model.

I'll go over two different responses that I had that address the two goals, as well as some different types of spline fits and how to incorporate DOE conditions. Finally, I'll go over how to generate some joint profilers and some summary and discussion of findings and maybe any challenges that I had.

The goal of what we're trying to model is called continuous production. The standard within biologics is to do batch production. For that, you will be growing up cells, maximizing their output, and then they will get metabolically exhausted at a certain point and go into a death phase. Then you would harvest everything in that bioreactor, which should include your dead cells as well as your drug or molecule of interest, and you'll take that and process that to whatever extent you need to.

What we're trying to look at, and a lot of people in the industry are looking at, is instead of growing and dying and doing these one-off batches, can we be more like other industries, say the car industry or something like that, where we're just continuously producing material?

To do that, we want to, of course, still grow up the cells, but then we want to maintain them in that stationary phase, which would be number 5 in this growth curve, and prevent this death phase at number 6. To do that, you would potentially have a smaller bioreactor, but you'd be constantly feeding in new media that should have fresh nutrients and everything, and then at the same time removing the same amount of liquid. In theory, that liquid should have your drug of interest in it. The goal is to maintain a steady state of new feed coming in and then hopefully permeating off whatever molecule we're looking to produce.

The second goal, other than trying to maintain that steady production is we want to be able to predict how things are going to be done at large scale based on small scale. Because at small scale, we can run lots of different reactors, we can introduce different conditions, but we can't really do that at large scale because of logistics, time, money constraints. We want to be able to see can we both predict the small to large scale, but then can we also see how maybe adding different factors, different conditions might impact at the large scale as well.

Another interesting thing about this is with production, because you have these different phases, you might be introducing conditions at different time points. The growth phase might always be at the same control, maybe the same control as large scale, but after that, you're then going to add these different DOE factors, which are going to have a time element to them. We'll go over how we were able to do that with Functional Data Explorer.

I'm first going to introduce spline. This is the basis of the modeling we were doing. Initially, we wanted to look and see if we can do some parametric modeling, which maybe would have worked for some of the phases. We could have maybe used logistic exponential for the growth phase. But when we're adding, introducing the effects of scale and the DOE that are introduced at different times, we weren't necessarily capturing all of that because some productions might grow and then die, some might grow and be steady. We couldn't really capture that with a parametric model we found.

We wanted to use something more flexible, which are splines in this case. What a spline essentially is, is you're going to fit a simplified function between knots in the data. When you're trying to fit a spline, there's a couple of choices you need to make. How many knots do you want to do? Generally, the more knots you have, the more closely the model is going to fit the data, which could be good or bad, might lead to overfitting. Then also your degree of fit. Between each knot, do you want to have a straight line? Do you want to have some curvature? Do you just want a step function?

Then with certain types of splines, which Functional Data Explorer cause P Splines, you can add a smoothing function. At each knot, you're going to smooth between each spline so that if you're using linear, for example, going to get these rapid disjointed lines.

The two spline options that you have in Functional Data Explorer is B Splines and P Splines. There are also other options in Functional Data Explorer, things like wave functions, but that didn't necessarily apply in this case. But as a general rule of thumb, I found generally P Splines, which are the more simple regression splines, are good for modeling smooth functions. Where P Splines can be better for those noisy wiggly functions because it smooths out maybe some of that noise. That's a general statement. There are ways to certainly overfit with a P Spline, there's ways to underfit with a B Spline, but those are just general rules of thumb.

In this case, the platform we're using is Functional Data Explorer. I have some images of what the platform actually looks like. Generally, functional data is just data that's recorded over a continuous domain. In this case, that would be time, the batch is being produced over time. It can also be other things. I've seen it used for spectral data, for example. Then you want to identify a set of measurements. In this case, that would be different batches we would identify as, so there'll be individual fits for each batch.

The other nice part that I'll go into more detail later is you can add these supplementary factors. These can be your DOE factors, so things that are coming in and potentially changing the shapes that you want to capture. I'll go more into detail on how that works later in the presentation.

Initially, I wasn't sure what type of spline would be best because there's not necessarily a hard and fast rule that I found. I initially did modeling with both. When you're modeling these in Functional Data Explorer, there's a lot of ways to find an optimal model. In this case, I use BIC minimization. You can also do things like cross-validation. In this case, because I was introducing those supplemental factors, it was a little harder to define what a testing versus training data set would be. We just went with BIC minimization in this case.

We have results from the two different spline types here. This red line is the mean fit of each of them. You can see they're pretty similar, but they reach that different ways. The B Spline recommended fewer number of knots, in this case 13, and it recommended a cubic function, so adding a little bit of that curvature that we do see here.

P Spline recommended having a knot at every single day, essentially. There's 30 days, but 29 knots between each day. Then it was smoothing at every single knot, and so it was able to use a linear fit but still get that curvature by introducing that smoothing. We see they're quite similar at the beginning here. There's maybe some differences at the end where we were actually potentially missing some data from some batches.

We'll look at next the individual fits to see what was maybe going on there where we're seeing those differences at the end. Here we're showing for the same set of batches, what were the actual spline fits per batch. Again, we're seeing this similar thing we saw with the mean function that the beginning is pretty similar to each other, in the middle is similar to each other, but at the end of productions for the B Spline fits, sometimes we're seeing these rapid changes in direction, which is not what we would expect to see in real life.

Generally, once the cells are dying, they don't magically resurrect and start growing again. This seems to be an artifact of the fact that we didn't necessarily have Day 30 data for all batches. Because the B Spline was using that cubic function, it was predicting things that maybe didn't necessarily make sense. Where with the P Spline, which it was using that smooth linear function, we're getting a little bit more of what we would expect at the end of the batch, even if we're missing data. This was suggesting that P Spline might be a better fit, but I still wanted to continue with both to confirm that.

With Functional Data Explorer, once you get all of the fits of all the batches, it will then generate a mean function. That's the mean of all the fits for all the batches. Then it will look and say, "Where are we getting big sources of variability away from that mean function?"

If you remember a couple of slides ago, we saw a big fanning. I'll go back to show. We see this big fanning out. Things are pretty tight here, and then we see this big fanning out at the end, which is what we're trying to explore. Of course, looking at this first shape function, which is going to describe between 70% and 80% of the variability, we're seeing it's pretty flat at the beginning during this growth phase, but then we're seeing this variation at the end, which is what we would expect. Indeed, this is probably the stable phase that we're trying to model. This seems to represent that because it's steady and then it goes up, which is then countering this drop-down that we're seeing in the mean function.

We see they're both capturing that here in the shape function, but for the P Spline, it's going up and being steady, where for the B Spline, it's going up and again having that little drop at the end that we didn't necessarily want.

The additional shape functions are capturing things that maybe are less of interest but might still be interesting to model. Maybe things like little jumps, maybe between the transition, between the growth and stationary phases, some drop-offs at the end. But this initial shape function was definitely the one that we were probably going to be the most interested in.

The shape functions I find can sometimes be a little bit hard to interpret because you then have to compare them to the mean function. One thing that's nice in Functional Date Explorer, like any other principal components, it provides score plots. So it'll fit each different batch, and it'll give them a score for how high or low they were on each principal component, so which one is high in shape function one and which one is low in shape function one.

From there you can see examples which helps you define what is this shape change that's happening here. We can see here confirms that this particular batch, which was high in this initial shape function, had this nice steady state, where the one that's low, we're seeing this increase and then this drop-off.

Out of curiosity, also looked at what are these other ones doing? They seem to be more like capturing sometimes we get this bump between the growth and stationary phase because we're changing a condition and that can make the production a little disjointed, which is not a huge deal, but maybe something we want to minimize, or maybe these drop-offs at the end sometimes, which may or may not be of significance. They might just be an artifact of them ending this particular batch.

From this, we defined our functional principal components, we identified where those shape changes were occurring. But then we wanted to see, what actually changes those shape functions? Can we actually control how the shape is done by manipulating conditions? That's where the functional DOE comes in.

For that, what is essentially happening is there's a regression against each individual principal component with your supplemental DOE conditions. There's a lot of different options for how you can do that modeling. In this particular case, I just did a best subset with AIC minimization because in this case, we're only introducing three factors. There wasn't a huge need to reduce the number of factors and reduce complexity. Then also wasn't too computationally intense to do a best subset. That's the method that I use for all of these FPCs. We can see here, which is nice, is they output a prediction expression. You can get an idea of how each factor is impacting each shape.

But again, that's not always the most ideal way to interpret something. From it, we might see Factor 3 has the biggest coefficient, so maybe Factor 3 is most influential. But one of my favorite things in JMP is profilers. Really helpful to be able to visualize what is actually going on.

In this case, looking at the profiler, I used maximizing Day 30 as a proxy for an extended stationary phase. Not a perfect solution, but we think if it's going to be steady, it's still going to be high at Day 30. Indeed, when we do that optimization, we see that minimizing Factor 3, which makes sense because it has a negative coefficient, is going to give us a more steady state production than when Factor 3 is really high. That was really helpful for our customers. It basically gave them a leverage point to help them identify, "Here's a condition where we're going to get more steady production."

The second response, since we identified the steady state, was we want to look at, can we predict large scale? The data set we had for this response contained both large-scale run at control conditions, and then small-scale run at control conditions, as well as introducing those different DOE conditions after growth.

The reason we wanted to be able to capture this is because it's not a perfect translation from small scale to large scale, especially for the growth phase. Because the cells might have more room to grow at the large scale, there might be better or worse mixing, that they might have more access to nutrients. We tend to see more rapid growth at the large scale than we would see at the small scale. We wanted to see, can we generate a prediction profiler where we're predicting that increase in growth, but then also seeing if we can apply those impacts from the DOE as well.

I won't go into detail again on this, but a very similar fitting was done for Response 2. P Spline again was chosen because there was that weirdness at the end that we saw with the B Spline for Response 1. Looking at the shape functions, we see this one again. This is the steady-state one. But in this case for scale, this is not what we're as interested in.

Because scale, we want to see is it impacting this portion? We do have these additional shape functions where we see this appears to be capturing changes at the beginning of the production, which is what we care about. Indeed, Shape Function 3 seems to be the most dramatic change to that growth phase.

Again, looking at profilers helps confirm that. If we look at these slopes here, when we have high FPC3 versus low FPC3, we see when it's high, it's a little bit more shallow; when it's low, we're seeing this more intense peak.

Then we wanted to move on to the DOE. Can we apply the factors and scale to these principal components? The nice thing that showed up for that Principal Component 3, the only thing that showed up as significant for it was scale, which is nice because we were able to isolate that impact of scale to that particular shape change. Not completely because scale did impact the other principal components as well, but we did have this isolated shape.

Again, looking at the profiler helps the most with interpretation. We see here at Day 30, scale is not really having a much of an impact. It's pretty flat here. But if we look at Day 7 during the growth phase, we do see this difference here, this effect where large scale is going to be higher than small scale. Seeing we seem to be capturing that difference between scale.

We were able to capture those two goals in the models. Another nice aspect in Functional Data Explorer is you can export the prediction formulas to a column in your data table, and then just using the regular profiler platform, you can create a profiler that you can manipulate concurrently between the two responses since they share some factors here.

This is nice for a couple of reasons. One, it's nice to see everything together, but also you may have customers or model users who don't have JMP Pro, they might not have access to Functional Data Explorer. This allows you to share what the model is with somebody who maybe just has regular JMP, and then they can play around with it from there.

Overall, to wrap things up, we were able to achieve our two different modeling goals with Functional DOE. It was able to simplify a lot of the complexity while still capturing the areas of interest that we saw where things were changing from batch to batch under different conditions. We were able to isolate certain functions against specific factors. Then at the end, we were able to create this profiler that makes it really nice for sharing the customer or model user because at the end, they're the ones who are going to use it.

Some limitations. I think maybe it gets discussed at every JMP conference every year that there's no prediction error provided from Functional DOE or Functional Explorer. I had tried to play around a little bit with, if we output our prediction residuals, can we get somewhat of an idea by trying to maybe model around those residuals? I had pretty limited success with that, but definitely open to ideas on how we may be able to do that in the future.

Another thing that would be nice, and this may actually be possible, and I just couldn't figure out how to do it, is optimizing against a specific shape function. In my case, I was optimizing to maximize Day 30 as a proxy. If it's high Day 30, we have a fairly steady state production. That's not a perfect proxy. If we were able to put in like, this is an ideal function where we're having this flat production, how do we optimize against that? That would be useful. Again, that might be something that's already possible, and I just am not aware of.

The other risk is, of course, with any flexible modeling, you may overfit really noisy data. The data was actually pretty smooth in this case. That's not always going to be the case. Generally, when we want to try to balance that overfitting, things like cross-validation are really useful. But because I wasn't running everything at the same conditions, we were introducing these additional factors, that makes defining a testing versus training data set a little bit more complicated. Definitely, in the future, open to any suggestions around that.

But overall, we were able to do what we needed to do. This was, of course, very much not just me working on this. There was a lot of data provided from our preclinical team. We've got some names here as well with some of my co-authors, which helped with a lot of the background on continuous processing. Some help from other statisticians in our research side of things. I just wanted to acknowledge these at the end. There's some references available here as well. That wraps up my presentation on this.

Presented At Discovery Summit 2024

Presenter

Skill level

Intermediate
  • Beginner
  • Intermediate
  • Advanced

Files

Published on ‎07-26-2024 12:57 PM by | Updated on ‎01-28-2025 01:31 PM

Cell culture plays a crucial role in the production of biologics. When introducing process changes as part of a design of experiment (DOE), accurately modeling the behavior of the cell culture process is challenging as the process involves multiple interdependent growth and production phases, only some of which may be impacted by process changes. Traditional parametric non-linear models struggle to effectively capture this complexity, while non-parametric models alone can be disjointed and difficult to correlate with DOE parameters.

To address this issue, functional DOE simplifies the complexity into principal components and correlates the changes with DOE parameters. This approach enables the creation of a prediction profiler, which can optimize cell culture parameters from small scale data and use them to predict behavior during larger-scale production. The entire process can be performed within the Functional Data Explorer Platform in JMP Pro and can provide a more efficient approach for optimizing cell culture processes.

 

 

Today, I'll be going over a use case for using functional DOE to model some complex nonlinear cell culture processes. Initially, I'll go over what cell culture process we were trying to model, which is a continuous production. I'll go over two of the goals of this modeling. That would be maintaining steady state and then also seeing if we can predict some large-scale behavior based on some small-scale data.

Then I'll go over into non-parametric splines, which are the basis functions for the modeling that I did, and an introduction to Functional Data Explorer, which is the platform that in this case is using splines as well as principal components to model.

I'll go over two different responses that I had that address the two goals, as well as some different types of spline fits and how to incorporate DOE conditions. Finally, I'll go over how to generate some joint profilers and some summary and discussion of findings and maybe any challenges that I had.

The goal of what we're trying to model is called continuous production. The standard within biologics is to do batch production. For that, you will be growing up cells, maximizing their output, and then they will get metabolically exhausted at a certain point and go into a death phase. Then you would harvest everything in that bioreactor, which should include your dead cells as well as your drug or molecule of interest, and you'll take that and process that to whatever extent you need to.

What we're trying to look at, and a lot of people in the industry are looking at, is instead of growing and dying and doing these one-off batches, can we be more like other industries, say the car industry or something like that, where we're just continuously producing material?

To do that, we want to, of course, still grow up the cells, but then we want to maintain them in that stationary phase, which would be number 5 in this growth curve, and prevent this death phase at number 6. To do that, you would potentially have a smaller bioreactor, but you'd be constantly feeding in new media that should have fresh nutrients and everything, and then at the same time removing the same amount of liquid. In theory, that liquid should have your drug of interest in it. The goal is to maintain a steady state of new feed coming in and then hopefully permeating off whatever molecule we're looking to produce.

The second goal, other than trying to maintain that steady production is we want to be able to predict how things are going to be done at large scale based on small scale. Because at small scale, we can run lots of different reactors, we can introduce different conditions, but we can't really do that at large scale because of logistics, time, money constraints. We want to be able to see can we both predict the small to large scale, but then can we also see how maybe adding different factors, different conditions might impact at the large scale as well.

Another interesting thing about this is with production, because you have these different phases, you might be introducing conditions at different time points. The growth phase might always be at the same control, maybe the same control as large scale, but after that, you're then going to add these different DOE factors, which are going to have a time element to them. We'll go over how we were able to do that with Functional Data Explorer.

I'm first going to introduce spline. This is the basis of the modeling we were doing. Initially, we wanted to look and see if we can do some parametric modeling, which maybe would have worked for some of the phases. We could have maybe used logistic exponential for the growth phase. But when we're adding, introducing the effects of scale and the DOE that are introduced at different times, we weren't necessarily capturing all of that because some productions might grow and then die, some might grow and be steady. We couldn't really capture that with a parametric model we found.

We wanted to use something more flexible, which are splines in this case. What a spline essentially is, is you're going to fit a simplified function between knots in the data. When you're trying to fit a spline, there's a couple of choices you need to make. How many knots do you want to do? Generally, the more knots you have, the more closely the model is going to fit the data, which could be good or bad, might lead to overfitting. Then also your degree of fit. Between each knot, do you want to have a straight line? Do you want to have some curvature? Do you just want a step function?

Then with certain types of splines, which Functional Data Explorer cause P Splines, you can add a smoothing function. At each knot, you're going to smooth between each spline so that if you're using linear, for example, going to get these rapid disjointed lines.

The two spline options that you have in Functional Data Explorer is B Splines and P Splines. There are also other options in Functional Data Explorer, things like wave functions, but that didn't necessarily apply in this case. But as a general rule of thumb, I found generally P Splines, which are the more simple regression splines, are good for modeling smooth functions. Where P Splines can be better for those noisy wiggly functions because it smooths out maybe some of that noise. That's a general statement. There are ways to certainly overfit with a P Spline, there's ways to underfit with a B Spline, but those are just general rules of thumb.

In this case, the platform we're using is Functional Data Explorer. I have some images of what the platform actually looks like. Generally, functional data is just data that's recorded over a continuous domain. In this case, that would be time, the batch is being produced over time. It can also be other things. I've seen it used for spectral data, for example. Then you want to identify a set of measurements. In this case, that would be different batches we would identify as, so there'll be individual fits for each batch.

The other nice part that I'll go into more detail later is you can add these supplementary factors. These can be your DOE factors, so things that are coming in and potentially changing the shapes that you want to capture. I'll go more into detail on how that works later in the presentation.

Initially, I wasn't sure what type of spline would be best because there's not necessarily a hard and fast rule that I found. I initially did modeling with both. When you're modeling these in Functional Data Explorer, there's a lot of ways to find an optimal model. In this case, I use BIC minimization. You can also do things like cross-validation. In this case, because I was introducing those supplemental factors, it was a little harder to define what a testing versus training data set would be. We just went with BIC minimization in this case.

We have results from the two different spline types here. This red line is the mean fit of each of them. You can see they're pretty similar, but they reach that different ways. The B Spline recommended fewer number of knots, in this case 13, and it recommended a cubic function, so adding a little bit of that curvature that we do see here.

P Spline recommended having a knot at every single day, essentially. There's 30 days, but 29 knots between each day. Then it was smoothing at every single knot, and so it was able to use a linear fit but still get that curvature by introducing that smoothing. We see they're quite similar at the beginning here. There's maybe some differences at the end where we were actually potentially missing some data from some batches.

We'll look at next the individual fits to see what was maybe going on there where we're seeing those differences at the end. Here we're showing for the same set of batches, what were the actual spline fits per batch. Again, we're seeing this similar thing we saw with the mean function that the beginning is pretty similar to each other, in the middle is similar to each other, but at the end of productions for the B Spline fits, sometimes we're seeing these rapid changes in direction, which is not what we would expect to see in real life.

Generally, once the cells are dying, they don't magically resurrect and start growing again. This seems to be an artifact of the fact that we didn't necessarily have Day 30 data for all batches. Because the B Spline was using that cubic function, it was predicting things that maybe didn't necessarily make sense. Where with the P Spline, which it was using that smooth linear function, we're getting a little bit more of what we would expect at the end of the batch, even if we're missing data. This was suggesting that P Spline might be a better fit, but I still wanted to continue with both to confirm that.

With Functional Data Explorer, once you get all of the fits of all the batches, it will then generate a mean function. That's the mean of all the fits for all the batches. Then it will look and say, "Where are we getting big sources of variability away from that mean function?"

If you remember a couple of slides ago, we saw a big fanning. I'll go back to show. We see this big fanning out. Things are pretty tight here, and then we see this big fanning out at the end, which is what we're trying to explore. Of course, looking at this first shape function, which is going to describe between 70% and 80% of the variability, we're seeing it's pretty flat at the beginning during this growth phase, but then we're seeing this variation at the end, which is what we would expect. Indeed, this is probably the stable phase that we're trying to model. This seems to represent that because it's steady and then it goes up, which is then countering this drop-down that we're seeing in the mean function.

We see they're both capturing that here in the shape function, but for the P Spline, it's going up and being steady, where for the B Spline, it's going up and again having that little drop at the end that we didn't necessarily want.

The additional shape functions are capturing things that maybe are less of interest but might still be interesting to model. Maybe things like little jumps, maybe between the transition, between the growth and stationary phases, some drop-offs at the end. But this initial shape function was definitely the one that we were probably going to be the most interested in.

The shape functions I find can sometimes be a little bit hard to interpret because you then have to compare them to the mean function. One thing that's nice in Functional Date Explorer, like any other principal components, it provides score plots. So it'll fit each different batch, and it'll give them a score for how high or low they were on each principal component, so which one is high in shape function one and which one is low in shape function one.

From there you can see examples which helps you define what is this shape change that's happening here. We can see here confirms that this particular batch, which was high in this initial shape function, had this nice steady state, where the one that's low, we're seeing this increase and then this drop-off.

Out of curiosity, also looked at what are these other ones doing? They seem to be more like capturing sometimes we get this bump between the growth and stationary phase because we're changing a condition and that can make the production a little disjointed, which is not a huge deal, but maybe something we want to minimize, or maybe these drop-offs at the end sometimes, which may or may not be of significance. They might just be an artifact of them ending this particular batch.

From this, we defined our functional principal components, we identified where those shape changes were occurring. But then we wanted to see, what actually changes those shape functions? Can we actually control how the shape is done by manipulating conditions? That's where the functional DOE comes in.

For that, what is essentially happening is there's a regression against each individual principal component with your supplemental DOE conditions. There's a lot of different options for how you can do that modeling. In this particular case, I just did a best subset with AIC minimization because in this case, we're only introducing three factors. There wasn't a huge need to reduce the number of factors and reduce complexity. Then also wasn't too computationally intense to do a best subset. That's the method that I use for all of these FPCs. We can see here, which is nice, is they output a prediction expression. You can get an idea of how each factor is impacting each shape.

But again, that's not always the most ideal way to interpret something. From it, we might see Factor 3 has the biggest coefficient, so maybe Factor 3 is most influential. But one of my favorite things in JMP is profilers. Really helpful to be able to visualize what is actually going on.

In this case, looking at the profiler, I used maximizing Day 30 as a proxy for an extended stationary phase. Not a perfect solution, but we think if it's going to be steady, it's still going to be high at Day 30. Indeed, when we do that optimization, we see that minimizing Factor 3, which makes sense because it has a negative coefficient, is going to give us a more steady state production than when Factor 3 is really high. That was really helpful for our customers. It basically gave them a leverage point to help them identify, "Here's a condition where we're going to get more steady production."

The second response, since we identified the steady state, was we want to look at, can we predict large scale? The data set we had for this response contained both large-scale run at control conditions, and then small-scale run at control conditions, as well as introducing those different DOE conditions after growth.

The reason we wanted to be able to capture this is because it's not a perfect translation from small scale to large scale, especially for the growth phase. Because the cells might have more room to grow at the large scale, there might be better or worse mixing, that they might have more access to nutrients. We tend to see more rapid growth at the large scale than we would see at the small scale. We wanted to see, can we generate a prediction profiler where we're predicting that increase in growth, but then also seeing if we can apply those impacts from the DOE as well.

I won't go into detail again on this, but a very similar fitting was done for Response 2. P Spline again was chosen because there was that weirdness at the end that we saw with the B Spline for Response 1. Looking at the shape functions, we see this one again. This is the steady-state one. But in this case for scale, this is not what we're as interested in.

Because scale, we want to see is it impacting this portion? We do have these additional shape functions where we see this appears to be capturing changes at the beginning of the production, which is what we care about. Indeed, Shape Function 3 seems to be the most dramatic change to that growth phase.

Again, looking at profilers helps confirm that. If we look at these slopes here, when we have high FPC3 versus low FPC3, we see when it's high, it's a little bit more shallow; when it's low, we're seeing this more intense peak.

Then we wanted to move on to the DOE. Can we apply the factors and scale to these principal components? The nice thing that showed up for that Principal Component 3, the only thing that showed up as significant for it was scale, which is nice because we were able to isolate that impact of scale to that particular shape change. Not completely because scale did impact the other principal components as well, but we did have this isolated shape.

Again, looking at the profiler helps the most with interpretation. We see here at Day 30, scale is not really having a much of an impact. It's pretty flat here. But if we look at Day 7 during the growth phase, we do see this difference here, this effect where large scale is going to be higher than small scale. Seeing we seem to be capturing that difference between scale.

We were able to capture those two goals in the models. Another nice aspect in Functional Data Explorer is you can export the prediction formulas to a column in your data table, and then just using the regular profiler platform, you can create a profiler that you can manipulate concurrently between the two responses since they share some factors here.

This is nice for a couple of reasons. One, it's nice to see everything together, but also you may have customers or model users who don't have JMP Pro, they might not have access to Functional Data Explorer. This allows you to share what the model is with somebody who maybe just has regular JMP, and then they can play around with it from there.

Overall, to wrap things up, we were able to achieve our two different modeling goals with Functional DOE. It was able to simplify a lot of the complexity while still capturing the areas of interest that we saw where things were changing from batch to batch under different conditions. We were able to isolate certain functions against specific factors. Then at the end, we were able to create this profiler that makes it really nice for sharing the customer or model user because at the end, they're the ones who are going to use it.

Some limitations. I think maybe it gets discussed at every JMP conference every year that there's no prediction error provided from Functional DOE or Functional Explorer. I had tried to play around a little bit with, if we output our prediction residuals, can we get somewhat of an idea by trying to maybe model around those residuals? I had pretty limited success with that, but definitely open to ideas on how we may be able to do that in the future.

Another thing that would be nice, and this may actually be possible, and I just couldn't figure out how to do it, is optimizing against a specific shape function. In my case, I was optimizing to maximize Day 30 as a proxy. If it's high Day 30, we have a fairly steady state production. That's not a perfect proxy. If we were able to put in like, this is an ideal function where we're having this flat production, how do we optimize against that? That would be useful. Again, that might be something that's already possible, and I just am not aware of.

The other risk is, of course, with any flexible modeling, you may overfit really noisy data. The data was actually pretty smooth in this case. That's not always going to be the case. Generally, when we want to try to balance that overfitting, things like cross-validation are really useful. But because I wasn't running everything at the same conditions, we were introducing these additional factors, that makes defining a testing versus training data set a little bit more complicated. Definitely, in the future, open to any suggestions around that.

But overall, we were able to do what we needed to do. This was, of course, very much not just me working on this. There was a lot of data provided from our preclinical team. We've got some names here as well with some of my co-authors, which helped with a lot of the background on continuous processing. Some help from other statisticians in our research side of things. I just wanted to acknowledge these at the end. There's some references available here as well. That wraps up my presentation on this.



Start:
Wed, Oct 23, 2024 11:30 AM EDT
End:
Wed, Oct 23, 2024 12:15 PM EDT
Executive Briefing Center 8
Attachments
0 Kudos