If a pharmaceutical product is likely to degrade after 12 months, how can we get an early insight into that risk? The answer is to perform an accelerated stability study that collects data over a period of a few weeks under high-stress conditions, such as elevated temperature, pH for liquids, and moisture for solids.
Once a model has been selected, it can be extrapolated over time to assess the overall level of stability. Multiple impurities contributing to degradation are modelled individually and evaluated collectively to identify the highest risk factors.
For moisture-driven impurities, packaging can be used to control degradation rates. A packaging model can be constructed that accounts for moisture permeation and adsorption (in both product and desiccants). A composite model (kinetic plus packaging) evaluates overall stability.
This JSL application is an interesting combination of physics, chemistry, model fitting, and statistical estimation. The live demonstration covers the workflow navigation, modelling moisture vapour transmission rates, kinetic model fitting and ranking, the use of diagnostic plots and prediction profilers, and the generation of dashboards to present final results.
Special emphasis is given to the methodology of model selection and the challenges that are unique to accelerated stability studies.

My name is David Burnham, and what I'd like to do in this talk is to look at how we can estimate the shelf-life of a product by modeling the degradation that occurs over time. Now, one way that we can make that assessment is by taking the product and put it in a climate controlled chamber. Now, if we do that, we may need to wait a period of months or even years to accumulate sufficient data that we can build our models. But what if we are introducing a new product to market which is intrinsically unstable? That's information we want to have sooner rather than later.
It's the role of an accelerated stability study to give us this data in just a period of a few weeks. We achieve this by stressing the product, for example, we can store it in high temperatures. I'm the author of a JSL application which is designed to model the data that comes out of these accelerated stability studies. The work has been a collaboration between Pega Analytics, and a couple of pharmaceutical companies. Pega Analytics is the company I formed 15 years ago now to primarily deliver JMP training, but also JSL services. What I want to do today in this talk is to look at the process of modeling degradation within pharmaceutical tablets.
Now, this degradation occurs because these tablets are not totally stable. We have impurities that evolve over time. I can use a kinetic model as a scientific description of that evolution. The impurities are being driven by thermal energy and presence of moisture within the tablet and within the environment. Let's take a look at the data that comes from this type of study. We have high temperatures and these are the storage temperatures and high relative humidity. Because of this acceleration, because we're stressing the product in a period of just two weeks, we have measurable quantities of impurity and sufficient variation in these numbers that I can build a model.
Let me show you what a model may look like. This model, you've got three components. You've got the temperature, the relative humidity, those are the storage conditions. Then you have a time component which is the evolution of the impurity over time. Once these data allow me to build the model, I can do two things. The first one is I want to dial down the storage conditions such that I'm at the normal unstressed condition. Then I want to extrapolate and try and anticipate what this product is going to do in the future. Let me just rescale the Y axis so you can see this. In this example, you can see that the time profile is nice and flat, so we have a nice stable product, and it's below that purple line, that 1% specification threshold for this product. Everything looks good for this kinetic model.
There is another type of model we can build, which is packaging model. Let's explain why we might want to do packaging. For this I want to have a product which is unstable. To illustrate it, I'm going to push the temperature back up to 50 degrees. What you see now is this impurity after about 50 or 60 days goes through the specification limit, so we no longer have a viable product. Imagine, I know in this instance I pushed the temperature to 50, but imagine this is a normal storage condition. We then have to think about what we can do. Well, we can introduce packaging. The role of packaging is to drive down the humidity.
As I push the humidity down, you see the profile for the time profile becomes flat. In fact, as I push it down, I also reduce the sensitivity to temperature. Here I've got a relative humidity of 20%, which I can achieve in a controlled environment when I do my packaging. I put the tablets into the bottle, seal the bottle at 20%. Now, I don't need a packaging model to do that, I just put the tablets in the bottle. But what happens over time is the humidity in the bottle increases. Now this is fascinating. You think the bottle is pretty solid, but moisture is permeating through the side of that bottle, and we can model that permeation. It's a really interesting piece of physics.
We can slow that down by adding desiccant, and we can model the effects of desiccant. With the models we can do what if analysis based on the quantity of desiccant that we put in. That's why we're interested in the packaging model. But it's very niche, it's very spacious, it's great fun, it's interest in physics and maths. But to keep this taught to broad audience, I'm going to focus on the kinetic models. Even if we're doing the packaging, the packaging model is no use unless we understand this dependency on the humidity. I'm going to focus my attention on these kinetic models. One of the things I also want to do is to address the challenges that occur when we build these models. Not so much the building, but model selection.
I think the model selection challenge is very unique to accelerated stability studies. The JSL application is going to help me do that. But before I look at the application itself, I want you to understand some of the mechanics of how we build these models natively using JMP functionality. I'm going to introduce to you what I think is probably the simplest model that I could build for these data. I have a formula here for the model. This formula is based on an Arrhenius equation. Effectively, we have an exponential term for temperature and a term for time, and we also have a component for relative humidity. We've extended the Arrhenius equation to include moisture. Now I can show you how this formula performs.
If I unhide this column, then this is what the formula is predicting for my impurity, and it's not doing a good job. It predicts 10%, the actual value is 2.5%. What's going on there? Well, if I look at the formula again, then you're going to be familiar with the idea that when you create a formula, it's an expression based on different column variables. But you may or may not be familiar with the idea that you can add parameters. I can add a parameter, give it a name, and give it a value. Now, this formula has got three parameters, and each of those parameters have got a number which has been arbitrarily assigned. That's not what I want to do. What I want to do is choose optimal values for those parameters.
I do that by fitting the model to my data. That process is achieved using the nonlinear platform in JMP. I'm going to take a closer look at the nonlinear platform. I first of all assign the model to my prediction formula, and the impurity to my response. This gives me a window where it says, click go to start. Before I click the go button, let me just explain what's going on. We have these parameters, and I've given initial values to them. What I want to do is optimize those values. When I click Go, it applies a Gauss-Newton algorithm, which seeks to choose values that minimize an error sum of squares. Similar concept to least squares regression. If you prefer, you can convert this into a root-mean-square error of about 2. That's my final model.
Next, I can ask the question, well, is that the best I can do? Well, it'd be nice to have a second model that I can compare against, and we can build much more complex models than the one I've shown you. Let me illustrate a more complex model, and it looks like this. It's more complex in the sense it has more parameters associated with it, and those are associated with a more complex expression. I'm not going to go into the details with expression. I want to focus on the mechanics of fitting this model. Because what you're going to find is if you click Go, it fails. I want to look at some of the reasons, some of the challenges of building these models. It's telling me to click Go again. Click Go again, it still fails. It's not really helping me.
What's happening? Well, the maximum iterations have been exceeded. The maximum is 60, so what does this mean? Well, it means that we are doing a gradient descent algorithm. I'm kind of sitting on a hill, and I'm walking down the hill and I want to get to the bottom, but I'm only allowed to take 60 steps. Well, I'm not quite sure why we're limited to that. As modern computers, there's no problem with doing more than 60 steps. Let's set the limit to 6,000. I want to reset these parameters. Let me just revert these back to the original, and then click Go again. It still fails, but we get a different error message. The message is saying that this convergence criteria is too strict, we want to relax it. Well, the convergence criteria is a very small number. I don't think I want 10 to the minus 15.
I want a small number, but I think 10 to the minus 7 is small. Let's try this, and I'm going to revert back again to my initial parameters and then click Go. Finally I've converged, so you can see it's a lot more work sometimes with a complex model. Now let's compare the two. I have an error sum of squares of 128 or root-mean-square error of 2.5 roughly, versus a root-mean-square error of 2, and an error sum of squares of 91. The model on the right-hand side is doing better than the one on the left-hand side. But at least we can compare different models. But we want to be careful when we're doing this because this model on the left-hand side is still not the best representation of that model. Let me show you something else that we can do.
I'm going to relaunch this analysis for the second model, and I'm not going to click the Go button. I said that this is a gradient descent algorithm, so we're walking down a hill, and we talked about the number of steps we can take, but it also depends where you start from. If you start in a different location, you end up walking in a different location. Where you start from is based on this value here. What I want to do is explore different starting conditions. JMP has a really nice feature which allows me to do a grid search for these starting conditions. I need to define this grid, and I'm not very good at typing and talking at the same time, so I'm going to type. You might hear me talking, but I'm just mumbling to myself, but I need to set this up.
It's going to be 1-10 in steps of 3. This is the large one, 1-100,000 in step 100 of those. 0-10 and let's have ten of those. All right, that's end of me mumbling. I'm talking to you again now. I've got this set up. If I click Go, JMP will evaluate all of those starting conditions. I've got 27,000 of them, let's take a look. Well, not take a look at all of them, but there you are, 27,000. It's taken for any row, I've got a combination of starting conditions for these parameters and it's calculated the error sum of squares. Let's now just sort this and sort this into ascend in order. You see the best error sum of squares is 145. Still doesn't compare well to this. But remember, we've not done the algorithm yet. This is just finding the starting condition.
These are the parameters I want to start at. I'm just going to copy those and I'm finished with that table, close that down. I can paste those into here. This is my starting condition. Now I'm going to click Go. Now I've converged, and let's see what this looks like. You see, we've got an error sum of squares of 1. Remember, this was 128 last time, and now it's down to 1. Compares incredibly favorable with 91. The root-mean-square error is 0.2 versus 2, so 10 times smaller. What I've tried to illustrate there are some of the challenges of fitting these models, but there's an important lesson that for a complex model, if you don't do a grid search for the initial conditions, you are very likely to end up with a suboptimal solution.
I can evaluate the different parameters for different models, and I can compare the different models. I might use the root-mean-square error, but I also want to penalize complex models, so I can calculate in software at least, the AIC statistic, which is a model comparison statistic. I'm going to now switch to the software, the JSL application that facilitates this model building and model comparison. Let me launch the software, looks like this. First thing I just need to do is to provide a new project name. Let me call it Berlin. I've already got a Berlin. Let's call it Berlin 2, and click Ok. Where you see these icons here, these represent a workflow. Well, first of all, let's just talk about the application more generally.
Think about what JMP does. JMP gives us the ability to fit models, to perform statistical analysis, to do visualization. When we're writing JSL scripts, all of that type of functionality is relatively easy because JMP is doing all the hard work for us. What this application seeks to do in addition to that is to provide a sense of workflow. We have some steps to do. These can't be done yet, they're locked. These two are available, and this is the recommended step. We provide a workflow, a sequence of tasks to perform, but we also persist information. Once something is set up, you don't have to do it a second time. If you fitted the models, you never have to fit them again.
You can come back to this tomorrow, the next month, next year. That's the idea of the project. It persists all the information over time. Let's just kick this off. The first step, the recommended step is to select a product. Really what I'm doing here is saying, have I got a liquid or a solid product? In the lingo, a liquid product is a large molecule product and a tablet would be a small molecule product. I think of it slightly differently, I think of it in modeling terms. A liquid is sensitive to PH, and a liquid might be a vaccine, for example, it will be sensitive to PH, whereas a tablet is going to be sensitive potentially to moisture. I'm going to take a tablet. I wish I didn't say I'm going to take a tablet. I'm going to start laughing at that.
The next thing I want to do is assign some data, and I need some data. This is a data table. It's got four impurities associated with it. Come back to the application. I can just assign these data, and it's going to ask me to confirm that I want to use these data. Yes, I do. It then tells me it's copied these data to the project and I don't actually need to keep the data table open. These data are persisted as part of the project, but we don't physically have to see the table. We can still do column mapping, so I can take these four impurities. These are my responses. These are the things I'm going to be fitting models to. For each of these impurities, I will fit a number of models, and then I want to choose the best model for each.
Let me just finish off the column mapping. Now, for each of those responses I can put in specifications. I'm going to put 1% in, so 1%. What we're saying is if the impurity level is below 1%, then we have a viable product. If we look at the evolution of impurity over time, the point at which we cross 1% level is the effective lifetime of the products. Now the final step which is to provide model options. There are eight different models we can fit to these data, and it's worth just explaining that. If you think about the time evolution of the impurities, increases over time, it could happen in a linear way or the rate of increase could slow down, or it could increase. We have this idea of decelerating and accelerating models.
There are different ways we can express those models. I've just lost my mouse. There we go. Let's come back to model options. These are the eight model types. These models contain expressions or terms for temperature and water activity. Water activity can be articulated as either relative humidity or absolute humidity. We end up with a total of 16 different model variants. For demo purposes, I'm just going to have eight model variants and only look at the ones that involve relative humidity. I'm almost done. There is one other thing. The data themselves are the stressed storage conditions. I have to provide the standard storage conditions. 25 degrees centigrade, 65% humidity.
That completes the tasks for setting up the project. Now I can go ahead and fit the kinetic models. By popular demand I can do an auto-fit and just click ok. It will fit all of the models, and it will do all of the model selection. But I want you to really understand the challenge of the model selection. Doing it automatically isn't going to give you any insight. Let's focus on doing it individually for one of these at least. For each of the impurities, we're going to go through, or I'll only do it for impurity one, but we go through the three-step process of finding the right model. The first step is just to fit the models. This will go through and fit each of these models in turn. When it finishes, it will present a dashboard which is just a confirmation that these models have been fitted.
The final step is going to be to choose one of those eight models. But it's convenient to have an intermediate step where we create some type of shortlist. The second step is we're creating that shortlist. The software facilitates this by giving you a traffic light system. The models in the green zone are the most probable models. The models in the yellow zone are less probable, and the ones in the red zone you can throw away. We can do a sanity check on this. If you look at the best model is a decelerating model. Well, if the impurities are decelerating over time, they're not accelerating. It makes sense that we can throw away the accelerating models.
The software is not using that logic, and maybe it should actually, maybe there should be that sort of intelligence. It's just using cold statistics. It's calculating an AIC statistic. This is a model comparison statistic which penalizes model complexity. JMP uses it in many different places. It's something that's used throughout JMP. It's not actually used in the nonlinear platform, but we can calculate it within software. I'm going to just focus on the two models in the green zone. I'm going to remove these. My short list is going to be these top two models, and now I can select my preferred model, or at least the software is going to give me more information to make that final selection.
I'm now just focusing on these two models. The statistics themselves don't help me at this stage because I can't really differentiate between the models in the green zone. If you look at something like an R squared or a root-mean-square error, those numbers are almost identical. What I can do is to just close those down so we're not distracted by that. Focus on some diagnostic plots. I can look at different diagnostic plots, and I can look at those for the two models that I've selected. One of the things I've tried to do is put in quality of life functionality in the software. One of the things I find useful is to be able to actually look at both of these side by side. Now I've got the first model in this list. The best model, if you like, is on the left-hand side and the other model on the right-hand side.
Let's just be clear what we're looking at, actual versus predicted. The actual data points versus the predicted data point. If we have a perfect model, then all of my data are going to be on that 45 degree red line. This gives you a sense of how good the models are doing at predicting the data, but also allows us to compare the two models. I hope the thing that strikes you here is the fact that these totally are the same. They're so similar. Every time I look at this I think I've made a bug in the software and I'm not actually showing you the different graphs, but they are different. This data point here is on the red line, and it's off the red line. They are different graphs, but you can see how similar they are. This is the challenge of model selection. It's very well illustrated here.
Let's try a different graph. Let's look at a normal probability plot of residuals. We hope the residuals will form a straight line. In this instance, the graph on the left is definitely better than the one on the right-hand side. The graph on the left is the one with best statistics as well. At this stage I'm thinking, take the model with the best statistics, right? You're probably not going to lose your job if you take the model with the best statistics and the best diagnostics, so that's fine. But I kind of need to also think about how I'm going to use these models. I can look at a prediction profiler. We know the predictions are the same, we've already seen that. These profilers are going to look the same, or at least they're going to look the same over the range of the data that we have. But that's not how we use these models.
Let me first of all explain the prediction profiler because you're probably expecting to see the temperature and the relative humidity. Those have been delegated to drop down lists. We're just focusing on the time evolution. But what do I want to do? I want to dial back the temperature to the normal storage conditions, 25 degrees centigrade and 65% humidity. Then what I want to do is extend the time component on these. Again, another quality of life. I can just extrapolate these graphs. Now you see a dramatic difference. When we extrapolate these models, both models describe a deceleration, but the deceleration is much more aggressive on the left-hand side.
Let's look at the consequences of this. Let's take the model on the right-hand side first of all. This is the point at which the model crosses the line. The line being the specification threshold at this point. But that's not the point of interest. When we model lifetime, we actually look at the upper confidence bound and where that crosses. That's the point we're interested in. Just over 300 days. On the left hand side, it's more like just under 600 days. Now those numbers are really important and they are calculated in the software, they're here. But I wanted you to understand where they come from and how absolutely important they are in terms of model selection. Because often we find that these are the only meaningful statistics that differ between the different models. What am I going to do? Well, let's think about why we're doing the modeling.
I'm not trying to tell you what the shelf-life is for this product. If you want to know that, stick it on a shelf in a climate controlled chamber for two years and see what happens. What we're trying to do is do some type of risk assessment to look at the potential for this product to fail earlier rather than later. And this is telling me, there is the potential it's going to fail after 300 days. It might not. The model might be correct on the left-hand side. It might have a lifetime of 600 days. But I want to take a risk averse approach to this. What I'm probably going to do is select the second model, and I can say why. I can say I'm taking a risk based approach based on the models in the green zone and it gives me some text.
If you're not happy with that, you could say it was based on scientific judgment or something else. You could just say I like it. But we can put some type of explanation in, or maybe we want to take the model on the left-hand side because it's got the best statistics. Ultimately we can choose. But I think this is the rational choice, risk based assessment. I'm going to take that, and that has completed my model selection for this impurity. This sequence we can perform for all of the impurities, and if I show you a summary, you can see we've completed those three steps impurity 1, I can do the same for the other three. I can do it manually, or I could come to auto-fit and those three, I could now auto-fit.
If I do this, we're going to be sitting here staring at the screen for a few minutes. Instead of that, let me show you how I can just switch to a different project. I choose another project, and this is the same data, the only difference, all four impurities have been auto-fitted. I just want you to see what the output looks like. If you had clicked the auto-fit button, you'd be presented with output that looks like this. What it will show you is that first of all, Impurity 1 has got an amber button. There's a lot of traffic light systems in the software. All the others have got green. What is the amber telling me? Well, this is telling me that there was some sort of ambiguity in the model selection because I had those two models I could choose, whereas for the others there's no ambiguity.
With the others, I've just chosen the model with the smallest statistic, the best statistic. Whereas this one, I've done it based on that CI breach duration. These durations are being shown. This is the 311, that's the number we saw earlier. If I click on the icon, well, if I just hover, it's going to tell me that the estimate was somewhere from 311 to 569. If I click on there, it will come and show me the traffic light system and show me it's chosen the second model because it had the worst case estimate for this number. That's not always the case. Here, the shelf-life is really poor. We've got a green icon, and the green icon is telling me that we have a very consistent range of estimates here. I've got four models in the green zone, but they all predict pretty much the same number. Because of that, we can therefore just choose the best model based on the statistics.
There are different decisions based on how consistent the models behave compared to each other. Ultimately we've gone through the process of fitting all the models and choosing the appropriate model for each of these impurities. These are impurities for a single product, and we can kind of then do analysis. I could, for example, just summarize those models in dashboard format, and we could look at the different models that we have. We can test out different storage conditions here. That's one type of analysis. The other type of thing we could do is drive this kinetic model through a storage model, and we could look at different types of containers that we can put the product in.
Let me try and summarize what I've covered here. If I come back, we started off with some data that comes from an accelerated stability study, which is stressing the product. The product I've been talking about is a pharmaceutical product, pharmaceutical tablets. I can model the evolution of impurities. I can build these kinetic models. We've looked at the mechanics of how we generate these models using the nonlinear platform, and I've tried to show you some of the challenges of using that platform. Then I've worked through how we can use the JSL application to automate the model fitting and to facilitate the model selection process. You can go beyond and think about packaging products as well. That wraps up my presentation for you today. Thank you very much.
Presenter
Skill level
- Beginner
- Intermediate
- Advanced