Mixed model stability analysis in JMP

MikeD_Anderson · May 29, 2024 10:00 AM

Introduction

Stability analysis has turned into a hot topic recently. I’m honestly not sure why, but the number of emails I’ve seen come through my inbox is undeniable. A common thread around stability analysis questions is the question of Random Batch Stability Analysis. The questions tend to go something along these lines: “Can we study a product’s stability (or shelf life) in JMP and consider the effect of multiple, randomly selected batches in the study?”

The short answer is yes, we can, with a little help from…math.

via GIPHY

But fear not! The math in question might look scary, but it’s not. And I was nice and made it into an add-in to help. So, if you want, you can just skip to the end where I show you the add-in and continue with the knowledge that the math we are using is documented here. Now that’s a happy thought, isn’t it?

So let’s give our math anxiety a cookie and shoo it off to bed while we have a look at how to deal with this very interesting problem.

Why is this necessary?

Let’s start out by considering what we’re saying. Stability, or shelf-life analysis, basically is a hedge against the tendency of organic molecules (like medicines or foods) to degrade over time. It deals with the question: “How much degradation can I accept before the product loses the desired level of efficacy or becomes unsellable?” (Side note: I’m going to be framing this in the context of pharma development, but the concepts apply to many different domains.) The challenge comes from the idea that we generally think we must approach this problem differently in different phases of a product’s development. (The truth is a little more nuanced than that, but let’s use it as working assumption for the moment.)

When a product is new, we may not have many lots or batches of material, so we must take several samples from the same lot and track them over time to determine if the initial guess at a product’s stability is valid. As we move a product from development to production, we have more opportunities to sample from different batches. This is great, because it is a more realistic picture of the material's behavior but introduces complexity in the form of lot-to-lot or batch-to-batch variation. As a result, we need to have different methods for handling the problem at different phases of the product life cycle.

For the first case, we have a tool (currently residing in the degradation platform) that follows the ICH guidelines (link) for a single batch stability study.

For the second case, that batch-to-batch variation means we must use a special type of model called a mixed model in JMP. If that sounds scary, take a deep breath, it’ll be okay. Mixed models are probably the most intuitive model type (conceptually) that you’ll run into. The math is a little hairy, but conceptually, humans are hardwired to approach the world in a way that’s close to mixed models.

The mixed model for batch stability

Let’s start with a simple example: If we had 10 bags of cookies from the same production facility, but from different points in time, we would expect the cookies to have consistent properties within some allowance for variation. That last bit is where the mixed model stuff comes in. If we didn’t care about the causes of that variation around the general trend, we could just lump them together and expect some cookies are going to be a little superior to others. However, if we wanted to study those sources of variation (e.g., which oven was used, ingredient vendors, etc.), we could include them in the model as fixed effects. In the case of the bags of cookies themselves, we don't care about the 10 specific bags we pulled, but we do want to use them to estimate the variability of ALL bags in production, so we employ a model component for this random effect.

Let's switch to a biotech example. If we have some material or drug made in batches and the feed stocks for those batches have variation, we expect that there is going to be some small batch-to-batch variation to contend with.

In traditional models, we lump that variation into the error term for the model (ϵ):

With mixed models, we try to figure out the contributions of these random effects to better understand where the variability is sourced. You may have seen these ideas before as they show up in designed experiments as random factors that use the REML (restricted maximum likelihood) fitting method; defining the variability incorrectly can lead to inaccurate statistical tests, and as a result, poor business decisions.

Shelf life and mixed models

So what does this all have to do with a shelf-life calculation? Well, to determine a shelf life, we collect data either on the degradation of the active ingredient in a sample or increase in impurities over time and use those to develop a simple linear model. (Side note: there is another set of problems in this space when looking at nonlinear models, but we’re not going there this time). We then use the linear model to extrapolate when the parameter of interest crosses a predetermined spec limit, e.g., when we have too little active ingredient or too much impurity.

For that single batch case where we monitor an active ingredient, we would just pick the worst performing sample from the batch and use the lower confidence interval on its line of fit to set the shelf life. For those not familiar with the ICH guidelines, there are some additional statistical checks built into the protocol around similarity of samples that I’m not discussing here.

For the multiple batch case, we are going to do something similar. But we’re going to use a mixed model to track the variation between batches and work with a single model for the whole data set. We’re going to assume batch-to-batch variability is generally normally distributed around some average behavior. If we use that assumption, we can use some offsets to access the equation we need to calculate the shelf-life while considering the batch-to-batch variation. Let’s get to it (or if you want to skip the math, you can just move on past this next bit – your call).

WARNING! HERE THERE BE...MATH!!

Building the decision line

We’re going to build the decision line by first running a mixed model in JMP Pro. Mixed models are one of the available personalities in Fit Model (Analyze > Fit Model).

Once we have that set, the model effects area will change to have three tabs. We’re only concerned with the first two. We’ll start the set up by adding the variable of interest to the Y field. Next, we’ll add the time variable to the Fixed Effects tab, just like in traditional models you've probably built before.

Last, we’ll go to the Random Effects tab and add the batch or lot variable into the random effects. By doing so, we're carving off that batch-to-batch variability by introducing a random intercept. We’re also going to consider the possibility that the batches may degrade at different rates over time (i.e., have different slopes for the time measure), so we'll allow for random slopes. To account for this, we are going to select the time and batch variables in the Columns List and then click Cross in the Random Effects tab. The last thing we want to do is set the alpha level for the model, which is set under the red triangle menu. If you wanted a 75 percentile, you’d set an alpha of 0.25. When you’re done (if you’re following along), it should look like this:

Then we click Run. The last thing we’ll need to do before we switch into “plug-and-chug” mode is go to the red triangle menu and select LRT > Covariance and Correlation Matrices > Covariances for all parameters .

The good news Is that we are now done with the “heavy calculations” and all the values we need are in the mixed model report. We can extract them to create the decision line. The decision line is just a confidence interval for the n-quantile line for the mean model (i.e., the 75 percentile line based on the alpha level set in the model set-up dialog). So, we start with the mean model from the fit:

The β estimates are found in the Estimates column here in the report:

We then modify that line with a shift to account for the batch-to-batch variability represented by the random effects, as well as the quantile we want to base the shelf life on. We’ll do this in two steps so you can see where everything comes from:

First, let’s take the mean line and expand the error term (ϵ):

Note that all we’ve done is pull a couple of terms ( ) out of ϵ in that very first equation. The first new term () is the random intercept and the second term () accounts for the random slopes. A little note on the notation I’m using here: fixed effects (things we can control) will use Greek letters (e.g., ) and random effects will use Latin letters (e.g., ).

Step 2 is to offset the mean line by the contributions of each source of variability (the terms in the equation below) and scale that by the z-score for the desired quantile (). It makes the equation look a little more complex, but it’s all just estimates that you look up in the report (locations can be seen in the screenshot below) or that JMP can calculate for you:

We find those sigma-squared terms () in the Estimate column of the variance components section of the report:

Last, we need to develop a confidence interval on that line. It looks a little hairy when it’s laid out in notation:

variance terms:

via GIPHY

But all those partial derivatives break down into either constants or simple linear terms. Each sigma () or psi () is just a variance or covariance found in the covariance matrix we turned on in the report. By substituting and consolidating terms, the variance terms look something like this:

So, when you add all that up and put it back into , you get what basically amounts to a simple polynomial with an obnoxious number of constants.

That’s irritating if you want to try and do this longhand but handy if you want to, say, code up the analysis. All of this can be done piecewise and just added up at the end.

Building it all in JMP

If you went through the math, welcome back!

The good news, as I’ve said a few times now, is that every number needed for making the decision line for a given study is available in the JMP mixed model reports. It’s just a matter of pulling them out, plugging them in, and looking at where the decision line crosses the spec limit of interest. You could do that longhand, but like I said way at the beginning, there’s an add-in for it now. You can find it here: Random Batch Shelf Life / Stability Analysis. I’ve left the code unencrypted so that you can poke around and see what I’m doing in there.