Reduce Cost and Avoid Nonconformities Doing Smart Shelf Life Calculations in JMP

Companies in the pharmaceutical industry must demonstrate shell life by measuring product performance over time at storage temperatures. To accelerate the test, it is often also done at elevated temperatures. Although Arrhenius demonstrated in 1889 how to combine results at different temperatures in to one model, many companies still analyze each temperature separately. It is not cost-efficient to stratify data into different models.

In addition, ongoing verification of shelf life must be performed, where it is often enforced that all individual observations are inside specifications. Due to measurement noise, this criterion is often not met. Instead, it should be enforced that measurements are inside prediction intervals from the initial shelf life study, which is a weaker requirement.

JMP has the Arrhenius Equation in the Degradation/Non-linear Path/Constant Rate platform. However, this platform lacks some of the excellent features of the Fit Model platform, such as studentized residuals plot, Box-Cox transformation, random factors, and prediction intervals.

This presentation demonstrates how the Arrhenius Equation can be entered into the Fit Least Squares Platform by making a Taylor expansion with only four terms, as well as how a JMP workflow can ease the calculations.

Thank you for giving me this opportunity to talk about how we, as JMP partners, help our clients reducing costs and especially avoiding non-conformities during smart shelf-life calculations and verification of shelf-life.

First, I will describe some of the issues we have seen at our clients with shelf-life studies, and especially the enforcement in ongoing verification. Looking at the shelf-life studies first, where you set the shelf-life, we can see that it's based on a number of batches, typically 3-4, where you measure the change over time. Of course, the end level depends on the starting point and the slope you see. Hopefully, the slope is the same for all batches.

But we often see that batches start at a different level. This can be a challenge because future batches, when you do ongoing verification, they might start at a different level. If they start lower, you might have an issue that you are out of spec when you read shelf-life.

What we recommend to do there is actually convert the shelf-life to a requirement at batch release, meaning what should the value be at time 0 to ensure you will be inside specification at shelf-life. I'll come back to that, of course.

We also very often see that linear regression is done on an absolute scale, but degradation is relative. If degradation is large, you're actually not describing the degradation rate. The solution is simple. Just take Ln to your data before you make the regression.

I'm sorry, Pierre, to interrupt. I can see a button saying, Zoom is sharing your screen.

I think it's gone now.

Is there a way you can hide your taskbar? Because it's quite large. It's like a double taskbar at the bottom.

Okay, let's try something else. I think maybe if I do like this, is this better?

Much better, yes.

Because now I'm showing in presentation mode, so maybe I should show it in presentation mode instead.

Yes, that looks good.

I will just try before we go on just to see, because then when I go to share a JMP, I will do it like this. Then I'm afraid you will see the taskbar when I'm working on JMP.

That's fine, yeah, I guess.

Should we do it that way?

Let's do it like that.

When I do a PowerPoint, I will do it in presentation mode. I think that works better.

Absolutely. Perfect. Thank you.

I guess we just start all over.

Start over again. Sorry about that.

It's about two minutes, so not a big disaster.

Thank you for giving me this opportunity to talk about how we, as JMP partners at NNE, help our clients making better shelf-life studies and especially minimize the number of non-conformities during verification.

First, I will describe a little bit the issues we see with shelf-life studies and especially the ongoing verification. We are going to enforce it's still valid, the shelf-life you stated. This is done at regular intervals testing some batches. If you look at the first study that is done, that is the shelf-life estimation, where you set your shelf-life, and they would typically take a set of batches, 3-4, and have them to decay over time. You measure these four batches over time and see how much they decay, and then you can calculate how long a shelf-life you had.

But the level you have over time, of course, depends on the slope, which is the main purpose, the decay. In a shelf-life study, but it also, of course, depends on where you start. Since you only have three and four batches in your shelf-life estimation study, and you're going to verify it on other batches going forward, if they start lower than the batches you had in your shelf-life study, you might have a problem in the verification, even though it doesn't decay more just because it starts lower.

Our solution to that is actually to convert the shelf-life equation to a requirement time saver start value, simply a release limit, which, of course, then should be better than what you require at shelf-life, so there's a room for a change. Then you're not sensitive to future batches because you will ensure they start high enough, so to say. We also often see that people are doing the regression over time on an absolute scale, but degradation is relative. We strongly recommend to take Ln to your data before you make the regression.

We also see many companies having big issues with measurement reproducibility, and shelf-life is the most difficult measurement situation you have because for obvious reason, you have to measure the batches at very different time points, typically years between. Of course, everything changes, and if you don't have a very stable measurement system, you get a lot of noise on your regression curve. You can actually reduce that by entering your time point as a random factor in the model.

We also very often see companies doing shelf-life at many different temperatures, which is a good idea because then you can accelerate the test. But for some reason, typically, temperatures are modeled on their own. It's very rare that we actually see people modeling it across temperatures. We get one model describing all temperatures. Of course, we strongly recommend that we model across temperatures because then you have more degrees to feed them to estimate your residuals.

In the same area, we also see that when people are going to model temperatures, each temperature on its own, of course, you need time 0 measurements at all the different temperatures. But then it's actually the same measurement you're using because at time 0, it's just the start value. Then you have to be careful when you go into being the model across temperatures because then you shouldn't have the same observation there four times, for example, if you have false temperatures.

It's very important modeling across temperatures that you only have a single registration at one temperature. Then it doesn't really matter which one because it's at time 0. This is the issues we see when people are setting shelf-life. This is a little bit the solutions we recommend to it.

Then when you're doing the shelf-life verification, meaning now you have stated that you have a certain shelf-life, then you have regular intervals to prove this shelf-life is still valid, and there is an ICH guidance for that. That's also included in JMP. You'll see that in a minute. But what you do there, you make these typical three to four batches, and then you take the confidence limit on the slope for the worst batch and having that to state what is your shelf-life.

But this is a little bit challenging because then you are assuming that you have seen the first batch among the first three, which is typically not the case. This is actually the reason why we do not recommend to use the ICH method because it often leads to problem in verification.

We also often see that people get a too pass optimistic estimation of the slope standard error because they are assuming that it's independent observations, and then number of degrees of freedom is just n-1. However, very often many measurements are done in the same analytical one. If you have analytical one to one differences, it is not independent measurements, and you need to correct for that by making the effective degrees of freedom. Just by putting in the measurement date as a random factor, you won't get that.

Then you will typically get a little bit bigger standard error, which might be seen as a problem. But actually what happens is that it increases the requirement for start value, and thereby you are minimizing the risk that you fail in verification.

Last but definitely not least, we still see many companies saying that at verification, all measurements should be inside specification. But if you do have a measurement issue, and always you do have a measurement issue, then you can actually be outside specification at shelf-life due to measurement issues. What we recommend there is actually build a proper model with the right degrees of freedom. For this model, actually on the same data as you used to set the shelf-life, and based on this, make a prediction interval where you can expect future observations to be. This interval will typically be slightly wider than your specification interval and thereby minimizing the risk of failing.

I will now shortly describe the formulas and the platforms and where to find them in JMP, but I will do this quite rapidly because I'm sure you can get access to the formulas in the presentation material afterwards, and I think it's more interesting to get into demonstrating it in JMP. But let's first start with this release limit thing.

How do you convert your slope and your standard on slope into what should the release limit be at start? Let's say that we have something that drops over time, and we have a Lower Specification Limit. Then you simply, and we're taking that from the VSO guidance for stability, evaluation, and max science. They simply just take the Lower Specification Limit, then they subtract the estimated slope, time to shelf-life, so how much does it drop on the shelf-life. Then you also need to add some uncertainty. It's coming from this standard error on the slope, and there's also a measurement standard deviation on your starting value.

We're actually just using this formula besides that we are converting the normal quantile to a t-quantile because it's an unknown standard deviation. As an estimate of the measurement repeatability, we're actually using the IMC from the model. But besides that, it's exactly as described in the VSO guidance. Of course, when you build your model, you'll get the slope from your modeling JMP, you will get your effective degrees from the modeling JMP based on the equation down here, you will get the standard error on the slope, and you will get your residual error.

Based on that, you can just feed this into this formula, and you do have your Lower Release Limit. If it turns out that one of the bottlenecks is actually the measurement noise at the start value, you can just make several measurements at start and take the average of these, and thereby you can suppress this by the square root of N. But of course, these measurements has to be taken in difficult analytical-wise.

Next, when we're going to model the weight versus time, you can just go to Fit model. Of course, there you can make your regression. There we strongly recommend that you take a lock to your result, because then with a constant rate reaction, that will be linear proportional to the time. Of course, if you have small decays, then you can also just make it without the lock. But why bother about if the case is small or large? Just take a lock and it works no matter the size of the decay.

If you want to model across temperatures, you have to use the Arrhenius equation, where you can actually describe decays at different temperatures using an activation energy. If you go to the degradation platform with a nonlinear path, this can nicely be described in JMP, and there you can actually build a model across all temperatures, that's actually pretty easy to do if you find this platform. It's the right, I assume, but you better find it, and I will demonstrate it in a minute in JMP.

Based on that, you will get your model coefficient. You have an intercept, you get a slope, you get an activation energy, and they even come with standard errors and some covariance. If you put these numbers together, you can actually calculate the slope at each temperature, and you can calculate the standard error on slope at each temperature. Then you can feed this into the Lower Release Limit, and you will know what that should be.

All these model parameters, and the standard errors, and the correlation, you simply get from JMP. But you have to put it into this equation I've shown up here. It's actually not as straightforward as it might look.

For that reason, we actually recommend to make a Taylor expansion of the Arrhenius equation, because if you make a Taylor expansion, you can fit it a bit of polynomial. Normally, we see that up to third order is sufficient, and that requires four different temperatures. The great thing about going into Fit model where you can do it, when you have done the Taylor expansion is there, you can put in random factors. As I mentioned previously, you actually need to put in measurement time as a random factor. But often you also would like to have the batch as a random factor because we would like to predict what's going on in future batches, not just those you use for the study. Of course, you also get better model diagnostic too.

If you scale your Arrhenius temperature properly, so it 0 at the time of interest. All these terms here goes out because they are 0. Then it's very easy to get the slope and the standard error because that's just the coefficient and standard error in front of the time parameter in your model. It's much easier than what I showed on the previous slide.

Then you can also put in on top of the third order data expansion, you can also put in batch multiplied by time, interaction between batch and time, to see if there should be a batch dependent slope. Hopefully, that's not the case, but it can be the case, or even worse, we also put in a Arrhenius temperature time, times batch to see if the activation energy should be batch dependent. That's rarely happened, and of course, shouldn't happen. But I think it's nice to check before we make the assumptions.

Now, let's get into JMP to see how does this work in JMP. I will now shift to JMP. Here I have a case where I have studied some batches at different temperatures and at different times. Let's first look at the result. Here you see the result, of course, with an Ln on. It's supposed to be linear at three different temperatures, 15, 25, 30, and 40, from 0-36 months.

As you can see, as expected, the higher the temperature, the higher the slope. This can easily be described by the Arrhenius equation. Each batch is measured in duplicate at each time point. You can also see the typical case that we have some measurement variation from day to day. For example, you can see all the measurements made at three months, they are above the regression equation, indicating at that day we measured too high, and for example, at month nine, they're typically too low. Clearly, this curve is contaminated with... Or these numbers are contaminated with your measurement noise. This is quite typical because it's measured at very different time points, and they can easily be that you measure higher on some days than others. You will see what influence that makes in a minute.

Let's start doing some modeling. You can actually go into the degradation analysis in JMP. There, you cannot combine the temperatures, but you can do it by temperature. This is actually following this ICH guidance. I just opened it here for 15 degrees. It works the way that you, with a significance level of 0.25, look for, can you assume common slope and common intercept. If the P-value is below 25, you have to use separate intercept and separate slope. You can, for example, see here for the 15 degrees, following the ICH guidance with this significance criteria, you can assume common slope, but you will have different intercepts.

Then you actually with these different intercept common slope, you are making a common interval on each regression line, and then you take the word batch, which in this case is batch B. Then you are saying where this catches the spec limit, the lower spec limit, Ln scale, this is your shelf-life, in this case, 55 months.

It's very easy to do, but there are some problems with this method. The first thing is that this significance level of 0.25, of course, give a high risk of getting false significance on that you need to have different slopes, but it might not be needed. Even worse, you're just looking at the worst of the first three batches. I mean, it's not very probable that the worst batch you're ever going to make is among the first three. This can really give you some serious issues later on the ongoing verification. Even though it's easy, it's not what we recommend to do.

Of course, you can make exactly the same models in Fit model, that's what I'm showing here. Again, first by temperature, and later we will combine it. There you can see you can put in time, batch and time times batch, in this case for 15 degrees. Time times batch will be the term that can take into consideration that slope might be batch dependent, so cannot want common slope. But you can see it has a high P-value, but I don't like to use P-values because they are so sensitive to sample size and signal to noise ratio and so on. I prefer to use information criteria, which are more robust towards across different sample sizes and noise levels and so on. I prefer the Akaike information criteria, which is minus log likelihood, so the lower the better.

If I take time times batch out, it's 188-... It actually drops to -193, meaning it's a better model. Now I have justified that I can have common slope. Then I can go to batch number. It has a more borderline P-value, still in the high end, but I'm not using the P-value, I'm using the information criteria. If I take it out, it drops from 100 -3 to -195. Still dropping. In this way, I have also justified, I can use common intercept.

However, as I mentioned before, be careful here because these numbers are not independent. I'm not telling JMP that for now. You can do that better by going to Fit model and do the same model. The only difference, now I'm putting in measurement day as a random factor. Now I can correct for that these measurements are grouped, that numbers on the same day comes from the same analytical one. Now you can see that you get different P-values. Again, looking at the information criteria, you can see batch number times time -207, It drops to -239. Better model, justified common slope. But see now what happens when I take out my batch number. It's the same as before, just adding measurement day. It's -239. Now it actually increases to -236. I shouldn't take that one out, meaning I cannot assume common intercept.

It's a good example why you need to put in measurement day, because otherwise you could be fooled by numbers are not independent. If you want to model across temperatures, as I mentioned previously, it's fairly easy to do. Just go to the degradation data analysis and put in this nonlinear part. Actually, you have the Arrhenius equation built in, and you will get across the four temperatures a common intercept, a common slope, and a common activation energy.

However, I've just shown that, yeah, the common slope is fine, but the common intercept is questionable. We need actually separate intercept common slope, and you cannot really do this here. What you can do is that you can go in and say, I want separate parameters for all batches. But then you get separate intercepts, you get separate slopes, and in this case, you get common activation energy, which I think makes sense.

But you cannot really here have the combination of common slope and separate intercepts. But there's a solution in JMP. If you go to the nonlinear platform, you can build your own fitting equation. There you can actually, with the Arrhenius, have, as you can see here, a common activation energy, a common slope, but separate intercept. So it can be done.

But the challenge here is that you cannot put in random factors. You're having a hard time correcting for, it is not independent measurements. You cannot put batch either in as a random factor. You have a hard time making a model describing batches in general, which is typically what you need.

For that, we actually like to go to Fit model and put in this Taylor expansion of the Arrhenius equation, as you're seeing here. To first, second, and third order, of course, putting in the batch number for different intercept, put in batch number times time to be able to handle that you might have a different slope, hopefully not. Even worse, you can also put in Arrhenius time times batch number to correct for you might even have batch dependent activation energy, which could be strange.

But looking at the AICc, you will not take this time out first, and you see it -730. How lucky can you be? It drops to -67 as it falls. It's a better model. I have justified now that I can have a common activation energy. The same batch number times time, it has a borderline P-value. But again, looking at the Akaike, you can actually see it's still dropping. I can actually justify now that I also can have a common slope, which is, of course, a great thing. This you can also do in this model.

As you can see down here, I have actually put in the Measurement Day as a random effect because this you can easily do in Fit model. You cannot do that in the degradation platform.

Hopefully, you have seen here that there are many different ways of calculating the slopes. I've tried here to see what difference does it make for your Lower Release Limit. If you're running this one here, this small script, you can actually see here there are many different... I could do it by temperature, by temperature with random time. This degradation platform would come in everything, individual everything, the nonlinear and so on, the Taylor without random time and Taylor with random time.

Over here, I just type in the slopes and standard or slopes we get from these models. Here you can actually see what is the Lower Release Limit if you only make one measurement batch release. You can actually see the method we recommend, which is Taylor expansion with random time, gives one of the highest release limits. Of course, when you have a higher release limit, there's a lower risk that you will later on have issues in ongoing verification. There we would require that all batches should start about 10.09. Otherwise, we cannot be sure they still work at shelf-life.

You can see if we take the random time and do it on 15 degrees alone, this is where we have the requirement is a 15 degrees not combining, you get an even higher release limit. But that's because you get fewer degrees of freedom by doing a separate model. We can actually get a Lower Release Limit, which you can still rely on by building a model across temperatures. As you can see to the right, if you take two measurements at start, batch release to suppress measurement noise, you can reduce it further. If you take 10, you can go even further down. How many measurements you would like to do in start?

Batch release, of course, depend on your measurement noise and it's distributed in the bottleneck. But it really makes a difference which method you're using. If you're not correcting for it's not independent measurements, then you can easily get a too Low Release Limit, which will then give you issues later on in ongoing verification.

If you want to describe batches in general, for example, to setting the shelf-life, then you also need to add batches as a random factor. Now I'm putting both batch and Measurement Day as random factors. Now I'm not only describing the three batches I used to make my study, I'm describing batches in general. Then you can just go down to the Profiler. I put it at Arrhenius temperature 40.272 that corresponds to 15 degrees Celsius. You can see here's the state of shelf-life. They would like to prove that they have 24 months. If you look at the general line for all batches with confidence, it nicely stays inside its limit. This company has no problem at all proving that they have a shelf-life of at least 24 months at 15 degrees.

You can see I'm running 5% one-sided Alpha because it's only a problem to be to one side. However, if I want to predict where could individual measurements be, because this is only showing where the two line is, then I can actually do the same model again I can exact the same model. I just changed my Alpha to 0.135% one-sided, like what's inside +-3 Sigma to describe everything. Then down here, you still see the same. Of course, it gets a little bit wider by taking my Alpha. Then I would like to show the predictive limits down here.

Unfortunately, you cannot show predictive limits on a profiler in version 17. I'll just shortly go to the 18 early adapter. It's not released yet, but it will come. Running the same model in there, because the great thing in version 18 is that you can see both confidence limits on a profiler, that's the dark way. But you can also see prediction limits or individual confidence limits. This is where you can expect individual observations to be with this shelf-life you have of these batches. You can see you can expect to have values slightly below lower specification.

For ongoing verification, we recommend actually to set the requirement that there should be inside the prediction interval, which is slightly wider than the specification limit. This way you can also avoid non-conformities due to measurement issues.

Hopefully you have seen that there are a lot of pitfalls in doing shelf-life and verification, but JMP has a good toolbox to work around these pitfalls and actually to do it right. To conclude, I will just go back to my presentation and go to issues I started with, just show them again, because now we have been through the matters. When we do shelf-life at clients, we strongly recommend to convert it to a release limit because then you are sure that all batches you're going to make in the future will live up to the shelf-life, because you are putting on a requirement where they start.

Of course, do all models on Ln data. It's just put lock on in the model dialog window, and you get the regression that is supposed to be linear. End the time as a random factor in the model. Then you can correct for that you're probably not having the same measurement level at all days. Of course, build a model across temperatures with this Taylor expansion of the Arrhenius equation that's easy to do. Then remember, of course, not to have multiple registrations. Time 0 point should only be entered at one temperature.

Then, when you're going to the ongoing shelf-life verification, we do not recommend the ICH method because it actually assumes that you see the worst among the first three, which is probably not the case. It's also very important when you calculate the release limit that you get the right standard of your slope, you get the right degrees of freedom. When you do not have an independent measurement, it's not just n-1. You really need to put in typically the analytical one as a random factor.

Then last, but definitely not least, please put a specification on your ongoing verification. They should conform with the prediction interval you made on the batches used to set the shelf-life. Thank you for your attention. Hopefully, you got inspired how you can do shelf-life in a very good way using JMP. Thank you very much.

Author

Per Vase

Reduce Cost and Avoid Nonconformities Doing Smart Shelf Life Calculations in JMP

Author

Files