Choose Language Hide Translation Bar
Level I

Predicting Molecule Stability in Biopharmaceutical Products with a Rapid Short-Term Study Using JMP (2022-EU-30MP-1005)

Paul Taylor, Senior Scientist, GSK
David Hilton, Investigator, GSK


In the journey of delivering new medicines to patients, the new molecular entity must demonstrate that it is safe, efficacious, and stable over prolonged periods of time under storage. Long-term stability studies are designed to gather data (potentially up to 60 months) to accurately predict molecule stability.

Experimentally, long-term stability studies are time consuming and resource-intensive, affecting the timelines of new medicines progressing through the development pipeline. In the small molecule space, high-temperature accelerated stability studies have been designed to accurately predict long-term stability within a shorter time frame. This approach has gained popularity in the industry but its adoption within the large molecule space, such as monoclonal antibodies (mAbs), remains in its infancy.

To enable scientists to design, plan, and execute accelerated stability models (ASM) employing design of experiments for biopharmaceutical products, a JMP add-in has been developed. It permits users to predict mAb stability in shorter experimental studies (in two to four weeks) using the prediction profiler and bootstrapping techniques based on improved kinetic models. At GSK, Biopharm Process Research (BPR) deploys ASM for its early development formulation and purification stability studies.



Hi, there. I'm Paul Taylor.

I'm David Hilton.

Yeah, we'll be talking to you about predicting molecule stability

in biopharm aceutical products using JMP.

We'll dive straight into it.

Oh, yes.

One thing to mention is just a little shout- out

to this paper that's been published, so Don Clancy of GSK,

which most of the work has been based on.

Just to introduce what we do.

We're part of biopharm process research,

and we're based in Stevenage in UK in our R& D headquarters.

We're the bridge between

discovery and the CMC phase, so the chemistry, manufacturing, control.

That would be the stepway

into the clinical trials and also the actual release of the medicine.

We have three main aspects.

We look at the cells.

We are looking towards developing a commercial cell line,

the molecule itself by expressing developable, innovative molecules

and the process in terms of de-risking the manufacturing and processing

purification aspects for our manufacturing facility.

Just an overview is to see

why do we need to study the stability of antibody formulations?

How do we assess the product formulation stability?

The overview of the ASM JMP add-in that we've got,

and the value of using such modeling approaches with the case studies.

Just to reiterate about biotherapeutics,

so these are all drug molecules that are protein- based.

The most common you'll probably know are all these vaccines from COVID,

so they're all based on antibodies and other proteins.

These are very fragile molecules.

The stability can be influenced by a variety of factors

during the manufacture, transportation, and storage of them.

Factors such as the temperature,

the pH could cause degradation of the protein.

The concentration of the protein itself can have a significant impact.

The salts to use, the salt type, even the salt concentration,

and even subject to light and a little bit of shear

can have an effect.

What's that cause is the aggregation of the protein fragmentation.

It can cause changes in the charge profiles,

which can then affect the binding and potency of the molecule.

That could be caused by isomerization, oxidation, and a lot more.

They're fragile little things, but also we need to keep an eye

on the stability to make sure they are safe and efficacious.

One way of looking at the stability is by subjecting to a number

of accelerated temperatures and taking various time points.

These long term stability studies can go up into five years.

It can take up to 60 months.

These will be more real time data.

The procedure is extremely resource intensive.

At each time point,

we can use a variety of analytics such as HPLC, mass spectrometry,

so essentially separating the impurities away from the main product itself,

the ones that could cause the problems

and then quantifying free mass spectrometry,

like scattering or just UV profiles itself.

We can separate by charge, size, or [inaudible 00:03:21].

Within that five years,

when we're gathering that data, a lot can happen in those five years.

We could have better developments in the manufacturing or the formulation.

Does that mean that we have to repeat

that five- year cycle again to get the stability data?

Short answer is no.

What we can do as an alternative is look at accelerated their stability studies.

These are more short term studies, so we can apply that more exaggerated

accelerated degradation temperatures and we can use shorter time periods.

In a matter of months and years, we can now look at a matter of days,

so we can go from seven to 14 to 28 days.

This technique is commonly applied in a small molecule space,

but not so much in the large Bison super space,

because of the small molecule space.

They involve a lot of tablets and solid formulations,

and it's only starting to hit trend

in the bioph arma industry with more liquid formulations.

In terms of the stability modeling,

we base our data using Arrhenius kinetic equations

that can be both linear and exponential with respect to time.

These are semi- empirical equations

based on the physical nature.

For example, accelerating

is if there's a nucleation point for aggregation,

it could cause an exponential growth.

Conversely, when you look at decelerating, when there could be a rate limits in step,

it could cause a slow growth in the degradation, too.

All of these models are fitted and performed on a fit quality assessment

through evasion information criterion, but also we can establish

confidence intervals as well using bootstrapping techniques.

This is where David Burnham at Pega Analytics.

He's worked closely with Don Clancy

on developing a JSL script or a JMP add-in for us scientists to use in a lab.

Just going to give you a quick demonstration of the JMP add-in itself.

If I can exit.

During our ASM study is you collect your data,

and obviously you put it into a JMP table.

In this instance, we're looking at the size exclusion data,

so we're looking at the monument percentage, the aggregate, and fragment.

What we got is a JMP add-in that will give us the fitting.

If I just quickly open it up.

You go through a series of steps, so you can select the type of product.

In terms of small modular space, where they're dealing

with solid tablet formulation,

you could use a model based on Aspirin, which is more of a generic approach,

or a generic tablet where it's more novel and different to Aspirin.

But for us in biopharma,

because it's all liquid formations, we look at the generic liquid.

We're good. Okay.

Inside in the data,

so just that data table I showed you earlier,

you can also do a quick QC check.

It's just a fancy check that everything matches up,

and ensure that everything is hunky-d ory,

or you can just remove it and replace it with a new data table.

The most strenuous part of this JMP add-in is to actually match the columns up.

In terms of the monomer, and aggregate, and fragments,

those are the impurities that we're interested to model.

We'll put those in the impurity columns, as well as matching up

all the other important aspects such as the time, temperature, pH,

and also batch which is something that could be of importance.

If you have a molecule that has different lot releases

and you want to see if all your lots are consistent,

this is one tool that you could possibly use

to ensure that your lots of lot variability is consistent.

Al so to ask is if you have specifications,

so if we have a gold target of having no more than 5 percent drop

in monomer,

so we're doing 100 minus the monomer in this case.

But in terms of aggregate, we have no more than three percent,

and for fragment, no more than two percent.

In terms of the model options, we can select all the models

that we want to fit and evaluate and also select just at a generic

temperature and pH you would like to look at.

You can choose that later on as well.

That can be flexible,

and also the different variants like the temperature only,

and temperature, and pH.

To save looking at fitting the data itself,

you can either go for a quick mode, which could be a two- minute quick fit,

which won't be as accurate as maybe the long term mode.

But to save you all from looking at spinning wheel of death,

I've already fit the data and then we can go straight into it.

We can fit all the models.

Once it loads up, eventually, you can have an overarching view

of the prediction profilers of each type of model that's been fitted and evaluated.

You can see that some have a confidence

that are a little bit broader and some are a little bit tighter,

so it can be either

the model doesn't reflect very well in it or it could be overfitted.

For scientists, we can then delve into selecting the candidate model.

This is where it's based on the Bayesian information criterion.


You can also look at those two criteria and see which one is more appropriate.

That you can use the drop down to see

how the model fits in with the actual predicted values.

Then last but not least, if we look at

when you select the preferred model, this can give you the override in...

Here you go.

You can manually override which model you'd like to choose.

But also, here in prediction profiler,

is you can select the conditions you would like and extrapolate from it.

Beyond the one month period,

you can extrapolate all the way up to two years,

and you can see how it fares in terms of its stability.

One last thing to add is bootstrap techniques.

If you want to find the control of how the bootstrapping is working

and to get more accurate modeling of the confidence intervals,

you can simply do that to each of your impurities.

Trust it not to work.

Okay, time to completion. It's done.


You can see that you can look into it in a great detail.


We go back to the presentation.

We'll swap these.

Okay. We'll be going into our first case study,

which is looking at the stability of our formulations.

They play an important role in drugs in general.

They, not only just helping biopharmas

in terms of stabilizing the protein during the storage or manufacture,

but also it can help aid the drug delivery when subjected to a patient.

Formulations contain many components, so they're called exceptions.

These are generally inactive components within the drug product itself

but they act as a stabilizer, so they could be the buffers,

some amino acids, stabilizers, surfactants preservatives, m etal ions, and salts.

But in terms of the formulation development,

you can screen many of the excipients to find the ideal formulation

and you can use design and experiments, and that's going into a different topic.

But one way to test and prove that your final formulation

fit the purpose is by doing stability testing.

Our case study, we've looked at three different pH of formulation

for this monoclonal map,

and we stressed them at elevated temperatures.

We looked at our ASM study from time zero all the way up to 28 days

and analyzed them through size exclusion.

A s you can see, it's the same snippet of the JMP add- in,

so you can see how the models are being fitted,

but also the extrapolation from the prediction profiler

to see how the monitoring stability fares.

When we look at the mon omer and aggregate, you can see we can take predictions

from that prediction profiler at 5, 25, 35, 40

or even other temperatures.

But within that model, we have an N1 value,

and that can reflect on how fast the degradation is,

either in acidic or basic pH.

What we found is a minus value.

It has faster degradation in acidic pH.

We found that there was a higher risk of low pH rather than a higher one.

Our next case study, which is on a similar trend,

is looking at a different m onoclonal antibody,

where we use its ASM stability study up to 28 days.

With that same molecule, we had some historical data,

which had five years worth of stability data.

What we did is just taken the data from both studies

and put them into JMP add-in and see how they compare.

What you can see is highlighted in green,

you can see that that is the model prediction.

In the bottom is long term study, the real time study.

In blue, you can see that the values fare pretty well.

Then in red is the confidence intervals.

They match up quite nicely, which is good.

But one of the downsides to ASM because it's the short term study,

if you look at the graphs itself, they seem to be quite linear.

Whereas in real time data, they seem to be a bit more curved and exponential.

But in terms of getting that

data back and that actually prediction, it's quite good.

That could help with some immediate formulation

development work that you need to do

rather than wait for long term stability data,

I'll pass it on the David.

Thanks a lot, Paul.

Paul's given an example of how we could be looking to long term study

stability predictions based on a month's worth of data.

What we intend to do though

is you're intending to design your formulation

in order to hit a certain minimum threshold for stability time spent.

In this case, what we need to do is having a fixed formulation

and we're then trying to use this technique to find out

what period of time does the product stay within a device for each threshold,

and therefore what do linear in terms of how long we can hold this material?

The material being in question is material that's been generated

during the dashing manufacturer of the bio therapeutic.

Essentially with this, you've got different unit operations,

which are linked in series.

What happens is your complete volume operation

and then depending on shift patterns or utilization of your facility,

it may be a case that you want to have holes in between different unit operations

in a way in order to regulate timings of your process.

One of the key things that we need to have here

is to know that if we're holding our material

in between the unit operation,

what's the maximum period of time we hold it for material

[inaudible 00:14:57]?

The way that we normally look to do this is we have a plot, something like that,

on the lower right hand side,

where we just

hold the material a month in a small scale study

and just do repeated analytic measurements

of our product quality interest to see how it changes

and whether it falls within tolerances.

In this case, you can see that we're looking for total 100 percent

over time on the X- axis.

In all cases, it's falling within those red bands,

so we can say it's [inaudible 00:15:32] criteria that we are after.

What we're looking to do in this study

is we were rather than looking at the standard conditions

that would be exclusively the standard conditions

that the material could be held at, so that would be 40 degrees C.

It was refrigerated or approximately 25 degrees C,

so room temperature.

What we're looking to do is to have parallel studies performed

at 30, 35, and 40 degrees C, but only for a week,

and then see if that high temperature data could be used to predict

the low temperature data.

What we've got in this slide is we've got

a few snapshots from the data that we collected,

which has been visualized within the graph builder.

The person the panel box on the left hand side, you've got the data reflected

at five degrees C and the columns in there represent

material coming from different unit operations.

Each row corresponds

to a particular form of analytics that's being deducted.

For example, if you measure the concentration

of the level of [inaudible 00:16:38] species.

On the right hand side, we've then got the equivalent data

for equivalent unit operations and analytics,

but just a higher temperature.

Just doing basic plot like this,

the first thing we can see is that the general trends

seem to be consistent.

If we were to look at the purple plots,

in this case, you can see that the first column,

so the first unit operation, we've got a descending straight line,

whereas the second and third unit operations, you have a slight increase.

Qualitatively, it looks quite promising in that an increase in temperature

isn't causing any changes to the general trends that we're observing.

In terms of a more positive prediction,

this is where we then began to use the ASM add-in.

In this case, what we've done is we've taken

the 30, 35, and 43 degrees C data and we then use that transmitter model.

In terms of the model bit quality,

you can see from the predicted versus actual plot

in the center that it appears to fit quite well,

so that's reassuring.

If you look at the model fits in the table on the top left hand side,

we can see that the model fit with the lowest BIC score

with [inaudible 00:17:57] model.

This study was that we didn't have pH as a variable.

That's why the BIC score for both of those is the same,

because we're essentially removing that parameter.

The linear genetic model

and the external linear model are essentially equivalent.

What we've then done is use this high temperature data

to fit this kinetic model and determine what these kinetic parameters would be,

so it's K 1 and K2

in the kinetic equation, show n on the bottom left of the slide.

We've then changed the temperature value in that to five and 25 degrees C

and try to predict what level of degradation we'd expect

at that temperature and over a longer period of time.

This is what's shown on the right hand side.

We have the red lines corresponding to predictions from this equation

based on the high temperature data,

and that's then fit into the experimental data

at lower temperatures just to see how good the prediction is.

In this case, you can see that actually the predictions

appears to be quite good.

It gives you cause of comfort in this case.

Sometimes, however,

we noticed that this wasn't always the case.

In this example, again,

with the high temperatures, we've been able to fit...

Have good model fittings which are predicted versus actual fit

is quite strong.

In this case, however, the model fit has been

stipulated to be the best, in this case, is the accelerating connect model,

so indicating that the reaction rate's getting faster over time.

We then apply the same procedure to this set of data and we start trying

to model what would be happening at lower temperatures.

We can begin to see that the prediction is a little bit erratic.

In reality, the increase in

the level of this particular purity was fairly linear.

The model was predicting that it was beginning to overestimate.

It's quite a drastically high time points.

I guess one thing that's important to bear in mind with this

is that you also need to have an incorporated level of subject matter

knowledge when applying these kind of technique techniques also.

You need to have the balance

between what's the best statistical model in terms of fit,

but also what's the most physically representative of the type of system

that we're dealing with.

In terms of subject matter knowledge,

another thing that's an area where it's important within this technique

is the selection of the temperature that you can use in this study

and the temperature range that you're going to look at.

There's two reasons for this, and you've got competing forces.

It's preferable to use as high temperature as possible

because that means that the reactions in exceeded a faster rate.

One of the issues that we often encounter with this type of study

is that at low temperatures,

fortunately for us, we're often dealing with products which are quite stable.

But the inherent problem with that is when you're trying to use

quite short time, quite narrow time series later

in order to measure these changes, you often end up getting caught

in the noise and your signal to noise ratio ends up being quite low.

That's what's being demonstrated within these plots.

The plots at the left hand side, are a kinetic rate fits, or reduced plots.

What you typically expect is that an entirely temperature driven behavior,

you'd have straight line.

If you look at the top left plot, you can see that that's the case.

We've got the first four blue points, which are forming straight line.

That corresponds to 40, 25 degrees C.

Across all of those temperatures,

we've got entirely temperature preventing that behavior.

But then the five degrees C point

on the right hand side of that plot appears to be off.

But when you dive into the data and find out whether this was

because it was not available in that temperature dependence,

if you look at the equivalent plot on the right hand panel,

you then begin to see that actually because there's so much noise in the data

that it's more of a fitting issue rather than a mismatch issue.

If you were to look at the gradient of, say, one of those red line, blue line,

despite the fact that the intercepts can be different,

they can easily fit that data because there's just too much noise

in the data to really be able to fully understand which one should be applied.

In terms of general conclusion, though, I think what this is

for this project we've been able to demonstrate is that JMP itself

has a number of powerful built-in tools

and with lots of knowledge of JMP scripting language

or someone who can do this for you, those can be compiled

in some form of user friendly package, which can then be used

for quite complex analysis, which makes it accessible to most users.

It's also demonstrated that by performing statistical fits to semi-empirical models,

we've actually got a lot of tangible benefits from that

and that we're able to make predictions about the future

which in the past, we've not been able to do,

and potentially, significantly reduce our timelines

in terms of identifying liabilities with particular drug products.

Frankly, this also demonstrates

the importance that you can't be in areas such as this.

You can't rely exclusively on statistical models.

You also have to incorporate with your own subject matter knowledge.

Try and work out which statistical model

or kinetic model, whatever it might be, is the most appropriate

to the situation you've got, and then which of those is the best fit.

In terms of acknowledgements as also getting a lot of this work

has been based on an original paper,

which came out of GSK by D on Clancy, Neil, Rachel, Martin, John.

This has been extended, so a biotherapeutic setting.

We also thank George and A na

for the supplying data which has been used to build this project,

and Ricky and G ary for the project endorsement

[inaudible 00:24:20].


Great if you could share JMP add-inn that you demonstrated. 


Dear Paul and David,


thanks a lot for the very interesting presentation. 

I would like to know if the ADD In that you presented is the same as mentioned in the book "Accelerated Predictive Stability (APS) - Fundamentals and Pharmaceutical Industry Practices", by Fenghe Qiu and Garry Scrivens that is called "GSK ASM software"?


Thanks a lot for your help


Tatjana Stieler 

Article Tags