at the Center for Mathematical Sciences at Merck.
Today I'll be going over simulating sterility breaches
with non-parametric data.
At Merck, we often deliver our liquid formulated drugs in prefilled syringes.
A group at Merck that specialized
in that came to me asking me to see if I could simulate
if there is any risk to sterility breaches in them, depending on historical data
and some different scenarios they wanted to look at.
There was two interesting parts of this
that I wanted to go over today in my coaster and discuss a little further.
The first was some of the historical data,
specifically, the fill weight was non-normally distributed.
When filling the syringes, it's not necessarily processing to a target.
It's able to move within a range
and even drift outside of that range for a bit before being corrected.
That often results in some heavy tailing of the data,
which you can see in the bottom left here.
That's an example of that.
We wanted to make sure that we were capturing that heavy tailing,
because obviously that's where the highest risk is going to be.
The other interesting part that goes specifically, into some JSL scripting
is that I was dealing with a large number of iterations asked for by the customer
because they were looking for 10 million per scenario because that's the order
of magnitude, they were expecting to create the syringes.
I was able to, during the project, discover some techniques
to reduce the processing load on JMP
that was able to significantly reduce the process time for when I was running
the simulations and prevent any crashing or anything like that from memory issues.
I'll touch on both of those things.
But first, I wanted to go into a little bit about more background
on the prefilled syringes and what we were looking at.
As I mentioned, we have the fill weight data.
That's the amount of liquid that's filled into the syringe.
That again, I wanted to look at non-parametrically
using a density function.
I was able to find that that was very easy to do in JSL.
I'll show how I did that.
Then the other aspect was the plunger insertion depth.
How deep is the plunger being inserted and how close is that to the liquid fill?
Then the dimensions of the prefilled syringe.
There is some variability from the manufacturer,
I wanted to make sure that was being captured.
There were two key outputs, and they were a yes or no output for each.
The first was, we want to make sure that we were maintaining a gap
between the liquid fill and the plunger.
Because if we don't, then we're going to be getting
liquid up on the plunger, and that could be a sterility risk.
We wanted to make sure that the air gap length was always greater than zero.
The other one was we also don't want that air gap to be too big
because when we're shipping the syringes, say, on an airplane,
they might be exposed to lower atmospheric pressures,
which can cause the plunger to move up.
If it moved up too much, it could go beyond a sterile barrier
that was created when the plunger was inserted.
We don't want it to go too low. We don't want it to go too big.
But there's a lot that goes into the plunger movement,
not only the air gap, which is a function of the dimensions
of the plunger and how deep the plunger was inserted
and how close it is to the fill.
But again, also different atmospheric pressures
and the cross sectional area, so the dimensions of the syringe.
There's a lot of different inputs
and different sources of variability potentially
to that plunder movement.
I wanted to be able to simulate all of those.
That meant that I knew that my data table and JMP
that I wanted to simulate into was going to be very big.
The first change that I was able to make, to make these simulations
a lot more efficient was actually just opening up the historical data
that I was going to use, the data table I was going to use
as being invisible.
This made it so JMP didn't have to render the table,
this potentially massive table I was going to create
and was able to really reduce process time
and also prevent jump from crashing at times,
it said, the memory of my laptop was exceeded.
Once I opened up the historical data as invisible,
I then would add enough rows to that just blank rows to get me to 10 million,
because obviously my historical data wasn't that big.
But I wanted to make sure that the data table had 10 million rows,
so then I could go ahead and simulate 10 million iterations.
Specifically, what I did for the non-parametric aspect
of the data was I fit the data in the distribution platform in JMP,
and then I was able to just very easily use the fit smooth curve function
to save simulations from that non-parametric data
to 10 million iterations.
Super simple and easy way to fit essentially kernel density function
simulated values.
The other two things that really improved my simulation
was, as I mentioned, there was a lot of different calculations
that I was doing within a data table and different scenarios over 20 different,
for example, plunger depth targets we wanted to look at.
As part of my JSL script, I wanted to be looping over different scenarios.
But if I was just going to create a column that then referenced previous columns
in a loop, that could cause reference issues
for each iteration of the loop, because I would end up with essentially
all of the new columns having the same formula
because they'd all just end up referencing whatever the last
iteration of the loop was.
To prevent that, if I wanted to use a formula
for the column, I would then need to delete the formula.
Again, very inefficient.
One very simple and easy way that I could get around this
was instead of saving a formula for a new column,
just use set each value.
This means that JMP didn't need to save the formula at all.
It eliminated that issue with the looping reference
and then also, again, reduced process time.
The final improvement that I made was by really working with my customer
in this case, and really figuring out what exactly they needed,
I was able to streamline things a lot.
Because initially, I was just giving them the kitchen sink.
Giving them distributions and histograms of every single parameter and output,
which they thought was interesting but was not really worth the effort
and worth the process time.
What they really just wanted was what is the % failure rate for these two outputs?
I was able to make delivering that a lot more efficient
by eliminating the need of opening up, say,
a distribution platform and trying to fit 10 million rows.
Instead, I just made sure that any sterility breach,
I just created a column where if a sterility breach occurred,
it was a one, if it didn't, it was a zero.
Then it was very easy to just calculate the column mean to give
the percentage of failure for any scenario and directly output that to a journal.
That way, the journal also wasn't having to be massive because it was saving
so much information from the data table because it was creating graphs from it.
Overall, initially in this project,
I was able to deliver it, but by using the platform outputs,
visible tables, and save formulas, it was taking at least three hours.
Often, I was letting it run overnight,
so I don't know the exact timing, but at least three hours.
By simplifying the output alone, so going directly to the journal instead
of saving from, say, the distribution platform and JMP,
I was able to get this down to an hour and 49 minutes.
Then just those two simple changes
of making sure that the data table was invisible
and saving values instead of saving, the formula got me down
to 52 minutes despite the volume of calculations
that were being needed to be made.
Overall, it can be very simple
and easy to simulate non-parametric data within JMP
using these data tables and using the fit, smooth curve function.
Then also, if you are simulating really big data sets in JMP,
if you are simplifying the output, if you're making sure
that JMP isn't rendering things it doesn't need to or calculating
and saving things it doesn't need to, it can actually be very efficient
in creating the simulations and giving you the outputs.
In this particular case, using those techniques,
I was able to reduce my simulation time over a three-fold.
That's all I have. Thanks for listening.