Leveraging JMP Analytics to Drive Innovation in High-Throughput SynBio Analytica...

Today we're going to be talking about using JMP, our favorite software

and applying it to a real- world problem in our analytical sciences department

at Amyris which is a symbio company.

Before we jump into that, I wanted to introduce myself as well as

Scott, who helped me along this journey.

I'm Stefan, I'm an associate director of R& D data analytics A myris.

I have twelve years of industry experience,

a lot of diverse background, I've worked in various labs

from analytical chemistry to fermentation science and in more recent years

focused more on the quality and data science side of things.

Scott has helped me in a lot of the content here

and has been working with Amyris for a number of years

and he is one of the JMP pros working for JMP.

I'd like to start off by just saying thank you to Scott for helping us out here.

We're going to split the talk today into three parts.

I'm going to give a bit of background and context both on synthetic biology,

if you haven't heard of that before, and analytical chemistry.

The main part of the talk is really going to be focused on then applying JMP

to a specific question we had and then finally we'll wrap it up

briefly touching on automation and then the impact

of the analysis and this case study we'll look at together.

S ome of you may not be familiar with synthetic biology

or analytical chemistry and I really like to understand

context and background and is going to be relatively important

for the case study we look at, we'll focus on that today and start there.

S ynthetic biology really leverages microorganisms,

as we like to call them, as living factories.

We use mainly yeast in the case of Amyris that we precision engineer,

and we use the process of fermentation, which is not a new thing it's something

people have been using for thousands of years, mainly to make alcohol

and bread in a lot of cases.

In our case, we're using the yeast in fermentation,

feeding it sugar and converting that sugar into a variety of target

ingredients and chemicals.

Those ingredients and chemicals we can then make

higher purity so they may be higher performing lower costs

and in a more sustainable fashion.

To give an example, this isn't just a fairy tale.

This is reality, it's not an idea, we have

18 molecules today that we manufacture at scale and I'm showing

a subset of those here.

There's an example on the top left you have Artemisinin.

That's an antimalarial drug, it was our first molecule

and that's how our company was founded.

In the top middle, we have Biophene which is actually a powerful

building block that we then convert into other chemicals and applications.

One example being Squalene which is a

very popular emollient used in the cosmetics industry

and traditionally is sourced from shark livers

and one that might be familiar in the bottom middle, we have patchouli.

Some people associate that with the hippie smell, it's a fragrance,

but it's actually really ubiquitous in the fragrance industry

as a base note, so it goes into thousands of products.

Things like Tide detergent have patchouli in it and we can manufacture this,

which is traditionally extracted from plants

with our synthetic biology platform.

I work in the R&D function, and so our goal is really to identify

the best E strains that we can then use at manufacturing scale,

and that requires research at scale.

We run highly automated a lot of high- throughput workflows

at Amyris in Emeryville, and so from the left there we start screening

our yeast strains

at a capacity of about 600,000 strains per month.

We take those top performers and we promote them

to what we call our bench- scale bioreactor fermentations,

which you can see pictured on the right there.

Throughout all of this, we're creating a lot of strains,

which means we also need to understand what's happening in those strains,

what are they producing, how much, and that's really where analytics come in.

Those analytics need to be run at a scale to match that so we can really

get the data to understand what's happening.

With this scale of research, there's a lot of opportunities,

and a lot of those opportunities come from looking at conventional

approaches and reconsidering how to do those.

I will talk a little bit about analytical chemistry.

Again, that's not anything that's unique to synthetic biology.

It's pervasive in a lot of industries, petroleum industry,

environmental sciences, pharma, very common way just to measure things.

I'll talk here really about chromatography,

and as an example, I'll take fermentation that we do on the bioreactive scale.

From this fermentation, we're going to sample that while it's running.

We're going to get a dirty sample from that which we then can further

prepare and dilute.

We have this mixture of components in this final form.

We'll then take this mixture of components,

we'll run it across some separation techniques.

That's a chromatograph.

What that's going to do is based on the property of those components,

might be size, it might be polarity, it'll allow us to separate those out.

We then feed that into some detection mechanism.

There's a variety that you can use and what that gives you is a separation

of these components over time and then some intensity of response.

The last piece and where we're going to focus today

is intensity isn't really a useful thing for you or me to make decisions on.

We need to translate that into something useful like a concentration.

The calibration curve allows us to translate

that intensity into a concentration,

and of course, you can imagine if you get that translation wrong,

your data is going to be wrong and it's going to mislead you.

Calibration curves is where we'll focus today,

and that's the heart of the question.

A calibration curve is created, by running standards with

varying levels of your known component.

The example I'm showing here,

we have a low mid- high, so a three -evel calibration.

We know what the concentration is in those because we prepared them,

and we measure the response on these instruments.

From there, we can fit some calibration curve.

In this example, I'm showing just a simple linear fit,

and then we can run unknown samples,

read the response off our instrument and do an inverse prediction.

We're taking our response from the Y and predicting what the quantity is

in that sample.

It's a very common way to be able to quantify things in unknown samples.

That's our background.

We're going to jump into the case study looking at this key question

we had around optimizing a part of our process in our analytics.

A bit more background here is that when we do calibration in our labs,

there's a cost associated maintenance of these calibrations

and calibration curves and calibration standards is expensive,

both due to people's time, but also materials.

These materials can often cost thousands, even tens of thousands

of dollars per gram.

With the scale that we're doing our research at, it really pushes us

again to reconsider those conventional approaches.

We're running millions of samples per year, and we have a really diverse

set of analytical methods so we have currently

in our lab in Emeryville,

over 100 different analytical methods measuring all components.

One place we looked at is conventionally.

We see this with most people we hire, this is where people start.

Conventionally calibration curves often have five to seven levels,

whether they're linear or not.

We think about they say, okay, five to seven levels, linear fit.

In theory, the most you might need is or the minimum you might need is two

and there's a cost to each additional level,

both in materials and preparation costs and maintenance.

This is where we wanted to look and ask the question,

look, can we actually reduce this number for an existing method

without significant impact on our actual data quality?

The way we quantify our unknown samples.

This is where JMP comes in, we're going to use Jump here to simulate

some alternative calibration schemes, in this case,

reducing the number of levels of calibrations

and to reiterate what we've walked through our problem ultimately is that

calibration maintenance is costly.

That's exasperated by the scale we do it at.

Our general approach is really going to be to look at how can we optimize this.

Let's look at reducing the levels of those calibrations,

and then our specific solution is using JMP here

to ask the question, look, if we went back in time

and if theoretically, we had run two calibrators or three calibrators

instead of six or seven, how would that have impacted our data?

Our case here, we're going to focus on a single method today.

This is a real method we've been running for about six months.

We have 22 batches of samples we've run on this method,

so it's about 1000 samples.

Our existing calibration I show here on the right is a linear calibration.

It has six levels and we've estimated if we can reduce this

to the minimum of two levels, we could save an estimate of $15,000 a year.

There's a real measurable motivation to understand if we can pursue this.

Showing here the general workflow that we came up with.

I'm going to go through this really quickly right now but no worries.

We're going to walk through it step by step together.

We're really going to just pull the historical data.

We're going to recreate our historical calibration in JMP to validate what

we're doing in JMP matches what we've done historically,

and then we're going to say, okay, let's eliminate some of these levels,

recreate the calibration with those reduced levels,

and then evaluate what impact that has on our targets.

Now, I think in this case it's also really important to emphasize you see,

we have two pass- fail forks in the road.

Often when we're doing analysis

on data in hand, we're looking for statistical significance

with studies like this, it's really important to determine what your practical

requirements are.

In this case, what does that mean?

We're talking about impact on the measurement of unknown samples.

Ultimately, we want to make sure that reducing the calibration is not going

to bias the measurement in one way or the other.

We want the measurement to be the same.

As many people will tell you, the same is not really a quantifiable thing,

it depends on your sample size, the noise in your process.

We need to define what is no different, same or no impact mean.

Here we're going to set our acceptance criteria ahead of time

for this first step as accuracy within half the percent,

and for the second step as accuracy within 1%.

We'll see these come back as we walk through this.

Our first step, and every page here, I'm going to show in the top right

what step we are in the process,

as well as highlighting what JMP platforms we're using.

For our first step, we're going to be pulling our historical data

from a database, in our case, we have a database.

We have a Lin system that already has the data in a structured format.

You could also import this from CSV, however, you can access the data.

We're pulling it in our case using raw SQL and JSL

and it pulls in a structured format showing a subset of the columns we have,

but what you'll notice is in this case we have our six calibrators

as well as a number of unknown samples.

We're pulling in the historical data as the core data set we're working with.

The first step is recreating and validating the same calibration curve

so that same six- point calibration in JMP.

Now, you might ask why we have to do this.

There are two main reasons.

One is calibration curves can have a lot of caveats.

They can have weighting, they can have anchor points,

they could be forced through zero, they could be nonlinear.

This is a good way to validate that you're using the right

parameters and JMP to recreate this.

The other reason is that we don't expect these values to be exactly the same.

The reason being that a lot of these analytical software uses some proprietary

regression that is not exactly like, let's say, ordinary least squares regression.

To do this, we're going to use a specialized modeling fit curve parameter

and really just recreating our calibration curve.

Just like I showed earlier,

where we have our known quantity of our six standards on the X

and our raw intensity or signal response on the Y.

In our case, we have 22 batches, I'm not showing all of them here,

but we're reproducing this for 22 different sequences in essentially one

click and what I call the power of the Control key if you don't know this trick.

Will save you a ton of time, if you hold down the Control key,

click on the red button, whatever you do is going to apply to every

analysis in that window.

Recently learned that's apparently called broadcasting,

so you could use that as well.

We're recreating a calibration curve for each of our batches

and then in the same specialized modeling platform, we're then

saving the inverse prediction formula.

Because we're predicting from Y to X,

if you remember back to our calibration intro,

to be able to save the predicted values back to our data table.

This then looks like this where on our data table

we have first our historical quantity, what we pull from the database,

and now we have our raw quantity that we generated from

these newly created calibration curves and JMP.

We have a multiplier we have to apply, do the sample prep we do

that we pull from the database so that's already there

and it's going to stay constant.

We simply need to just apply a calculated column here

to have a comparative value to our historical data.

If you look in this first raw, our value is very close to

but not exactly the same as our historical data.

Next up, we're going to visually do a comparison, plotting our historical

against the JMP recreation of that calibration,

and this is a good check again, to look through your data.

What you would expect or hope for is

a line that essentially looks like Y equals X.

Now we don't want to stop at a visual analysis.

We of course, want to bring some statistics into it.

This is where we introduced the passing Bablock regression.

It's actually something that was just added into the base jump functionality.

I think with JMP 17 used to be an add- on for a long time.

I'm glad it's there now.

This is a specialized regression that's non- parametric and robust outliers,

that's really designed specifically for comparing analytical methods.

For many of you, probably irrelevant you're never going to have to use it,

but we need to use it in the world we're working in.

What this regression does, it gives you two hypothesis tests

to test for constant bias as well as proportional bias.

S tarting with a constant bias, where we're seeing if there's bias.

Imagine the line moving up and down the same across the range.

We're evaluating if the confidence interval of our intercept

does or does not include zero.

For proportional bias, where the bias would change based on the response.

We're evaluating if the confidence interval of our slope

does or does not include one.

Now in our case, we reject the null hypotheses in both of these cases,

which tells us that we do have statistically significant bias,

both constant and proportional in our data set.

From here you might say, okay, we're done there's bias we can't move on,

but thinking back, this is why it's really important to define

what the practical significance is because

any statistician will tell you in our data set we have 1000 samples,

you have 1,000 samples you're going to be looking at

very tight confidence intervals.

You're going to be able to detect very small differences.

We have a statistically significant difference but does it matter?

That brings us to our last step we're going to calculate,

again using the column formula,

the relative difference between the two methods

and I'm showing a distribution of that below

and that distribution then gives us access to this test equivalence.

This allows you to test a distribution of values against the constant

that you define within some confidence.

Here in this window, we'll enter our target mean is zero

because we hypothesized that they're going to be the same so no difference.

Now we get to enter our acceptance criteria, which was 0.5%.

This gives us this very nice output with our final two hypotheses

tests where if we reject these, we can determine essentially that the mean

of this data set is equivalent to zero within plus or -0.5%.

This one you might say, hey Stefan, this is doing a t- test,

your distribution is not exactly normal and I think you'd be right

and if I went back I might actually use the test mean platform

because that gives you access to non- parametric equivalence tests.

Regardless, this is a really useful and direct way to test

for practical significance.

We've pulled our historical data from the database,

we've recreated and evaluated the calibration curve

and we've established that it passes our acceptance criteria.

If it had failed, it could be an issue with the data set.

You might not be using the right calibration parameters.

There are a number of reasons, we generally would

pretty much always expect this to pass.

It usually just requires some investigation to what's going on

in the way you recreated this calibration.

Our next step is down- sampling or reducing the number of levels

of our calibration.

Now, if we try to do this without JMP, we have to go into every single

sequence in our analytical software manually

remove calibrators, recalculate things.

Be really long and tedious thing.

JMP this is as easy as just using the data filter.

In our case with this six- point calibration we have a linear one.

We know that the minimum number of points we need for linear fix is two.

We're picking the highest and the lowest calibrators

and just filtering down to those.

From here I'm going to go pretty quickly, but really all we're going to do

is recreate this calibration with two points in JMP.

Again, we're using the specialized modeling platform, doing a fit line.

The only difference now is we have two points instead of six

and we're applying that inverse prediction formula back to the data table,

which again is going to give us our inverse prediction

and then we apply the multiplier

and because I know I'm going to do the practical significance,

I'm just going to preemptively calculate a relative difference between

the two- point calibration and then the historical difference.

Again, we go through the Passing Babock and not so surprisingly,

again considering the size of our data set,

we're going to reject the null hypothesis here and establish that we have

statistically significant bias, both proportional and constant.

We move on to test our acceptance criteria.

Remember back now our threshold is 1% instead of 0.5%

and that's working with the stakeholders of the data

to establish what is an acceptable equivalency.

That's always important pre- work to do

and we're going to test that equivalency.

Here we find that these two methods are equivalent within plus or minus 1%.

On the unknown samples and that's really important.

We're using those historical real- world samples

to really ask the question what if we went back in time

and reran all these calibrations with two points

and reported the data of these unknown samples.

How would those values change?

On average we see that they change very little,

and so it gives a lot of credence to considering

inducing those calibration levels.

We've essentially demonstrated this now

and so this calibration on the left and the calibration on the right

we're saying are equivalent , aren't going to provide equivalent quantitation

within 1% and so we have essentially the evidence we need to push for this change.

We passed our first check, we reran the evaluation with the two,

we passed that and now we're at our final step

of implementing those changes in our process.

Now it's arguably the most important part,

if you do an analysis, we just leave it sitting there, doesn't do much good.

This can sometimes be the hardest part.

You have to go out, you have to convince people, especially in cases like this

and you have to take consideration of maybe are there additional

things that this analysis didn't consider.

I'm happy to talk to anyone about that,

but we're not going to go in depth of what the other considerations we have

to think about before putting this into action,

but with this example, we did

actually end up reducing calibration levels from six to two

and that reduced the annual cost of running that method by about $15,000.

From there we might say, okay, what now?

Are we done?

Of course not.

Right now we need to look at we did it for one method,

we have a suite of another 100 plus methods that may also have these

many level calibrations that might be overkill for what we need.

We want to look at repeating the analysis for other methods.

That's where I think automation comes in.

It is a really great way to scale this one- off analyzes for ourselves,

but also for others.

My rule of thumb is if I find myself doing an analysis

more than two or three times.

Let's build that out in the automation,

say future me a lot of time, spend a little time now.

I'll just touch on this very briefly and I want to shout out Scott here

for helping me with a lot of the workflow builder work and the scripting

but these native automation tools in JMP are really powerful

and they're very user- friendly, there's a lot of code- free options

and so there's really different ways you can do this.

You can do it on the left side, right in a classic way doing all the scripting

even allows you to save these global variables

so it could give us place for you to have users enter their

acceptance criteria which might change

and or you can leverage the workflow builder which is a bit of a newer feature

but really lets you build out this automation.

Even if you just want to script it raw,

you can build the framework that you can then flesh out.

The two things I will say about this is

how much you can automate or maybe how much effort you have to put into it

is going to be limited to some extent by how rigid that workflow is.

If users need it to be really flexible, need to interact with it,

it could become very challenging to automate,

and of course the data consistency is key as well.

This is really a great tool to help others reproduce the analysis,

but you really do have to also train them and document the work,

make sure they know what it's actually doing.

As we all know, every analysis has its caveats.

You need people not just to click and have a report,

but also understand a little bit like what are some

potential things that could come up, especially if you're trying

to future- proof of work.

I like to bring it back together and wrap it up there

and hope today that I've showed you that JMP

you don't have to do like crazy complex or sophisticated things in JMP

you could piece together a lot of simple functionality

to create really impactful workflows.

Whether you're working in a lab at your organization, wherever it is,

look to identify these improvements in existing workflows.

I like to think about if you all are in the experience

most of us are in, there's more data than what we know what to do with.

Look at the data that no one is looking at

and then challenge the conventional thinking.

The way we're working is always changing, ask why do we do it this way.

In our case, for a long time, this is the way we do it.

Five, six- point calibration.

Ask why, what if we didn't, what would the impact be.

Of course, don't have to tell anyone listening here.

Use JMP for the scalable analysis and then use automation to make it easy,

and it really doesn't have to be fancy.

It just has to work for what you need it to do.

Finally, you can use that to impact these impactful, implement impactful change

and use data to drive those decisions.

It's probably one of the most convincing tools that we have today.

If you're talking to management,

do it in units of dollars because they love that.

I'll wrap it up there

I think last thing I'd like to say is just a thank you to the

JMP Discovery Summit Committee,

all the people organizing special thank you to Scott for all the help

he gave me in the past with Amyris, but also with this talk and this analysis,

and then a number of people at A myris who were involved in with this

and with that, I will wrap it up.

Thank you. Bye.