How to make more from your online and offline fermentation data

and how to speed up your bio process development with statistical modeling.

I am Benjamin Fürst from Clariant

and today I will show you how to do this in JMP.

I will lead you through a hands-on presentation,

how to combine different modeling techniques

so that in the end you have a combined profiler

where you can look at all the responses online, offline, at one time

and see what impact your process parameters have.

I am a biochemical engineer by training.

I work for Clariant at the Group Biotechnology

and I'm group leader of Bioprocess design.

The agenda for today will be

introduction for Clariant and into the process technology sunl iquid.

And then I will dive into the topic of my talk

and go to the statistical analysis of fermentation data.

For the modeling offline responses I called it standard statistical modeling.

I will use the Fit Model platform

and will especially focus here on the extrapolation control

which came in with JMP Pro 16.

Next level analysis will be modeling the online data

and I will use the same data set

and for this I will use the Functional Data platform,

also a JMP Pro feature.

Let me share some key figures about Clariant.

Clariant is a global leader in specialty chemicals.

To the right you can see the Clariant business numbers.

We have a 3.9 billion Swiss franc sales in 2020

with a 15% EBITDA margin

and over 13,000 employees worldwide at 85 production sites.

Clariant consists of three business units:

care chemicals, catalysis, and natural resources.

Care chemicals.

They produce, for example, ingredients for shower gels and shampoos,

which you use in daily life.

Natural resources.

They produce, for instance, bentonites.

40 percent of all vegetable oils are clarified with bentonites from Clariant.

Catalysis.

Of course, these are all kinds of catalysts

and they also contain the business line biofuels and derivatives.

This business line sells sunliquid.

What is sunliquid?

Sunliquid is a technology which came from the Group Bio R&D Center, where I work.

Basically, it's a biotechnological process for producing bioethanol

from non-edible biomass.

How does it work?

Just to stress this again,

that's the second- generation bioethanol production.

So we use non- edible biomass feedstock.

So basically agricultural residues.

This can be wheat, straw, bagasse, corn stover,

municipal waste or forestry residues.

We put that into the sun liquid process

and we first turn them into cellulosic sugars

and then to cellulosic ethanol.

The nice thing about this process is

that it can be used as a platform process for not only by bio-ethanol,

but for other sustainable fuels or even bio-based chemicals.

The development of this process started in the Group Biotechnology where I work.

You can see if I'm not in home office due to corona,

I'm working here at that corner of the R& D Center in Planegg.

That's located just a few kilometers outside of Munich in Bavaria in Germany.

The center was inaugurated in 2006

and we have over 100 scientists and technicians working there.

The competence fields of Group Bio are biofuels and derivatives,

sunliquid, industrial enzymes, and biobased chemicals.

Within the Biotech R&D Center,

we have all the expertise to develop new products and technologies under one roof,

starting from small tiny microtiter plates over a shake flask

to up to 100 liter bioreactors.

Here in the picture, you have a small peek at our technical center fermentation.

To the right here, you can see

the sun liquid pre-commercial plant in Straubing,

which is just one and a half hours from here.

It was built in 2012 and can produce about 1000 tons of ethanol per year,

and we were able to test the wide range of different biomass feedstocks there.

Maybe some of you are aware that Clariant is commissioning

a commercial- size bioethanol plant in Podari, Romania at this time.

To say, as a biochemical engineer,

seeing this plant, that is really, truly, impressive thing.

The size of the buildings and the equipment there

are, in biotechnological means, really tremendous.

And I see this as a real flagship for biomass conversion here in Europe.

And to my opinion,

that is one of the major biotechnology projects we have in Europe at this time

and it really makes me proud to see

that we are turning our sun liquid technology

from bench scale to production scale.

The plant transforms a quarter of a million tons of wheat straw

into 50,000 tons of bioethanol per year.

And to give a more handful number, that means one huge bale of straw,

about 500 kilos goes into that plant per minute.

The sun liquid technology enables you to produce bioethanol

in a very integrated way.

Side products like lignin are reused in the CHP plant for energy supply.`

So generating steam and power which goes back to the plant, of course.

If you're curious to learn more about that technology,

please look at the links

or feel free to contact me or reach out in the contacts

which are given on the home page.

Now, why I'm showing you all of this?

What do all the stages have to do with each other?

At all stages of the development of our biotech process,

you have data about the process.

And it doesn't matter if it's a micro titer plate,

a one- liter bioreactor, or a multi- cue production scale fermenter,

you always have offline data like titers you measure,

and you have online data coming from sensors, pressure, temperatures.

And you always want to know:

how can you achieve the most efficient process?

How can you achieve the high yields?

What the process influences in process parameters?

And what sensor do you have to pay attention to

to get the most of your process?

All these points can be addressed with statistical modeling of your process,

and JMP really gives you the opportunity to make more

from your online and offline fermentation data

and speed up your biotech process.

Let's go.

Since my talk is going to evolve about data coming from a fermentation process,

I want to put everybody on the same page concerning

how does a fermentation process look like.

This is a basic overview of the fermentation process with DSP.

I will focus on the main points here.

So you come in with a test tube that's a few milliliters.

You propagate your organisms in a so- called seed fermenter.

And then you do, in the next scale, the fermentation,

the main fermenter where the product is going to be produced

and then you go to a downstream.

The setup here really depends on what the specs of your product are.

All along the process, you have different steps

and you have factors, which are set for this step.

On the top here, I show just some examples.

So seed inoculum amount, how much biomass you put in here,

pH, temperature setting of the fermenter, or even other humid operations,

and the raw materials you put into the process:

which type, at what concentration.

And all of them here are constant, they have a fixed number.

So I depicted them here with dots.

And of course, you have responses.

The responses are what the outputs of your process.

This can be a harvest yield.

So how much product you going to get in your fermenter

or how good does your DSP perform

in terms of some specific response you want to look at.

Those are constant, again, because you just have a single point

and you have online sensor data.

For fermentation, you usually look at something like a

dissolved oxygen called pO2, temperature, pressures.

And the important point about that sensor data, it is functional.

So you have your sensor data over duration.

You have a curve of data.

And I can really tell you

if you're interested in knowing, how is everything connected there?

How can I make the best of all of the data you have?

And normally starts if you want to know from your fermentation parameters

what is your response you're going to have,

your offline response

something like a yield product character.

And maybe you want to know from a fermentation parameter

how does a typical sensor curve look like

so that you know how to go with your process in a good way.

And last but not least, wouldn't it be great, here for the last point,

knowing at one time point,

online sensor has a critical impact on your harvest yield?

Using standard software you will really not get far.

You're probably going to lose your focus looking at all the points

and JMP can really help you in getting those things done.

In the following I want to show you, with a live demonstration then,

how you can do all those things in JMP.

There are several videos in the community

really how to use the Fit Model platform or the FTE platform,

but I'd like to combine these platforms to put all the results together.

So standard statistical modeling

with a Fit Model platform and with a focus on the extrapolation control.

I want to use the fermentation parameters and model the harvest yield.

That is where we're going to go for the first point.

Normally the goal of doing this is,

you want to know what process parameter give a high harvest yield.

You want to find an optimum in the design space.

Is that optimum stable?

And what parameters have interactions?

And what parameters are sensitive?

And the way you do this, as I already told you,

you're going to use the Fit Model platform.

One important point here, I call it here

detailed evaluation by and with subject matter experts.

I believe that the real speed for process development

comes from a mutual understanding among the experts you have.

It doesn't matter if it's a fermentation engineer,

data analyst, or a manager.

Every expert must understand a little bit of the language of the other experts.

So it's really crucial that the subject matters

share a common language.

And I believe using JMP enables you to speak that common language

using the profiles JMP provides and the reports JMP has.

Get back to the topic.

This was called a real- life experiment,

and the data was coming from a planned experiment

and it was not intended for the usage of statistical modeling.

That means the design space was not optimal.

I put a plot of three parameters here.

As you can see, of course, there was some kind of structured approach,

but not in all dimensions, not in all parameters.

You can see on this axis, we don't have so many points.

So one of the challenges here was

the limited design space in here, especially the extrapolation control.

JMP Pro comes in very handy.

Let's get into JMP.

I'm going to load up JMP now.

So that is my data table and of course already scripted the report.

This is the profile for that model I chose.

A t this point, I will not go into the details of the modeling.

I assume you already looked at your plots.

You've done, you chosen your models, the parameters for your model correctly.

So how can you use this for your development?

I think that's pretty obvious.

You have here the harvest yield,

which you're interested for your fermenter,

and dependency of all of the parameters which I found significant for this model.

And one way you can use this,

I want to show you a very nice thing you can see with this model.

If you look at the Seed Inoculum,

that is basically the amount of biomass you put in the process.

If you have a low biomass you put in the seed,

it means you're not going to have so much biomass during your whole process.

And you will see that this parameter alone

doesn't have a huge impact on the harvest yield.

But now if you see that moving, if you move that around,

there is a strong interaction with component A in the main fermenter.

So

with less biomass in here you have,

the more component A in the main fermenter you give in,

the more products you will generate.

You can basically say that's a limitation on component A that makes totally sense.

Now the interesting thing is the interaction.

If you going to put more biomass into the process, that turns around.

So what happens here?

I have to say you cannot see that here what happens there

because you have a different limit.

Now it gets to the point where it's interesting.

You have to put more data into your model, for instance, online data.

You're going to see that in a different sensor.

That is something I will show you then if we go into the online data analysis.

You can exactly see this behavior there.

And just seeing this was quite obvious for me; an explanation of that was clear.

But just the other day I talked to a monitoring and fermentation engineer

and he just said,

"Having that behavior for that other parameter and temperature,

that's totally clear and we have to look at that."

That's something that I talked about, that common language

that I will be able, as data analyst, to understand some things

but not to the same extent as the fermentation engineer does.

So it's very crucial that everybody has access to this kind of looking at that.

I wanted to talk about Extrapolation Control.

Extrapolation Control comes with JMP Pro 16 and it can be found here.

There are different criterion you can set.

I set it here to the first one that basically is

JMP to stay within the context of the data you have.

If you want to learn more about that,

there's a talk about control extrapolation by Laura Lancaster and Jeremy Ash.

The videos in the community.

They talk a lot about how you can use that

and how are the statistical details behind that.

As you already probably noticed, moving around the parameters,

those traces here move that wasn't there before

and that really gives you an advantage in using the model

because you will not go outside the model, outside of your design space.

You're not going to make a wrong conclusion from your data.

And you can see even JMP splits data here

if it thinks that is not sufficient data there

to do some extracting data from that.

So for limited design space,

Extrapolation Control is very neat way to go.

We're going to use that later.

So we're going to need to save our

prediction formula.

Going to save it here, save columns, hit the prediction formula,

and then it's going to turn up in your data table down here

because we're going to need that.

Going to get back to my presentation.

So just to make a short summary.

The models shown could show expected behavior

and most important unexpected behavior and use that for next runs.

The extrapolation control especially on limited design space comes in very handy,

and we were able to find optimized parameters of the model

in the design space.

So we identified potential parameters for a higher yield for the next runs.

Just a nice side fact.

How to interpret stability.

I normally prefer to use the simulator here.

You can just add that to the profiler and also I like to use the contour plot

especially when you have multiple responses.

So not only the titer

but imagine you have some DSP performance you want to put into your model as well

and then you put them all in a contour plot

and limit the areas where you want to have the process

and then you have the target area where you need to run your process

to be within all the specs.

Especially here because of this limited design space,

you have to emphasize, you have to verify the results

with an additional experimental runs because you were on the limit.

The optima we chose with the profiler are on the limit of the model.

So you have to be sure that those are correct.

But here in this case,

we just use them to guide us in the correct directions for parameter setting.

At this point, we have the profiler

with the response of the harvest yield over the fermentation parameters.

Now I already hinted it.

Wouldn't it be great to have now all your online sensors on there too

so you can see other effects as well?

Let's go there.

So the next step is online data analysis with a functional data platform.

And there you are able to use the same set of fermentation parameters

and model the functional data, the online response data

and combine them with your model.

We just use that.

To put that a little bit in graphics,

w e have the fermentation parameters, we have a whole bunch of sensor data.

I depicted all the patches here.

And from that data, we want to go to a profiler

where we have the lower line is the model I just showed you

and you also have other responses there.

We can see what is the typical response of that set of parameters you've chosen.

I drafted a short workflow here of what we're going to do.

I'm going to show you that in JMP, of course.

And we are basically going to use that we did already,

the fit model platform, save the prediction formula.

And now we're going to use the FDE platform, f it splines,

do the functional DOE and save that again,

and then put everything together in one place.

Let's get back to JMP.

Here's a set of data.

I normalized all the data from zero to one.

I would want to show you an example of the pO2 ,

that is dissolved oxygen.

This is an important parameter in fermentation

because microorganism in this case, needs oxygen.

And so having sufficient oxygen all the time is important.

The graph builder, essentially, if you want to get people to use JMP,

I always like to stress out using the graph builder,

because that's just the easiest way to put data together and make nice graphs.

So using other software probably would have taken me to do this,

not 30 seconds.

And this is a very nice way to have all the overviews.

And where do you want to go?

We want to have another parameter here, the pO2 .

Also, all the information we have in all those batches

we want to collect them here

and see what is the typical curve for this set of fermentation parameter we chose.

The functional data platform is here under specialized modeling.

You need your outputs, which are functional.

I'm just going to take two here, pO2 and CO2.

On the X, you're going to put the duration,

and you have to tell JMP how to discriminate between the matches.

So you're going to put your batch ID here.

Now, it's really important

to supplement everything you want to need about

so the fermentation parameters,

or even other things like the harvest yield.

You want to know, you want to do analysis on,

you have to put supplementary here at this point.

If you forget something here, you have to redo a lot of stuff again.

And then you have the first report on the Functional Data Explorer.

JMP gives you here

that's everything together.

One graph that's not so nice,

but here you can see all the data by Batch ID.

Just want to point out, there are nice clean- up tools here.

Have a look at them, use them.

You don't have to clean up your data before,

even you can fill your data here very easily.

Now we have the data here, and now we have to build a functional model of the data.

And JMP does that with doing splines.

There are different kinds of splines you can use.

I'm first going to go to the easiest one, B-S pline.

That is pretty fast, and you can see what it does there.

First, you're going to see here in red, the splines ,

which were done over the other user data.

And the second, it does a functional PCA down here.

So PCA basically is about reducing the dimensionality.

And JMP produces a set of Eigen functions

and a set of FPCs, functional principle components.

And if you multiply each corresponding FPC with the Eigen function,

you will end up with a functional model.

So you can use that to individually model each batch of fermenter you have.

So the FPCs are the individual characteristic of a batch

and the Eigen functions are valid for all of them.

In multiplying them, you get each individual curve.

That's important to understand that those FPCs are the characteristic

for the functional data you have.

Then you have to design now.

Now, looking into this.

Basically you see that spline doesn't get the data very well.

I know that up and downs here,

those are important and that doesn't suit me very well.

So I'm going to remove this fit and I will do...

There are other models.

A P-Spline is a penalized B-S pline model and that will do the thing here.

Just want to point out there are other ways as well.

The direct function of PCA does it without fitting a basis function.

If you have a huge data set that works faster.

For me, I already know from my data set that the P-Spline is

that what I'm going to need.

You can already see here this takes some time.

So think about maybe some data reduction

that you can get faster through that process if it's important.

But be aware that if you reduce your data,

you will be losing information about your process

and maybe those will be the important ones.

So maybe just take your time and just wait for some minutes,

grab a coffee.

So

you can see here already that those lines do fit very good.

So this is something I was concerned with to use.

Of course, you can look at the diagnostic plots

and see if everything suits you well.

I will focus on that where I wanted to go originally.

Keep in mind, where did you want to go?

Wanted to add a pO2 curve here depending on the fermentation parameters.

JMP is going to give the option to do a functional DOE analysis.

That is exactly where we want to go.

Basically in the background, it does a generalized

progression with a two- degree factorial model.

And the estimation method,

it usually depends on the amount of parameters you have.

It's either best subset or I think it's a forward selection.

Let me make that smaller.

Don't need this.

But we do need this. That's where we wanted to go.

So exactly what we need.

We have the pO2 , the online response, dependency of our parameters.

Just as a side point,

if you don't see this modelization fitting,

you have to do that maybe more by using the fit model platform.

But if it's fine for you, you can go with that.

Again, you have to save your prediction formula

as you want to put that together.

Be aware that I'm now in different data tables than I used before.

So you can hit save prediction formula, click,

and then you 're going to end up with your prediction formula here in that table.

And my original model was in another data type.

So you have to just combine them whatever you like, however you like.

I just basically copied the formula and that was it.

Then, let me show you if you want to put them together.

I put them together in my original table. I have them both here now:

the prediction formula of the harvest yield

and the pO2 of the prediction formula.

And how to put that together?

Very easy.

Use the profiler, put the function,

and very important, tick the expand immediate formulas

because you have that Eigen function there,

so you need to expand the immediate formulas.

Click okay.

I, again, prepared a strip because the original evaluation,

that's not so colorful.

So I prefer

coloring at some point.

So I just put those two in here.

So this is where we wanted to go.

We have our harvest here and we have our pO2,

and we have all our parameters in here for the dependencies.

Now I promised you that we're going to see more.

That is the seed inoculum in the component A we looked before,

if you remember,

and we saw that for low biomass we have a positive impact of component A

in the main fermenter.

And having more biomass, that behavior turns

and now we are able to see what's happening.

You see here that is the typical pO2 response to that set of parameters.

And already going down here, you can see what happens.

I'm going to go here a little bit more down.

The pO2 goes down.

So in the fermenter, we don't have a limit.

Now of component A, we have a limit of oxygen.

That is just the reason why that is not good for the process.

Now with modeling your online data and your offline data,

you can go one step deeper to understand what exactly causes all your behavior.

That is a really great example

how you can approach the understanding,

especially the speed- up of your process development.

Going back to my presentation.

Basically, that is just what I showed you,

just with more parameters.

I added more online responses and also some DSP responses in here.

You can put that at whatever extent you like.

But be aware, here is no extrapolation control.

There are different models behind each response.

So JMP cannot put that together into extrapolation control.

So here you might limit your factors over the factor settings of the profiler

that you stay within in your design space.

You can use JMP Standard to look at the profiles.

You just have to do the analysis with JMP Pro, the FDE analysis.

Okay, one more thing.

Wouldn't it be great now to know at which time points an online sensor

has a critical influence on your yield?

So that you basically have the sensor

as an input parameter and your yield on this side.

So you can exactly say, "Okay, at this point, it's very critical

to have my pO2 up or down to have a good harvest yield."

Let's go. Next level.

Now, I want to use the online data to model the harvest yield.

I wanted to, before we start, put that graphically again

and on this side, we have all the pO2.

The graph I already showed you before. That's the pO2 of all the batches.

And you can see that in the end here, we have this one's going up;

the pO2 sensor here, the pO2 is very low;

and here it's somewhere in the lower region of the sensor.

And in each of these cases, the harvest yield has different behavior.

So somehow the individual curves at this time point

have influence on the harvest yield.

That is exactly what I want to model

so you can say which sensor profile really leads to a good yield.

This is where I want to go.

So I can have all the responses here

and my harvest yield of course depicted and modelized over, this case, the FPCs.

Remember that I told that the FPCs are

the individual characteristic of the response.

That is exactly what you do.

So I'm going to show that to you in JMP as well.

We can start working through that workflow

and basically, it's the same but you just use the data in a different way.

You use your online data, the summaries of the FPCs

to model the generalized regression.

Then you put everything again together,

save the to prediction formulas in one data table,

and there you go.

Okay, where did we leave?

So this is where we stopped the last time.

We had our pO2 over the fermentation parameters.

We already have

the model of pO2 over the FPCs.

Now what we need is the individual

function summaries.

So we need to modelize the FPCs we have for that response

with the harvest yield because that is what I want to know.

How do the F PCs contribute to the harvest yield?

Now you can set here what you want to export.

Basically, tick all of this.

You need to save formulas because you want to have the formula of the pO 2

dependent of the FPC,

and you want to have all the FPCs.

Now you have to extract that data because you want to modelize that,

and either save data here

or if you have modelize more than one response,

you can do that up here in save data or save the summaries.

Then you get everything neatly arranged in one data table.

Click that and then you will end up with a table like this.

So you basically have all your FPCs in here

and the formulas you will need here are there.

Now we have to model the FPCs to get dependency to the harvest yield.

You just go to the fit model platform,

then you will choose what you want to model

and that is your...

It's supplemented.

That is why I hinted that you really have to be aware

of what you supplement in the very first step of the FTE.

Now we're going to need the harvest yield here.

It has to be supplemented.

And then,

depending on how you want to approach it, I normally take the FPCs.

I use the response surface model.

That might be sufficient for you.

If you think you need another model,

you're free to use whatever modelization you seem necessary to.

In this case, I've put them individually.

Sometimes, even maybe just modeling the mean would be enough.

The mean here, maybe just modeling the mean is enough.

I know that in my case it isn't.

By the way, at this point,

I would like to say thank you to Imanuel Julio, JMP engineer,

who helped me out in going through that procedure as well.

Thanks, Imanuel, for giving me a heads up for FTE.

We need a generalized regression here.

Hit run.

And then, that dialog opens where you can choose different estimation methods.

Basically, you could just try them.

Then, the model is going to be done.

The nice thing is that up here you're going to have a model comparison.

So if you put in more than one, you will directly see the comparison of them.

So how they behave and which has the best information criterion.

Then, you have to choose whatever you want to go with.

Be aware that something like best subset can take some computational time.

So maybe go to the see how far that should go.

You chose, then, your model.

You do, then, save the prediction formula of that estimation method you chose.

You hit that and then JMP will put that here.

Then you have everything together that you need, this time in one place.

You're going to take the profiler,

put in your prediction formula for the harvest yield,

and of course, of your online responses as well,

add them here

depending on how many you have modeled.

Expand immediate formulas, as well, and click okay.

I, again, scripted that,

having it a little bit nicer.

And then this comes up.

You have your prediction formulas for pO2, your online responses

and your harvest yield; dependency of the FPCs down here.

I will just show you a short...

If you turn on the FPC 1, you can see what happens.

So if you have a lower harvest yield,

then it shows you that here, you are basically on a rather low level.

We already learned that

low levels of oxygen are not so good for fermentation process.

So that can be seen here as well.

And if you

have high yield you have no more harvest,

more tighter, more product than your fermenter,

you're going to see basically, this goes up.

That's good. Same behavior here.

And you can already see here that here goes up.

So it's positive for your process that the pO2 goes up in the end.

Now this is the point where, at least, I have to go to the subject matter expert

and to give him that data because

he is the one who understands that this level of analysis

may be not good for a manager.

But here you can really go into detail with all the subject matter expert

on the, let's say, fermentation level.

And having this option,

this really speeds up your process, understanding very much.

And you can see the impact with the plots JMP gives you very easily.

And sharing that profiler with a process engineer

gives you a real head start

in what is important during the course of the process,

and gives you really the opportunity to save a lot of time

in finding that out.

And if you do that by chance, find it out by chance,

that' s going to take so much longer, so don't do that.

I'm heading towards the end of my talk.

And I want to say that

the statistical analysis of fermentation data,

JMP really gives you the power to

explore and visualize those complex process very easy.

You can deepen your process understanding

and which process parameters are important, which interact

and which do I have to look at at which time points of the process.

And with the profilers and the different setups,

JMP really gives you the possibility to speak that one mutual language

to all levels from technician to manager so that really everybody

can make more from the online and offline fermentation data

and really speed up your biotech process development.

Thanks for listening.

I'm Benjamin Fürst from Clariant

and feel free to comment and ask questions over the b eta channels.