I'm Bill Worley,
and along with our special guest, Nick Strasser, from Bas Enzymes,
we're going to be talking to you about some important JMP tools
for fermentation processes,
more specifically, ethanol production in a batch system.
I'm going to turn it over to Nick for a quick introduction,
and then I'll give a little bit more introduction on myself then, as well.
Yes, hello. Thank you, Bill.
Everyone, my name is Nick Strasser.
Like Bill said, I am a technical support specialist for BASF Enzymes.
I've been supporting the fuel ethanol industry in North America
for about 10 years and have found JMP to be an extremely valuable tool
for analyzing production and fermentation processes
happening at ethanol producers facilities.
I'm excited to be here today with Bill and presenting this material.
Hopefully, if you are an ethanol producer, this material will be directly applicable.
If you are not, some of the things we'll talk about,
definitely you can apply it across multiple different industries.
I'm just excited to be here and look forward to it.
Thank you, Nick.
A little bit more background on me.
My name is Bill Worley.
I'm a principal systems engineer with JMP.
I've been with JMP almost 10 years.
Prior to that, I worked for Procter and Gamble,
and that's where I say I grew up on JMP.
I've got a background in chemistry that's got a master's degree in chemistry,
but I've been around.
I've worked for a few other companies over the years.
We'll just leave it at that and we'll go from there.
All right, Nick.
All right, to get us started,
I'd like to talk about some general data management fundamentals,
things that I have found to be valuable in the ethanol production industry.
First and foremost,
the things that you see us go through today,
they're some of the screenshots we take
and the demonstrations you'll see Bill go through,
we will be using the latest version of JMP, JMP 17.1.
You may be using a previous version of JMP,
and that's perfectly fine.
Some of what you see here might be standard functions in 17,
but you might have to poke around a little,
find them somewhere else in 16.
If they're available, they might just be in a different spot.
Then some of the version 17 functions are completely new
and might not be available in older versions.
I also want to quickly mention here the compatibility with Excel.
Microsoft Excel does have a JMP add- on toolbar.
If you know how to activate a toolbar in Excel,
you can go find it there and have that added to your Excel.
For ethanol producers,
Excel is oftentimes the intermediary
between their laboratory management systems
and their data analysis with JMP.
Just some regular best management practices here.
In the ethanol production industry, if you are a batch fermentation facility,
keep a master spreadsheet of all of your batches in one place.
It's important that it's all in one place,
so that you can leverage the different things that are going on,
either process- wise, or chemical- wise, or ingredient- wise.
Also understand which data are important to you.
Sometimes I talk to a customer and they're only backwards- looking.
They're collecting the data of things that happened
and not necessarily the inputs or the changes that got them there.
Make sure that you have a spot
to collect almost any information you can think about
that might become useful looking backwards later on.
I highly, highly, highly recommend, and most producers do this now,
add a trial column to your data.
As you collect your information,
as your batches mature and drop and your production goes along,
make sure you add a column
that you can denote whether you are trying something new,
whether it's an enzyme, or a chemical, or even a process.
Maybe it's even a new crew.
Maybe you're switching up your crew leads or something like that.
Make it a trial and put it in a column where you can go back and reference that.
In a similar vein, have an upset column.
Upsets are like trials that we didn't want to have happen.
But it's good to note that there's maybe an upset condition
if for no other reason than a week or two or month or two later.
This was an out- of- ordinary thing
and we can maybe disregard some of the numbers.
All right.
Like upsets, data entry errors are going to happen,
so make sure that you spend some time cleaning your data.
This is actually a very important step and it can be rather time- consuming.
You have multiple options.
My biggest point of emphasis would be, clean the data at the source if you can.
Go all the way back to where maybe a misentry was entered
and correct it there.
That way, if you ever have to go back,
you're not constantly cleaning the same errors over and over again.
I would say, clean up the source if you can.
Clean and JMP if you have to.
You have options in JMP that might help you identify things
that need to be cleaned and options for cleaning them up.
You can use the option to recode.
You can explore outliers or missing values.
Oftentimes, I see data entry errors when I create a control chart.
Maybe a number that's normally, say, between 15 and 20 shows up as 170.
Well, to me that's a misplaced decimal.
It's probably supposed to be a 17 and somebody misplaced a decimal
and we ended up with 170.
You can easily see that in a control chart.
Find it and correct it.
Then I would also say, always make sure you check your data type in JMP.
If you have a column that's supposed to have a numerical entry
and somebody enters a character, that's going to become a problem later on
when you want to do some statistical analysis.
All right, one of my favorite topics here.
I really want people to use these terms: common cause and special cause.
If you have variation in your data,
it's for one of two reasons or maybe both put together.
It's either a common cause reason or a special cause reason.
Common cause reasons are reasons that occur naturally
because your equipment, your people,
and your processes have certain limitations.
This is what we call your noise in your data.
This is just normal, everyday little variations in your data
because of the imperfect nature of everything that you're using.
There's also a special cause variation.
Special cause variation happens for, like, when things malfunction.
Maybe an environmental condition changes,
or maybe you have a process or an input change.
A process or input change can be special cause
that maybe you weren't expecting to happen.
But it could also be a trial.
Like we said before, keep track of that.
Special cause variation,
a process input change.
That's going to be very valuable information down the road.
Okay, next we're going to talk a little bit about the data types
that JMP will recognize.
Continuous, nominal, and ordinal.
Continuous data types are your number data types.
Numbers anywhere from negative infinity to infinity
contain a decimal place in ethanol production.
A great example of this would be an HPLC result,
a temperature, a PH, a fermentation time.
Nominal data is a type of data that's typically a character.
It's the name of something.
Sometimes we use numbers to name things, and in ethanol plants we do that.
But it's like a recipe input.
For example, a yeast or GA, a fermenter number.
If you have a trial condition or an upset condition,
that's probably also going to be a character expression,
and it will become a character type of data or nominal data.
Then we also have ordinal data.
Ordinal data shows a certain order or progression.
A great example of this in batch fermentation
would be a batch number.
We could have named our batches any names we want,
but it really makes sense to our human brains to use a number,
make it sequential so we know which batch happens first,
which batch happens later,
so on and so forth.
Why does all this matter?
Why does the type of data matter?
Well, depending on what you want to accomplish with JMP,
the program is going to look for different types of data to match up.
For example, if you want to create a bivariate analysis,
you will have to have continuous data plotted against other continuous data.
If you want to do a one- way ANOVA test, you will have to have continuous data
plotted against nominal or ordinal types of data.
Let's dig in a little bit deeper.
Bill and I are going to assume that there's a certain background
level of knowledge here
on importing data from JMP, but I just wanted to point
to some of these things just to know that you have options.
Most often what I'm doing is I'm opening JMP,
and then I go to Open, and I'm looking for a file.
Oftentimes, that's an Excel file.
Make sure you select all file types in the drop down menu,
and then all of your file types will show up.
You can find your Excel file, open it, and go through the Import wizard.
If you've already done that,
or if you like to maybe just update a table that already exists,
you might have Excel open on one half of your screen
and JMP open on another.
You might just be copying and pasting maybe new material into an old file,
just updating it.
In addition, if you're like me
and you're constantly going back to the same Excel files
where I'm keeping all of my data organized and in one place,
JMP will have a source script.
As long as I haven't moved or renamed that source file,
that Excel source file, I can easily open my JMP file,
click Source Script, and it will bring in all the information
that's in that file, whether it be new or old.
You have all sorts of other methods.
There's the Excel toolbar, the JMP feature in the Excel toolbar,
you can just open a new table and start dropping data in,
and new for JMP 17, there's something called Workflow Builder.
We're not going to get into all these.
I just want you to know you have options.
To quickly point out here,
we're not going to spend a lot of time on this.
What you'll see Bill and I using here
for cursors are the arrow, the selection, the grabber, and the lasso.
Those are the tools that we're going to use most often.
Other tools are available,
and if you have questions about these, there's a lot of good resources out there
to do some training with JMP directly or on their website.
You can check those out.
But these are the cursor controls that we'll be using.
Getting into the nitty- gritty here, and I'm going to start analyzing my data.
But before I do that, one really important question.
Because JMP is going to assume that your data are normally distributed,
it might be good practice, or it is good practice,
for you to know ahead of time whether a certain type of data should be,
is expected to be, is, or isn't normally distributed.
On the left side of the screen here,
you'll see your typical normal bell curve distribution.
These data are basically normally distributed.
JMP is going to assume all of your data is like this.
On the right hand side of my screen,
I see some data that's not normally distributed,
it's skewed a little bit.
Just from an ethanol production fermentation process,
my total sugar at drop,
I would never expect to be normally distributed.
I would expect it when there's upset conditions,
that upset only draws my sugars one direction and never the other.
This is perfectly normal.
But if I'm going to get into some more advanced data analysis,
that's going to be something good for me to keep in mind.
Once you've done it and you know, you probably don't have to do it again.
All right, let's get into what I would consider
the bread and butter.
The main feature in JMP that I use when I'm looking at my process
and want to know if it's in control, is a control chart.
Process control charts, as we talked about before,
can show you control, can show possible outliers,
and they can be used to very quickly get a visual
and mathematical representation of means
for different phases of something like perhaps a trial
or maybe a month- to- month comparison on how my operation is going.
The basic control chart output is going to look like this.
What you see here, I would say, is probably...
Most of this is that type of data
that would be considered within my normal noise.
I don't see anything here that really screams at me like,
"This is an outlier, this is a data entry error,"
or anything like that.
Your basic control chart output will look like this.
If you want to dress it up a little bit on the left side there,
you can select for zones and zone shades.
This is going to shade your zones
that are 1 , 2, and 3 standard deviations from your average.
Again, this is developed by or used a lot in the automotive industry
where you're doing some Six Sigma improvement projects.
This type of visual is really useful
for any kind of Six Sigma projects you have going on.
In addition to the zones and shadings, we can also use some rules
that have been developed by the industries to say,
"Is my process in control or out of control?"
Here we have an example of a violation of a warning.
I believe, Bill, when you and I talked about this,
it had something to do with the two data points
being a certain width apart, a certain too many standard deviations.
I can't recall if it was 2 or 3.
But here we have one data point that was quite a bit higher
and quite a bit different than the previous data point,
and it threw up a warning.
Aside from control charts comparing means, it's another very useful tool.
In the ethanol production industry, it is used all of the time.
When you have a trial going on,
something that you wanted to know, "Is this process any different?
Is it statistically significant, and to what degree?"
We're going to take a look at comparing means,
and it's going to be used to look at a continuous set of data
against ordinal or nominal types of data.
Again, we talked about that process upset condition
or a trial condition being a nominal type of data.
Comparing means analysis.
First and foremost, take a look.
Is this right for me?
It is appropriate to use when you have large data sets.
My rule of thumb is, 30 data points or more.
If you have few outliers, outliers can really pull your average.
If it's time constricted or time restricted to a local
or small amount of time.
In that time, all other conditions
have been controlled to the best of your ability,
it would not be appropriate with small data sets
where outliers can skew the mean calculation pretty drastically.
I wouldn't use this over a broad range of time,
say, comparing now to three years ago,
where conditions, an unknown number of conditions,
would probably be different between now and three years ago.
If you're continuously improving,
you're going to want to keep this kind of analysis
to a certain time restriction as well.
How do we get into this?
We want to do a fit, X by Y.
In the example you're going to see here, we're going to use phase.
This would be a trial phase as our X- factor and ethanol to liq solids,
or how much ethanol are we getting
for the amount of corn we're putting in as the Y- response?
The example that is kicked out to us shows that in this particular example,
looking at the diamonds and circles
and then looking down at the connecting letters report.
This is an example.
First of all, by default,
JMP is going to assign a 95% confidence interval.
Here we see that the mean differences shown by the connecting letters report,
they have two different letters.
For each phase of this trial, there are no overlapping letters.
We've got a baseline and then we've got a trial.
What JMP is telling us here is that
these averages are statistically different with 95% confidence.
Looking at a little bit more complicated situation,
you might find something like this.
I love this example.
It already violates what I said about large data sets.
We've got like 4 or 5 data points in the baseline
and then just a few points for a yeast trial,
and then a couple more baseline points.
What JMP is telling us here,
if we looked at the connecting letters report,
that the baseline and the yeast trial are significantly different,
but it also says that baseline two
is connected to both the yeast trial and the baseline
with no statistically significant difference.
Maybe.
I'm going to eyeball that. Do this pass the eyeball check?
Boy, baseline and baseline two look really similar.
I'm going to dig a little bit deeper.
If I look down at my order differences report,
I can see that my yeast and baseline did pass a 95% confidence level
for being statistically different.
But if I look at yeast and baseline two, it just barely, barely missed the cut off.
If I were to go back and assign a 90% confidence,
then I would get a result that I would expect.
That my baseline and baseline two are basically the same
and that the yeast trial was different
with a 90% confidence.
Okay. Thanks, Nick.
I'm going to dive in a little deeper.
Actually, we'll show you a few things, few slides, and then we'll get into JMP.
But we're going to talk about bivariate and multivariate analysis first.
This is for use with continuous data against other sets of continuous data.
This is a bivariate example where we're looking at ethanol liq solids,
versus another term called liq solids.
We can see that by this line here that there's really no correlation,
especially if you look at the R squared down here.
This has a very small R square,
which indicates that there's no correlation between these two variables.
But it is pointing at that there is
a general decline in the fermentation yield
as solids are increased,
or that the widest variation was noticed around this 33% line.
We can further play around with that and add some colors
and see where we might see some differences there as well.
The next step in this would be to say,
"Okay, we're looking at a multivariate example."
We're looking at a bunch of different process parameters and saying,
"Okay, are they correlated in any way?"
We've got these numbers that tell us
that we can look at the lines and see which way they slope.
We can look over here where we've got the correlation coefficients
matched up with the particular squares.
Those numbers are telling us
whether something might be correlated or not.
The redder it is,
the higher the correlation in a positive direction,
and the bluer it is,
the more it's correlated in a negative direction.
One important note here is that correlation does not imply causation.
That's a very important thing to remember.
Also, we're going to look at Graph Builder.
This is maybe more of an artistic than analytical type of analysis,
but it allows you to look at all kinds of different properties
and you can play around with it.
You can remove and hide items with column switcher and a data filter.
You can change access label orientation.
You can modify spacing.
There's all kinds of different things you can do.
This is an example of how you can do that.
This is a kinetics graph for showing by phase
what the ethanol and sugar are doing over time.
We've got those that we can play with
and I'll show you how to make one of those in a minute or so.
Then here's another instance where we're looking at
by batch and by day.
We've got ethanol and sugar.
We've got our sample age up on the upper grouping axis here,
and then we have batch number down here on the lower x- axis.
One other tool that I want to talk about
before I get into JMP itself is the advanced control charting,
which we call model driven multivariate control charts.
This is especially useful when you're looking for processes
that appear to be in control,
but they still have batches that are failing.
You might have all of your individual control charts
are showing that everything is in control,
but you're still getting batches that fail.
This model driven multivariate control chart
will help you better understand that.
You can see this is just some of the output
that you'll get there.
With that, I am actually going to step out of PowerPoint and do this.
All right.
I should still be sharing.
I've got my JMP window up.
L et's go ahead and start off with making these graphs.
I've got the data in a stack format for right now.
This is that graph that I showed you before .
This is one of them and I'm going to show you how to make it.
It's real quick.
Let's just go to Gaph Bilder
and I'm going to pull this one out of the way
so it doesn't get in our vision here.
We're going to pull in phase and you can see things light up,
these drop zones light up as I pulled the data in or the column in,
but I want to group that .
Then I want to go ethanol here and sugar here .
We've got the two and that's actually not the way I want that to show up,
so we're going to redo that.
I'm going to do an Undo here and start over.
Let's do sugar and ethanol first.
We'll pull those in as one.
There we go.
Then we're going to pull in sample age and now we'll pull in that phase
and now we've got that and we're going to add a line to this.
We're going to pull in the smoother, and that adds those, and we'll hit Done.
That's that graph.
That's basically the kinetics as ethanol and sugar go.
Ethanol goes up and sugar goes down over time.
We can see that there might be some slight differences
in the two baseline and trial, but we'd have to dig deeper
to really get more into that.
That's that one graph.
I think I'll just move on
to the model driven multivariate control chart.
That's a good idea on how to build that, use that Graph Builder.
But let's look at the model driven multivariate control chart.
Again, I have these things pulled up.
Let's go to Analyze and let's go to Multivariate Method
or Q uality and Process Model D riven M ultivariate Control Charts.
Actually, this isn't the right table, so let me pull up my other table.
There we go.
Click this off.
We'll go to Analyze,
Quality and Process, Model D riven M ultivariate Control Charts.
We're going to just pull in a group of different processes.
I'm just going to pull all these in.
We're going to try and see where we see differences.
We could put a time ID in here,
but we're just going to go ahead and say O kay.
Now we're seeing that we're getting some batches.
We're looking at all those components together,
all those process steps together.
Now we're showing that we've got some things that are out of control,
so I'm going to highlight these out of control batches.
This is all done with principal components.
It's saying it takes eight principal components
to explain at least 85% of the variation that we're seeing.
Let's highlight those, show their contribution plots.
Now we can see that we're getting an idea of what are the issues here.
Let's sort the bars so that we have ones where we're getting a lot more variation
seen in some of the samples versus others.
That these others and then we've got one that says it's out of control.
We can look at that individually and say, "Oh, wow, yeah, we've got a point.
At least one point that's out of control
for that individual batch there, or that sample."
Sample 23.
One other thing you can do here, too, is, let's add a monitored process,
let's go to the score plot.
Now we can see that down here,
we've got the batches that we're looking at
and then we've got the individual points.
Again, we can look at their individual plots,
but let's make that group A.
Then we're going to go back up here,
take those off and highlight this other grouping here.
Make that group B.
I don't know if you can see in the background there
that's highlighting in the data table.
That's a nice way of selecting the data for you to show that.
Then we've got the bars here.
Now we've got an idea that this prop send cell count
is one of the bigger drivers as to why we're seeing differences
between batches that are out of control versus in control.
That's just a nice tool to try and work with there.
I'm going to get rid of this for now
and I'm going to go back to the PowerPoint.
Get that right.
Let's step up again.
Part of another tool within JMP,
and this is getting deeper into the power of JMP,
is something called Functional Data Explorer in JMP.
Let me get rid of this.
Then this allows you to analyze data that is captured over time.
You can compare curves to a standard or golden curve.
With chemometric data, you can preprocess and analyze that data,
and then you can use this Functional Data Explorer
to do a qualitative review of the curve data.
One of the things that might be used in the ethanol industry,
one of the tools is something called near- infrared spectroscopy,
and that's used to measure inbound corn composition.
This Functional Data Explorer will allow you to do that.
Some of the other things you might want to look at
are like spectral data, HPLC, or Chromatographic data, mass spec data.
But the JMP Pro features in Functional Data Explorer
allow you to do the preprocessing
and analyzing of spectral data or functional data.
It's pretty straightforward.
Then, just so everybody knows what I'm talking about here,
functional data is any data
that unfolds or develops over some sort of continuum.
These continuums are listed down here below,
and time is one of those continuums.
This is an example of visualizing
the functional data that we're talking about.
This is some HPLC data, this is NMR data, and then this is near- IR spectral data.
How this is used for, or how this is used in the ethanol industry
is to do something like this.
We looked at that sugar and ethanol data before,
but we really couldn't compare batch to batch, right?
Not that well.
We couldn't get any quantitative view of that.
Well, with Functional Data Explorer, this allows you to look at these by batch,
and you can fit a model to them and see where things fall out.
We'll show you that in just a second.
Then this is just an output from some near- IR data
where we've looked at 60 batches of gasoline looking for the octane rating
and then the output that we get out of that.
Now I'm going to step out here and go back to our data sets here
and get rid of this one and pull back up our stack data.
Let's go to Analyze.
This is Specialized Modeling.
Functional Data Explorer.
We're going to do this in the stack format,
and we're going to look at ethanol and sugar.
That's our Y output.
We're going to use a time this time, but we're going to use continuous time.
We're not going to use the categorical time.
We need the ID function.
Then we can put Phase and let's say,
Amylase Type in there as some S upplemental variables.
Say Okay.
Now we're going to fit this data.
I'm going to do a JMP tip here.
I'm going to hold down my Ctrl key.
I'm going to go to the red hotspot and go to Models and say Wavelengths.
This will fit both models at the same time.
One of the things you're going to find out
is that wavelengths need a grid and a needling space grid.
We're going to say Okay there, and say Okay for the next one.
Now we have models fit for both the ethanol and the sugar.
This is ethanol, this is the sugar data.
If we come down here a little bit,
we can see that each one of the batches that we have
are using these wavelength type functions.
We can go a little deeper, we can look at the fit here
where we have multiple shape functions and we can say,
"Okay, why are these different?"
If we hover over them, we can look at that and we can plot that out.
But we can also come down here and play around with a function here
where we can look at the functional principal components
and say, "Okay, how did these things change over time?"
This gives us a pretty good indication
that as we increase these functional principal components,
but we don't know what those parameters are,
things are going to change with the output or the consumption of the sugar.
We get a qualitative view of what's going on there.
Then last but not least,
with all these things, you're going to want to share this data.
I'm going to go back to PowerPoint one more time here.
This is an image of a JMP Live output
where we've looked at a couple of the graphs
where we've got ethanol and sugar comparing over time by batch number
and then by sample age here,
comparing both the baseline and trials that way.
That's a nice way to look at that.
Let me show you this, in a live JMP Live image,
take me a second to get that.
Pull that in.
This is a live JMP Live image of those graphs.
This allows anyone that has JMP Live to investigate this data.
This is especially important when you're trying to share data
with your management or your team.
If you have upsets in your analyzes or in the process,
this is a great way to share that.
You'll get warnings about things that are out of control,
if you're looking at control charts.
I've got other dashboards or other images that I can look at.
We can go over here and look at, let's see, chemical production.
Then I can look at the data this way where it's a model that we can say,
"Okay, how are things reacting over time?"
Then this is something called a prediction profiler where we can say,
"Okay, what happens to the model when I change reactors?"
Or in this case, trials or whatever it happens to be
in the ethanol world?
With that, I believe this is all we've got.
We'll say thank you, and we'll go from there.
Nick, any last words?
Yeah, I will just say again, Bill, thank you to you and the JMP team.
For those users out there, if you're an ethanol producer
or not an Ethanol producer,
you probably have your 2 or 3 or 4 analysis,
what I would call the bread and butter of what it is you're doing.
You do it over, and over, and over again.
Don't be afraid to reach out. Learn something new.
Learn a new feature about those things that you use often.
But also, don't be afraid to grow a little bit
on the depth of your understanding of what JMP can do.
We are so data- rich and information poor.
JMP can help you take that data, turn it into real actionable information,
and it tells a great story,
and it can really help an organization out.
Reach out, find Bill, find somebody at JMP, find myself.
Always happy to help.
Thanks again.
Thanks, Nick.
Just as a word,
Nick and I had done a presentation earlier this year
based on this same data set.
If you're interested, reach out to us.
We can get those links for you so you can watch those presentations.
It was a couple of days.
We did it over a couple of days.
A little more in depth
than what you're seeing today, but thank you.