Hello, everyone.
My name is Bill Worley,
and I am a systems engineer for JMP for the US Chem team.
Today I'm going to be talking to you about
An Analytic Workflow for Data and Chemometric Analysis with JMP .
I have got a few things I want to highlight.
We're going to be talking about getting the data in...
Actually just following the analytic workflow that we share
about getting data in, cleaning and blending,
visualization, exploratory data analysis, building models.
And then ultimately,
what are you going to do with that data and how are you going to share it?
Couple of things that are important
are the new JMP Workflow Builder,
I'm going to highlight that.
This is just a snapshot of what I'll be showing you in a little bit.
And the chemometric part of this is analyzing spectral data,
using functional Data Explorer for pre- processing.
And these are now built into JMP.
If you can see over here, we've got a tab up here in FTE Now
where you can choose from different types of pre- processing,
standard, normal, variate , multiplicative scattering correction,
Savitzy-Golay filtering, and baseline correction.
And I believe there will be maybe one or two other things added to that.
Just so you know, the data that I'm using is pulled from this paper.
You see it right down here, back from 2002.
Just to let you know, that's where the date is coming from.
All right, I'm going to put that aside for now,
and I'm going to go ahead and get things going here.
So I've got my home window.
I'm going to go and start a File, New, Project.
And the workflow that I'm going to be working with is this one right here.
I'm going to right- click on that.
I'm going to open it.
What I've done is I've taken the data set that I want to work with
and I've built all these steps in the Workflow Builder,
and I'm going to play that for you now.
So it's going to populate our project here.
So I'm going to go ahead and hit play.
As you can see, that's building the workflow, doing some analysis.
We're doing some model screening right now.
And then everything is complete.
And now we have all these tabs across the top
where we've completed that analysis with the workflow.
I've actually included one other table in there.
When we get there, I'll talk more about that.
But we built that table, we pulled the table in
just to show you they're from the source data.
We've actually pulled this data in from an Excel file.
Getting the data into JMP from Excel is fairly easy.
And we built some exploratory data analysis.
So the first step we made is doing a distribution,
and we can interact with it just like anything else.
Everything's interactive from the Workflow Builder.
Did some graphing where I put the column switcher in,
added the local data filter.
Just so you know,
as we build the workflow, all these things are built-in
and the recording helps you keep track of what's going on.
And you can see that we've got full functionality going on there.
Did some more exploratory data analysis looking at Fit Live IX ,
doing an [inaudible 00:03:46]
like in this case,
Fit Live IX for mill time versus dissolution,
and blend time versus dissolution.
Just to get back to it,
this is tablet data that's pretty popular within the SCE community within JMP.
And I'm just building on how we would analyze that and build out this workflow.
Next step would be multi- variate analysis to see,
for this dissolution, which is our key performance indicator,
what might be any of the factors
or what factors might be highly correlated with dissolution.
Not seeing anything that's jumping out too much.
We do have some partial correlation,
but no one factor is jumping out as
the answer for the data that we're looking at.
We can get a better understanding of what factors might be important
if we look at a predictor screening.
We can see here that we have things called screen size, mill time, spray rate.
Those look to be important factors that we could use to build a better model.
Next, we can actually set up a stepwise regression.
I'm going to go here and actually run this model in a second.
And then we've got that output so that data is there
and we could use that as needed.
So we built that model,
and we're out here looking at another type of analysis,
which would be decision tree, a partition analysis.
We can do neural net, build that model
and we can do a [inaudible 00:05:35] squares.
So we've got all those things together.
But once all said and done,
you could actually use something called Model Screening in JMP Pro
to build these models out and find out which is the best overall model.
And based on that, we can see that
a neural- boosted model is probably the best overall model for us to work with.
We can then take all this information,
share it with all our colleagues, co-workers,
anybody who might be interested.
And we can do this in several different ways.
One of the best ways would be to use JMP L ive
and put everything out there for folks to look at and share.
That's the first part of the analytic workflow.
And again, if we look back here, that's all set up in this portfolio here.
And as I said before, I had opened another table.
And this is for the chemometric part of the analysis.
This is near- infrared data for finding the active ingredient in tablets.
We built the tablets, we made the tablets.
Now we have to take the finished product and find out what's it all about.
Do we have the right active ingredient?
And can we tell based on this technique called near- infrared analysis?
We're going to step through a few different things,
but I'm going to turn that Workflow Builder back on
to record these steps.
So let's turn that on, let's go back to our data set,
and let's do some analysis now.
So I want to clear out these raw states first.
And now I want to go to Analyze Clustering, Hierarchical Clustering.
So I got that, and I've got all my data groups.
So there's 404 wavelengths that are grouped.
I'm going to pull those in, say, okay.
Let's build this out a little bit,
we're going to look at three different clusters,
and let's color those clusters.
All right, so you can kind of see that,
let me pull this down a little bit,
you can see we've got three clusters,
fairly big green cluster and two smaller blue and red.
So we've got that.
And now let's go back to our data set, and let's do a Graph Builder.
So let's go to Graph Builder.
Let's pull our wavelengths in,
here to X,
do a parallel plot,
clean that up a little bit,
and right- click there to combine scales and parallel merged.
I'm doing these steps pretty fast.
This is something you'll want to go back and watch again, if it's of interest.
But the thing I want to show you here is that the data is pretty scattered,
and there's a lot of baseline separation,
maybe some additive and multiplicative scattering that we need to clean up.
So let's go back to our data table
and go to another analysis step.
Let's go to a multivariate method.
Let's go to principal components.
Again, we'll pull all our wavelengths in.
Say okay.
And the thing I want you to note here is that
we have some 404 wavelengths
are all grouped right around this little area right here.
That is highly correlated data.
We could build a model off of that,
but it may not be the best.
Because we're going to be including wavelengths that are not of importance
because of the high correlation.
So we'll clean that up in a little bit.
I'll show you how to clean that up in a little bit.
And as a matter of fact, let's go to that step right now.
Let's go back over here and go to Analyze,
Specialized Modeling.
And we're going to go to Functional D ata Explorer.
And let's get this set up first, and I'll tell you more about it.
Let's put our wavelength there,
we have our active ingredient, which is a supplemental variable.
And then our ID function. There we go.
Say okay.
Raws as functions. Let me do that again.
Active ingredient and our wavelengths.
This kind of looks like what we saw before in Graph Builder,
and we want to clean this up.
So we've got these new tabs in J MP Pro 17 for Functional Data Explorer.
Spectral is one of the tabs.
And then, as we talked before, we have standard normal variate
multiplicative scattering, Savitzy-Golay and baseline correction.
I'm going to select the standard normal variate first
to cleaning that up.
And then you can see the baseline is a little wobbly here.
Let's clean that up, take that next step,
and then go ahead and say okay.
And now we've got that set up.
It looks a lot better, a lot cleaner.
And now the next step would be to model this.
We're going to use another new function in Functional Data Explorer
called wavelets.
It's wavelet modeling here.
And you can see down here that our model has been built,
and we're explaining a lot of the variation with about
five functional principle components.
But if you look at these, we're explaining the shape with our shape functions.
That's where our eigenvalues come into.
This is really just a nice way to look at the data
and make sure that our spectra is being well modeled.
As I said, we've got
five shape functions that are explaining things really well.
So let's clean this up a little bit and pull this back,
make our model a little simpler.
We won't go all the way back to five, but we'll leave it at 10 for now.
You can look at the score plots.
There's still some scattering here in the data,
but we'll clean that up in a second.
And then one of the other steps you want to take care is to do a wavelet model.
This is new in JMP 17, this wavelet analysis.
And what this is really all about is looking for
can we find the important wavelengths
that are going to give us a telltale sign of what's going on with the data.
And what I'm looking for, especially with the spectra,
is something where I can see a shift in the baseline.
And I can see that we've got a good shift in the baseline
and a grouping of spectral wavelengths around 88 20 to maybe 88 50.
So that's the important part here.
So we get an idea of what the important wavelengths are.
All the data that I had done up here before,
let me pull this back,
the pre- processing that I had done before,
I want to save that data out and do some analysis on it.
So I'm going to go here to the Functional Data Explorer,
select Save Data.
This is going to be a new data set,
and now we've got to do some work with this data and clean it up
and make sure we're ready to go.
I want to do a transpose.
Transpose Y. X is our label.
And these two drop into [inaudible 00:13:38] .
See if we got this right, let's hit okay.
And yes.
So we've cleaned that table up, we've taken those 300 spectra,
and then we've transposed them into another data table.
This is all the pre- processed data,
so we're going to do a few more things to this
to show where that pre- processing has really cleaned up the data
and where we can build some models with it.
So let me get rid of this column.
All right, and ready to go.
So we're going to do the same thing that we did before.
Let's go to Analyze, Hierarchical Clustering.
Actually, let me take a step back here real quick.
Here, I want to group these columns to make things a lot easier.
So group those columns and let's go back to where we were.
Let's go to Analyze, Hireachrial Clustering,
pull our columns in, say, okay.
We'll do the same thing we did before, we'll look at three different clusters,
and color those clusters.
This will be a quick comparison,
but if you look at what we did before to what we've got now,
we've got a lot tighter clusters.
And these actually are pretty well dispersed.
They're pretty even.
Those clusters are fairly even right now.
Let's go back to our data table.
Let's go back to our Graph Builder
and pull our wavelengths in again, we did before.
We're going to make a parallel plot out of this again.
It doesn't look great right now, but let's right- click here,
and go to combined scales, parallel merged.
And now you see that the data is really cleaned up
where we did that pre- processing has taken things to...
They look a lot better.
Let's see if we can compare that here.
What we had before, and what we have now.
So we've got that data much cleaner.
Any analysis that we do from here should be much better.
So let's go back to Analyze,
Multivariate M ethods, Principle Components.
Pull our wavelengths in.
Say okay.
And now we've taken that data
and we've broken that correlation structure that we had before.
This is currently after pre- processing,
this is what we had before.
Just to show you the difference.
So we've really clean things up.
Now we'd want to take maybe one more step in the analysis.
Analyze, let's go to Quality and Process
Model- driven Multivariate Control charts.
We're just looking for maybe some unusual behavior in here.
In this case, it's based on the principal components.
Say okay.
And this is looking at two principal components.
You can see that there's some potential outliers here.
But this is spectral data, we're not going to get rid of anything.
We just want to kind of view that.
And one other thing we want to look at is to go to Monitor the Process,
we're going to look at score plots.
And now we can look at our subgrouping down here
and we can actually compare these groups.
I'm going to pull up a tool here, a lasso tool.
I'm going to do my best to group these, a couple of these.
That's going to be my group A.
A nd I'm gonna do another lasso here.
We'll just leave that as is and go there.
We grabbed one of the wrong ones, but I think we'll be okay.
And now we can compare where we're seeing differences in the spectra
for these two subgroups.
And as I was saying before if we looked right in here,
those wavelengths are in somewhere in that 8800 range,
and then we can see that there's a real difference there.
O ne more thing we want to do, and this is the last step.
What I'd shown before, I'd done model screening,
and I want to do model screening again.
I go to Analyze, Predictive Modeling, Model Screening.
We're going to set this up, we're going to do our active.
This is our response that we're trying to model,
and we're going to use our wavelengths to build this model out.
I'm going to clean this up a little bit.
We don't need all these different modeling types.
We're going to pull this out.
But the nice thing about this is I can build all these models at once
and really find out what's the best modeling approach to take with this data.
I don't need that. I don't need that.
Let's add those.
One thing I'm not going to do for time sake
is I'm not going to add any cross- validation.
If we take that into account,
it'll actually run a lot longer.
But as you'll see, this is going to be fairly quick.
I'm going to go ahead and say okay.
And as this is going, just talk a little bit more about what we're seeing.
We're building out these models.
You can see it's stepping through.
And let's see in about another few seconds here it should be done.
There we go.
Taking a little longer.
There we go.
Based on what I said,
I didn't use any validation, but neural requires it.
So that's the validations that you see there.
But overall, we get a really good idea
XG boost is going to be the best model to fit this data.
We could use any of these others,
because they're all really good models as well.
But you get to choose, select one, let's say partially squares.
Because that's the go- to analysis method for spectral data anyway.
But we've got that.
We can say run, selected,
and fill out that model and find out, can we make it even better?
Hopefully what I've done and showed you is that we can build these...
Let me pull this back to our beginning here.
Just a few steps.
This is our workflow, and I've added those steps.
So we've got that table that we're working with.
We cleared the raw states . We transpose the data.
Anything that we closed out is now part of that workflow.
So we continue to build that workflow.
One thing I'll say is that
I would typically not build a workflow inside a project,
but just showing you that it can be done.
Let me go back to my slide here, and share.
Let's flip this.
One more step here.
I just want to say thank you to a few people.
Jeremy Ash, who's no longer at JMP, but he's a great inspiration for this.
Mark Bailey has been a great help.
Ryan Parker and Clay Barker have done really fantastic things
with genreg and Functional Data Explorer.
Chris Gotwalt has been really helpful in getting things set up.
And then Mia Stevens has been a real supportive person
in helping me build the spectral analysis out within JMP.
So I really appreciate everything and that I'll say thank you.
That's it.