Spectris: Spooky Spectroscopy - (2023-US-30MP-1472)

There's a strange place that sits

between the analytical tools that you would use to do

analysis with known physical models and with simple curves

and the analyzes that you would do, say, with functional data explorer,

where you have families of curves that have complex shapes,

and you're less interested in the actual physical nature of the shapes themselves

as you are in just relating them back to observed phenomenon.

This strange,

no man's land of analysis in JMP is where a lot of first principles techniques sit.

Things like X- ray diffraction, things like HPLC,

where we have known physical methods and known equations that help us describe

very fundamental phenomena of a molecule or crystal or a system.

All we have to do is plug peak positions in

or area under the curve information in

and we can get some very sophisticated analyzes out of fairly simple data points

because of these first principles methods.

At first blush, it would seem like JMP should be able to handle that.

It seems like it's got all the tools but,

when we dig into doing those kinds of analyzes we suddenly realize that

t he problem is a bit more complex than what we would expect.

Today what I want to do is focus on some techniques and strategies

to deal with some of those simpler cases

and then introduce some tools that we can use

to streamline those larger more complex problems.

Let's go ahead and let's move into JMP and have a look at that.

To start off, let's go ahead and have a look at a very simple case,

a single peak on a background.

How would we go ahead and pull the information out of this peak?

How would we get its center position?

How would we get its full width at half max or its standard deviation

or even the area under the curve?

How would we get that information?

Well, most of us that have done this for a while, we would say,

oh, you know what, I'm going to go into Fit curve

and I'm going to say, here's my y, and I'm going to say, here's my x.

Then I'm going to go ahead and I'm going to go fit a peak model of some kind.

Let's just say the Gaussian peak and you look at that and you go hey,

98% R² , that's awesome, that's great.

Let's see if we can do a little better.

Just to skip ahead just a little bit here

we could look at the Lorentzian peak shape and the Gaussian peak shape

and we can see that those both give fairly good R²,

they give fairly good peak fits.

We could even come into the values

underneath each and we can pull up the area under the curve for them.

But how good are those fits actually?

Let's take a look at them a different way.

What I want to do is take and we'll go ahead and pull up graph builder on these

and we'll look at how the models relate to the residuals for those peaks.

We can see a very different story

than what we saw in Fit curve with these two peak shapes.

We can see that there's a systematic error built into these peak shapes.

The reason we can see that with the Gaussian,

we can see that it's kind of underestimating at the center.

It's doing okay on the shoulders, but out in the tails it's really missing things.

We can see almost the inverse for the Lorentzian.

Why is that?

Well, the truth is that in spectroscopy particularly,

there are a lot of different peak shapes.

It's not just Gaussian, it's not just Lorentzian.

There's actually a whole family of peak shapes that are out there

to handle all the different physical phenomena

that result in the peaks that we see in spectroscopy.

How do we deal with those in JMP?

Well, it's actually really quite easy.

Let's start with looking at

what the results of using the correct peak shape is.

Here I've got the Gaussian again, the residuals for the Gaussian peak fit

and the blue line in this case is no longer the Lorentzian.

It's called a Pseudo Voigt,

which is an approximation of a peak shape called a voigt function.

Notice that the residuals for the Voigt function are dead flat.

We are actually doing much better.

Before, if we were to try to do

quantification with the Lorentzian or the Gaussian,

we would run into a situation where we might over or underestimate

the quantity of a material in a sample.

With the Voigt in this case, because this is a voigt peak shape,

we're actually going to get the accurate quantification of that.

That's the important thing.

Now how did I do this?

Well, there's a few ways to do it.

The easiest is to come into the model

come into the data table and create a model.

The model is really easy to make.

This is the voigt peak shape.

Looks a little scary, but that's just the nature of the math.

Here I've got a parameter for the baseline and this whole mess.

Here is the voigt peak shape.

We can come into the parameters settings

and define starting points for each of our values.

Then going into

we're not going to use Fit curve, we're going to come down to nonlinear.

We can use that as a starting point for an analysis.

I'm going to expand underneath 40 minutes.

That's actually a good habit to get into in this case.

I did that wrong.

Let's go back and redo that.

Should be the counts.

There we go.

That looks better.

Now if I go ahead and click Go,

it does my peak fitting for me and everything.

That's great.

Can't get the area under the curve here very easily.

But I can get just about every other parameter that I need.

The nice thing about a lot of these peak shapes

is they also have well defined integrals.

Once you have the standard deviation and the mean and those information,

you can usually get the integral fairly easily,

the area under the curve fairly easily.

That's one way of handling that.

But it introduces a large amount of error possibilities in this peak shape.

We've given ourselves a lot of potential problems.

What we really would like is something that looks a bit more like this,

where we've got a predefined function called the PseudoV oigt.

We give it all of our fitting parameters

and there's our fitting parameter for our baseline.

It's the same math,

but we cloaked it in an easy to understand function where we are just

providing the parameters that we want to fit.

It works the same in nonlinear.

How do I do that?

Well, there are a few things that we can do.

We can define in and there's a lot of code right here.

But the big things that we want to pay attention to

are the fact that we're defining a function,

that we're defining some parameters.

At the very bottom of this, this is a family of parameters.

I am using the Add custom functions operator

to put those into the memory of JMP, so that JMP knows that I've got

these custom functions and knows what they look like and knows how they behave.

Doing it that way provides some really powerful tools.

If I come into the scripting index,

once I've defined my functions, they show up in the scripting index.

I didn't really give a lot of descriptions here,

but you could give quite detailed descriptions and examples

here as you would like.

The other thing that we can do, again coming back into our

Fit model is when we define these functions,

we get our own entry in the formula editor,

which lets us just click on one of these and use them

just like we would any other function in the formula editor.

Again, these are actually quite easy to define.

The examples in the scripting index make it very easy to do.

Just search for Add custom function

and you can just use the boilerplate there to build off of that.

There's also a great blog post on how to do that.

That's one answer to one question that we have.

Let's continue on and let's look at a different question,

maybe a slightly more complex problem.

What happens if we have two peaks?

So suddenly Fit curve is no longer on the table.

We're going to have to use Fit nonlinear

and that also suggests how we might work with this.

We're going to basically have to break out

our equation, our model that we had before.

I break it out column by column

just to manage all of those bits and pieces that we saw before.

I have one for my baseline,

I have one for my peaks, and then I have one for my spectris.

Let's have a look at what all those look like really quick.

Let's start with the baseline

because it's got a little bit of a gotcha that we have to worry about.

The baseline just has the fit parameter for the baseline,

but it also has this x term times zero.

That's because nonlinear expects every equation that goes into a formula

to tie back to the x axis that you're providing.

We put x times zero in there just to have it be okay with plotting that.

That's just a little gotcha that you have to deal with.

That's one piece, peak 1 looks just like we would expect with its parameters.

Peak 2 looks just the same,

except it's got different parameter names so we don't have any collisions.

Peak one was 1, 2, 3, 4 and peak 2 is B 1, 2, 3 , 4, 5, 6, 7, 8.

That's the only thing we have to do.

Then the spectris itself, the thing that we're going to fit,

the things that we're going to put into not the Fit nonlinear platform,

is we're just going to say my baseline curve plus my peak 1 plus my peak 2.

Just like I showed you before doing that in Fit nonlinear,

here's my spectris that goes into the prediction equation.

I'm going to remember to put my counts in and not my x curve.

Just like I said before, I'm going to expand my intermediate

formulas and that's going to tell JMP to dig back in from that first formula

into all the formulas that are in the individual columns.

We click Okay, hey, we see what we expect to see.

Now we can click go

and it goes through and fits everything just like we would expect.

We get a nice fit and we have the ability to get confidence

intervals and everything else we'd like off of that.

Two peaks is reasonable and possible.

But the problem that we run into is

what happens when we have something that looks like this.

At a rough count, there's probably a dozen peaks there

plus a complex baseline that's not actually a straight line that's

probably got some parabolic behavior to it.

We've got a complex baseline, we've got multiple peaks.

We're going to have to make one formula for each of those.

There's a lot of legwork to build in something like this.

If you get into X- ray diffraction, the problem gets even worse.

There's comfortably 30, 40 peaks in this spectris right here

that we would have to work with.

The first question that we need to ask is,

can nonlinear handle that a problem?

Well, it turns out that it can

if we just use nonlinear and I'm going to do something wild and crazy.

I've got it fitting a Lorentzian peak

and I'm going to come back and I'm going to actually have it fit it in real time.

You can watch that as it goes through.

It nails each peak in near real time as I move through this quite quickly.

It's hitting the big peak in each group.

That says that the Fit engine

can probably handle the processing that we're dealing with.

That's fine. This really becomes more of

a problem of logistics than a problem of actual functionality within JMP.

It really is a real problem.

If we were to look at, let's just say we're looking at fitting Voigt peak shapes

and we could talk about Lorentzian and we could talk about Gaussian,

we could talk about the Pearson seven, all those different types of peak shapes.

But the voigt peak shape has five parameters,

the x axis and then the forfeit parameters.

That roughly equates to about six mouse clicks per peak.

Even if you're doing it in a single formula, it's six mouse clicks per peak.

That says that for a ten peak formula, for a ten peak spectra,

we're going to have to do 88 mouse clicks.

However long that takes you per mouse click is dependent on many, many factors.

But if we were to do something like that X-ray diffraction pattern,

we're talking in the range of 300 mouse peaks.

If it's actually up around 40, it's actually around 300 mouse clicks.

That's a lot of clicking around that we don't want to have to do.

We would like our interaction with the spectra

to be something along the lines of one click per peak.

That suggests that we need some automation built- in.

Let's have a look at how I've done that.

I've taken a tool and built a tool to handle this.

I've actually taken a number of different solutions here.

First off, let's look at the library of peaks that I've generated.

Spectriss.

The title of this talk includes in it a number of different peak shapes.

We include a family of Gaussian peaks that have a split Gaussian

that gives you a different standard deviation for the x and y parameter

for one side of the peak from the other.

The same with Lorentzians, the Pearsons and then the PseudoV oigts.

These all also have versions that are

tuned to give you the area instead of the intensity as a fit parameter.

That's the area term in all of these.

That's one piece.

When we load in the spectris, add in, we get that for free.

That's automatic.

Now let's look at the other challenge.

Let's take that olive oil spectris.

What we really want is a tool

where we can come in and say, here's my X- axis, here's my Y- axis.

I just want to be able to do some peak finding.

Here's my four main peaks.

It found them automatically.

Maybe I want to do a first derivative or maybe I want to do a quantile.

I can also remove the background here so I can click finished.

It's found those first three peaks for me.

I'm going to go ahead and change my background to a linear one.

Now I can come in also and do some manual peak selection.

Behind the scenes,

it's taking care of writing all of those peak parameters for you

so that everything's nice and tidy.

There's probably one right there.

Probably one right there. There's one right there.

Every time you add a peak, you can come in and select the peak

in the list of peaks, and it'll give you the information calculated at that time.

You can see right here, these peaks are not well defined.

They're not fitting the data very well.

Really we want to go over into nonlinear.

I've taken in hacked nonlinear

so that it will run this in real time and look nice and pretty.

You can watch the peak shapes changing.

Realistically, I might have chosen a quadratic instead of a linear for this,

but just for the sake of interest.

Here, I've run out of iteration.

I'll increase the iteration and I'll also back off just a touch

on my gradient so that I can try and get this thing to converge a little quicker.

Okay, we'll take that as good enough for the moment.

We can say that we want to accept the fit,

and there's my fit parameters.

Then I can say done.

It brings it back over into spectris for me to work with.

I can now say, refine my AUC parameters

and I can come in and get my new approximate area under the curve.

That's great and grand, but what I really want is an output table

that has all those parameters and their information attached to them.

That's spectris in a nutshell.

The goal with this project was to take, like I said before,

we want to have the ability to handle physical peaks, multiple peaks,

with an easy to use interface that handles those curves

where we need the area under the curve,

the physical parameters attached to each peak.

But we really don't either have enough data to use in fit model

or in a functional data explorer, or

it's just not the problem where we want to work with that particular tool.

The tool is up again.

The QR code here will take you to the add in on the community

where you can work with it.

Spectris is up now and ready to go.