Choose Language Hide Translation Bar

Spectris: Spooky Spectroscopy - (2023-US-30MP-1472)

There is a no-man's-land in JMP, a place where it is scary to venture. It is a place inhabited by specters of data sets too complex for simple nonlinear regression, but too ephemeral for functional data analysis. It is a strange place. It appears basic methods are enough to venture here but using those techniques quickly becomes unmanageable. These spectra can come from many sources, but all have a common problem: there is too much data for simple nonlinear regression or too litttle to use functional methods. Join me on a journey across this challenging landscape of classical spectroscopy to learn methods for extracting information from examples of complex spectra and how to automate the process. 

 

 

There's   a  strange  place  that  sits

between  the  analytical  tools  that  you would  use  to  do

analysis  with  known  physical  models and  with  simple  curves

and  the  analyzes  that  you  would  do, say,  with  functional  data  explorer,

where  you  have  families  of  curves that  have  complex  shapes,

and  you're  less  interested  in  the  actual physical  nature  of  the  shapes  themselves

as  you  are  in  just  relating  them back  to  observed  phenomenon.

This   strange,

no  man's  land  of  analysis  in  JMP  is  where a  lot  of  first  principles  techniques  sit.

Things  like  X- ray  diffraction, things  like  HPLC,

where  we  have  known  physical  methods and  known  equations  that  help us  describe

very  fundamental  phenomena of  a  molecule  or  crystal  or  a  system.

All  we  have  to  do is  plug  peak  positions  in

or  area  under  the  curve  information  in

and  we  can  get  some  very  sophisticated analyzes  out  of  fairly  simple  data  points

because of  these  first  principles  methods.

At  first  blush,  it  would  seem  like JMP  should  be  able  to  handle  that.

It  seems  like  it's  got  all  the  tools  but,

when  we  dig  into  doing  those  kinds of  analyzes  we  suddenly  realize  that

t he  problem  is  a  bit  more complex  than  what  we  would  expect.

Today  what  I  want  to  do  is  focus on  some  techniques  and  strategies

to  deal  with  some  of  those  simpler  cases

and  then  introduce  some  tools that  we  can  use

to streamline  those  larger more  complex  problems.

Let's  go  ahead  and  let's  move into  JMP  and  have  a  look  at  that.

To   start  off,  let's  go  ahead  and  have a  look  at  a  very  simple  case,

a  single  peak  on  a  background.

How  would  we  go  ahead  and  pull the  information  out  of  this  peak?

How  would  we  get  its  center  position?

How  would  we  get  its  full  width   at  half  max  or  its  standard  deviation

or  even  the  area  under  the  curve?

How  would  we  get  that  information?

Well,  most  of  us  that  have  done  this  for a  while,  we  would  say,

oh,  you  know  what, I'm  going  to  go  into   Fit curve

and  I'm  going  to  say,  here's  my  y, and  I'm  going  to  say,  here's  my  x.

Then  I'm  going  to  go  ahead  and  I'm going  to  go  fit  a  peak  model  of  some  kind.

Let's  just  say  the  Gaussian  peak  and  you look  at  that  and  you  go  hey,

98%  R² ,  that's  awesome,  that's  great.

Let's  see  if  we  can  do  a  little  better.

Just  to  skip  ahead  just  a  little  bit  here

we  could  look  at  the  Lorentzian  peak shape  and  the  Gaussian  peak  shape

and  we  can  see  that those  both  give  fairly  good   R²,

they  give  fairly  good  peak  fits.

We  could  even  come  into  the  values

underneath  each  and  we  can  pull  up the  area  under  the  curve  for  them.

But  how  good  are  those  fits  actually?

Let's  take  a  look at  them  a  different  way.

What  I  want  to  do  is  take  and  we'll  go ahead  and  pull  up  graph  builder  on  these

and  we'll  look  at  how  the  models  relate to  the  residuals  for  those  peaks.

We  can  see  a  very  different  story

than  what  we  saw  in   Fit curve with  these  two  peak  shapes.

We  can  see  that  there's  a  systematic error  built  into  these  peak  shapes.

The  reason  we  can  see that  with  the  Gaussian,

we  can  see  that  it's  kind of  underestimating  at  the  center.

It's  doing  okay  on  the  shoulders,  but  out in  the  tails  it's  really  missing  things.

We  can  see  almost the  inverse  for  the   Lorentzian.

Why is that?

Well,  the  truth  is  that in  spectroscopy  particularly,

there  are  a  lot of  different  peak  shapes.

It's  not  just  Gaussian, it's  not  just  Lorentzian.

There's  actually  a  whole  family of  peak  shapes  that  are  out  there

to  handle  all the  different  physical  phenomena

that  result  in  the  peaks that  we  see  in  spectroscopy.

How  do  we  deal  with  those  in  JMP?

Well,  it's  actually  really  quite  easy.

Let's  start  with  looking  at

what  the  results  of  using the  correct  peak  shape  is.

Here  I've  got  the  Gaussian  again, the  residuals  for  the  Gaussian  peak  fit

and  the  blue  line  in  this  case is  no  longer  the  Lorentzian.

It's  called  a  Pseudo Voigt,

which  is  an  approximation  of  a  peak shape  called  a  voigt  function.

Notice  that  the  residuals for  the  Voigt  function  are  dead  flat.

We  are  actually  doing  much  better.

Before,  if  we  were  to  try  to  do

quantification  with  the  Lorentzian  or the  Gaussian,

we  would  run  into  a  situation  where  we might  over  or  underestimate

the  quantity  of  a  material  in  a  sample.

With  the  Voigt  in  this  case, because  this  is  a  voigt  peak  shape,

we're  actually  going  to  get the  accurate  quantification  of  that.

That's  the  important  thing.

Now  how  did  I  do  this?

Well,  there's  a  few  ways  to  do  it.

The  easiest  is  to  come  into  the  model

come  into  the  data table  and  create  a  model.

The  model  is  really  easy  to  make.

This  is  the  voigt  peak  shape.

Looks  a  little  scary,  but  that's just  the  nature  of  the  math.

Here  I've  got  a  parameter for  the  baseline  and  this  whole  mess.

Here  is  the  voigt  peak  shape.

We  can  come  into  the  parameters  settings

and  define  starting points  for  each  of  our  values.

Then  going  into

we're  not  going  to  use   Fit curve, we're  going  to  come  down  to  nonlinear.

We  can  use  that  as  a  starting  point for  an  analysis.

I'm  going  to  expand underneath  40  minutes.

That's  actually  a  good  habit to  get  into  in  this  case.

I  did  that  wrong.

Let's  go  back  and  redo  that.

Should  be  the  counts.

There  we  go.

That  looks  better.

Now  if  I  go  ahead  and  click  Go,

it  does  my  peak  fitting for  me  and  everything.

That's  great.

Can't  get  the  area  under the  curve  here  very  easily.

But  I  can  get  just  about  every other  parameter  that  I  need.

The  nice  thing  about a  lot  of  these  peak  shapes

is  they  also  have well  defined  integrals.

Once  you  have  the  standard  deviation and  the  mean  and  those  information,

you  can  usually  get the  integral  fairly  easily,

the  area  under the  curve  fairly  easily.

That's  one  way  of  handling  that.

But  it  introduces  a  large  amount  of  error possibilities  in  this  peak  shape.

We've  given  ourselves a  lot  of  potential  problems.

What  we  really  would  like  is something  that  looks  a  bit  more  like  this,

where  we've  got  a  predefined function  called  the  PseudoV oigt.

We  give  it  all  of  our  fitting  parameters

and  there's  our  fitting parameter  for  our  baseline.

It's  the  same  math,

but  we   cloaked  it  in  an  easy to  understand  function  where  we  are  just

providing  the  parameters that  we  want  to  fit.

It  works  the  same  in  nonlinear.

How  do  I  do  that?

Well,  there  are  a  few things  that  we  can  do.

We  can  define  in  and  there's a  lot  of  code  right  here.

But  the  big  things that  we  want  to  pay  attention  to

are  the  fact  that  we're defining  a  function,

that  we're  defining  some  parameters.

At  the  very  bottom  of  this, this  is  a  family  of  parameters.

I  am  using  the Add  custom  functions  operator

to  put  those  into  the  memory of  JMP,  so  that  JMP  knows  that  I've  got

these  custom  functions  and  knows  what they  look  like  and  knows  how  they  behave.

Doing  it  that  way  provides some  really  powerful  tools.

If  I  come  into  the  scripting  index,

once  I've  defined  my  functions, they  show  up  in  the  scripting  index.

I  didn't  really  give  a  lot of  descriptions  here,

but  you  could  give  quite  detailed descriptions  and  examples

here  as  you  would  like.

The  other  thing  that  we  can  do, again  coming  back  into  our

Fit  model  is  when  we  define  these functions,

we  get  our  own  entry in  the  formula  editor,

which  lets  us  just  click  on  one of  these  and  use  them

just  like  we  would any  other  function  in  the  formula  editor.

Again,  these  are  actually quite  easy  to  define.

The  examples  in  the  scripting index  make  it  very  easy  to  do.

Just  search  for  Add  custom  function

and  you  can  just   use  the boilerplate  there  to  build  off  of  that.

There's  also  a  great  blog post  on  how  to  do  that.

That's   one  answer to  one  question  that  we  have.

Let's  continue  on  and  let's  look at  a  different  question,

maybe  a  slightly  more  complex  problem.

What  happens  if  we  have  two  peaks?

So  suddenly  Fit curve is  no  longer  on  the  table.

We're  going  to  have  to  use   Fit nonlinear

and  that  also  suggests  how we  might  work  with  this.

We're  going  to  basically  have  to  break  out

our  equation,  our  model that  we  had  before.

I  break  it  out  column  by  column

just  to  manage  all  of  those  bits and  pieces  that  we  saw  before.

I  have  one  for  my  baseline,

I  have  one  for  my  peaks, and  then  I  have  one  for  my  spectris.

Let's  have  a  look  at  what  all those  look  like  really  quick.

Let's  start  with  the  baseline

because  it's  got  a  little  bit  of  a  gotcha that  we  have  to  worry  about.

The  baseline  just  has  the  fit  parameter for  the  baseline,

but  it  also has  this  x  term  times  zero.

That's  because  nonlinear  expects  every equation  that  goes  into  a  formula

to  tie  back  to  the  x  axis that  you're  providing.

We  put  x  times  zero  in  there  just to  have  it  be  okay  with  plotting  that.

That's  just  a  little  gotcha that  you  have  to  deal  with.

That's  one  piece,  peak  1  looks  just like  we  would  expect  with  its  parameters.

Peak  2  looks  just  the  same,

except  it's  got  different  parameter names  so  we  don't  have  any  collisions.

Peak  one  was  1, 2, 3, 4  and  peak  2 is  B  1,  2, 3 ,  4,  5, 6, 7, 8.

That's  the  only  thing  we  have  to  do.

Then  the  spectris  itself, the  thing  that  we're  going  to  fit,

the  things  that  we're  going  to  put into  not  the   Fit nonlinear  platform,

is  we're  just  going  to  say  my  baseline curve  plus  my  peak  1  plus  my  peak  2.

Just  like  I  showed  you  before  doing that  in   Fit nonlinear,

here's  my  spectris  that  goes into  the  prediction  equation.

I'm  going  to  remember  to  put my  counts  in  and  not  my  x  curve.

Just  like  I  said  before, I'm  going  to  expand  my  intermediate

formulas  and  that's  going  to  tell  JMP to  dig  back  in  from  that  first  formula

into  all  the  formulas  that  are in  the  individual  columns.

We  click  Okay,  hey,  we  see what  we  expect  to  see.

Now  we  can  click  go

and  it  goes  through  and  fits everything  just  like  we  would  expect.

We  get  a  nice  fit and  we  have  the  ability  to  get  confidence

intervals  and  everything else  we'd  like  off  of  that.

Two  peaks  is  reasonable  and  possible.

But  the  problem  that  we  run  into  is

what  happens  when  we  have  something that  looks   like  this.

At  a  rough  count, there's  probably  a  dozen  peaks  there

plus  a  complex  baseline  that's not  actually  a  straight  line  that's

probably  got  some parabolic  behavior  to  it.

We've  got  a  complex  baseline, we've  got  multiple  peaks.

We're  going  to  have  to  make one  formula  for  each  of  those.

There's  a  lot  of  legwork to  build  in  something  like  this.

If  you  get  into  X- ray  diffraction, the  problem  gets  even  worse.

There's  comfortably  30, 40  peaks  in  this  spectris  right  here

that  we  would  have  to  work  with.

The  first  question  that  we  need  to  ask  is,

can  nonlinear  handle that   a  problem?

Well,  it  turns  out  that  it  can

if  we  just  use  nonlinear  and  I'm  going to  do  something   wild  and  crazy.

I've  got  it  fitting  a  Lorentzian  peak

and  I'm  going  to  come  back  and  I'm  going to  actually  have  it  fit  it  in  real  time.

You  can  watch  that  as  it  goes  through.

It  nails  each  peak  in  near  real  time as  I  move  through  this  quite  quickly.

It's  hitting  the  big peak  in  each  group.

That  says  that  the  Fit  engine

can  probably  handle  the   processing that  we're  dealing  with.

That's  fine. This  really  becomes  more  of

a  problem  of  logistics  than  a  problem  of actual  functionality  within  JMP.

It  really  is  a  real  problem.

If  we  were  to  look  at,  let's  just  say we're  looking  at  fitting  Voigt  peak  shapes

and  we  could  talk  about  Lorentzian and  we  could  talk  about  Gaussian,

we  could  talk  about  the  Pearson  seven, all  those  different  types  of  peak  shapes.

But  the  voigt  peak  shape has  five  parameters,

the  x  axis  and  then the  forfeit  parameters.

That  roughly  equates  to  about six  mouse  clicks  per  peak.

Even  if  you're  doing  it  in  a  single formula,  it's  six  mouse  clicks  per  peak.

That  says  that  for  a  ten  peak  formula, for  a  ten  peak  spectra,

we're  going to  have  to  do  88  mouse  clicks.

However  long  that  takes  you  per  mouse click  is  dependent  on  many,  many  factors.

But  if  we  were  to  do  something  like  that X-ray  diffraction  pattern,

we're  talking in  the  range  of  300  mouse  peaks.

If  it's  actually  up  around  40, it's  actually  around  300  mouse  clicks.

That's  a  lot  of  clicking  around that  we  don't  want  to  have  to  do.

We  would  like our  interaction  with  the  spectra

to  be  something  along the  lines  of  one  click  per  peak.

That  suggests  that  we  need some  automation  built- in.

Let's  have  a  look at  how  I've  done  that.

I've  taken  a  tool and  built  a  tool  to  handle  this.

I've  actually  taken  a  number of  different  solutions  here.

First  off,  let's  look  at  the  library of  peaks  that  I've  generated.

Spectriss.

The  title  of  this  talk  includes  in  it a  number  of  different  peak  shapes.

We  include  a  family  of  Gaussian  peaks that  have  a  split  Gaussian

that  gives  you  a  different  standard deviation  for  the  x  and  y  parameter

for  one  side of  the  peak  from  the  other.

The  same  with  Lorentzians, the  Pearsons  and  then  the  PseudoV oigts.

These  all  also  have  versions  that  are

tuned  to  give  you  the  area  instead of  the  intensity  as  a  fit  parameter.

That's  the  area  term  in  all  of  these.

That's  one  piece.

When  we  load  in  the  spectris, add  in,  we  get  that  for  free.

That's  automatic.

Now  let's  look  at  the  other  challenge.

Let's  take  that  olive  oil  spectris.

What  we  really  want  is  a  tool

where  we  can  come  in  and  say, here's  my  X- axis,  here's  my  Y- axis.

I  just  want  to  be  able to  do  some  peak  finding.

Here's  my  four  main  peaks.

It  found  them  automatically.

Maybe  I  want  to  do  a  first  derivative or  maybe  I  want  to  do  a  quantile.

I  can  also  remove  the  background here  so  I  can  click  finished.

It's  found  those  first  three  peaks  for  me.

I'm  going  to  go  ahead  and  change my  background  to  a  linear  one.

Now  I  can  come  in  also  and  do some  manual  peak  selection.

Behind  the  scenes,

it's  taking  care  of  writing  all  of  those peak  parameters  for  you

so  that  everything's  nice  and  tidy.

There's  probably  one  right  there.

Probably  one  right  there. There's  one  right  there.

Every  time  you  add  a  peak, you  can  come  in  and  select  the  peak

in  the  list  of  peaks,  and  it'll  give  you the  information  calculated  at  that  time.

You  can  see  right  here, these  peaks  are  not  well  defined.

They're  not  fitting  the  data  very  well.

Really  we  want  to  go over  into  nonlinear.

I've  taken  in   hacked  nonlinear

so  that  it  will  run  this  in  real time  and  look  nice  and  pretty.

You  can  watch the  peak  shapes  changing.

Realistically,  I  might  have  chosen a  quadratic  instead  of  a  linear  for  this,

but  just  for  the  sake  of  interest.

Here,  I've  run  out  of  iteration.

I'll  increase  the  iteration  and  I'll also  back  off  just  a  touch

on  my  gradient  so  that  I  can  try  and  get this  thing  to  converge  a  little  quicker.

Okay,  we'll  take  that  as good  enough  for  the  moment.

We  can  say  that  we  want  to  accept  the  fit,

and  there's  my  fit  parameters.

Then  I  can  say  done.

It  brings  it  back  over  into spectris  for  me  to  work  with.

I  can  now  say,  refine  my  AUC  parameters

and  I  can  come  in  and  get  my  new approximate  area  under  the  curve.

That's  great  and  grand, but  what  I  really  want  is  an  output  table

that  has  all  those  parameters  and  their information  attached  to  them.

That's  spectris  in  a  nutshell.

The  goal  with  this  project  was to  take,  like  I  said  before,

we  want  to  have  the  ability  to  handle physical  peaks,  multiple  peaks,

with  an  easy  to  use  interface that  handles  those  curves

where  we  need  the  area  under  the  curve,

the  physical parameters  attached  to  each  peak.

But  we  really  don't  either  have   enough  data  to  use  in  fit  model

or  in  a  functional  data  explorer,  or

it's  just  not  the   problem  where  we want  to  work  with  that  particular  tool.

The  tool  is  up  again.

The  QR  code  here  will  take  you to  the  add  in  on  the  community

where  you  can  work  with  it.

Spectris  is  up  now  and  ready  to  go.