cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
Building an Analytic Workflow from Visualization to Chemometric Analyses (2022-US-30MP-1087)

Bill Worley, JMP Sr. Systems Engineer, SAS

 

Building an analytic workflow for any manufacturing process can be be daunting. This presentation will demonstrate the ease of building an analytic workflow from preparing the data to analyzing the final product. The workflow demonstration will show steps for data visualization and multivariate analyses including clustering, predictive modeling and optimization of the process. Additionally, a chemometric modeling approach to quantify the active ingredient in a finished product will be included.

 

 

Hello,  everyone.

My  name  is  Bill  Worley,

and  I  am  a systems  engineer   for  JMP  for  the  US  Chem  team.

Today  I'm  going  to  be  talking  to  you  about

An  Analytic  Workflow  for  Data and  Chemometric  Analysis  with JMP .

I have got  a  few  things I  want  to  highlight.

We're  going  to  be  talking  about getting  the  data  in...

Actually  just  following  the  analytic workflow  that  we  share

about  getting  data  in,   cleaning  and  blending,

visualization,  exploratory  data  analysis, building  models.

And  then  ultimately,

what  are  you  going  to  do  with  that  data and  how  are  you  going  to  share  it?

Couple  of  things that  are  important

are  the  new  JMP  Workflow  Builder,

I'm  going  to  highlight  that.

This  is  just  a  snapshot  of  what  I'll be  showing  you  in  a  little  bit.

And  the  chemometric  part  of  this  is analyzing  spectral  data,

using  functional Data  Explorer  for  pre- processing.

And  these  are  now  built  into  JMP.

If  you can  see  over  here, we've  got  a  tab  up  here  in  FTE  Now

where  you  can  choose from  different  types  of  pre- processing,

standard,  normal, variate , multiplicative  scattering  correction,

Savitzy-Golay  filtering, and  baseline  correction.

And  I  believe  there  will  be  maybe  one or  two  other  things  added  to  that.

Just  so  you  know,  the  data  that  I'm  using is  pulled  from  this  paper.

You see it right  down  here, back from  2002.

Just  to  let  you  know, that's  where  the  date  is  coming  from.

All  right,   I'm  going  to  put  that  aside for  now,

and  I'm  going  to  go ahead  and  get  things  going  here.

So  I've  got  my  home  window.

I'm  going  to  go  and  start  a File,  New,  Project.

And  the  workflow  that  I'm  going  to  be working  with  is  this  one  right  here.

I'm  going  to  right- click  on  that.

I'm  going  to  open  it.

What  I've  done  is  I've  taken  the  data  set that  I  want  to  work  with

and  I've  built  all  these  steps   in  the  Workflow  Builder,

and  I'm  going  to  play  that  for  you  now.

So  it's  going  to  populate our  project  here.

So  I'm  going  to  go  ahead  and  hit  play.

As  you  can  see,  that's  building the  workflow,  doing  some  analysis.

We're  doing  some  model screening  right  now.

And  then  everything  is  complete.

And  now  we  have  all  these  tabs   across  the  top

where we've  completed that  analysis  with  the  workflow.

I've  actually  included one  other  table  in  there.

When  we  get  there, I'll  talk  more  about  that.

But  we  built that  table, we  pulled  the  table  in

just  to  show  you they're  from the  source  data.

We've  actually  pulled  this data  in  from  an  Excel  file.

Getting  the  data  into  JMP from  Excel  is  fairly  easy.

And  we  built  some exploratory  data  analysis.

So  the  first  step  we  made  is doing  a  distribution,

and  we  can  interact with  it  just  like  anything  else.

Everything's  interactive from  the  Workflow  Builder.

Did  some  graphing   where  I  put  the  column  switcher  in,

added  the  local  data  filter.

Just  so  you  know,

as  we  build  the  workflow, all  these  things  are  built-in

and  the  recording  helps  you keep  track  of  what's  going  on.

And  you  can  see  that  we've  got full  functionality  going  on  there.

Did  some  more  exploratory  data  analysis looking  at  Fit Live IX ,

doing  an   [inaudible 00:03:46]

like in  this  case,

Fit Live IX for mill time  versus  dissolution,

and  blend  time  versus  dissolution.

Just  to  get  back  to  it,

this  is  tablet  data  that's  pretty  popular within  the  SCE  community  within  JMP.

And  I'm  just  building  on  how  we  would analyze  that  and  build  out  this  workflow.

Next  step  would  be  multi- variate   analysis  to  see,

for  this  dissolution, which  is  our  key  performance  indicator,

what  might  be  any  of  the  factors

or  what  factors  might  be highly  correlated  with  dissolution.

Not  seeing  anything  that's jumping  out  too  much.

We  do  have  some  partial  correlation,

but  no  one  factor  is  jumping  out  as

the  answer  for  the  data that  we're  looking  at.

We  can  get  a  better  understanding  of  what factors  might  be  important

if  we  look  at  a  predictor  screening.

We  can see  here  that  we  have  things called  screen  size,  mill  time,  spray  rate.

Those  look  to  be  important  factors  that  we  could  use  to  build  a  better  model.

Next,  we  can  actually  set up  a  stepwise  regression.

I'm  going  to   go  here and  actually  run  this  model  in  a  second.

And  then  we've  got  that  output   so  that  data  is  there

and  we  could  use  that  as  needed.

So we  built  that  model,

and  we're  out  here looking  at  another  type  of  analysis,

which  would  be  decision  tree,  a partition  analysis.

We  can  do  neural  net,  build  that  model

and  we  can  do  a   [inaudible 00:05:35]   squares.

So  we've  got all  those  things  together.

But  once  all  said  and  done,

you  could  actually  use  something  called Model  Screening  in  JMP  Pro

to  build  these  models  out  and  find out  which  is  the  best  overall  model.

And  based  on  that,  we  can  see  that

a  neural- boosted  model  is  probably  the best  overall  model  for  us  to  work  with.

We  can  then  take  all  this  information,

share  it  with  all  our  colleagues, co-workers,

anybody  who  might  be  interested.

And  we  can  do  this in  several  different  ways.

One  of  the  best  ways  would  be to  use  JMP L ive

and  put  everything  out  there for  folks  to  look  at  and  share.

That's  the  first  part  of  the analytic  workflow.

And  again,  if  we  look  back  here, that's  all  set  up  in  this  portfolio  here.

And  as  I  said  before, I  had  opened  another  table.

And  this  is  for  the  chemometric part of  the  analysis.

This  is  near- infrared  data  for  finding the  active  ingredient  in  tablets.

We  built  the  tablets,  we  made  the  tablets.

Now  we  have  to  take  the  finished  product and  find  out  what's  it  all  about.

Do  we  have  the  right  active  ingredient?

And  can  we  tell  based  on  this  technique called  near- infrared  analysis?

We're  going  to  step  through  a  few different  things,

but  I'm  going  to  turn  that   Workflow  Builder  back  on

to  record  these  steps.

So  let's  turn  that  on, let's  go  back  to  our  data  set,

and  let's  do  some  analysis  now.

So  I  want  to  clear  out  these raw  states  first.

And  now  I  want  to  go  to  Analyze Clustering,  Hierarchical  Clustering.

So  I  got  that,  and  I've got  all  my  data  groups.

So  there's  404  wavelengths that  are  grouped.

I'm  going  to  pull  those  in,  say,  okay.

Let's  build  this  out  a  little  bit,

we're  going  to  look  at  three  different clusters,

and  let's  color  those  clusters.

All  right,  so  you  can  kind  of  see  that,

let  me  pull  this  down  a  little  bit,

you can see we've  got  three  clusters,

fairly  big  green  cluster and  two  smaller  blue  and  red.

So  we've  got  that.

And  now  let's  go  back  to  our  data  set, and  let's  do  a  Graph  Builder.

So  let's  go  to  Graph  Builder.

Let's  pull  our  wavelengths  in,

here  to  X,

do  a  parallel  plot,

clean  that  up  a  little  bit,

and  right- click  there to  combine  scales  and  parallel  merged.

I'm  doing  these  steps  pretty  fast.

This  is  something  you'll  want  to  go back  and  watch  again,  if  it's  of  interest.

But  the  thing  I  want  to  show  you  here  is that  the  data  is  pretty  scattered,

and  there's  a  lot  of  baseline  separation,

maybe  some  additive  and  multiplicative scattering  that  we  need  to  clean  up.

So  let's  go  back  to  our  data  table

and  go  to  another  analysis  step.

Let's  go  to  a  multivariate  method.

Let's  go  to  principal  components.

Again,  we'll  pull  all  our  wavelengths  in.

Say okay.

And  the  thing  I  want  you  to  note   here  is  that

we  have  some  404  wavelengths

are  all  grouped  right  around  this little  area  right  here.

That  is  highly  correlated  data.

We  could  build  a  model  off  of  that,

but  it  may  not  be  the  best.

Because  we're  going  to  be  including wavelengths  that  are  not  of  importance

because  of  the  high  correlation.

So  we'll  clean  that  up  in  a  little  bit.

I'll  show  you  how  to  clean that  up  in  a  little  bit.

And  as  a  matter  of  fact, let's  go  to  that  step  right  now.

Let's  go  back  over  here  and  go  to  Analyze,

Specialized  Modeling.

And  we're  going  to  go  to   Functional D ata  Explorer.

And  let's  get  this  set  up  first, and  I'll  tell  you  more  about  it.

Let's put our  wavelength  there,

we  have  our  active  ingredient,   which  is  a  supplemental  variable.

And  then  our   ID  function. There  we  go.

Say okay.

Raws as  functions. Let  me  do  that  again.

Active  ingredient  and  our  wavelengths.

This  kind  of  looks  like what  we  saw  before  in  Graph  Builder,

and we  want  to  clean  this  up.

So  we've  got  these  new  tabs  in  J MP  Pro  17 for  Functional  Data  Explorer.

Spectral  is  one  of  the  tabs.

And  then,  as  we  talked  before, we  have  standard  normal  variate

multiplicative  scattering, Savitzy-Golay  and  baseline  correction.

I'm  going  to  select the  standard  normal variate first

to  cleaning  that  up.

And  then  you  can  see  the  baseline is  a  little  wobbly  here.

Let's  clean  that  up, take  that  next  step,

and  then  go  ahead  and  say  okay.

And  now  we've  got  that  set  up.

It  looks  a  lot  better,  a  lot  cleaner.

And  now  the  next  step  would  be to  model  this.

We're  going  to  use  another  new  function in  Functional Data Explorer

called  wavelets.

It's  wavelet  modeling  here.

And  you  can  see  down  here that  our  model  has  been  built,

and  we're  explaining  a  lot of  the  variation  with  about

five  functional  principle  components.

But  if  you  look  at  these,  we're  explaining the  shape  with  our  shape  functions.

That's  where  our  eigenvalues  come  into.

This  is  really  just  a  nice  way   to  look  at  the  data

and  make  sure  that  our  spectra is  being well  modeled.

As  I  said,  we've  got

five  shape  functions  that  are explaining  things  really  well.

So  let's  clean  this  up  a  little  bit and  pull  this  back,

make  our  model  a  little  simpler.

We  won't  go  all  the  way  back  to  five, but  we'll  leave  it  at  10  for  now.

You  can  look  at  the  score  plots.

There's  still  some  scattering  here in  the  data,

but  we'll  clean  that  up  in  a  second.

And  then  one  of  the  other  steps  you  want to  take  care  is  to  do  a  wavelet  model.

This  is  new  in  JMP  17, this  wavelet  analysis.

And  what  this  is  really  all  about  is looking  for

can  we  find  the  important  wavelengths

that  are  going  to  give  us  a  telltale  sign of  what's  going  on  with  the  data.

And  what  I'm  looking  for, especially  with  the  spectra,

is  something  where  I  can  see  a  shift   in  the  baseline.

And  I  can  see  that  we've  got  a  good  shift in  the  baseline

and  a  grouping  of  spectral  wavelengths around  88 20  to  maybe  88 50.

So  that's  the  important  part  here.

So  we  get  an  idea  of  what the  important  wavelengths  are.

All  the  data  that  I  had  done  up here  before,

let  me  pull  this  back,

the  pre- processing  that  I  had  done  before,

I  want  to  save  that  data  out and  do  some  analysis  on  it.

So  I'm  going  to  go  here  to  the   Functional  Data  Explorer,

select  Save  Data.

This  is  going  to  be  a  new  data  set,

and  now  we've  got  to  do  some  work with  this  data  and  clean  it  up

and  make  sure  we're  ready  to  go.

I  want  to  do  a  transpose.

Transpose  Y. X  is  our  label.

And  these  two  drop  into   [inaudible 00:13:38] .

See  if  we  got  this  right, let's hit okay.

And  yes.

So  we've  cleaned  that  table  up, we've  taken  those  300  spectra,

and  then  we've  transposed them  into  another  data  table.

This  is  all  the  pre- processed  data,

so  we're  going  to  do  a  few  more things  to  this

to  show  where  that  pre- processing  has  really  cleaned  up  the  data

and  where  we  can  build   some  models  with  it.

So  let  me  get  rid  of  this  column.

All  right,  and  ready  to  go.

So  we're  going  to  do  the  same thing  that  we  did  before.

Let's  go  to  Analyze, Hierarchical  Clustering.

Actually,  let  me  take  a  step  back  here real  quick.

Here, I want to group  these  columns   to  make  things  a  lot  easier.

So  group  those  columns   and  let's  go  back  to  where  we  were.

Let's  go  to  Analyze,  Hireachrial  Clustering,

pull  our  columns  in,  say,  okay.

We'll  do  the  same  thing  we  did  before, we'll  look  at  three  different  clusters,

and  color  those  clusters.

This  will  be  a  quick  comparison,

but  if  you  look  at  what  we  did  before to  what  we've  got  now,

we've  got  a  lot  tighter  clusters.

And  these  actually  are   pretty well  dispersed.

They're  pretty  even.

Those  clusters  are  fairly  even  right  now.

Let's  go  back  to  our  data  table.

Let's  go  back  to  our  Graph  Builder

and  pull  our  wavelengths in  again,  we  did  before.

We're  going  to  make  a  parallel plot  out  of  this  again.

It  doesn't  look  great  right  now, but  let's  right- click  here,

and go  to  combined  scales, parallel  merged.

And  now  you  see  that  the  data is  really  cleaned  up

where  we  did  that  pre- processing has  taken  things  to...

They  look  a  lot  better.

Let's  see  if  we  can  compare  that here.

What  we  had  before, and  what  we  have  now.

So  we've  got  that  data  much  cleaner.

Any  analysis  that  we  do  from  here should  be  much  better.

So  let's  go  back  to  Analyze,

Multivariate M ethods,   Principle  Components.

Pull our  wavelengths  in.

Say okay.

And  now  we've  taken  that  data

and  we've  broken  that   correlation  structure  that  we  had  before.

This  is  currently  after  pre- processing,

this  is  what  we  had  before.

Just  to  show  you  the  difference.

So  we've  really  clean  things  up.

Now  we'd  want  to  take  maybe one  more  step  in  the  analysis.

Analyze, let's go to  Quality  and  Process

Model- driven  Multivariate  Control  charts.

We're  just  looking  for  maybe some  unusual  behavior  in  here.

In  this  case,  it's  based on  the  principal  components.

Say okay.

And  this  is  looking  at   two  principal  components.

You can  see  that  there's some  potential  outliers  here.

But  this  is  spectral  data, we're  not  going  to  get  rid  of  anything.

We  just  want  to  kind  of  view  that.

And  one  other  thing  we  want  to  look  at is  to  go  to  Monitor  the  Process,

we're  going  to  look  at  score  plots.

And  now  we  can  look  at   our  subgrouping  down  here

and  we  can  actually compare  these  groups.

I'm  going  to  pull  up a  tool  here,  a lasso  tool.

I'm  going  to  do  my  best  to  group  these, a  couple  of  these.

That's  going  to  be  my  group A.

A nd  I'm  gonna  do  another  lasso  here.

We'll  just  leave  that  as  is and  go  there.

We  grabbed  one  of  the  wrong  ones, but  I think we'll  be  okay.

And  now  we  can  compare  where  we're  seeing  differences  in  the  spectra

for  these  two  subgroups.

And  as I  was  saying  before  if  we  looked  right  in  here,

those  wavelengths  are  in somewhere  in  that  8800  range,

and  then  we  can  see  that  there's a  real  difference  there.

O ne  more  thing  we  want  to  do, and  this  is  the  last  step.

What  I'd  shown  before, I'd  done  model  screening,

and  I  want  to  do  model  screening  again.

I  go  to  Analyze, Predictive  Modeling,  Model  Screening.

We're  going  to  set  this  up, we're  going  to  do  our  active.

This is our  response   that  we're  trying  to  model,

and  we're  going  to  use  our wavelengths  to  build  this  model  out.

I'm  going  to  clean  this  up  a  little  bit.

We  don't  need  all  these different  modeling  types.

We're  going  to  pull  this  out.

But  the  nice  thing  about  this  is  I can  build  all  these  models  at  once

and  really  find  out  what's  the  best modeling  approach  to  take  with  this  data.

I don't need that. I don't need that.

Let's  add  those.

One  thing  I'm  not  going to  do for  time  sake

is  I'm  not  going to  add any  cross- validation.

If  we  take  that  into  account,

it'll  actually  run  a  lot  longer.

But  as  you'll  see,  this  is going  to  be  fairly  quick.

I'm  going  to  go  ahead  and  say  okay.

And  as  this  is  going,  just  talk  a  little bit  more  about  what  we're  seeing.

We're  building  out  these  models.

You  can  see  it's  stepping  through.

And  let's  see  in  about  another   few  seconds  here it  should  be  done.

There  we  go.

Taking  a  little  longer.

There  we  go.

Based  on  what  I  said,

I  didn't  use  any  validation, but  neural  requires  it.

So  that's  the  validations that  you  see  there.

But  overall,  we  get  a  really  good  idea

XG  boost  is  going  to  be the  best  model  to  fit  this  data.

We  could  use  any  of  these  others,

because  they're  all  really good  models  as  well.

But  you  get  to  choose,  select  one, let's  say  partially  squares.

Because  that's  the  go- to  analysis method  for  spectral  data  anyway.

But  we've  got  that.

We  can  say  run,  selected,

and  fill  out  that  model  and  find  out, can  we  make  it  even  better?

Hopefully  what  I've  done  and  showed you  is  that  we  can  build  these...

Let  me  pull  this  back to  our  beginning  here.

Just a few steps.

This  is  our  workflow, and  I've  added  those  steps.

So  we've  got  that  table that  we're  working  with.

We  cleared  the  raw states . We  transpose  the  data.

Anything  that  we  closed  out is  now  part  of  that  workflow.

So  we  continue  to  build  that  workflow.

One  thing  I'll  say  is  that

I  would  typically  not  build  a  workflow inside  a  project,

but  just  showing you  that  it  can  be  done.

Let  me  go  back  to  my  slide  here, and share.

Let's flip this.

One  more  step  here.

I just  want  to  say  thank you  to  a  few  people.

Jeremy  Ash,  who's  no  longer  at  JMP, but  he's  a  great  inspiration  for  this.

Mark  Bailey  has  been  a  great  help.

Ryan  Parker  and  Clay  Barker have  done  really  fantastic  things

with  genreg  and  Functional  Data  Explorer.

Chris  Gotwalt  has  been  really  helpful in  getting  things  set  up.

And  then  Mia  Stevens  has  been  a  real supportive  person

in  helping  me  build  the  spectral analysis  out  within  JMP.

So  I  really  appreciate  everything and  that  I'll  say  thank  you.

That's  it.