cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
Predicting Functional Responses using JMP Functional DOE and Curve DOE Modeling - (2023-US-30MP-1520) Edit

Fangyi Luo, Group Scientist, Procter & Gamble
Christopher Gotwalt, Director Statistical R&D, JMP 

 

Functional or curved responses frequently occur in industry. Thanks to new features in JMP, we can now model and predict functional responses using key DOE or product design factors with JMP Functional DOE or Curve DOE modeling. A Functional DOE model is purely empirical. However, a Curve DOE model can incorporate mechanistic or expert knowledge on the functional form of the curved responses.  In this presentation, the methods and results of predicting functional responses using Functional DOE and Curve DOE modeling are compared using case studies from the consumer product industry.

 

 

Hello,  my  name  is  Fangy i  Luo and  I'm  from   Procter & Gamble.

Today  I'm  presenting with  Chris  Gotwalth  from  JMP.

We're  going  to  talk  about  how  to  model data  from  designed  experiments

when  the  response  is  functional  curve.

Functional  or  curve  responses occur  very  often  in  industry.

Thanks  to  the  new  development  of  JMP,

we  can  now  model and  predict  functional  responses

as  a  functional  of  key  DOE or  product  design  factors

using  both  functional  DOE or  curve  DOE  modeling.

A  functional  DOE  model is  purely  empirical.

However,  a  curve  DOE  model can  take  into  account  mechanistic

or  expert  knowledge  on  the  functional form  of  the  curve  responses.

In  this  presentation, the  method  and  results  of  predicting

functional  responses  using  functional  DOE and  curve  DOE  modeling  will  be  compared

using  case  studies  from  the consumer  product  industry.

This  is  the  outline  of  a  talk.

We  will  break  the  talk  into  two  parts.

In  the  first  part,  Chris  will  talk  about

what  are  the  functional data  examples  of  functional  data

and  then  he  will  help  you with  fundamental  understanding

of  the  functional  DOE  modeling,

including  functional principle  component  analysis

as  well  as  curve  DOE  modeling.

In  the  second  part,

I  will  use  two  examples from   Procter & Gamble

and  compare  the  results of  functional  DOE  and  curve  DOE  modeling

using  these  two  examples.

The  first  example  is  Modeling  Viscosity

over  Time  Data from F ormulation  Experiment.

The  second  example  is  Modeling Absorption  Volume  over  Time  Data

From  a  Diaper  Design  of  Experiment.

Then  I  will  finish  the  talk with  a  brief  summary  and  conclusion.

Thanks  Fangy i.

Now  I'm  going  to  give  a  quick  intro to  functional  and  curve  data  analysis.

But  first  I  want  to  point  out

that  there  is  a  lot  of  this  kind  of  data out  there  and  JMP  really  has  made

analyzing  curve  response  data  as  fast, easy  and  accurate  as  possible.

If  you  haven't  heard of  functional  data  analysis  before,

you  have  certainly  seen  it  out  there.

It's  all  over  the  place,

and I'll  show  you  some examples  to  make  that  clear.

For  example, here  are  annual  home  price  indices

from  1992 -2021  for  all  50  US  states.

Each  function  has  a  beginning  measurement

followed  by  a  sequence of  other  measurements

and  then  a  final  measurement.

They  all  have  a  beginning, a  middle  and  an  end.

The  functions  don't  have  to  all  have the  same  start  and  endpoints

or  measurements  at  the  same  times.

In  a  time  series  analysis, we  are  really  interested  in  using  data

to  predict  forward  into  the  future using  data  observed  from  the  past.

In  a  functional  data  analysis or  a  curve  data  analysis,

we  are  generally  more  interested

in  explaining  the  variation internal  to  the  functions

than  predicting  beyond the  range  of  times  we've  observed.

In  product  and  process improvement  in  industry,

we  are  often  working on  non-financial  curves.

I'm  going  to  show  you  some  examples that  our  customers  have  shared  with  us.

Here  we  see  a  set of  infrared  spectra  of  gasoline  samples

used  to  develop  an  inexpensive  tool to  measure  octane  and  gasoline.

The  green  curves  had  high  octane, and  the  red  ones  were  low  in  octane.

The  height  of  the  left  peak turned  out  to  be  critical

for  predicting  octane  level.

Microbial  growth  curves

are  a  common  type  of  functional data  in  the  biotech  industry.

Today,  F angyi  will  be  demonstrating two  methods  in  JMP

that  can  be  used  for  analyzing  DOEs,

where  the  response is  a  set  of  measurements.

The  first  method is  called  functional  DOE  analysis

and  is  best  for  complicated response  functions  like  spectra

when  you  need  the  model  to  really  learn the  curves  and  the  data  from  scratch.

The  second  is  a   curve DOE  analysis,

which  is  based  on non-linear  regression  models.

When  you  can  use  the   curve DOE  analysis,

I  found  that  you  get  more accurate  results  with  it.

But  if  you  can't  get the   curve DOE  analysis  to  work,

you  can  always  fall  back on  the  functional  DOE  analysis,

as  it's  more  general  than   curve DOE.

The  critical  step in  functional  data  analysis

that  will  be  new  to  most  people

is  called  functional principle  components  analysis,

also  called  FPCA  for  short.

This  is  how  we  decompose  the  curves into  shape  components

that  describe  the  typical  patterns we  see  in  the  curves,

as  well  as  weights  that  attribute how  strongly  each  individual  curve

correlates  with  those  shape  components.

It's  a  kind  of  dimension  reduction and  data  compression  technique

that  reduces all  the  information  in  the  curves

into  the  most compact  representation  possible.

To  illustrate  FPCA,  take  a  look at  the  set  of  curves  in  the  plot  here.

What  do  they  have  in  common?

How  do  they  differ  from  one  another?

What  I  see  in  common

is  a  set  of  peak  shapes with  one  peak  per  curve,

and  the  shapes go  to  zero  away  from  the  peak.

They  also  appear  to  be  symmetric around  the  center  of  the  peak.

In  terms  of  differences, I  see  variation  in  peak  heights,

and  there  are  clear  horizontal  shifts from  left  to  right,

and  some  curves are  also  narrower  than  other  ones.

In  a  functional  data  analysis,

the  first  thing  we  do is  find  a  smoothing  model

that  converts  or  approximates the  discrete  measurements,

converting  them  into  continuous  functions.

There's  a  variety of  smoothing  models  in  FDE.

I  don't  really  have  a  firm  rule as  to  which  one  is  the  best  in  general,

but  here  are  my  observations about  the  most  common  ones.

Wavelets  and  splines have  different  strengths.

Wavelets  are  new  in  JMP  Pro  17

and  are  very  fast  and  are  generally the  best  with  complicated  functions

such  as  spectra,  as  long  as  the  X coordinates  of  the  data  are  on  a  grid.

On  the  other  hand, there  are  B and P  splines,

which  are  slower  computationally

but  are  better  for  data with  irregularly- spaced X s,

and  are  also  often  better

when  there  are  only  a  dozen or  fewer  measurements  per  function.

If  the  data  aren't  large, I  would  try  both  splines  and  wavelets

and  see  which  one is  giving  us  the  best  fit

by  looking  at  the  graphs.

The  main  graphs  I  use to  make  decisions  about  smoothing  models

are  actual  by  predicted  plots

and  you  wanted  the  one that  hugs  the  45- degree  line  more  closely.

In  this  case,  I  would  choose the  wavelets  model  on  the  right

over  the  spline  model  on  the  left,

because  those  points  are  tighter around  that  45- degree  line.

Immediately  after  JMP  Pro fits  a  smoothing  model  to  the  data,

it  decomposes  the  signals

into  dominant  characteristic  shapes it  found  in  the  data.

In  mathematical  language, these  shapes  are  called   eigenfunctions,

but  a  better  and  more  approachable  name would  be  to  call  them  shape  components.

Here  we  see  that  JMP  has  found

that  the  overall  mean  function is  a  peak  shape

and  that  there  are  three  shape  components

that  explain  97% of  the  variation  in  the  data.

The  first  shape  component  appears to  correspond  to  a  peak  height.

I've  learned  to  recognize that  the  second  shape

is  a  type  of  left- right  peak  shift  pattern and  that  the  third  shape  component

is  something  that  would control  the  peak  width.

Remember  that  these  are  shapes learned  from  the  data,

not  something that  I  gave  JMP  outside  of  the  data.

What  has  happened is  the  observed  spectra  in  the  data

has  been  decomposed into  an  additive  combination

of  the  shape  components

with  unique  weights for  each  individual  curve.

The  functional  PCA  is   like  reverse engineering  the  recipe  of  the  curves

in  terms  of  the  shape  components.

The  mean  function  is  the  thing that  they  all  have  in  common.

The  shape  components are  the  main  ingredients.

And  the  weights are  the  amounts  of  the  ingredients

in  the  individual  curves.

The  functional  DOE  analysis is  the  same  mathematically

as  extracting  the  scores  or  weights

and  modeling  them  in  fit  model with  the  generalized  regression  platform.

Fortunately, there  is  a  red  triangle  option

in  the  Functional  Data  Explorer that  automates  the  modeling,

linking  up  the  DOE  models with  the  shape  functions  for  you

and  presenting  you  with  a  profiler

that  connects  the  DOE  models with  the  shape  functions.

You  can  directly  see how  changing  the  DOE  factors

leads  to  changes in  the  predicted  curve  or  spectra.

There  are  many  potential  applications of  functional  DOE  analysis,

some  of  which  Fangyi  will  be presenting  later  in  this  talk.

There  is  another  approach in  JMP  called  curve  DOE  modeling.

This  answers  the  same  kind  of  question as  functional  DOE,

but  it  is  nonlinear  regression  based rather  than  spline  or  wavelet  based.

What  that  means  is  that  if  you  have a  good  idea  of  a  nonlinear  model,

like  a  three- parameter  logistic  model, and  if  that  model  fits  your  data  well,

you  can  get  models  and  results

that  generalize  better than  a  functional  DOE  model,

because  the  general  shape  of  the  curve

doesn't  have  to  be  learned  from  scratch from  the  data  using  splines  or  wavelets.

The  idea  being  that  if  you  can  make assumptions  about  your  data

that  reproduce  the  modeling  effort  needed,

your  predictions  will  be  more  accurate, especially  from  small  data  sets.

Curve  DOE  analysis has  a  very  similar  workflow

to  a  functional  DOE  analysis,

except  that  you  go  through the  Fit  Curve  platform

instead  of  the  functional  Data  Explorer,

and  instead  of  choosing wavelets  or  splines,

you  chose  a  parametric  model from  the  platform.

Just  like  in  a  functional  DOE  analysis,

you  want  to  review the  actual  by predicted  plot

to  make  sure  that  your  nonlinear  model is  doing  a  good  job  of  fitting  the  data.

A   curve DOE  analysis is  the  same  as  modeling

the  nonlinear  regression  parameters

extracted  from  the  curves  using the  generalized  regression  platform.

This  is  the  same  thing  as  what's  going  on with  a  functional  DOE  analysis

with  the  FPCA  weights.

Fit Curve  automates  the  modeling and  visualization  just  as  FDE  does.

Once  you  know  functional  DOE  analysis,

it's  really  not  very  hard  at  all to  learn   curve DOE  analysis.

Now  I'm  going  to  hand  it  over  to  F angyi

who  has  some  nice  examples  illustrating functional  DOE  and   curve DOE.

Thanks  Chris.

Next  I'm  going  to  talk  about two  examples  from   Procter & Gamble.

The  first  example is  viscosity  over  time  curves

collected  from  a  number of  historical  formulation  experiments

for  the  same  type  of  liquid  formulation.

There  are  six  factors  we  would like  to  consider  for  the  modeling.

They  are  all  formulation  ingredients and  we  call  them  factor  one  to  factor  six.

The  goal  of  our  modeling is  to  use  these  formulation  factors

to  predict  or  optimize viscosity  over  time  curve.

The  response  of  modeling is  viscosity  over  time.

This  slide  showed  you some  viscosity  over  time  data.

For  majority  of  our  formulations, the  viscosity  of  the  formulations

would  increase  first  with  time and  then  decrease  later  on.

Next,  we're  going  to  perform  functional DOE  analysis  on  viscosity  over  time  data.

Before  functional  DOE  analysis,

we  need  to  perform functional  principal  component  analysis

on  the  curves  smooth using  different  method.

Here,  we  apply  functional  principal component  analysis

to  the  curves  first  using  B-s plines

and   find  five functional  principal  component

where  they  cumulatively  explains about  100%  of  variation  in  the  curves.

Each  of  the  curve  would  express

as  the  sum  of  the  mean  function plus  linear  combination

of  the  five  functional principal  components

or  eigen functions also  called  shape  function.

We  also  apply  direct  functional  principal component  analysis  to  the  data

where  it  find  four functional  principal  components

that  cumulatively  explains

about  100%  of  variation across  viscosity  over  time  curves.

E ach  curve  will  then be  expressed  as  the  mean  function

plus  linear  combination  of  the four  functional  principal  components.

This  slide  compares  the  functional principal  component  analysis  model  fit

using  two  different  options.

The  first  one  is  using  the  B-s pline  option

and  the  second  one  is  using the  direct  functional  PCA  analysis.

As  you  can  see  using  the  B -spline  option, the  model  fit  seems  to  be  smoother

as  compared  to  the  model  fit using  direct  functional  PCA  analysis.

This  slide  showed  you the  diagnostic  plots,

the  observed  versus  predicted  viscosity

from  the  functional principal  component  analysis

using  two  different  options.

Using  direct  functional  PCA  analysis,

the  points  are  closer to  the  45- degree  lines

as  compared to  the  one  using  B-s pline  option,

indicating  that  direct functional  PCA  analysis

fits  the  viscosity  over  time  data

slightly  better  than  the  functional principal  component  analysis

using  B-spline  option.

After  performing  functional principal  component  analysis,

there's  an  option  in  JMP, you  can  perform  functional  DOE  modeling

and  get  functional  DOE  profiler.

For  functional  DOE  modeling,

basically  it's  combining the  functional  rincipal  component  analysis

with  the  model  for  the  functional principal  component  scores

using  formulation  factors.

For  this  profiler  we  can  predict the  functional  responses,

in  our  case,  is  viscosity  over  time  curves using  different  formulation  factors.

You  can  select  a  combination of  the  formulation  factors

and  it's  able  to  predict the  viscosity  over  time  curve.

This  slide  shows  you  the  diagnostic  plots, the  observed  versus  predictive  viscosity

and  also  the  residual  plots from  the  functional  DOE  modeling.

As  you  can  see  that  the  residuals from  the  functional  DOE  modeling

are  larger  than  the  functional principal  component  analysis

before  the  functional  DOE  modeling.

Our  colleagues  at   Procter & Gamble

actually  they  find  that Gaussian  Peak  model  would  fit

individual  viscocity  curves  very  well.

This  Gaussian  Peak  model has  three  parameters  A, B, C,

and  this  A  indicates  the  peak  value of  the  viscosity  over  time  curve

and  B  is  a  critical  point,

which  is  a  time  when  viscosity reaches  maximum,

and  C  is  a  growth  rate.

The  rate  of  the  viscosity  increase during  the  initial  phase.

This  is  the  fitting of  the  viscosity  over  time  curve

using  the  Gaussian  Peak  model

using  a  feature  in  JMP, called curve  fitting.

These  are  the  diagnostic  plots

of  the  viscosity  curve  fitting using  the  Gaussian  Peak  models.

It  looks  like  the  model  fitting are  not  too  bad,

however,  the  arrows  seems  to  be  larger than  the  arrows  from  the  fitting

using   functional principal  component  analysis.

After   curve DOE  fitting using  Gaussian P eak  model,

there's  option  in  JMP you  can  perform  curve  DOE  modeling.

Basically,  curve  DOE  model  is  combining

the  parametric  model  for  the  curves, the  Gaussian  Peak  model,

and  the  model  for  the  parameters of  the  Gaussian  Peak  model

express  the  parameter as  a  function  of  formulation  factors

using  generalized  regression  models.

Then  you  get  the   curve DOE  model

and  this  is  a  profiler of  the   curve DOE  model.

Using  this  profiler  you  can  predict the  shape  of  the  curve

by  specifying  combination of  the  formulation  factors.

Actually,  this  profiler is  somewhat  different

from  the  functional  DOE  profiler we  got  previously.

These  are  the  diagnostic  plots from   curve DOE  model.

As  you  can  see  here that  the   curve DOE  model

does  not  fit  the  data  well  and  it's much  worse  than  the  functional  DOE  model.

These  are  the   curve DOE  model  fit on  the  original  data.

As  you  can  see  that for  a  number  of  formulations,

the   curve DOE  model does  not  fit  the  data  well.

This  is  a  comparison  of  the  profilers

from  functional  DOE  model and   curve DOE  model.

As  you  can  see  that  the  profilers, they  look  quite  different.

This  compares  the  diagnostic  plots

from  functional  DOE  model and   curve DOE  model.

As  you  can  see  that  functional  DOE  model

fits  the  data  much  better than  the   curve DOE  model

with  a  smaller  root  mean  square  error.

Now  I'm  going  to  show you  the  second  example.

This  example  is  from a  diaper  design  of  experiment

with  four  different  product  A, B, C, D

at  three  different  stations labeled  as   S1,   S2  and   S3,

so  it's  a  factorial  design.

Diaper  absorption  volume was  measured  over  time

for  these  four  different  product at  three  different  stations.

The  response  is  diaper  absorption volume  over  time

and  the  goal is  to  understand  the  difference

in  diaper  absorption  curves across  different  products  and  stations.

These  are  a  few  examples  of  diaper absorption  volume  over  time  curves

where  the  fitting  lines are  smoothing  curves.

We  performed   functional principal  component  analysis

on  the  diaper  absorption volume over  time  curves

and  this  functional principal  component  analysis

was  able  to  find five  functional  principal  component

where  cumulatively,

they  explains  about  almost  100% of  variations  among  multiple  curves.

These  are  the  functional  principal component  analysis  model  fit.

As  you  can  see,  for  almost  all  the  curves,

the  fitted  curve  plateaued after  a  certain  time  point.

Functional  principal  component  analysis model  fitted  curves  really  well

as  you  can  see  from  the  diagnostic  plots.

We  performed  functional  DOE  modeling

of  the  functional principal  component  analysis

and  this  is  profiler of  the  functional  DOE  model.

This  model  allows  us to  evaluate  shape  of  the  curve

for  different  diaper  products at  different  measuring  stations.

The  product  comparison at  station  two  seems  to  be  different

from  the  product  comparisons at  station  one  and  station  three.

These  are  the  diagnostic  plots of  the  functional  DOE  model.

Next,  we  would  like  to  perform curve DOE  modeling.

Before   curve DOE  modeling,

we  would  like  to  find some  parametric  model

that  fits  the  diaper  absorption volume  over  time  data  well.

I  found  that  there's  a  function  in  JMP called   biexponential  4P  model.

This  model  is  a  mixture of  two  experiential  model

with  four  unknown  parameters.

This  model  fits  all  the  diaper  absorption volume  over  time  curves  really  well.

These  are  the  diagnostic  plots of  the  curve  fitting  and  you  can  see

that  the  biexponential 4P   model fits  all  the  curves  really  well.

After  fitting  diaper absorption  volume  over  time  curves

using  biexponential 4P  model, we  performed   curve DOE  modeling  using  JMP

and  this  is  a  profiler of  the   curve DOE  model.

Using  this  profiler,  you  are  able to  see  the  shape  of  the  curve

as  a  function  of  diaper  product as  well  as  a  measuring  station.

This  is  a  profiler  of  product  A at  station  two  and  then  station  three.

These  are  the  diagnostic  plots of  the  curve  DOE  model

and  you  can  see  that curve  DOE  model  fits  the  data  well,

except  that  at  higher diaper  absorption  volume,

the  residuals  are  getting  larger.

These  are  the   curve DOE  model  fit on  the  original  data.

As  you  can  see  that for  most  of  the  curves,

this  model  fits  the  data  really  well.

This  compels  the  model  profiler

of  the  functional  DOE  model versus   curve DOE  model.

As  you  may  notice that  there's  some  difference

between  these  two  profiler at  the  later  time  point.

The  predicted  diaper  absorption  volume at  the  later  time  point

tend  to  plateau from  the  functional  DOE  model,

but  it  continue  to  increase at  later  time  point

using  the   curve DOE  model.

This  compares  the  diagnostic  plots from  the  functional  DOE  model

versus  curve DOE  model using biexponential 4P  model.

As  you  can  see  that  both  of  these  models fits  the  data  really  well,

with  functional  DOE  being  slightly  better

with  slightly  small root  mean  square  error.

Now,  you  have  seen  the  comparison of  functional  DOE  modeling

versus  curve  DOE  modeling using  two  P&G  examples

and  this  is  our  summary  and  conclusions.

Functional  DOE  modeling is  always  a  good  choice.

When  the  parametric  model fits  all  the  curve  data  well,

curve DOE  modeling may  perform  really  well.

However,  if  the  parametric  model does  not  fit  the  curve  data  well,

then  the  curve  DOE  modeling may  perform  poorly.

Functional  DOE  model  is  purely  empirical.

However,   curve DOE  model

may  take  into  account mechanistic  understanding

or  extrovert  knowledge in  the  modeling,  so  it  can  be  hybrid.

I t's  good  to  try  different  method like  different  smoothing  method

before  functional principal  component  analysis.

In  functional  DOE   modeling,

try functional  DOE  model versus  curve DOE  model

and  see  which  one  performs  best.

This  is  end  of  our  presentation.

Thank  you  all  for  your  attention.