Choose Language Hide Translation Bar

Using Graph Builder’s Splines To Align Measurement Curves (2023-EU-30MP-1329)

A frequent task in data analysis is aligning curves before a descriptive or root cause analysis. Often an additional complication occurs when the measurement intervals are not equidistant in the series to be compared. There is not one single value that quantifies the shift for a whole curve. Interpolation is the solution in cases like this. Simple linear interpolation may lead to numerous random errors; a spline interpolation is more robust. Since the Graph Builder exports the formula for the spline in its current shape, it became an easy, accessible tool for the alignment of curves. And as all steps can be programmed in JSL, it provides a framework for automating curve alignment. This presentation will describe the background, concept, and case study application for the alignment of curves.

 

 

Welcome,

 everybody,  to  this  presentation  about  a  use  case  of  curve  alignment.  Experienced  analysts  often  say  that  in  a  larger  analytical  project,  plus  minus  60 %  of  the  total  time  goes  into  the  preparation  of  data.   If  curves  play  a  role  and  especially  the  alignment  of  curve  is  needed,  then  that  is  certainly  close  to  the  truth.

Curves  are  very  specific  types  of  data,  and  JMP  has  some  tools  to  work  with  curves  and  to  address  all  the  related  problems.  In  the  sample  library,  there  is  the  Algae   Mitscherlich data,  which  is  one  of  my  favorite  data  sets  with  that  respect  because  it  has  the  option  to  deal  with  many  aspects  of  fitting  curves.

This  is  just  an  example  of  the  development  of  Algae  density  in  different  treatments.  The  type  of  curves  that  I'm  going  to  talk  about  are  typically  observations  or  measurements  over  time.  But  this  doesn't  mean  any  loss  in  generality.  The  presented  concepts  work  in  all  kinds  of  curve  relationships.

This  is  an  example,  A lgae  measurement  over  time,  and  one  of  the  aspects  that  is  in  the  focus  of  the  analysis  for  this  data  set  is  to  specify  curves,  specific  types  of  curves  that  have  a  known  shape  and  are  driven  by  certain  parameters  and  then  to  estimate  those  parameters  based  on  the  data.

In  those  cases,  the  parameters  very  often  have  a  technical  meaning  like  slope,  inflection  point,  limit  that  gets  approached.   That  platform  here  also  has  the  sliders  that  let  you  analyze  how  changing  one  of  those  parameters  affects  the  shape  of  the  curve.

In  the  specific  case  that  we  are  going  to  talk  about,  we  are  not  specifically  interested  in  the  curve.  The  curve  itself  is  only  a  help  because  we  are  facing  another  problem.

This  is  the  data  set,  or  this  is  a  part  of  the  data  set  that  goes  back  to  the  real  problem.  We  had  this  series  of  measurements,  one  and  another  series  of  measurements,  and  they  belong  to  two  different  devices.  U nfortunately,  the  clocks  of  these  devices  were  not  in  sync.  But  luckily,  each  of  the  devices  had  one  sensor  that  measured  the  same  substance.  W e  could  look  for  times  where  the  measurements  were  very  close  to  each  other.   Then  try  to  find  out  how  to  correct  one  of  the  clocks,  so  to  say,  so  that  we  get  aligned  measurements,  and  then  use  those  to  evaluate  the  data  from  all  the  centers  that  have  been  available  in  that  data  set.

What  was  the  problem  of  the  task?  You  see  the  curves  here.   The  red  curve  is  the  one  that  we  took  as  the  reference  curve,  and  the  blue  one  is  the  one  that  we  wanted  to  shift.   You  see  not  only  that  the  curves  are  quite  some  distance  away,  although  they  should  theoretically  have  measured  the  same  substance  at  the  same  time,  but  also  the  time  points  of  each  series  is  completely,  or  the  time  point  of  both  series  are  completely  unrelated.  With  the  bare  eye,  we  don't  see  any  lag  that  we  could  use  to  correct  one  of  the  data  sets.

Therefore,  I  looked  into…  I  compared  the  time  points,  not  the  Y  measurements,  the  time  points  of  the  two  series.  If  there  was  just  a  shift,  then  we  would  expect  to  see  all  the  data,  all  the  points  exactly  on  one  line.  But  here  you  see  there  are  ups  and  downs,  so  this  is  obviously  not  the  case.

Perhaps  we  can  see  more,  we  can  understand  more  if  we  calculate  row  by  row  in  the  data  table,  we  calculate  the  difference  of  the  two  times  and  look  at  those.   Here  with  some  fantasy,  we  see  a  little  bit  of  curvature,  so  till  the  end,  it  seems  to  be  closer  related  than  to  the  beginning.  But  in  the  beginning,  this  looks  like  real  random  data.   This  as  well  does  not  help  us  a  lot  to  figure  out  how  to  relate  the  data.

I  thought  I  had  the  link  to  the  data  table,  but  we  can  look  in  this  screenshot  as  well.  This  is  the  data  set  that  you  have  seen  before,  a  little  bit  annotated.   We  see  that  two  lines  have  specific  markers,  the  star  and  the  circle.  This  is  due  to  the  fact  that  the  whole  measurement  project  had  a  ramp- up  phase,  and  at  the  star  point,  the  measurement  series,  the  measurement  time,  the  real  process  time  started.

The  circle  is  there  where  we,  after  visual  or  manual  inspection,  saw  the  starting  point  in  the  second  time  series,  and  we  want  to  align  both.   W e  need  to  change  the  relationship  of  the  rows,  of  the  data  and  the  rows.  We  want  to  shift  one  of  the  data  sets,  and  that  reminded  me  of  the   paternoster  that  I  like  to  use  when  many  years  ago  I  was  working  for  a  company  that  had  a  very  old  administrative  building  and  we  had  the   paternoster  in  there.

It  came  to  me  that   the  strategy  that  we  are  following  is  part  of   paternoster shift,  which  gives  the  word  elevator  pitch  a  completely  new  meaning,  by  the  way.  How  do  we  find  the  right  steps,  the  right  place  to  fit?  We  do  not  have  similar  or  identical  time  points  in  both  series  of  times.  We  need  to  construct  those  time  points  somehow.   Of  course,  this  is  done  through  interpolation.

T he  first  thing  that  comes  into  mind  is  linear  interpolation,  and  if  I  zoom  in  into  only  a  part  of  the  data  set,  then  it  is  evident  that  linear  interpolation,  so  just  checking,  so  to  say,  the  regression  between  neighboring  time  points  has  some  problems,  especially  if  we  look  into  areas  where  we  have  horizontal  lines,  which  may  easily  happen.  Then  the  time  point  in  that  range  is  quite  arbitrary.  It  always  leads  to  the  same  results.  The  opposite  is  true.  If  we  are  in  an  area  with  a  very  steep  ascent,  then  a  little  change  on  the  X- axis  or  the  time  may  lead  to  significant  changes  in  the  Y  value.  T his  is  not  a  very  good  technique.  We  can  use  splines  to   interpolate between  the  values.

You  know  splines,  certainly  from  the  graph  builder.  If  you  make  a  scatter  plot,  then  by  default,  the  smoother  is  switched  on  and  the  smoother  provides  splines.   You  can  even  change  the  stiffness  or  the  degree  of  fit  or  the  closeness  to  the  data  with  a  slider  in  the  graph  builder.

The  advantage  of  using  a  spline   as  an   interpolation  tool,  also  takes  into  consideration  points  further  away.  T hey  build  a  smooth  curve.  That  is  why  in  the  graph  builder,  it's  called  the  smoother.   This  makes  it  easier  to  use  those  for  interpolation and  to  use  these  as  well  as  a  base  in  our  alignment  process.

We  need  to  fit  splines.  How  can  we  do  this  or  which  platforms  do  help?  First  of  all,  simple  tool  is  Fit  Y  by  X  comes  into  mind  very  fast  when  you  work  with  JMP  and  do  that  data  analysis.  This  is  the  data,  one  of  the  curves.  There  is  the  spline  fit,  and  here  is  the  slider  that  let  me  choose  how  close  or  how  close  I  want  to  fit  my  data.  Very  good,  very  easy  to  use,  and  you  can  save  the  spline  but  only  the  values,  not  the  formulas.  We  are  keen  on  getting  the  formula  for  the  spline.

Next  stop,  Fit  Model.   If  you  have  a  continuous  variable,  you  select  it,  you  can  give  this  the  attribute  of  being  a  knotted  spline  effect.  When  you  do  so, you  are  prompted  to  say  how  many  knots  that  spline  should  have,  the  more, the more  flexible.   I  accept  the  default,  say  run.  We  get  the  typical  report  from  Fit  Model.  A lso,  we  have  fit  models,  functions  of  saving  formulas.   We  can  use  Fit  Model,  save  the  formula.

Little  disadvantage  here  is  I  need  to  specify  the  number  of  knots  before  I  start  the  analysis.  Once  the  analysis  is  done  within  the  platform,  I  don't  have  the  option  to  play  with  it  or  change  it  like  it  is,  for  example,  in  Fit Y  by  X.

Another  tool  is  the  Functional  Data Explorer . T he  Functional  Data  Explorer  has  splines  as  a  core  function,  and  it  is  also  functionality  to  find  optimal  definitions,  optimal  fits  for  the  splines.  You  can  export  everything.  It's  a  bit  because  simple  tasks  like  this  is  not  where  the  Functional  Data  Explorer  is  made  for.  You  need  some  more  clicks  to  come  to  a  result.  A lso,  it's  only  available  for  people  who  have   JMP role.

Remains,  the  Graph  Builder.  You  have  seen  it  before,  and  this  time  I  want  to  show  the  spline   control  as  well.  As  I  said,  we  can  use  the  slider  to  determine  the  fit.  A  very  nice  feature,  by  the  way,  is  that  you  can  check  this  box,  then  through  a  bootstrap  sampling  method,  the  confidence  interval  for  the  smoother  is  calculated  or  estimated.  You  see  how  that  changes when  I'm…  Now  you  can  see  better,  I  have  a  lot  of  data  and  there  is  not  too  much  variability.

H ere  the  confidence  band  is  quite  small.  But  if  we  zoom  into  one  of  these  areas  here,  for  example,  that  place  and  look  at  what  happens  when  I  change  this,  then  we  see  that  the  smoother  can  even…  That  the  line  of  the  smoother  can  even  walk  out  of  its  own  confidence  band.   This  is  another  visual  help  to  find  out  a  good  fit  for  the  smoother,  for  the  spline.  It  should  stay  within  its  own  confidence  limit.  Then  comes  the  very  important  option  here.  We  can  save  the  formula.  Then  we  have  a  formula  for  this  spline.

The  graph  builder  surprises  as  a  modeling  tool.  Who  had  expected  this?  How  does  that  help?  This  is  again  part  of  my  data  table,  small  part.   You  see  that  now  I  have  two  columns  here  where  I  saved  the  formulas  for  the  smoother  too.   Down  here  in  the  colored  rows,  I  put  some  arbitrary  time  points  in.   That  leads  to  an   interpolated response  relative  to  the  time  point  that  I  have  given.  It  only  works  for  interpolation.  We  cannot  extrapolate  this  way,  it's  only  with  interpolation.

But  this  way  I  can,  for  example,  manually  add  different  time  points.  I  have  this  one  here  plus  X  seconds  in  that  case ,  and  then  I  can  see  what  is  the  difference  of  the  interpolated  value.  Now  I  can  put  reference  times  in  and  I  see  exactly  what  is  the  expected  value,  plus  minus  a  little  bit  for  both  measurements.

I  did  this  for  two  different  phases.  I  can  go  here  and  experiment  anymore.  In  my  journal,  you  see  in  the  yellow  rows,  I  added  eight  seconds.  In  the  orange  ones,  10  seconds.  Depending  on  what  you  want  to  do,  this  is  the  principle  of  how  you  can  work  with  this.  If  your  task  is  a  one- off  task,  this  is  good  enough.  You  can  go  in  here,  play  with  the  data,  see  the  difference.

Our  task  was  more  regular.   The  good  thing  is  everything  can  be  controlled  with  JSL.   As  usual,  for  many  commands  that  you  do  manually  on line,  you  have  corresponding  JSL  statement,  and  I  just  listed  some.  This  is  not  a  working  program.  First  of  all,  you  need  to  set  up  the  graph,  clear.  Then  you  have  commands  that  you  can  send  to  the  graph  and  specifically  to  the  smoother  element  in  your  graph.   We  can  change  the  smoother  so  we  could  even  interactively  try  to  determine  good  fits.

We  can  also  give  the  command  to  save  the  formula  in  the  data  table.  That  is  the  command  that  plays  an  important  role  for  our  solution  here.  You  can  read  out  the  current  settings  of  the  Lambda  slider  and  something  small.

How  did  we  want  to  use  this?   The  concept  here  was,  of  course,  first  you  need  to  determine  what  is  the  reference  curve  and  what  is  the  objective  curve,  the  one  that  needs  to  be  shifted.  Then  you  calculate  the  spline  function  for  the  reference  curve  and  determine  the  direction  of  shift.  Where  are  we?  Do  we  need  to  shift  our  time  up  or  down?

Then  we  move  the  Y  values  of  the  objective  curve,  one  row  in  the  desired  direction  and  calculate  the  spline  function  for  this  new  curve.  Save  that,  use  a  reference  value,  and  then  calculate  the  differences  in  Y  for  each  row.   Then  we  take  the  total  sum  of  those  differences  as  a  criterion  when  to  break  the  process.  Because  after  every  step,  we  calculate  that  difference,  we  save  the  difference,  we  do  the  next  step,  and  we  check,  was  there  an  improvement?

If  yes,  we  move  up  or  down  one  row  more,  and  then  we  repeat  that  whole  activity  until  there  is  no  improvement  anymore.  The  whole  program  in  the  real  project,  of  course,  runs  behind  the  scenes.  You  wouldn't  see  anything.  But  I  added  did  some  graphs  to  make  it  visual  to  demonstrate  how  that  works  step  by  step.

The  starting  situation  is  this  one.  On  the  left- hand  graph,  you  see  the  dashed  line  and  the  solid  line.  The  dashed  line  is  the  reference  line.  The  solid  line  needs  to  be  moved.  On  the  right  side,  you  see  the  differences  per  row.

In  the  beginning,  the  differences  are…  You  see  that  here  in  the  starting  area,  the  differences  here  are  pretty  small.  Then  they  get  larger  and  larger,  and  they  are  negative.  That  is  why  it  goes  down  here  on  a  negative  scale,  very  small  differences  in  the  beginning,  and  then  they  get  up  larger.

This  is  the  starting  situation.  You  will  see  this  picture  again.  Then  the  program  will  start  shifting  the  reference  curve  one  cell  up  in  our  situation,  our  case.  Then  you  see  how  these  graphs  update  for  every  step.  Yes,  first  we  need  to  tell  JMP  what  are  the  time  and  measurement  values  for  the  reference  and  the  objective  curve.   Here  we  go.   It  will  take  a  little  bit  in  the  beginning,  then  afterwards  the  steps  come  faster.

You  see  how  for  every  step,  the  blue  curve  approaches  the  dotted  curve,  and  how  the  differences  decrease.  The  last  step  did  not  improve  the  situation  anymore,  therefore,  the  program  stepped  one  step  back.

Now,  we  have  the  data  table  in  a  situation  where  we   shifted  up  the  objective  curve.  Now  we  can  use  this  shift  for  all  the  other  measurements,  for  all  the  other  sensor  results  that  we  had  for  this  device  and  start  the  analysis.

That  was  it.  I  hope  I  could  inspire  you  a  little  bit.  It  was  an  interesting  presentation.  If  you  have  any  questions,  please  don't  hesitate  to  contact  me.  My  email  was  on  top  of  the  presentation,  bernd.heinen@stabero.c om.  Thank  you  very  much.