Choose Language Hide Translation Bar

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Saturday, March 4, 2023
Do you or your colleagues ever wonder if JMP can do a particular analysis? It probably can, and now in JMP 17 there is a new way to find out how. JMP Search is a new help capability that can guide you to JMP features and other resources like the JMP Community. Whatever your experience with JMP, it is designed to help find new-to-you features or re-find features you have used but forgotten. You will learn to use JMP Search for data manipulation, statistical analyses, and visualizations. And we will touch on some of the underlying technology behind JMP Search and how you can use it yourself.     Hi,  my  name  is  Evan  McCorkle.  I'm  a  developer  with  JMP,  and  I'm  here  today  to  talk  to  you  about  a  new  feature  in  JMP  version  17  called  JMP  Search.  It's  available  under  the  Help  menu.  There's  Search  JMP,  and  there's  accelerator  key,  CTRL+ comma  here.   The  idea  behind  this  feature  is  to  help  you,  whether  you're  a  new  user  of  JMP  or  an  experienced  user  of  JMP,  to  find  features  within  JMP  that  will  help  you  to  get  your  job  done. I'm  going  to  go  through  this  demo  today  using  one  of  our  sample  data  tables  you  may  have  seen  before,  Big  Class.  We're  not  going  to  focus  on  any  of  the  actual  statistical  analysis  or  anything  like  that.  I'm  just  going  to  focus  on  using  JMP  search  to  find  different  features  within  JMP  to  use  on  that  table. I  can  start  by  opening  JMP  search  and  you  get  a  dialog  here.  I'm  going  to  type  Big  Class.  I  misspelled  it,  but  that's  okay.  It  knows  what  I  meant.   I  see  results  here  in  a  details  pane  over  here.  I'm  going  to  go  ahead  and  click  on  that  and  bring  open  the  data  table.  From  here,  we  can  do  a  lot  of  data  table  cleanup,  data  table  manipulation.  For  instance,  what  if  I  wanted  to  exclude  all  the  men  in  this  table?  I  can  click  on  this  and  I  don't  quite  remember  where  this  feature  is,  but  it's  something  about  finding  matches  or  something  like  that. I  can  bring  that  up  by  the  accelerator  key,  CTRL+ comma,  I  can  type  Matches  and  look  through  the  results  here.   I  see  under  the  rows  or  triangle,  Row  Selection,  Select  Matching  Cells.   I  think  that's  what  I  want.  I  could  learn  more  by  looking  at  the  topic  help  here,  but  I  think  I  just  want  to  run  it.  I  could  run  through  the  Show  Me  dialog  there  or  Show  Me  button  there,  or  I  can  just  double  click  on  this  item  and  I  see   a  guided  path  down  to  that  item,  just  like  if  I  were  to  open  the  Rows  red  triangle  menu. I've  selected  all  the  men  and  now  I  can  come  over  here  and  hide  and  exclude  them.   Now  let's  say  I'm  going  to  expand  this  table.  I'm  going  to  change  this  from  40  rows  to  millions  of  rows.   I  might  want  to  turn  compression  on  to  do  that.  I  can  go  open  search  and  type  in  Compression  and  I  see  a  couple  of  options  here.   One  under  the  red  triangle  for  the  table,  this  one,  I  see  Compressed  File  When  Saved  and  then under  Utilities,  Compressed  Select  Columns. Let's  do  that  under  utilities,  Compressed  Select  Columns.   If  we  look  at  that,  we  see  it's  turned  age,  which  is  my  selected  column  into  a  one  by  integer  compressed  column.   We  don't  need  to  reopen  the  search  because  I  remember  under  this  red  triangle,  there's  that  compression  option.  I'm  going  to  go  ahead  and  bring  these  back.  From  the  data  table,  we  can  do  other  things  like  splitting  and  stacking,  and  joining,  and  recoding  column  names,  etc.  But  of  course  JMP  is  more  than  just  data  tables.  We  also  have  statistical  analysis  and  statistical  platforms.   Let's  look  at  some  of  that. I  can  come  into  here  and  I  can  type  Anova.  There's  a  couple  of  options  here,  but  I  want  to  look  at  these  first  two.   One,  there's  a  tutorial,  and  I  might  want  to  go  through  that  a  little  bit  later,  but  not  now. Under  Analyze  Fit  Y  by  X,  under  One  way,  there's  Means/Anova .  If  I  look  at  this  diamond  here,  I  can  see,  when  I  hit  Go,  it's  going  to  launch  the  platform  launcher  dialog,  and  it's  going  to  ask  me  to  put  columns  into  Y  and  X,  and  turn  knobs  and  flip  switches,  and  depending  on  the  data  used  and  the  options  chosen,  this  Means/ Anova  may  not  be  available  to  me,  or  it  might  be. In  particular,  if  we  look  here,  it  says  this  option  is  only  available  when  X  has  more  than  two  levels,  but  I  think  that's  going  to  be  okay.   Let's  hit  Go.  It  brings  up  in  the  Fit  Y  by  X  dialog,  I  want  to  do  height  by  age.  We  see  one  way  here,  that's  just  what  I  wanted.  Let  me  say  okay,  and  we  have  age  has  more  than  two  levels,  so  I  can  bring  back  open  the  search  and  search  again  for  that.  I  see,  it's  just  right  under  the  red  triangle,  second  one  down,  we  can  turn  that  on. Then  I  want  to  do  a  letters  report.  I  can't  really  remember  connecting,  connected...  Something  about  letters.  Let's  look  at  that. I  see  a  couple  options  with  Fit  Model,  and  I  see  one  under  One way,  and  I'm  already  in  One way.   It  would  be  great  if  this  is  appropriate  to  use  Oneway  to  do  this.   It's  not  available  to  me  right  now,  but  I  see  it's  under  One way  Means  Comparisons,  and  I  see  lots  of  different  techniques  in  here  with  Student's t  and   Tukey.   Let's  just  look  at  that  under  Compare  Means,  student's  T.   I'm  thinking  if  I  do  this,  and  then  from  here  we  look  at  letters  again,  we  can  see  under  the  outline  for  One way,  and  then  under  the  outline  for  Means  Comparison,  we  have  a  red  triangle  and  Connecting  Letters  Report  is  actually  already  on. If  we  go  down,  we  can  see  it  under  this  Means  Comparison.  I've  already  got  what  I  wanted.   We  talked  about  data  tables.  This  is  some  statistical  platforms,  some  statistical  analysis.   Now let's talk  about  some  visualization.  I  go  back  to  the  table  and  I'm  actually  going  to  go  down  and  look  at  this  Fit Polynomial  on  Bivariate.  I want  to  do  height  by  weight,  bring  up in   Bivariate  here  and  search  for  Quadratic  again. I  see  this  red  triangle  entry.  This  is  another  red  triangle  entry,  but  I'm  going  to  look  at  this  one.  Do  a  Quadratic  Fit.  I'm  not  quite  sure  about  that  fit,  but  I  know  I  don't  like  the  red,  so  let's  change  it  to  something  else.  Let's  change  it  to  blue.   Under  that  red  triangle  there,  we  can  change  the  line  color , I'm going to   change  it  to  a  blue.   I  can  bring  this  up  and  now  I  want  a  little  more  in  the  visualization  here. Let's  look  at  the  Nonparametric  Density  and  turn  that  on.  That's  a  couple  options  in  terms  of  visualization  in  this  frame  within  Bivariate,  but  it's  also  available  in  other  platforms  and  in  other  situations  within  JMP.   To  go  back  to  the  table  here,  I  want  to  call  a  couple  of  things  out.   When  I  showed  this,  I  typed  matches,  but  we  got  results  for  matching  and  matched,  and  that's  because  we  are  doing  stemming  on  the  search  query  and  on  all  the  content  within  JMP. This  search  will  work  no  matter  what  your  display  language  is  within  JMP.  It'll  work  in  English  and  French,  and  Italian,  and  German,  and  Spanish,  and  Japanese,  and  Korean,  and  simplified  Chinese. No  matter  what  your  display  language  is  in  JMP,  you  can  search  in  that  language  and  then  get  results  here  in  that  language  that  are  localized  for  you,  as  well  as  in  the  details  pane  over  here  and  navigate  in  the  same  way  that  I  did  in  English.   The  technology  that  we're  using  to  turn  matches  into  results  for  matching  and  matched  is  also  available  for  you  to  use  for  yourself  in  your  your  own  data  tables  through  Analyze  Text  Explorer. Within  Text  Explorer,  if  you  have  a  bunch  of  text  data  to  look  at,  you  can  go  and  tell  it  what  language  the  column  is  in  and  what  stemming  you  want  and  what  tokenization  you  want.   JMP  will  do  that  same  collapsing  of  different  conjugation  of  words  and  things  like  that  into  a  single  form.   That  same  technology  is  the  technology  that  we're  using  within  JMP  search  to  help  it  to  work  in  whatever  display  language  you  happen  to  use  JMP  in. With  that,  we've  gone  through  JMP  search  for  data  tables,  statistical  tests,  and  visualization.   Again,  I  hope  that  JMP  search  will  help  you  if  you're  a  new  user  or  an  experienced  user,  to  either  find  new  things  or  re-find  things  that  you  knew  about  but  you  maybe  forgot  where  they  were,  and  to  use  JMP  to  help  you  to  get  your  job  done  quickly  and  easily. Again,  that's  JMP  Search  under  Help,  Search  JMP,  and  it's  available  in  JMP  version  17.  Thank  you  very  much.
This presentation is an extension of a case study presented at a Discovery conference a few years ago, where a client’s protocols required a Gauge R&R study to be performed before running a functional response designed experiment. As in a standard Gauge R&R study, there were several Parts, several Operators, and several replicates per combination of Part and Operator. However, in this case, the test equipment returned a set of curves as the response instead of a single point. A functional random effects model is appropriate for this type of data. In this application, the functional model is expanded using basis splines and then expressed as a mixed model, where variance components can be estimated using standard methods. This is done using the Functional Data Explorer and Fit Mixed platforms. Due to the functional model expansion, multiple variance components may be associated with each of the Part, Operator, and Part*Operator terms. It is shown that these variance components can be summed and written in the form of a standard Gauge R&R computation, therefore providing a Functional Gauge R&R analysis.     Hi,  my  name  is  Colleen  McKendry,  and  I  am  a  senior  statistical  writer  at  JMP,  but  I  also  like  to  experiment  with  functional  data.  This  presentation  is  an  extension  of  when  Flash  was  inspired  by  a  presentation  that  was  originally  done  in  2020  titled  Measurement  Systems  Analysis  for  Curve  Data. There  was  also  a  slightly  earlier  presentation  at  a  JSM  conference  in  2019.  My  talk  is  essentially  how  I  would  go  about  solving  the  problems  that  were  presented  in  those  original  papers.  I'll  discuss  them  a  little  bit  more  later  too. First,  a  little  bit  of  background  on  measurement  systems  analysis.  MSA  studies  determine  how  well  a  process  can  be  measured  prior  to  studying  the  process  itself.  So  it  answers  the  question,  how  much  measurement  variation  is  contributing  to  the  overall  process  variation. Specifically,  the  Gage  R&R  method,  which  I'll  be  using  later  in  this  analysis,  determines  how  much  variation  is  due  to  operation  variation  versus  measurement  variation.  These  types  of  studies  are  important  and  they're  often  required  to  be  performed  prior  to  any  type  of  statistical  process  control  or  design  of  experiments. A  Gage  R&R  classical  MSA  model  is  shown  here.  For  a  given  measurement,  your  response,  Y  sub  I K  is  the  Kth  measurement  on  the  Ith  part.  In  this  model,  you  have  a  mean  term  and  a  random  effect  that  corresponds  to  the  part  and  your  error term. The  random  effect  in  the  error  term  are  normally  distributed  random  variables  with  mean  zero  and  some  corresponding  variance  components.  This  is  simply  a  random  effects  model,  and  we  can  use  that  model  to  estimate  the  variance  components,  and  then  use  those  variance  component  estimates  to  calculate  the  %  gage  R&R  using  the  formula  that's  shown  on  the  screen. Here  we  have  the  same  model  but  the  crossed  version.  For  your  response,  Y  sub  IJK ,  that's  going  to  be  the  Kth  measurement  made  by  the  Jth  operator  on  the on  the  Ith  part.  A gain,  we  have  a  mean  term,  a  random  effect  that  corresponds  to  the  part,  and  now  we  have  a  random  effect  that  corresponds  to  the  operator,  and  a  random  effect  that  corresponds  to  the  cross  term,  which  is  the  interaction  between  the  operator  and  the  part,  and  of  course,  our  error  term. All  of  these  random  effects  are  normally  distributed  random  variables  with  mean  zero  and  some  corresponding  variance  component.  So  just  like  in  the  classical  model,  this  is  just  a  random  effects  model,  and  we  can  estimate  the  variance  components  and  use  them  to  calculate  the  %  gage  R&R. In  both  of  the  models  that  I  just  described,  the  response  or  the  measurement  was  a  single  point.  But  what  happens  if  this  isn't  the  case?  What  if  your  measurement  is  something  like  a  curve  instead?  This  was  the  motivation  behind  those  initial  presentations  in  2019  and  2020  that  I  talked  about. There  was  a  client  of  JMP  that  was  a  supplier  of  automotive  parts,  and  they  had  a  customer  that  specified  that  a  part  needed  to  have  a  specific  force  by  distance  curve.  Obviously,  the  client  wanted  to  match  their  customer  specified  curve,  and  so  they  wanted  to  run  a  functional  response  DOE  analysis  in  JMP  to  design  their  product  to  do  that. However,  before  spending  the  money  on  this  type  of  experiment,  they  first  wanted  to  perform  an  MSA  on  their  ability  to  actually  measure  the  parts  force.  There's  a  lot  more  details  in  the  paper  noted  at  the  bottom,  so  if  you're  interested  in  more  of  the  background,  please  see  that. This  is  what  the  data  looks  like.  We  have  force  on  the  Y  axis  and  distance  on  the  X  axis,  and  the  curves  are  colored  by  part.  It  looks  like  there  are  only  10  curves,  but  there  are  actually  250  curves  in  total.  It's  just  that  a  lot  of  the  curves  are  clustered  together. In  the  data,  there  were  10  parts,  five  operators,  and  five  replications  per  part  operator  combination.  I  just  wanted  to  note  that  this  is  simulated  data,  and  it's  simulated  to  look  similar  to  the  actual  data,  but  that  we  aren't  sharing  any  proprietary  data  here. A  few  function  characteristics  that  I  wanted  to  point  out,  the  functions  are  actually  all  different  lengths,  so  they  have  a  different  number  of  observations  in  their  curve.  Although  the  functions  were  collected  at  equally  spaced  time  intervals,  they  were  not  collected  at  equally  spaced  distances.  That  means  there's  no  true  replication  in  terms  of  distance. When  this  project  was  first  presented,  one  of  the  original  ideas  thrown  out  was  whether  we  could  pick  a  set  of  distance  locations  and  do  a  standard  gage  R&R  MSA  at  each  of  those  locations  and  then  summarize  that  information  for  a  final  result. The  problem  with  that  is  that  if  we  picked  a  specific  location,  there  wasn't  a  guarantee  that  there  would  be  an  observation  for  each  of  those  curves  because  there  wasn't  replication  for  the  distance.  Another  problem  that's  more  generalized  is  with  this  type  of  curve  data,  doing  point  wise  analysis  like  that  does  not  take  into  account  the  within  function  correlation. Luckily,  there's  a  whole  field  of  statistics  dedicated  to  this  type  of  data  called  functional  data  analysis.  There  are  a  variety  of  techniques  to  handle  unequally  spaced  data.  A  lot  of  those  techniques  are  now  available  in  JMP  through  the  Functional  Data  Explorer  platform. The  question  became,  can  functional  data  methods  be  combined  with  traditional  MSA  methods  to  perform  some  type  of  functional  measurement  systems  analysis?  This  was  the  solution  that  was  presented  in  the  older  papers  that  I  referenced.  This  is  just  going  to  be  a  little  bit  of  a  review  of  what  they  did. First,  a  penalized  spline  model  was  fit  to  estimate  the  part  force  functions  and  so  there  were  10  functions  that  were  estimated,  averaged  over  operator  and  replicates.  Then  these  functions  were  subtracted  from  the  original  force  functions  to  obtain  a  set  of  residual  force  functions.  These  residual  functions  no  longer  contain  any  variation  due  to  the  part.  All  of  the  variation  in  those  residuals  were  due  to  the  operator  and  the  replicates. They  then  fit  a  random  effects  model  to  the  residuals to  obtain  the  corresponding  variance  components  from  the  model.  A  graphical  method  was  then  used  to  find  the  smallest  part  variance  to  use  as  an  estimate  for  the  part  variance  component. This  was  then  used  to  calculate  a  type  of  worst  case  scenario  %  gage  R&R.  This  method  they  worked  fairly  well.  They  got  results  that  made  sense  and  the  client  was  happy.  But  this  was  just  a  generalization  of  a  standard  MSA  with  some  functional  components  sprinkled  in. When  I  looked  at  this  data  and  when  I  looked  at  the  problem,  I  wanted  to  try  to  take  a  more  traditional  functional  approach.  I  have  a  background  in  functional  data  analysis,  and  that  is  what  my  dissertation  was  on,  specifically  functional  mixed  models.  There  was  a  chapter  in  my  dissertation  dedicated  to  estimating  and  testing  the  variance  components  from  functional  mixed  models. I  did  that  by  expanding  the  functional  model  using  eigen  function  or  basis  function  expansions  and  rewriting  it  as  a  mixed  model  and  then  using  known  techniques  to  estimate  those  variance  components.  I  started  to  think,  could  I  use  the  same  type  of  technique? I  don't  need  a  full  mixed  model.  I  only  have  random  effects  here.  Can  I  create  a  functional  random  effects  model  for  the  part  and  operator  variance  components? This  is  what  I  came  up  with  for  a  functional  MSA  across  models,  since  we  do  have  an  operator  term.  For  functional  models,  they're  set  up  a  little  bit  differently  because  they're  all  based  around  the  input.  In  this  case,  your  response,  Y  sub  IJK  is  the  Kth  replicate  made  by  the  Jth  operator  on  the  ith  part,  but  this  time  at  a  particular  distance,  D. We  have  a  functional  mean  term,  a  functional  random  effect  that  corresponds  to  the  part,  a  functional  random  effect  that  corresponds  to  the  operator,  and  a  functional  random  effect  that  corresponds  to  the  cross  term  and  our  error  term. In  this  method,  I  subtract  the  mean  term  over,  and  so  I'm  left  with  this  set  of  residuals,  and  that's  when  I'm  going  to  model.  This  here  represents  the  eigen  function  expansion  of  the  functional  model.  We're  going  to  have  capital  B  eigen  functions and  sum  all  of  those  parts  together  to  create  this  big  long  random  effects  model. But  for  one  eigen  function,  what  is  shown  in  these  brackets  is  the  expansion.  For  each  functional  random  effect,  it's  split  into  two  parts.  We  have  a  functional  part  and  then  just  a  regular  part.  The  functional  part  is  taken  care  of  by  evaluating  those  eigen  functions.  Then  we  have  just  standard  random  terms  for  the  part,  the  operator,  and  the  cross  term. Then  what  this  essentially  does  is  you  build  this  long  random  effects  model,  and  then  you  have  a  number  of  variance  components  for  each  term.  For  the  MSA,  there  will  now  be  three  sets  of  capital  B  variance  components.  There's  going  to  be  capital  B  part  variance  components,  capital  B  operator  variance  components,  and  capital  B  cross  term  variance  components. Because  of  the  way  eigen  functions  are  structured,  they  are  known  to  be  independent  and  so  we  can  assume  that  based  on  how  we  structured  the  model,  all  of  these  variance  components  are  actually  independent  from  each  other  also.  That  means  we  can  sum  them  together  to  obtain  these  functional  variance  components. Since  we're  just  summing  them  together,  we  can  also  substitute  them  into  the  formula  for  the  %  gage  R&R  and  compute  that  just  like  we  did  in  the  standard  models. How  do  I  actually  do  this  in  JMP?  Well,  it's  a  multi  step  process.  I'm  going  to  briefly  outline  it  here,  and  then  I'm  going  to  do  a  demo  for  you.  First,  I'm  going  to  estimate  the  mean  curve  in  FDE  and  obtain  the  residual  curves.  I'm  then  going  to  model  those  residual  curves  serves  also  an  FDE  to  obtain  the  eigen  functions  needed  for  the  eigen  function  expansion. I'm  going  to  save  those  eigen  functions  to  the  original  data  table  and  use  them  in  FitMix.  Using  FitMix,  I'm  going  to  fit  a  random  effects  model  to  the  original  data  using  nesting  and  the  eigen  functions  to  define  the  appropriate  model  specifications. Hopefully,  that  all  makes  a  little  bit  more  sense  once  I  demo  it.  We're  going  to  exit  out  of  here.  Here  is  our  data,  and  we  have  a  column  for  the  ID  variable,  a  column  for  the  part  variable  that  defines  the  10  parts,  the  operator,  which  defines  the  five  operators,  our  distance  column,  and  our  force  column. Just  as  a  reminder,  this  is  what  the  data  looked  like.  My  first  step  is  to  estimate  the  mean  function  of  the  force  curves  and  then  use  that  to  obtain  some  residuals.  To  do  that,  I'm  going  to  model  the  force  functions  in  FDE.  I'm  going  to  go  to  the  Analyze  menu,  Specialize  Modeling  and  select  Functional  Data  Explorer. I'm  going  to  define  force  as  my  output,  distance  as  my  input.  Then  because  I  want  the  mean  function  averaged  over  all  of  the  IDs,  I'm  not  going  to  specify  an  ID  function  here.  I'm  going  to  click  Okay.  We  have  our  basic  intro  FDE  report.  But  I  want  to  fit  a  model,  so  I  can  go  to  the  red  triangle  menu,  models. Technically,  you  could  fit  any  of  those  models.  I  just  chose  a  B  spline  because  it's  first,  it's  easy,  and  it'll  just  take  a  few  seconds  to  run  here.  Okay,  so  here's  our  model  fit.  There's  a  red  line  here  that's  pretty  hard  to  see,  but  that  is  what  is  the  mean  function,  the  estimated  mean  function.  I'm  going  to  give  you  a  better  picture  of  that  in  a  minute  or  so. But  I  actually  want  to  save  the  functions  for  this  meme.  I  can  do  that  by  going  to  this  Function  Summaries  report.  I  can  click  the  red  triangle  menu  and  select  Customize  Function  Summaries.  I  only  want  the  same  formulas,  so  I'm  going  to  deselect  them  all  and  then  reselect  that  one  and  click  OK  and  Save. I  get  a  new  data  table  with  what  appears  to  be  this  lonely  little  entry  here.  There  is  a  hidden  column,  so  I'm  going  to  unhide  that.  We  have  a  distance  column  and  then  we  have  this  force  mean  functional  formula.  I'm  going  to  take  a  look  at  that,  what  that  actually  looks  like. When  we  look  at  the  formula  column,  we  can  see  that  this  is  a  function  of  distance.  For  any  value  of  distance,  this  is  going  to  be  evaluated  to  give  what  the  mean  function  is  at  that  distance.  This  formula  column  can  be  put  into  any  data  table  that  also  contains  a  distance  column.  That's  exactly  what  we're  going  to  do. We're  going  to  make  sure  this  is  highlighted.  We're  going  to  right  click  and  select  copy  column  properties.  Then  we're  going  to  find  our  way  back  to  our  original  data  table.  Double  click  to  create  a  new  column.  I'm  going  to  right  click  here  and  do  paste  column  properties.  Now  we  have  the  mean  force  evaluated  at  every  level,  every  distance  value  in  our  data  table. We  can  use  that  to  now  find  our  residual  function.  I'm  going  to  double  click  again  to  create  a  new  column  and  title  it  Force  Resids.  Now  I'm  going  to  create  my  own  formula  column  that  is  simply  going  to  be  force  minus  the  mean  force.  I'm  going  to  click  OK.  Now  we  have  our  set  of  residuals  and  this  is  what  it  looks  like. In  the  top  graph,  these  light  gray  curves  are  the  original  functions  from  the  force  column.  This  red  line  is  the  same  red  line  that  I  tried  to  show  you  in  the  FDE  report  that  was  hard  to  see,  but  this  is  what  that  was.  That's  the  mean  function. Then  the  bottom  graph  in  green  shows  the  residual  curves.  These  are  the  curves  that  I'm  going  to  use  to  proceed  with  my  analysis.  My  next  step  is  going  to  be  to  model  the  residual  curves  using  FDE  to  obtain  the  eigen  functions  that  I  need  for  the  model  expansion.  I'm  going  to  go  to  the  Analyze  menu  again,  select  Specialized  Modeling,  Functional  Data  Explorer. This  time  I'm  going  to  specify  the  residuals  as  the  output,  distance  as  the  input,  and  I  am  going  to  specify  my  ID  column  this  time.  I'm  going  to  click  OK.  We  have  our  initial port  here,  and  now  I  want  to  fit  a  model  to  this  data.  I  go  to  the  red  triangle  menu  to  look  at  the  models. Again,  technically,  you  can  fit  any  of  these  models.  In  my  experimentation,  I  found  that  these  top  three,  they  took  a  really  long  time  to  fit,  and  they  didn't  provide  super  great  fits  for  what  I  needed.  The  wavelets  models  and  the  direct  functional  PCA  were  much,  much  quicker  and  while  also  providing  better  fits. The  caveat  with  those  two  models  is  that  they  require  the  data  to  be  on  an  evenly  spaced  grid.  A s  I  mentioned,  when  I  introduced  the  data,  that's  not  the  case.  However,  in  FDE,  we  have  some  data  processing  steps  that  help  us  manipulate  our  data  a  little  bit.  We  can  go  to  the  clean  up,  reduce,  and  this  first  tab  is  what  we  want,  and  we  can  use  that. Now  that  just  puts  it  so  that  every  distance  value  has  an  observation,  has  a  force  residuals  observation.  Now  we  can  go  ahead  and  fit  one  of  those  models.  I  just  chose  direct functional  PCA.  A s  you  can  see,  it  was  very  quick.  The  fitting  was  super  fast. This  functional  PCA  report  is  where  we're  going  to  get  all  of  our  information  that  we  need.  But  I  was  just  going  to  scroll  down  to  look  at  the  data  fit  a  little  bit.  We  can  see  that  these  look  pretty  good.  Then  if  we  look  at  the  diagnostic  plots,  these  are  on  the  diagonal,  the  residuals  look  good.  This  looks  like  a  pretty  good  fit,  and  I  can  use  this  information  from  the  Functional  PCA. In  the  Functional  PCA  report,  we  have  a  table  of  eigen values  and  then  these  graphs  of  our  shape  functions.  The  shape  functions  are  actually  our  eigen  functions.  They're  just  called  shape  functions  in  JMP.  How  these  functions  work  is  that  your  original  input,  so  distance,  is  on  the  X  axis,  and  then  the  eigen  function  evaluation  is  on  the  Y  axis. For  any  distance  D,  you're  going  to  have  an  evaluation  at  eigen  function 1 ,  an  evaluation  at  eigen  function  2,  and  so  on.  You  can  use  the  eigen  functions  to  get...  You  can  use  a  linear  combination  of  the  eigen  functions  to  get  an  estimate  of  the  original  functions. The  eigenvalues  table  gives  you  an  idea  of  how  much  %  of  the  overall  data  variation  you're  taking  into  account  when  you  use  a  certain  number  of  eigenvalue,  eigen  function  pairings.  In  this  case,  the  first  eigenvalue  and  eigen  function  pairing  actually  accounts  for  99.9  %  of  the  total  variation  in  the  data,  which  is  actually  pretty  incredible. This  is  important  in  determining  how  many  eigen  functions  you're  going  to  use  for  the  basis  expansion.  Typically  when  you're  selecting  a  number  of  eigen  functions  to  use,  you  don't  actually  want  to  use  all  of  the  eigen  functions  that  you're  given. You  want  to  use  the  least  number  of  eigen  functions  that  still  account  for  an  adequate  amount  of  variation  in  the  data.  This  is  because  the  more  eigen  functions  you  use,  the  bigger  your  random  effects  model  is  going  to  be,  and  it's  going  to  be  harder  to  estimate. What  does  accounting  for  an  adequate  amount  of  variation  in  the  data  mean?  It  can  mean  different  things  to  different  people  in  different  fields.  Typically,  when  I'm  working  with  this,  I  have  used  in  the  past  about  90 %  as  a  cut  off.  If  I  was  just  doing  this  analysis  to  do  the  analysis,  I  might  only  take  this  first  eigen  function  since  it  explains  so  much,  and  run  with  that. For  demonstration  purposes,  I'm  going  to  take  the  first  two  just  so  you  can  see  what  the  model  expansion  looks  like  using  two  eigen  functions.  To  save  these,  I'm  going  to  go  to  this  Function  Summaries  report  again,  click  on  the  red  triangle  menu  and  select  Customize  Function  Summaries. I  just  mentioned  that  I'm  only  going  to  save  two  of  them,  so  I  want  to  enter  two  here.  I'm  going  to  deselect  these  all  again  and  only  save  the  formulas.  I  can  click  OK  and  Save  and  I  have  another  new  data  table  now.  The  things  that  I  need  are  actually  hidden.  We  want  to  look  for  these  force  resids  shape  functions  which  represent  our  eigen  functions. We  can  unhide  those  so  that  they're  now  included  in  our  data  table.  We  can  take  a  look  at  these  formula  columns.  Just  like  with  our  mean  function,  this  is  simply  a  function  of  distance.  For  any  value  of  distance,  these  formulas  are  going  to  give  you  what  the  eigen  function  value  is  at  that  distance. Also,  like  the  main  function,  we  can  put  these  formula  columns  into  any  data  table  that  also  contains  a  distance  column.  That's  what  we're  going  to  do  again.  We're  going  to  put  these  formula  columns  into  our  original  data  table. We're  going  to  make  sure  both  of  these  are  highlighted  and  right  click  and  select  copy  column  properties.  Again,  find  our  way  back  to  our  original  data  table.  I  want  to  add  two  new  columns  to  my  data  table.  I'm  going  to  go  to  calls,  new  columns.  We're  just  going  to  title  them  E1  and  2  to  represent  the  eigen  functions.  I  want  to  add  two  columns  and  I  want  to  add  them  as  a  group.  I'll  click  okay. Then  with  these  two  new  columns  highlighted,  I'm  going  to  right  click  and  select  paste  column  properties.  Now  we  have  our  eigen  functions  evaluated  for  every  distance. Just  to  give  you  an  idea,  this  is  what  they  look  like  and  it's  the  same  graph.  It's  almost  the  same  graph  as  what  was  in  the  FDE  report.  We're  just  taking  what  was  graphically  there  and  now  we  have  the  numbers  for  everything,  for  every  value  here. Now  we  want  to  do  the  eigen  function  expansion  and  expand  our  functional  model.  But  what  does  that  model  actually  look  like  when  you  have  two  eigen  functions?  I'm  going  to  hop  back  over  to  my  slides  real  quick  and  show  you  what  the  model  expansion  looks  like  when  capital  B  equals  2. I  have  this  divided  into  a  section  for  the  part,  the  operator,  and  the  cross  term.  Then  within  each  of  these  sections,  we  see  that  we  have  a  term  that  involves  eigen  function  one  and  a  term  that  involves  eigen  function  two.  Essentially,  this  means  that  we're  going  to  have  two  variance  components  for  part,  two  variance  components  for  the  operator,  and  two  variance  components  for  the  cross  term. Now  I'm  going  to  go  back  to  my  data  table.  I  want  to  fit  this  model  using  Fit  Model.  I'm  going  to  go  to  the  Analyze  menu  and  select  Fit  Model.  I  want  to  specify  my  personality  as  the  mixed  model.  Now  I'm  going  to  specify  the  residuals  as  my  Y.  Then  I'm  going  to  move  down  to  the  effects  section. In  the  fixed  effects  tab,  I  don't  have  any  fixed  effects  and  I  also  don't  have  an  intercept  because  I  mean  term  over  originally.  Now  I'm  going  to  move  to  the  random  effects  tab.  Here  is  where  I'm  going  to  use  the  eigen  functions  and  the  part  and  operator  variables  and  nest  them  in  an  appropriate  way  to  define  the  model  that  I  just  showed  you. We  can  add  both  of  these  eigen  functions  and  we're  going  to  select  these  and  also  select  part.  We're  going  to  nest  part  in  each  eigen  function.  Then  we're  going  to  do  the  same  thing  for  operator  and  the  same  thing  for  the  cross  term.  That's  how  we're  going  to  define  our  model. The  last  thing  I  want  to  do  in  this  launch  window  is  deselect  the  Unbounded  Variance  Components  option.  When  this  is  selected,  it  means  that  you  can  have  negative  estimates  or  variance  components,  and  I  don't  want  that. Any  negative  estimates  are  just  going  to  be  set  to  zero.  Now  I  can  run  this,  and  we  have  our  report  here.  This  is  the  table  that's  going  to  give  us  our  estimates  that  we  need  to  calculate  the  %  gage  R&R,  but  I'm  going  to  poke  around  the  report  a  little  bit  first. This  actual  by  predicted  plot  looks  really  weird  at  first,  but  it  makes  sense  when  you  think  about  it.  Since  we  don't  have  any  fixed  effects  or  an  intercept,  when  we  don't  take  the  random  effects  into  account,  our  estimate  for  everything  is  just  zero.  When  we  do  take  those  random  effects  into  account,  we  see  that  the  actual  by  conditional  predicted  plot  looks  a  lot  better  and  that  these  observations  fall  pretty  well  along  this  diagonal. We  can  also  take  a  look  at  the  conditional  residual  plots,  and  we  can  see  that  they're  pretty  small  and  they're  centered  around  zero.  We  do  have  some  deviation  from  this  line  here,  but  there's  nothing  super  crazy  about  the  residuals.  I  feel  okay  about  using  these  estimates  to  calculate  the  percdent  gage  R&R.  I  actually  pulled  this  table  and  put  it  back  into  my  slides.  I'm  going  to  go  back  to  my  slides  for  the  remainder  of  the  presentation. Here's  that  data  table,  not  data  table,  the  report  table  that  was  just  there.  As  you  can  see,  you  have  a  variance  component,  S2  variance, components,  estimates  for  part,  two  for  operator,  and  two  for  the  cross  term.  A s  I  mentioned  when  I  was  describing  the  model,  we  can  sum  these  together  to  calculate  the  functional  variance  components. These  specific  numbers  aren't  as  important  in  this  analysis  as  what  you  get  when  you  put  them  into  the  formula  for  the  %  gage  R&R.  So  when  I  do  that,  we  get  a  %  gage  R&R  of  3.3030.  That  is  what  Baren  team  defines  as  an  acceptable  measurement  system.  If  this  had  been  my  project,  I  would  have  gone  back  to  the  client  and  said  that  it  seems  like  your  measurement  system is  accurate,  you  can  go  ahead  and  proceed  with  your  design  of  experiments. That's  basically  it  for  this  analysis  for  this  particular  data.  Just  some  thoughts  that  I  had.  This  result  was  actually  very  similar  to  the  worst  case  scenario  %  gage  R&R  that  was  presented  in  the  2019  JSM  presentation.  It  was  higher  by  just  a  few  decimal  places.  It'd  be  really  interesting  to  compare  these  methods  and  other  data  sets  to  see  if  they  are  always  similar  or  if  this  was  just  a  happy  coincidence. I  don't  have  much  experience  at  all  with  measurement  systems,  and  so  I  don't  have  any  other  data  to  play  around  with  or  even  really  know  how  to  obtain  it.  If  anybody  has  any  data  that  they  think  might  apply  to  this  type  of  project,  any  functional  data  that  also  they  might  want  to  do  an  MSA  on,  I'd  be  really  interested  in  hearing  about  it. For  some  future  work,  some  thoughts  that  I  had  was,  the  first  one  was,  should  I  add  a  functional  random  effect  for  the  ID?  This  is  very  commonly  done  in  a  lot  of  functional  mixed  models,  at  least  in  the  fields  that  I  worked  in.  This  was  a  big  contribution  in  my  dissertation  was  the  use  of  this  functional  random  effect  for  ID. Typically,  this  captures  the  within  function  correlation  across,  in  this  case,  distance.  I  played  around  with  this  random  effect  in  this  data,  and  every  model  I  used,  the  number  of  eigen  functions  I  used,  it  didn't  matter.  The  variance  component  associated  with  this  random  effect  always  came  out  to  be  zero,  and  that's  not  useful. I  think  in  this  case,  once  you  took  into  account  the  variance  from  the  part  and  the  operator,  there  just  wasn't  any  variation  left  to  account  for.  I  don't  know  if  that's  true  for  all  functional  MSA  studies  or  if  this  was  just  true  for  this  particular  data.  If  I  was  ever  able  to  get  my  hands  on  some  different  data,  this  is  definitely  something  I  would  keep  in  mind  to  see  if  it  could  be  added  to  a  model  in  any  other  data  sets  more  successfully. I  also  think  it  would  be  cool  if  we  could  calculate  a  confidence  interval  for  the  %  gage  R&R.  And  finally,  I  wanted  to  talk  about  the  one  thing  I  wasn't  super  happy  with  in  this  project,  which  was  the  residuals.  What  was  wrong  with  them? These  are  graphs  of  some  different  models  and  the  residuals  for  each  one.  I'm  going  to  go  back  and  forth  between  this  slide  and  the  next  one.  So  yes,  the  residuals  are  relatively  small.  They're  centered  around  zero.  There's  no  crazy  spikes  or  outliers,  and  that's  good.  That's  what  we  want. However,  in  all  the  models  I  fit,  I  just  still  didn't  love  how  they  looked  across  distance.  Looking  at  the  residuals  this  way  is  especially  important  when  working  with  functional  data.  This  is  because  it  can  really  show  when  you're  not  capturing  all  the  functional  parts  of  the  data. A  lot  of  times  in  functional  data,  you  see  this  fanning  effect  where  the  residuals  are  really  good  in  the  beginning,  and  then  as  you  get  towards  the  end  of  your  domain,  they  fan  out  a  little  bit.  This  data  actually  had  almost  the  opposite  problem.  We  can  see  that  the  residuals  are  a  little  bit  wider  in  the  beginning  of  the  domain  and  get  closer  to  zero  as  distance  gets  larger. There's  also  definitely  some  type  of  cyclical  pattern  in  these  residuals.  I  don't  think  it's  the  end  of  the  world.  I  think  they're  super  centered  around  zero,  but  you  can  see  in  these  graphs  that  there's  clearly  some  up  and  down  patterns.  Essentially  what  that  means  is  that  I'm  missing  something.  I'm  not  capturing  the  full  functional  nature  of  the  data,  and  I  don't  really  know  why  yet. I'd  really  like  to  figure  that  out  and  fit  an  even  better  model,  whether  that's  possible  with  this  data  or  different  data  in  the  future.  I'm  not  sure,  but  it's  definitely  something  I  want  to  spend  a  little  more  time  on,  and  I'd  be  open  to  any  discussion  anyone  would  like  to  have  about  it. That's  it  for  me.  Thanks  for  watching.  If  you  have  any  questions,  suggestions,  questions  or  feedback,  feel  free  to  email  me.  Thank  you.
Saturday, March 4, 2023
Climate change is a reality. The Paris climate goal of limiting global warming to 1.5 degrees is barely achievable. The only remaining question is: what can we still do to keep the consequences reasonably limited? How many conferences have you flown to last year? How many times a month do you eat meat? Do you drive to work? All of these questions matter, but how much? In this talk, we will try to find a data-based answer to this question using JMP as enabling software, and we will show how each one of us can contribute to preventing climate catastrophe. I am a mathematician, not a climatologist, but we need to get involved—all of us.      Thanks  for  tuning  into  my  talk.  My  name  is  David  Meintrup.  I'm  P rofessor  at   Ingolstadt  University  of  Applied  Science.   Today  I'm  going  to  talk  about  the  Earth,  the  climate  and  you. Let's  start  with  a  warm  up  talking  about  the  climate,  no  pun  intended.  Please  consider  these  four  actions.  Imagine  you  would  keep  doing  these  for  one  year.  No  plastic  bags,  go  vegan,  drive  fuel- efficiently,  or  always  switch  off  stand  by  modes.  Can  you  order  them  by  the  amount  that  they  reduce  your  carbon  dioxide  footprint?  No?  Well,  don't  worry,  you're  in  good  company. A  study  that  was  performed  by   A.T. Kearney i n  2019  came  to  the  conclusion  that  we  generally  have  no  clue  what  reduces  our  personal  footprint.  They  gave  people  seven  personal  actions,  from  no  plastic  bags  to  one  flight  less  per  year  to  region and  season  food,  et cetera.   You  can  see  what  people  thought,  their  answers  on  the  left  side.  I  will  give  you  the  correct  answers  at  the  end  of  my  presentation,  but  I  can  already  tell  you  that  people  were  completely  wrong. I  stumbled  across  this  study  a  year  ago,  and  I  will  openly  admit  I  had  no  idea  either.  But  how  are  we  going  to  save  our  planet  if  we  don't  know?   This  was  the  motivation  for  today's  presentation. I'm  mathematician  by  education,  and  I've  been  promoting  statistical  literacy  for  many  years.  There's  this  famous  quote,  "Statistics  is  too  important  to  be  left  to  statistic ians  and  I  think  that  the  same  is  true  for  the  climate.  We  need  climate  literacy  to  know  and  understand  the  Earth's  climate,  impacts  of  climate  change,  and  approaches  to  adoption  and  mitigation. In  the  same  spirit,  I  would  like  to  say  climate  change  is  too  important  to  leave  it  to  climatologists.  And  despite  the  fact  that  I'm  a  mathematician,  I  wanted  to  study  the  topic  and  talk  about  it.  In  one  sentence,  the  goal  of  my  presentation  today  is  to  increase  my  own  and  everybody  else's  climate  literacy. I  would  like  to  do  this  by  answering  three  questions.  Why  exactly  does  climate  change  happen?  Since  when  do  we  know?  And  what  can  each  and  everybody  of  us  do  about  it? Let's  start  with  another  question.  Did  the  average  global  temperature  increase?  Yes,  no,  or  one  can't  say?  Well,  as  you  all  know,  the  answer  is  obviously  yes,  and  that  is  not  difficult  to  prove  as  one  can  simply  measure  the  temperature. H ere  you  see  the  development  of  the  global  temperature  over  the  last  140  years.  C ompared  to  the  reference  interval  from  1880 -1910,  we  have  an  increase  of  approximately  1.1  degrees  Celsius.  In  addition,  it  took  only  30  years  to  double  the  increase  from  0.5  degrees  to  one  degree. Next  question,  what  causes  global  warming?  Well,  again,  I  guess  that  you  are  all  familiar  with  the  answer.  Carbon  dioxide  emissions  from  burning  fossil  fuels  like  coal,  oil,  and  gas.  But  do  you  also  remember  why?  Why  do  these  emissions  cause  global  warming? Well,  the  answer  is  the  greenhouse  effect.  And  I  would  like  to  present  a  few  more  details  on  that.   The  temperature  on  Earth  is  completely  determined  by  the  radiation  balance.  We  have  incoming  solar  radiation  that  is  partially  absorbed  and  partially  reflected  by  the  Earth  and  the  atmosphere.   From  the  absorbed  energy,  one  part  is  radiated  back  into  space  as  heat,  and  another  part  is  absorbed  by  greenhouse  gasses  and  then  reemitted  down  to  the  Earth.  And  this  part  is  actually  what  is  called  the  greenhouse  effect,  and  that  is  causing  the  global  warming. Now,  which  gas  contributes  most  to  the  greenhouse  effect?  Is  it  water  vapor,  carbon  dioxide,  methane,  or  ozone?  I  guess  that  most  of  you  will  have  answered  carbon  dioxide,  but  actually  it's  a  trick  question.  Because  the  trap  here  is  that  I  didn't  ask  about  the  manmade  greenhouse  effect. Let's  have  a  look  at  the  details.  Greenhouse  gasses  actually  keep  us  warm.  Without  atmosphere  and  therefore  without  any  greenhouse  gasses,  the  temperature  on  Earth  would  be  on  average  minus  18  degrees.  No  life  on  Earth  would  be  possible.  Now,  if  we  add  an  atmosphere,  including  natural  greenhouse  gasses,  water  vapor,  methane,  and  carbon  dioxide,  approximately  at  a  level  of  280  parts  per  million,  then  we  have  a  natural  greenhouse  effect.   This  rises  the  temperature  from  minus  18  degrees  to  plus  50.  So  it's  a  huge  effect,  an  increase  of  33  degrees,  and  this  is  what  is  called  the  natural  greenhouse  effect.  And  the  main  gas  contributing  to  it  is  water  vapor. Now,  if  we  continue  and  we  add  anthropogenic  manmade  greenhouse  gasses,  for  example,  we  raised  the  carbon  dioxide  to  410  parts  per  million,  which  is  more  or  less  where  we  are  right  now,  then  we  also  add  another  layer  of  warming,  as  I  said  before,  approximately  1.1  degrees,  and  this  leads  then  to  an  average  temperature  of  16.1.  And  in  this  additional  manmade  greenhouse  gas  effect,  indeed,  carbon  dioxide  is  the  most  important  contributor. You  can  see  this  confirmed  on  this  slide.  R oughly  two  thirds  of  the  greenhouse  effect  is  caused  by  carbon  dioxide.  Methane  contributes  more  or  less  one  sixth.  There's  an  important  difference  between  these  two  gasses,  though,  and  this  refers  to  their  lifespan.  Every  molecule  of  carbon  dioxide  in  the  atmosphere  is  adding  to  global  warming  for  the  next  100  years  and  more.  Methane,  on  the  other  hand,  has  a  lifetime  of  about  nine  years.  So  cutting  methane  emissions  is  a  very  quick  and  good  fix  for  short  time  period.  But  on  the  long  run,  we  will  have  to  reduce  carbon  dioxide  emissions. Let's  have  another  look  at  the  greenhouse  effect.  The  first  slide  I  showed  you  was,  of  course,  an  oversimplified   as  this  is  tool,  but  it  has  some  more  details  on  it.  I  would  just  like  to  repeat  two  elements.  On  the  right  side,  you  have  the  greenhouse   this  very  important  down  welding  radiation.  And  on  the  left  side,  you  have  the  information  that  part  of  the  incoming  radiation  is  reflected  by  the  Earth's  surface.  And  of  course,  the  brighter  the  surface  is,  the  more  radiation  is  reflected.  This  is  important  to  understand  some  feedback  loops. The  feedbacks  are  self  reinforcing.  For  example,  the  most  famous  one  is  the  ice- albedo  feedback.  The  surface  of  the  ice  reflects  85 %  of  the  solar  energy,  only  15 %  is  absorbed.  The  dark  sea,  however,  only  reflects  7 %  of  the  energy  and  absorbs  93 %.  Now,  if  global  warming  induces  that  the  ice  is  melting  and  turning  into  dark  sea,  then  more  energy  is  absorbed,  causing  more  global  warming,  causing  more  ice  to  melt,  et cetera. The  same  feedback  happens  with  the  melting  of  the  permafrost  that  is  a  huge  storage  for  methane  and  carbon  dioxide.   We  can  even  see  an  increase  over  time  of  water  vapor  that  also  obviously  has  a  feedback  loop  because  warmer  air  can  store  more  vapor. Unfortunately,  these  effects  are  difficult  to  quantify.   In  fact,  they  are  not  included  in  many  models.  Let's  quickly  summarize  the  physics  that  we've  seen  so  far.  Temperature  on  Earth  is  a  question  of  radiation  balance.  The  natural  greenhouse  effect  is at  about  33  degrees  and  is  a  prerequisite  for  life  on  Earth.  The  anthropogenic  manmade  greenhouse  effect  consists  in  adding  additional  greenhouse  gasses  in  particular,  carbon  dioxide  and  methane.  Carbon  dioxide  in  the  atmosphere  increased  from  more  or  less  280  part  per  million  to  410,  inducing  an  increase  in  global  temperature  of  1.1  degrees  Celsius. If  we  want  to  stop  the  global  warming,  this  results  in  stopping  greenhouse  gas  emissions.   Let  me  turn  to  the  second  question.  Since  when  do  we  know?  Longer  than  you  think.  Many  of  you  might  be  familiar  with  the  Mathematician  and  Physicist, Jean-Baptiste Joseph Fourie  Jean  Bartiste  Joseph  Foyer.  He  was  the  first  one  to  realize  that  the  temperature  on  Earth  is  much  higher  than  one  would  expect.   The  explanation  he  came  up  with  was  that  the  atmosphere  acts  as  an  insulator,  storing  heat  that  would  otherwise  escape. Roughly  30  years  later,   John Tyndall proved  that  Fourier  was  actually  right.  He  demonstrated  that  carbon  dioxide  absorbs  and  emits  infrared  radiation.  Finally,  at  the  end  of  the  century,  the  Swedish  Chemist,  Svante Arrhenius,,  was  able  to  quantify  the  greenhouse  effect,  the  amount  of  global  warming  due  to  the  carbon  dioxide  emissions. By  the  way,  living  in  Sweden,  he  considered  this  a  positive  effect.  He  hoped  that  life  would  become  more  pleasant  with  a  little  bit  warmer  temperatures.  I  would  like  to  jump  to  the  '70s  and  show  you  30  seconds  from  a  very  popular  German  TV  show  at  that  time.  The  host  describes  in  detail  how  global  warming  works.  It's  in  German,  unfortunately,  but  I  put  subtitles  in  English  so  that  you  can  read  it.  A gain,  this  is  from  1978. [foreign language 00:12:39]. The consequences  will be  dramatic.  Isn't  it  incredible  how  precisely  they  predicted  global  warming  in  1978?  I  find  this  amazing  every  time  I  see  it.   This  TV  host  was  not  the  only  one  who  knew.  Many  companies  knew,  including. [foreign language 00:13:45] Here  is,  since  then,  Exxon.  Exxon  knew,  and  Exxon  knew  exactly.  You  might  have  heard  about  it  because  it  just  recently  made  the  news.  A  group  of  scientists  just  published  a  science  article  assessing  Exxon Global  warming  projections. I  would  like  to  read  two  sentences  from  the  abstract.  The  first  one  is,  "Their  projections  were  also  consistent  with  and  at  least  as  skillful  as  those  of  independent  academic  and  government  models."  In  other  words,  they  had  excellent  predictions  and  scientists.   The  final  sentence  says,  "On  each  of  these  points,  however,  the  company's  public  statements  about  climate  science  contradicted  its  own  scientific  data."  This  is  a  very  polite  way  to  express  that  they  invested  a  huge  amount  of  money  to  actually  dismiss  global  warming. As  we  do  have  the  documents,  I  can  show  you  this  in  a  little  bit  more  detail.   This  is  the  original  letter  from  Exxon  from  1982  called  CO₂  Greenhouse  Effects,  and  ending  with  the  remark  "Not  to  be  distributed  externally.  For  internal  use  only." Exxon  estimated  the  development  of  carbon  dioxide  in  the  atmosphere  and  global  temperature  until  2100.  So  let's  zoom  in.  Here,  we  see  the  year  2022.  And  the  corresponding  most  probable  measurements  that  Exxon  predicted  were  420  carbon  dioxide  and  increase  of  temperature  of  1.1  degrees.  This  is  spot  on,  right?  A ll  one  can  say  is  excellent  work. By  the  way,  if  you  wonder  why  they  were  interested  in  this  question,  it  was  partially  because  they  knew  that  global  warming  would  lead  to  a  rise  of  sea  level  so  that  they  had  to  build  their  oil  platforms  higher.  Now,  with  the   First World Climate Conference  in  1979,  did  climate  policy  have  measurable  success  over  the  last  40  years?  This  is  my  final  question  to  wrap  up  the  historic  part  of  this  talk.  Yes,  no,  or  one  can't  say? Very  unfortunately,  the  answer  is  a  very  clear  no.  Below  the  graph,  you  see  the  famous  temperature  stripes  showing  the  increase  of  temperature  over  the  last  60  years.  The  graph  itself  shows  the  carbon  dioxide  in  the  atmosphere. W e  have  the  First  World  Climate  Conference,  the  First  IPCC  Report,  the  First  UN  Climate  Conference,  the  Kyoto- Protocol,  the  Copenhagen  Accord,  and  finally,  the  Paris  Agreement.   During  all  these  meetings,  conference,  and  agreements,  carbon  dioxide  in  the  atmosphere  increased  from  316- 420,  the  measure  that  we  have  right  now.   None  of  these conference,  agreement  or  meetings  had  any  measurable  effect  on  our  actual  situation  regarding  climate  change. Finally,  not  to  leave  it  to  these  a  little  bit  depressing  news,  what  can  each  and  every  one  do  about  it?  Let  me  first  very  quickly  remind  you  that  there  is  a  practically  linear  relationship  between  temperature  increase  and  global  emissions.   If  we  want  to  keep  the  1.5  degrees  goal  from  the  Paris  Agreement,  we  can  very  easily  estimate  that  we  have  left  approximately  500  gigatons  of  carbon  dioxide.  This  was  in  2020,  three  years  ago. Three  years  ago,  we  had  500  gigatons  left  for  the  1.5  degree  goal  of  the  Paris  Agreement.  Now,  if  we  relax  this  a  little  bit  to  two  degrees,  then  we  have  1,350  gigatons  left.  To  demonstrate  the  current  status  of  our  emissions,  I'm  going  to  switch  to  JMP. Let's  start  with  having  a  look  at  the  global  emissions.  If  we  look  at  it  historically,  since  1850,  you  can  see  the  map  on  the  left  side  of  the  main  emitters,  and  I  have  the  data  here  on  the  right  side,  the  three  top  units  that  contributed  historically  to  global  emissions  are  the  United  States,  responsible  for  one  quarter,  a  little  bit  less  the  European  Union,  and  13 %  from  China.   This  adds  up  to  roughly  60 %  that  these  three  top  units  are  responsible  for  historically  since  1850. Now,  if  we  look  at  the  current  status,  so  this  is  data  from  2018,  you  can  see  here  that  the  top  three  are  still  the  same,  but  the  order  changed.   China  is  now  by  far  the  country  emitting  most,  followed  by  the  United  States,  and  third  in  place  is  the  European  Union.  T hese  three  still  add  up  to  roughly  50 %  of  global  emissions  per  year.  The  conclusion  here  is  that  without  these  players,  we  are  not  going  to  get  anywhere. Now,  to  make  the  comparison  a  little  bit  fairer,  let's  look  at  emissions  per  person.   Here  in  the in  this  lower  graph,  you  see  the  emissions  per  capita  for  different  countries.  The  top  ones  are  the  Gulf  states  like  Qatar  and  similar  countries,  followed  by  a  second  group  of  high  emitters,  Australia,  Canada,  and  the  United  States.   Then  there  is  a  third  group  that  consists  of  China  and  the  European  countries  in  the  middle.   Then  we  have  low- emitting  countries  typically  found  in  Africa. Now,  please  follow  me  on  the  following  calculation.  I  said  before  that  in  2020,  we  had  500  gigatons  left.  Now,  it's  easy  to  turn  this  into  a  per- person  budget,  which  is  56  tons. I f  we  want  to  be  carbon  neutral  by  2050,  this  leaves  us  28  years,  56  tons.  T he  personal  budget  on  average  per  year  that  is  compatible  with  the  1.5  degree  of  Paris  is  two  tons  per  person.  This  is  this  red  line  that  you  see  down  here. Now,  let's  look  at  the  United  States,  for  example.  For  the  last  three  years,  the  United  States  have  emitted  approximately  18.4  tons  per  person.   In  three  years,  they  already  used  the  56  tons  that  they  had  left  until  2050.  On  the  top,  you  see  the  years  left  until  the  corresponding  country  has  entirely  used  its  budget.  And  you  can  see  that  the  US,  Canada,  Australia,  and  Qatar,  they  are  at  zero  or  below.  In  other  words,  these  countries  already  have  used  everything  they  had  left  to  keep  the  Paris  goal  of  1.5 %.  Every  breath  they  take  now,  every  car  they  drive,  every  plane  they  fly,  is  already  on  the  depth  side  towards  this  climate  goal.  D on't  worry,  the  Europeans  are  all  going  to  follow  in  a  future in  a  couple  of  years. The  conclusion  of  this  is  unfortunately  that  we  have  absolutely  no  chance  to  reach  the  1.5  climate  goal  of  the  Paris  Agreement.  Every  ton  that  we  can  save  is  good,  but  the  1.5  degree  goal  is  gone.  Unfortunately,  there's  agreement  on  this. If  we  now  look  at  the  global  emissions  by  sector  to  a  little  bit  approach  the  question  in  what  area  we  can  personally  contribute,  then  you  can  see  that  almost  three  quarter  of  the  emissions  come  from  burning  fossil  fuels,  oil,  gas,  and  coal.  So  if  someone  says  the  climate  crisis  is  a  global  energy  crisis,  he  or  she  is  absolutely  right.  Almost  three  quarter  of  the  emissions  are  due  to  burning  fossil  fuels.  20 %  come  from  agriculture  and  then  they  are  cemented  waste. Here  in  the  middle,  you  see  this  in  a  little  bit  more  detail,  and  I  would  just  like  to  emphasize  one,  and  this  is  livestock.  Livestock  is  responsible  for  8 %  of  global  emissions.  What  that  means  is  the  following.  If  you  put  all  the  cows,  pigs,  and  sheep,  and  everything  in  one  country,  looking  at  their  emissions,  they  would  be  number  three  in  the  world.  There's  China,  the  US,  and  all  the  animals.  The  country  consisting  of  all  the  animals  would  be  the  third  biggest  emitter  on  this  planet. This  is  one  reason  why  agriculture  is  a  huge  contributor  and  is  actually  the  field  with  the  highest  impact  for  your  personal  influence.  Followed  by  buildings,  meaning  how  you  heat  and  the  electricity,  and  the  third  question  is  how  you  move.  So  transportation,  buildings,  and  agriculture  are  the  three  big  contributors  where  you  have  personal  influence  on  global  emissions. I  would  like  now  to  turn  the  attention  to  these  three  fields.  I  will  start  with  transportation  because  I  think  this  is  the  best  known.  But  it's  always  good  to  look  at  this  personal  budget.   Let  me  remind  you,  your  personal  budget  is  two  tons.  One   transatlantic  flight  Frankfurt- New  York,  consumes  four  tons.   Twice  your  personal  budget  is  spent  on  one   transatlantic  flight. There  are  other  ways  to  use  your  personal  budget  quickly.  One  luxury  cruise  seven  days,  2.8  tons.  Driving  your  fossil  fueled  car  for  one  year,  2.3  tons.  Everything  already  above  your  personal  budget and  you  haven't  eaten  anything  yet. Generally,  it  will  be  known  to  you  that  taking  a  plane  is  the  worst  way  of  moving.  You  cut  emissions  more  or  less  by  half  if  you  take  the  car  and  you  cut  emissions  by one  tenth  if  you  take  the  train.   Of  course,  public  transportation  is  better  than  private  one,  and  the  best  way  to  move  is  if  you  use  your  own  muscle  on  a  bicycle  or  just  by  walking. Now,  for  buildings,  the  situation  is  quite  clear.  60 %  of  the  emissions  come  from  direct  or  indirect  use  of  fossil  fuels  by  heating,  cooking,  and  electricity.  T he  conclusion  here  is  very  easy.  Turn  to  renewable  sources  for  your  power  use  in  your  house.  Heating,  cooling,  electricity.  Turn  this  into  a  greenhouse  and  you  will  significantly  contribute  to  a  reduction  of  your  carbon  footprint. 20 %  almost  of  the  emissions  in  the  building's  area  come  from  building  material.  And  it's  very  interesting  that  there's  a  lot  of  research  going  on  to  replace  classic  building  material  by  carbon  dioxide  neutral  or  even  negative  one.  And  I  included  one  example,  this  is  a  company  from  Switzerland  that  actually  stores  carbon  dioxide  into  recycled  concrete  and  tries  to  reduce  the  carbon  footprint  by  this. Finally,  agriculture.  I  have  here  the  data  for  four  different  diets  and  their  carbon  footprint.  It's  data  from  the  US.  It's  not  that  easy  to  find  the  data  for  other  countries,  this  is  why  I  took  the  one  from  the  US. The  average  American  diet,  again,  uses  your  full  budget  of  two  tons.  If  you  leave  out  dairy  or  if  you  leave  out  meat,  so  you  turn  to  vegetarian,  this  significantly  reduces  your  footprint.  But  the  really  huge  step  is  leaving  out  both  and  becoming  a  vegan.  If  you  wonder  why  this  is  the  case,  it's  because  of  the  footprints  of  different  types  of  food.  You  can  see  that  all  the  vegan  food  here  is  in  the  lower  section.  This  is  split  in  methane  and  non- methane  greenhouse  gasses.  And  all  the  high  emitters  are  in  the  upper  ones.  A ctually,  I  didn't  arrange  the  scale  right. If  you  look  at  the  top,  beef  from  beef  herds,  it's  not  40,  neither  50  nor  60,  nor  70.  It's  actually  100  kilograms  per  kilogram  of  the  corresponding  food.  I f  you  only  want  to  do  one  thing,  in  your  diet,  leave  out  beef.  Personally,  I  find  one  of  the  most  impressive  statistics,  this  one,  29 %  of  our  Earth's  surface  is  land,  71 %  is  habitable.  Half  of  it  we  use  for  agriculture.  Of  this  part,  77 %  is  directly  or  indirectly  used  for  livestock.  This  is  one  third  of  the  habitable  land,  but  we  only  produce  18 %  of  calories  from  meat  and  dairy. This  is  why  the  lead  author  of  the  corresponding  article,  Joseph  Poore  says,  "A  vegan  diet  is  probably  the  single  biggest  way  to  reduce  your  impact  on  planet  Earth,  not  just  greenhouse  gasses,  but  global  acidification,  eutrophication,  land  use,  and  water  use."  He  himself  turned  vegan  after  conducting  the  study. Let  me  just  wrap  up  a  little. H ere  are  the  four  actions  I  introduced  in  the  beginning,  and  I  hope  that  by  now  it  will  be  no  surprise  anymore  to  anyone  that  going  vegan  is  the  most  efficient  thing  you  can  do.  No  plastic  bags  is  good  for  the  environment,  but  it  doesn't  really  have  an  important  impact  on  the  carbon  footprint. Here  are  the  answers  from  the  survey  I  showed  you.  And  as  you  can  see,  no  plastic  bags  was  the  answer  with  the  highest  rank,  highly  overestimated,  just  like  only  eating  regional  and  seasonal  food.  And  on  the  other  hand,  reducing  meat  was  highly  underrated. I  would  like  to  wrap  up  with  a  quote  from  Al  Gore,  where  maybe  because  it's  not  100 %  clear  that  he  said  it,  "Vote,  voice  and  choice."  What  can  you  do  personally?  You  can  vote  in  every  election,  make  climate  policy  a  priority,  and  let  officials  know  what  you  want.  Make  your  voice  heard.  Support  organisations,  talk  about  it  in  your  company,  et cetera. Finally,  your  personal  choices  matter.  Ideally,  eat  a  plant- based  diet,  reduce  use  of  fossil  fuels  for  mobility,  in  particular  flying,  and  make  your  home  green  by  using  renewable  energy  for  electricity  and  heating. My  contribution  to  making  our  voices  heard  was  to  give  this  talk  today.   I  would  like  to  thank  you  very  much  for  taking  the  time  to  listen  to  my  message.
Has your world changed in the past couple of years? Ours has too! We hope our changes will make your job easier. This talk will present our newly-free and low-cost options to educate your engineers and scientists. We offer instructor-led courses on our public schedule and will add courses to the public schedule on request. If you have instructors in-house, we can provide them with our course materials, and they can adapt our demonstrations to use data relevant to your students. We also have free self-paced eLearning available. We are looking for ways to help you even more! What topics do you want to see for half-hour Mastering JMP sessions, one-hour Deeper Dive sessions, or multi-day analytics education? What times should we offer these sessions? In-person or remote? This session will include time to gather your feedback.     I'm  Di  Michelson  from  JMP  Education.  I'm  happy  to  be  with  you  today  to  talk  about  the  resources  that  JMP  Education  has  to  offer  you.  They  are  centered  around  learning  how  to  get  the  most  out  of  JMP,  including  both  how  to  use  JMP  and  how  to  use  statistical  and  analytical  methods  in  JMP.  In  the  live  session  at  the  end  of  this  recorded  talk,  I'll  provide  links  for  where  to  find  the  content  in  the  talk.  I  also  want  to  get  your  feedback  of  how  the  JMP  Education  group  can  provide  even  more  services  to  you  and  your  company's  JMP  users. I'm  part  of  the  JMP  Education  group,  managed  by  Ruth  Hummel.  Monica  Beals  is  also  part  of  the  group,  maybe  some  of  you  know  her.  Together,  we  have  over  40  years  experience  teaching  people  how  to  collect  and  analyze  data  to  get  the  information  needed  to  make  smart  decisions,  mostly  using  our  favorite  software,  JMP. Today,  I  want  to  talk  about  what  we  can  offer,  and  to  get  your  feedback  on  other  ways  we  can  help  you  to   Learn JMP  and  analytics.  We'll  talk  about  eLearning,  instructor  led  classes,  a  new  way  for  your  trainers  to  develop  courses  quickly  using  our  course  materials,  as  well  as  the  one  place  for  you  to  learn  about  JMP,  the  new   Learn JMP  space  in  the  JMP  user  community. Let's  start  with  free,  on  demand,   self-paced  eLearning.  We  converted  some  of  our  paid  eLearning  courses  to  free  courses  in  2022.  Of  course,  you  know  STIPS.  It's  been  available  for  quite  a  few  years.  STIPS  is  a  very  broad  course  with  over  30  hours  of  self- paced  learning  on  many  analytical  methods,  and  you  can  integrate  STIPS  into  your  academic  or  corporate  training  program. JMP  Educations'  analytical  eLearning  courses  go  deeper  into  statistical  methods  and  JMP  usage  than  STIPS  does.  We  have   self-paced  eLearning  on  many  topics  and  released  both  our  introductory  JSL  course  and  our  SPC  course  last  year.  By  the  time  you  see  this  recording,  I  hope  that  a  few  more  courses  will  have  been  released.  Our  plan  is  to  convert  all  courses  available  in  the  JMP  learning  subscription,  which  is  currently  a  paid  service,  to  free  eLearning. Data  Exploration  is  our  most  popular  course.  It  teaches  students  how  to  use  JMP  by  means  of  several  case  studies.  That  course  is  usually  followed  by  ANOVA  and  regression,  which  teaches  the  basics  of  statistical  modeling.  The  custom  DOE  course  teaches  the   principles  of  design,  so  you  can  use  the  custom  design  platform  in  JMP,  to  collect  the  data  needed  for  analysis  with  statistical  models  to  enable  you  to  make  good  decisions  about  your  processes. There  are  two  other  courses  currently  in  the  JMP  learning  subscription  that  we're  planning  on  converting  to  eLearning.  Our  classic  DOE  course,  which  covers  fractional,  factorial,  and  response  surface  designs,  in  a  different  way  than  our  custom  DOE  course.  We  also  have  a  course  on  stability  analysis,  which  is  written  for  those  in  the  pharmaceutical  industry  doing  shelf  life  studies. Here's  an  example  from  our  SPC  course.  Each  lesson  consists  of  videos  of  theory  of  control  charts  or  process  capability,  along  with  demonstration in JMP ,  along  with  quizzes  and  practices  to  help  you  retain  what  you've  learned.   Self-paced  eLearning  is  great,  especially  when  you  need  to  learn  control  charts  at  2  AM.  But  some  people  learn  better  with  an  instructor,  and  for  you,  we  also  offer  instructor- led  classes  that  are  live.  These  are  public  classes  with  students  from  many  different  companies  and  industries. We  organize  these  courses  into  these  buckets,  and  currently,  Monica,  Ruth,  and  I  are  teaching  a  few  times  per  month.  We  have  many  courses,  ranging  from  how  to  get  started  with  JMP  and  analytics  through  designing  experiments.  We  have  three  courses  on  designing  experiments.  The  first  one  here  is  general,  and  the  next  two  are  for  specific  types  of  experiments. We  have  classes  on  quality  improvement,  including  quantifying  the  variability  in  your  process  that  is  due  to  your  gage,  measurement  systems  analysis.  Also,  controlling  the  variability  of  your  process  using  control  charts,  statistical  process  control,  and  analyzing  time- to- event  data  in  our  reliability  analysis  class. We  have  lots  of  courses  on  advanced  analytics,  including  platforms  in  JMP  Pro,  like  analyzing  many  response  variables  at  once,  modeling  categorical  or  discrete  responses,  functional  data  analysis,  text  analysis,  generalized  regression,  and  methods  for  explanatory  modeling,  and  predictive  modeling.  We  also  have  two  scripting  courses,  the  introductory  course  that  teaches  you  the  language,  and  a  course  that  takes  you  through  two  examples  of  designing  and  building  a  production  script,  including  building  an  interactive  user  interface,  pulling  data  by  querying  different  data  sources,  building  custom  reports,  and  then  presenting  results  back  to  the  user. The  public  course  schedule  is  in  the   Learn JMP  space  in  the  community  and  on  jmp.c om/ training.  I'll  take  you  there  after  this  recorded  talk  is  finished.  From  this  page,  you  can  see  the  course  descriptions  and  the  schedule  as  well  as  register  for  classes. One  thing  I'm  excited  to  tell  you  about  is  we've  recently  implemented  a  request  a  course  button,  and  that's  for  you  to  use  if  you  look  at  our  public  schedule  and  don't  see  a  course  that  you  want  to  take  on  the  schedule,  or  you  found  the  course  but  it's  not  scheduled  at  a  time  that's  convenient  to  you,  especially  in  Europe,  then  you  can  click  the  request  a  course  button  and  ask  us  to  put  a  course  on  the  public  schedule  at  a  date  and  time  that's  convenient  to  you.  You  can  also  use  the  request  a  course  button  if  you  just  want  to  be  notified  when  we  add  a  particular  course  to  the  public  schedule.  So  we're  really  excited  about  this  request  a  course  button.  We  hope  it  will  be  useful  for  you  to  tell  us  when  you  want  to  take  instructor  led  courses. We've talked  about  on- demand  eLearning  and  public  instructor- led  classes.  Now  let's  talk  about  how  your  trainers  can  use  our  course  materials  in  their  course  development  process.  We  are  providing  free  of  charge  our  course  materials,  and  that's  the  power point  slides,  the  PDF  file  of  the  course  notes,  and  the  course  data,  and  for  most  classes,  that's  a  JMP  Journal.  You  can  request  access  from  your  sales  team,  your  account  manager,  and  your  JMP  systems  engineer.  If  you  don't  know  who  they  are,  ask  your  JMP  administrator.  You'll  sign  a  contract  with  some  very  basic  terms  of  use,  and  you  can  modify  our  materials  as  much  as  you  want,  including  replacing  our  data  in  the  demonstrations  and  practices,  with  data  that  are  relevant  to  your  learners.  We  hope  that  your  trainers  will  be  able  to  use  the  courses  that  we  have  created  to  quickly  create  more  relevant  learning  content  for  your  company. Here  are  those  categories  of  courses  again,  with  the  number  of  courses  that  have  course  material  available  within  each  category.   If  your  trainer  wants  to  see  how  we  teach  the  course,  have  them  come  to  a  public  class,  or  use  that  request  a  course  button  to  ask  us  to  schedule  a  public  class  at  a  time  that's  convenient  to  them.  W e've  talked  about  the  courses  that  JMP  Education  has  to  offer.  There's  much  more  JMP  learning  content  at  the   Learn JMP  page  in  the  user  community.  Just  go  to  jmp.c om/ community  and  click  on   Learn JMP. Our  vision  is  for   Learn JMP  to  be  the  one  place  to  access  all  JMP  learning  materials.  All  the  pieces  are  not  there  yet,  but  we  will  be  continually  improving  this  space.  We  want  the   Learn JMP  space  to  be  helpful  for  users  across  the  spectrum,  from  never  having  used  JMP  before,  to  being  a  JMP  expert,  and  new  to  analytics,  to  a  trained  statistician.  The  learning  materials  cover  different  learning  styles  with  live  sessions,  recorded  or  created  videos,  as  well  as  things  to  read. We  also  organize  by  time  commitment,  from  short  one- page  documents  or  a  five- minute  video  to  a  half- hour  mastering  JMP,  to  full  courses.  A ll  of  the  material  is  organized  according  to  the  JMP  Analytic  Workflow.  It  ranges  from  data  sources  through  visualization  and  analysis  using  JMP  platforms  through  sharing  results  with  JMP  users  and  people  who  don't  have  JMP.  You  can  see  this  JMP  Analytic  workflow  in  action  at  jmp.c om/ workflow. In  the   Learn JMP  space,  you'll  find  the  brand  new getting  started  in  JMP  on  demand,  and  that's  for  new  JMP  users.  There's  also  lots  of  additional  material  on  how  to  use  JMP.  There's  Mastering  JMP  live  webinars  and  on demand  recordings,  and  there's  our  eLearning  and  instructor- led  courses  that  I  talked  about  earlier.  There's  also  something  new  this  year,  deeper  dive,  and  it  fits  into  that  1- 4  hour  session,  so  it's  longer  than  a  half  hour  mastering  JMP,  but  not  as  in  depth  as  a  two  day  formal  course. We'll  be  adding  to  the  deeper  dive  topics  as  the  year  goes  on.   Please  let  us  know  what  topics  you  are  interested  in  learning  more  about.  We  want  to  make  learning  content  that  you  want  to  use.
Saturday, March 4, 2023
Extracting pertinent information from unstructured text data can pose a daunting challenge. Someone may wish to mine blocks of text for websites, telephone numbers, emails, or physical addresses. It could be that units of measurement between documents need standardizing. The Regex function, quietly incorporated into JMP a few releases ago, is an extremely powerful tool to quickly and easily perform these and other tasks. It is also a tool that, for many, is shrouded in mystery. This presentation seeks to highlight this often overlooked and underrated function and decode its inner workings to allow anyone and everyone to tap into its full potential.     Hi,  welcome.  You've  found  our  talk  on  Regex.  It's  a  powerful  text  analytics  tool  that  Hadley  and  I  are  going  to  explore  the  basics  of  today  in  our  talk. Yes,  thank  you  very  much  for  clicking  on  this  link  and  watching  this  presentation.  What  is  Regex?  Well,  Regex  is  a  function  that  searches  for  a  pattern  within  a  store  source  string  and  returns  a  string.  That  definition  was  taken  from  the  Regex  function  of  the  JMP  scripting  guide.  I'm  not  sure  that  that  definition  quite  does  it  justice. Before  we  go  into  some  details  about  how  you  can  use  it,  what  the  power  and  value  of  it  is,  what  I'd  like  to  show  you  here  is  just  the  format  of  the  function.  It  takes  in  a  source,  a  pattern,  and  then  if  you  like  a  replacement  string,  it  has  other  functionality  as  well.  But  for  the  purpose  of  this  presentation,  we  are  going  to  be  talking  about  these  first  three  inputs  to  the  function. Before  we  dive  too  deeply  into  it  and  show  you  some  examples,  I  just  like  to  talk  a  little  bit  about  how  to  set  up  a  pattern  in  Regex  and  specifically  about  the  concept  of  escape  characters.  These  are  characters  that  can  mean  many  things. For  example,  a  \W  can  maybe  mean  a  W.  It  can  also  mean  any  lowercase  or  uppercase  letter  A  through  Z,  as  well  as  numbers  zero  through  nine  and  a  lower  space  lowercase,  what's  that  called? Underscore. How  you  would  refer  to  that  is  simply  by  typing  \W.  If  you  wanted  to  refer  to  a  literal  W,  you  would  just  write  the  word  the  letter  W.  Digits  can  be  expressed  in  their  actual  form  or  they  can  express  generally  as  \ D,  and  then  \ S  refers  to  a  single  white  space  character,  including  tab,  return,  new  line,  vertical  tab,  and  something  called  form  feed. Probably  some  of  you  watching  it  know  what's  that  means.  I'd  like  to  mention  some  special  characters  now,  so  you  can  see  the  period,  the  question  mark,  the  asterisk  plus  refer  to  matches  of  different  characters.  So  the  period  refers  to  any  single  character. Question  mark  matches  zero  or  one  instance  of  whatever  is  put  in  front  of  it.  The  asterisk  matches  zero  or  more  and  then  the  plus  matches  one  or  more.  Now  there's  some  other  characters  as  well.  I  won't  go  through  all  of  these  and  there  are  many  more  that  I  haven't  captured,  but  I  thought  to  put  them  in  this  table  and  save  them  here  so  that  if  you  like,  you  can  pause  this  and  you  can  see  exactly  what  these  are. Let's  look  at  an  example.  Let's  say  that  you  wanted  to  extract  all  email  addresses  from  blocks  of  text,  free  text  with  many  email  addresses  in  all  different  formats.  How  would  you  do  that?  Well,  let's  look  at  our  source,  which  would  be  for  example,  for  help  contactsupport@jmp.com.  It's  free  with  your  license  of  jump. If  we  wanted  to  look  through  this  and  extract  the  email  address,  we'd  have  to  refer  to  it  as  a  pattern.  So  that  pattern  is  one  or  more  instance  of  any  character,  including  numbers.  Perhaps  we  can  refer  to  these  as  \W  followed  by  an  ad  sign,  followed  by  one  or  more  instances  of  \W  of  any  character  or  number  or  letter,  followed  by  a  literal  period  indicated  by  \. a nd  then  the  letters  C-O-M. If  we  were  to  set  that  up  in  a  Regex  function,  the  return  result  would  be  the  email  address  support@jmp.com.  That  would  be  the  pattern  that  matches.  Now  some  of  you  watching  this,  I  know  what  you're  thinking.  Not  all  email  addresses  follow  this  format.  Some  of  them  have  other  characters  in  them,  some  of  them  have  multiple  periods,  some  of  them  perhaps  don't  end  in  com,  they  end  in  something  else. That's  all  very  true  and  this  isn't  going  to  match  with  those.  What  you  could  do  is  then  take  this  pattern  and  perhaps  generalize  it  in  different  ways  to  get  more  email  addresses.  The  more  of  what  you're  looking  for  match  more  patterns.  We'll  talk  and  we'll  show  you  an  example  of  how  you  can  do  that  and  what  that  process  looks  like. The  examples  we're  going  to  look  at  is  an  example  of  automated  machine  messaging  indicating  error  messages,  different  parts  of  the  system.  What  we  want  to  do  is  extract  the  components  that  are  broken  from  all  of  these  messages.  I'll  show  you  how  to  do  that.  We're  going  to  take  phone  numbers  that  have  been  entered  manually  in  all  different  crazy  formats  and  we're  going  to  put  them  in  a  uniform  format  and  we're  going  to  extract  info  from  coded  text. In  this  case,  this  is  file  names  that  contain  information  about  how  different  biological  samples  were  run,  the  temperatures,  the  stressed  tests  and  so  on.  Times,  all  of  this  is  coded  in  the  name  of  the  file.  We're  going  to  pull  out  all  those  pieces  and  then  organize  them  in  a  table  that  we  can  work  with  them. Now  probably  you've  all  clued  into  the  fact  that  Peter  and  I  are  not  Regex  experts.  I  think  that  the  word  novice  is  probably  a  better  description  of  how  of  our  competency  in  Regex.  The  purpose  of  this  talk  really  isn't  to  show  off  our  Regex  prowess  and  how  great  we  are  using  Regex  so  that  everybody  should  be  impressed. Now  the  purpose  of  this  talk  is  to  demonstrate  how  powerful  Regex  can  be,  even  for  novices.  Even  with  a  very  little  bit  of  knowledge  about  how  Regex  works  and  how  patterns  work,  you  can  get  a  lot  of  use  and  a  lot  of  functionality.  Now  Regex  can  be  intimidating,  but  it  needs  because  at  its  core  it  really  is  very  simple. We're  going  to  take  you  through  some  examples  and  show  you  exactly  how  simple  it  is  and  how  you  can  start  using  it  right  away.  Without  further  ado,  I  will  turn  things  over  to  Pete. All  right.  Thanks,  Hadley.  Go  ahead  and  get  started  here  with  the  first  example.  Like  Hadley  said,  this  is  an  example  where  we're  trying  to  extract  out  of  a  description  here  what  part  was  actually  broken.  There's  probably  many  different  ways  you  could  get  at  this,  but  we're  going  to  show  you  how  to  do  this  with  Regex. I'm  going  to  create  a  new  column,  generate  a  formula  here,  and  I'm  going  to  look  for  Regex  in  the  filter,  find  it  there,  and  then  start  with  my  description.  That's  what  I  want  to  run  the  Regex  on.  Then  I'm  going  to  define  a  pattern. If  you  remember  with  what  Hadley  shared  there,  there's  a  couple  of  little  tricks  to  remember  with  Regex  that  will  make  it  a  lot  easier.  The  first  thing  I'm  going  to  do  is  put  in  a  W,  which  is  a  character,  but  I  want  this  to  be  more  than  one  character.  I'm  going  to  do  a  W  and  a  plus.  Then  after  that  w  and  plus,  I'm  looking  for  something  that  has  a  space  and  says  the  word  broken.  As  long  as  I  type  that  out,  right,  you'll  see  here  that  my  formula  result  is  there. If  I  hit  apply  here,  you  can  see  that  it  tells  me  what  is  broken,  but  it  also  contains  that  word  broken.  Maybe  I  don't  want  that.  Maybe  I  just  want  what  the  part  is,  not  the  word  broken  in  there.  Then  if  I  want  to  do  that,  how  I  can  do  that  is  go  in  here  and  containerize  this  to  make  this  a  first  word  of  the  list  here. Then  I'm  going  to  just  say,  hey,  I  only  want  that  first  word.  If  we  look  at  the  preview  here,  it's  just  giving  me  that.  Now  if  I  hit  apply  and  okay,  I've  extracted  out  what  I  was  looking  for.  Now,  this  is  a  simple  example  and  you  could  probably  think  of  other  ways  to  be  able  to  get  that  specific  part  of  this  description  out,  but  I  wanted  to  show  you  how  you  could  do  that  with  Regex  and  really  just  a  very  simple  start  to  this. Let's  look  at  a  little  bit  more  complex  example.  Here  we  have  phone  numbers  that  are  entered  randomly  and  they  have  different  spacing,  different  delimiters  in  there.  Sometimes  there's  a  one,  sometimes  there's  not.  Sometimes  there  is  extensions,  sometimes  there's  not.  We  want  to  format  that  in  a  different  way  and  end  up  with  a  more  clean  format.  Here's  the  end  result. Unlike  the  last  example,  I  think  this  one  is  a  lot  more  difficult  to  do  without  Regex.  Let's  walk  through  how  we  can  do  this  with  Regex.  Very  similar.  We're  going  to  start,  I'm  going  to  type  in  Regex  here  and  I'm  going  to  move  this  down  so  you  guys  can  see  as  we're  building  this  Regex,  the  results  pop  up  there. I'm  going  to  put  that  phone  numbers  in  as  my  original  pattern  or  my  original  data,  and  then  I'm  going  to  start  with  that  pattern.  If  we  remember  again  from  what  Hadley  said,  we're  looking  for  digits  this  time.  Our  pattern  is  digit,  digit,  digit,  then  something.  We  don't  know  what,  but  we'll  put  in  that  question  mark  because  it  could  be  many  different  things  and  then  we  have  digit,  digit,  digit. Let  me  pop  this  open  a  little  so  we  can  see  it.  Then  again,  we  have  a  question  mark  because  we  don't  know  what  that  delimiter  is  in  there.  Then  we  have  four  digits.  Okay,  all  right.  If  we  look  at  a  preview,  you  can  see  it  catches  some  of  these.  I'm  going  to  just  hit  apply  and  now  you  can  see  some  of  these  numbers  were  captured  here,  but  some  were  not.  Then  our  output  formula  isn't  what  we're  after. Let's  go  back  and  open  this  up  and  we're  going  to  containerize  those  like  we  did  in  the  previous  example.  We're  going  to  look  at  three  individual  words  here,  or  three  individual  sets  of  digits,  I  should  say.  We've  containerized  them,  we'll  hit  okay.  Then  we  want  an  output  that  looks  a  certain  way. We  want  to  have  the  first  word  followed  by  a  dash,  then  the  second  word  or  set  of  digits  followed  by  a  dash,  and  then  the  third.  Okay.  When  we  hit  apply  here,  you  can  see  this  is  cleaned  it  up  a  little  and  at  least  the  output  format  is  what  we're  looking  for,  but  we're  missing  a  few.  Like,  let's  look  at  this  one  specifically. This  one  has  a  space  here.  How  do  we  tell  Regex  that  there  might  be  a  space,  but  there  might  not?  We'll  go  back  here  and  we're  going  to  edit  this  a  little  bit.  We're  going  to  put  in  a  potential  space.  I'm  going  to  put  a  space  with  a  question  mark  there  because  it  might  be  there,  might  not  and  I'm  going  to  hit  okay  and  apply.  There  you  can  see  it  captured  those  two  with  the  space. But  you  can  also  see  some  of  these  have  a  one  at  the  start,  like  line  five  here.  How  do  we  tell  Regex  that  there  might  be  a  one  there?  So  just  like  we  did  with  the  space,  we're  going  to  go  in,  we're  going  to  say,  "hey,  there  could  be  a  one  here. " I f  we  do  that  and  hit  okay  and  apply,  you  can  see  that  it  cleaned  those  up. Now  we're  pretty  happy.  We've  got  everything  in  the  format  that  we  want  it.  But  you  can  see  there  is  other  examples  of  different  styles  of  phone  numbers  here.  If  people  have  put  in  letters  instead  of  numbers,  it's  not  capturing  all  of  that.  There's  more  we  could  do  with  this  to  clean  these  up  further,  but  we've  taken  a  lot  of  messy  phone  numbers  here  and  clean  them  up  into  a  nicer  format. This  is  a  good  way  to  use  Regex.  Now  I'm  going  to  pass  it  back  to  Hadley  for  the  last  example. All  right,  thank  you  very  much,  Pete.  Very  well  done  as  well.  What  I'm  going  to  do  is  I'm  going  to  show  you  this  example  here,  which  is  an  example  of  descriptions  taken  from  file  names.  The  first  seven  digits,  I  think  the  first  seven  things  are  the  name  of  the  sample  and  then  how  it  was  run.  Temperatures  sometimes  included,  but  not  all  of  them.  Days  sometimes  or  weeks.  Time  sometimes  included,  but  not  always. Let's  extract  all  of  this  information  and  what  we  ultimately  want  it  to  look  like  is  that.  We  are  going  to  use  Regex  to  extract  the  sample  project  code  from  the  front,  the  stress  condition  from  within,  the  temperatures  as  well  as  the  mean  of  those  temperatures,  temperature  range.  Then  if  there  is  a  time  we'd  like  that  as  well  expressed  in  days  and  not  in  weeks. Let's  delete  all  this  and  see  how  we  can  do  it.  Now,  the  first  thing  we  can  do  is  to  add  our  project  code  and  we  could  do  this  in  Regex.  But  you  know  what,  this  is  actually  probably  pretty  simple  to  do  using  substring.  It's  this  guy,  the  first  seven.  There  we  go.  Let's  not  complicate  our  lives. Now,  the  rest  of  it,  I  think,  is  a  little  bit  more  tricky.  What  I'm  going  to  do  is  I'm  going  to  open  up  a  new  script.  We're  going  to  start  out,  we  start  out  old  scripts  and  we  are  going  to  go  in  and  grab  all  of  these  descriptor  names.  We're  just  going  to  create  a  list  called  Description  with  all  the  values  in  this  column. What  I'm  going  to  do  is  just  show  the  log.  You  can  see  here  that  if  I  run  Description,  I've  now  got  all  my  descriptions.  What  do  we  feel  like  starting  with?  Let's  see,  I  think  temperature  is  probably  a  good  one  to  start  with.  What  I'm  going  to  do  is  just  to  show  you  that  if  we  take  the  temperature  code  here,  all  of  these  are  going  to  be  in  about  the  same  format. We're  going  to  create  a  list  container  to  hold  whatever  it  is.  We're  going  to  loop  over  all  of  the  items  in  description.  Temp  code,  I  going  to  equal  something  at  a  description.  Then  once  we  get  all  these,  we  can  just  slap  the  whole  thing  into  a  new  column. What  is  this  going  to  look  like?  Well,  it's  going  to  look  like  Regex  first  of  all,  our  description,  I  think  this  is  just  description  I  followed  by  what  is  it?  We're  talking  about  temperatures  here.  It's  one  digit,  maybe  a  second  digit,  followed  by  a  dash,  followed  by  another  digit  and  maybe  a  second  digit.  Then  the  letter  C. What  we  want  is  this  first  set  of  digits,  followed  by  this  second  set  of  digits.  If  I  run  this,  hopefully  it  works.  There  we  go.  As  I'm  doing  this,  I  see  that  I  probably  could  have  gotten  away  with  just  doing  this.  That  would  have  been  fine  too.  I  probably  didn't  need  that  second  one.  But  if  it  works,  it  works.  If  it's  broken,  don't  fix  it.  There  we  go. Let's  move  forward  and  what  should  we  do  next?  Let's  grab  our  time.  Time  is  going  to  work  exactly  the  same  way.  We're  going  to  create  a  container  for  time.  We're  going  to  loop  over  descriptions  for  time.  Now  what  do  we  want?  We  want  our  time  code  equals  Regex.  What  does  this  look  like?  It  looks  like  well,  first  of  all,  we've  got  our  description  followed  by,  what's  our  pattern? It  is  the  word  day  or  the  word  week.  Then  one  digit.  Might  there  be  two  digits?  I  guess  there  might  be.  We're  just  going  to  wrap  some  containers  around  this  so  we  have  a  day  or  a  week.  We  don't  have  both.  Then  we  have  one  digit  and  maybe  a  second  digit.  We  want  our  second  container.  We  don't  want  the  word  day  or  week.  We  want  just  this. If  I  run  this,  let's  see  what  time  code  looks  like.  There.  You  can  see  that  where  it  was  able  to  it  managed  to  grab  the  day  or  week  and  put  it  in.  Let's  take  all  of  this  and  drop  it  into  a  column.  But  before  we  do  that,  you  perhaps  want  this  expressed  as  numbers  rather  than  characters. What  I  could  do  is  run  that  and  express  the  whole  thing  as  a  number  instead  of  a  character.  Now  we're  getting  closer  to  where  we  need  to  be.  Of  course  we  want  to  know  whether  these  are  days  or  weeks  and  we're  not  going  to  know  that.  That's  going  to  affect  how  we  put  this  in  what  we  need  to  do  here. Because  if  it's  days,  then  it's  fine.  If  it's  weeks,  then  we  should  take  whatever  numbers  in  here  and  multiply  it  by  seven  to  show  that  we  are  consistent  with  the  number  of  days.  Then  we'll  put  that  in  a  new  column.  What  is  that  going  to  look  like?  Well,  it's  going  to  be  an  if  statement.  If  and  another  Regex,  if  our  descriptor  day  or  week  equals  week.  Once  we  pull  this  out,  our  description  if  it's  week,  then  take  whatever  time  code  we  have  and  multiply  it  by  seven. What  did  I  do?  I  think  I  probably  need  to  close  that  guy.  Sorry  about  that,  everyone.  Okay,  now  if  we  run  our  time  code,  you  can  see  that  our  weeks  are  now  multiplied  by  seven.  We  can  take  all  that  and  drop  it  into  a  column.  All  right,  so  far  so  good.  What's  left?  Oh,  yeah.  We  want  the  mean  temperature  rather  than  the  ranges. What  I'd  like  to  show  you  right  now  is  how  we  can  make  use  of  Regex  once  more,  and  that  is  to  take  whatever  was  in  our  temperature  code  and  again,  apply  Regex  to  it  to  say  that  if  it  was  the  lower  one,  the  minimum  one  is  going  to  be  the  one  on  the  left  side.  The  maximum  temperature  is  going  to  be  the  container  on  the  right  side. To  set  these  up,  but  I'm  going  to  take  all  of  this  and  wrap  it  into  a  loop  again,  like  that.  Now  we've  got  our  minimum  temperature,  our  max  temperature,  and  our  mean.  This  is  how  we're  going  to  set  this  up  in  Regex.  Anytime  we've  got  temp  code  and  this  is  the  pattern,  take  the  first  one,  take  the  second  one,  turn  them  into  numbers,  calculate  the  mean,  and  then  slap  that  entire  thing  into  a  new  column. Oops.  Okay,  so  the  last  thing  we  want  to  do,  is  grab  this  middle  sample  here.  Now,  I'm  not  going  to  walk  through  this  in  its  entirety.  Let  me  say  that  back.  I  am  going  to  walk  through  this  in  its  entirety.  Some  of  you  watching  this,  if  Regex  is  as  new  to  you  as  it  is  to  me,  it  may  not  get  this  on  the  first  try.  That's  the  beauty  of  recording.  This  is  you  can  pause  the  recording,  you  can  look  at  this,  you  can  try  it  out  for  yourself. But  basically  what  we're  doing  is  we're  going  through  the  same  process.  We're  creating  a  container  for  stress.  We're  looping  through  all  of  our  descriptions  and  we're  using  those  each  individually  as  the  source.  What  are  we  saying?  Well,  there's  going  to  be  eight  characters.  Any  letter  or  number  or  underscore  potentially  a  space  as  well,  although  I  don't  think  there  are  any  spaces.  Oh,  yes,  there  are.  That's  why  I  included  that. There  may  be  a  space  to  eight  of  them.  Then  I  like  this  here.  This  is  going  to  be  some  stuff.  Anything  one  or  more  of  them,  I  think  was  what  that  meant.  What  this  does  is  it  just  tells  you  to  start  at  the  beginning  and  start  looking.  Okay,  and  now  where  are  you  going  to  stop?  You're  going  to  stop  when  you  find  day.  You're  going  to  stop  when  you  find  week  or  week  or  a  space,  an  open  parenthesis,  closed  parentheses  or  some  digits  followed  by  C,  or  you  get  to  the  end  of  the  line. When  you  go  through  all  this,  what  are  we  looking  for?  We're  looking  to  extract  the  second  parentheses  thing  here.  This  was  a  literal  open  bracket.  That's  what  we're  looking  for.  Just  drag  all  of  these  things  here  and  drop  those  into  your  column. As  you  can  see,  this  was  a  little  bit  more  complicated.  It  used  some  more  complex  functionality,  including  look  ahead.   I'm  not  going  to  go  into  the  details  of  that  right  now.  But  I'll  just  leave  this  up  here  so  that  you  can  see  how  that  was  done  and  how  you  would  go  about  doing  this  for  yourself.  All  this  says  is  keep  looking  forward  until  you  see  day  and  then  take  everything  before.  That's  what  these  means.  That's  what  these  mean. With  that,  what  I'm  going  to  do  is  open  this  up  again.  Just  to  summarize  that  regular  expressions  are  a  specification  of  a  pattern  frequently  used  to  clean  up  or  extract  pieces  of  data.  That  you  can  search  for  a  pattern  and  replace  it  with  a  different  string  or  extracts  different  parts  of  the  string. You  can  define  the  pattern  using  the  Regex  function  or  the  Regex  match  function,  which  we  didn't  talk  about,  which  we  invite  you  to  check  out  in  the  help  files,  which  contain  lots  and  lots  of  information  all  about  Regex.  As  well  as  examples  about  how  you  can  use  it  to  solve  the  problems  that  you're  looking  to  solve  in  whatever  industry  or  whatever  situation  you're  dealing  with  that. I would like to thank  you  very  much  for  your  attention  and  I  hope  you  enjoy  the  rest  of  the  conference  to  check  out  the  other  talks.  Thanks  again.  Bye, bye
At the 2021 JMP Discovery Summit Americas, we presented a method for creating an “easy button” for data access, combining, cleaning, filtering, visualizing, analyzing, and generating new data. The use of the singular tense of “button” is not a typo. It only takes one button to perform any combination (or all) of these techniques, thus saving time and allowing problems to be diagnosed earlier. Informed answers to questions that lead to the best possible outcomes can be made faster, ultimately saving costs and speeding products to market. We now utilize the new OSIsoft PI connector in JMP 17 to extend the sources of data that can be quickly and effortlessly imported and everything listed earlier, with just the push of a button.     Hi,  thanks  for  finding  our  talk  today.  Hadley  and  I  are  going  to  be  talking  about  making  an  easy  button  for  data  access.  Now,  this  is  a  talk  that  we  had  given  previously  at  a  former  Discovery  Talk,  and  we're  going  to  be  talking  about  how  you  can  extend  this  capability  with  our  new  OSI soft  PI  connector.  Hadley,  take  it  away. Yes,  that  is  absolutely  right.  Before  I  move  into  what  we're  going  to  be  showing  you  today  and  what  you  can  use  yourself,  I'd  like  to  just  introduce  those  of  you  who  aren't  familiar  with  the  JMP  Analytic  Workflow.  E veryone  watching  this  talk  likely  already  knows  that  JMP  contains  all  of  the  analytic  capabilities  necessary  to  take  any  data  that  you  have  in  any  raw  format  and  transform  it  into  insight  that  can  then  be  shared  throughout  an  organization. What  we  are  going  to  be  focusing  on  today  is  the  data  access  and  the  data  blending  and  cleanup  aspects  of  the  analytic  workflow.   Why  are  these  important?  Well,  any  problem- solving  effort  begins  by  collecting  and  compiling  the  data.  One  big  problem,  is  that  this  can  often  be  time- consuming  and  tedious,  especially  for  scientists  and  engineers  who  have  background  in  this  stuff.  What  this  effectively  means  is  that  it's  often  not  done  or  not  done  in  a  timely  enough  manner.   So  problems  can  get  unnoticed  and  problems,  therefore,  aren't  solved. The  other  problem  is  that  data  can  be  found  in  many  different  places,  and  it's  an  effort  to  grab  all  of  this  and  put  it  in  the  right  format,  compile  it  together. A  solution  is  an  easy  button  for  quick  access  to  data  wherever  it  is.   What  we  have  got  and  what  we  are  going  to  show  you  is  a  simple  interface  built  using  the  application  builder,  which  is  a  simplified  strip  down  option  allowing  people  to  press  a  button  and  get  data  from  exactly  where  they  need  it  in  the  format  that  they  need  to  be  able  to  solve  their  problems. They  can  pick  a  data  source  and  filter  what  is  needed  if  necessary,  even  combining  multiple  sources  and  automating  this.  As  Pete  mentioned,  what  we're  going  to  be  doing  is  we're  going  to  be  building  on  a  tool  that  we  had  previously  shown  which  used  SQL  web  APIs  and  even  manual  entries  as  well  as  combining  data  from  other  sources.  Where  have  we  shown  this  before?  We've  shown  this  in  a  previous  Discovery  Talk. So t hose  of  you  watching  this  can  look  at  the  past  Discovery  presentations  and  check  those  out  if  you  like. What  we  are  going  to  do  now  is  take  it  one  step step  further  from  where  we  were  back  in  2022,  and  that  is  to  extend  it  to  data  contained  in   OSIsoft PI  servers.  We're  going  to  be  making  use  of  two  features  that  were  introduced  in  JMP  17.  There's  the  Connect  to  the  PI  Server  as  well  as  the   OSIsoft PI  Wizard.  With  that,  I'll  turn  things  over  to  Pete  to  demonstrate  that  functionality. Thank  you,  Hadley.  Share  my  screen  here.  If  I'm  going  to  launch  that  PI  importer,  you'll  find  it  in  the  same  place  you'll  find  all  of  the  database  connectors.  Just  like  we  would  do  for  SQL,  you'll  go  under  file,  database,  and  import  from   OSIsoft PI.  You  enter  the  name  of  your   PI Server,  your  authentication  method.  Hit  okay.  Then  it  gives  you  this  nice  interface  here  and  you  can  browse  to  what  you're  interested  in  and  pick  out  a  couple  of  attributes  or  tags  that  you  want. Let's  just  pick  one  for  now.  Then  I  can  select  what  my  start  time  is.  I'm  going  to  go  back  a  little  bit  in  time  and  shorten  this  query  a  bit  so  it  goes  a  little  quicker.  Once  you're  ready,  you  can  hit  Import.  This  is  a  big  improvement  over  what  you  had  to  do  before,  which  involved  a  fair  amount  of  scripting.  But  the  nice  thing  here  is  once  I've  imported  this,  everything  that  I  need  to  pull  that  up  again  is  captured  right  here  in  the  source  script.  So  if  I  hit  Edit,  you  can  see  all  that  was  needed  to  be  passed  into  that  PI  data  source  was  right  here.  Hadley  is  going  to  take  this  now  and  start  to  make  our  easy  button.   I'm  going  to  stop  sharing  and  pass  it  back  to  Hadley. All  right,  thanks  for very m uch,  Pete.  I'm  going  to  go  ahead  and  share  my  screen  once  again  and  show  you  that  what  we  are  going  to  do  is  we're  going  to  take  that  script  that  Pete  just  generated  using  the   OSIsoft PI  Import  Wizard  in  JMP 17  and  turn  that  into  a  simple  add- in  that  literally  anybody  could  use  to  select  whatever  tags  they  need  and  then  grab  that  data.   If  you  know  what  server  it's  coming  from,  you  know  what  the  configuration  is,  you're  always  grabbing  the  same  data  in  exactly  the  same  way,  the  only  thing  that  might  change  is  the  tags,  then  this  may  be  a  stripped  down  simplified  solution  that  anybody  could  use. Of  course,  if  you  had  other  things  that  you  wanted  to  filter  on,  like  timelines  and  stuff,  that's  easy  to  include  as  well.  And  if  you  wanted  to  take  this  a  step  further  and  combine  these  data  sources  and  maybe  do  some  automation  on  them  or  automated  analysis.  That's  an  easy  step  from  there,  and  Pete will  show  you  how  to  do  that  a  little  bit  later. But  what  I'm  going  to  do  is  I'm  going  to  take  the  source  script  that  was  used  to  generate  this  data.  I'm  going  to  copy  it  and  paste  it  into  a  JMP  script.  Now  when  I  run  the  script,  it  goes  back  and  collects  the  data  from  this  tag  IA  right  there .  It  could  very  well  be  that  there  are  multiple  tags  that  you  would  like  rather  than  just  one.  Maybe  you  have  a  list  of  tags  that  you  need. What  I'm  going  to  do  at  first  is  I'm  going  to  define  a  tag  list  which  may  contain  the  tag  IA  as  well  as  IB  and  IC  and  as  many  more  as  I  feel  like  including.  This  would  be  a  good  option,  is  if  you  were  always  getting  the  same  tags  every  time.  It  didn't  need  to  select  them  they're  always  the  same. Here  they  are.  What  I'm  going  to  do  is  I'm  going  to  run  this  for  each  one  of  these  tags  in  this  tag  list.  To  do  that,  I'm  going  to  make  use  of  another  relatively  recent  addition  to  JMP.  Only  I  think  it  was  added  in  15  or  16,  I'm   not  quite  sure,  but  that's  the  For  Each  function.   For  each  tag  in  my  tag  list,  run  this.  My  tag  is  going  to  be  here. I nstead  of  running  IA,  we're  going  to  just  concatenate  the  tags  and  then  go  ahead  and  run  that. Excuse  me.  There  we  have  it.   It  really  is  just  that  simple.  It'll  take  a  few  seconds,  but  there  we've  got  our  text.   That's  a  good  solution  if  you  were  always  running  the  same  text.  But  if  you  wanted  to  take  this  functionality  and  extend  it  a  bit  to  allow  a  user  to  select  some  tags  using  these  configurations,  I'm  going  to  take  this  code  and  I'm  going  to  run  it  or  set  it  up  in  application  builder. Now,  rather  than  hard  coding  a  list,  I'm  going  to  ask  the  user  to  select  the  list  from the  list  here.  We'll  just  add  a  few  tags  to  that.  Let's  add  tag  IA,  IB,  IC,  and  maybe  one  more  kilowatt  A.  There  they  are.  Now  we'll  just  add  a  button  that  the  user  can  press  to  grab  whatever  they've  selected  in  the  list  and  then  get  the  tags.   Button  1  is  a  good  variable  name,  but  we  need  a  bit  of  a  better  descriptor  so  the  user  know  what  to  do.  There  we  have  it.  When  we  press  this  button,  we  are  going  to  have  it  run  the  script  that  we  just  wrote.  Of  course,  instead  of  getting  our  tags  from  this  tag  list,  we  are  going  to  have  it  grab  whatever  a  user  selected  from  our  list  1  list  box. Can  it  really  be  that  simple?  Yes,  it  can, and  yes,  it  is.  Of  course,  if  we  wanted  to  extend  this  functionality,  at  this  point  in  the  sky,  our  imagination  and  our  needs  are  the  limit.  At  this  point,  I  will  pass  things  back  to  Pete  to  show  you  how  you  can  go  ahead  and  do  that. Wow,  Hadley,  that  really  does  look  easy.  Very  nicely  done.  Why  don't  I  share  where  we  went  from  here?  Actually,  let  me  share  this  screen  here.  All  I  did  was  take  what  Hadley  had  shown  and  add  a  few  more  tags.  The  next  thought  is,  "Hey,  that's  great  that  the  PI  importer  is  bringing  in  these  tables  individually,  but  what  happens  if  I  want  to  bring  them  together?" L et  me  just  show  what  this  does  first.  I'll  do  that  data  poll.  This  is  what  Hadley  showed.  Then  I'm  going  to  do  the  next  step,  which  is  a  data  compile. Now,  this  takes  advantage  of  the  workflow  builder,  and  we'll  go  ahead  and  walk  through  and  actually  write  this since  it's  really  easy  to  do.  I'm  going  to  just  pick  a  couple  of  things  to  compile  here.   There  you  go.   I'll  show  you  how  all  of  this  was  done.  Okay,  so  basically,  what  JMP  has  done  is  it  went  through  and  it  grabbed  a  bunch  of  those  data  tables,  it  concatenated  them,  then  it  split  them  apart,  and  all  of  these  steps  are  here.  So  it  concatenated  those  data  tables,  it  split  them  apart,  then  it  recoded  those  column  names,  and  finally,  it  made  just  a  simple  report. So  let's  walk  through  how  we  would  do  this  inside  of  the  workflow  builder.   I'm  going  to  close  out  of  these.  I'll  minimize  this,  and  I'll  just  start  with  those  three  tables  that  were  pulled  from  the  data.   Here  I  have  IA,  IB,  and  IC  metrics  that  I'm  looking  at. To  start  a  workflow,  you'll  find  it  under  File,  New,  and  Workflow.  A ll  this  is  doing  is  it's  grabbing  stuff  out  of  the  log  when  I  tell  it  to.   If  I  hit  record,  it  will  capture  all  the  steps  that  I  do  to  any  table  manipulations,  any  joining  or  splitting  of  tables,  any  renaming  or  recoding  of  variables.  All  of  that  will  be  captured  in  here.  Let's  start  with  that. The  first  thing  I'm  going  to  do  is  concatenate  these.  U nder  the  Tables  menu,  Concatenate.  I  have  A  there.  I  want  to  add  B  and  C.  I'll  give  it  a  name  here.  We'll  just  call  this  Stacked  Data  and  hit  okay.  There  you  can  see  that  this  was  stacked  and  it's  captured  here  as  well.  Everything  I  needed  to  do  there  was  captured.  I  want  to  back  up  here.  You  can  see  anything,  while  that  recording  is  going  on,  is  captured. Let  me  back  up,  start  over,  show  this  one  more  time.  With  that  off,  it  won't  record  anything.  With  that  on,  it  will, so  here,  we'll  do  that.  Tables,  Concatenate  again.  I  forgot  to  click  one  button  there.  I  want  to  add  a  source  name,  so  we'll  do  this  again.  Again,  call  this  Stacked  Data  and  hit  okay. T hat  was  my  first  step.  Now,  the  next  thing  I'm  going  to  do  is  split  this  apart  because  I  actually  don't  want  it  stacked.  I  want  them  together  in  the  same  table,  but  I  want  to  split  it  now.  So  we're  going  to  go  to  Tables  and  Split.  I  want  to  Split  by  that  source  column,  which  is  why  I  didn't  have  that  in  the  first  time  I  did  it.   Here  we  can  see  this.  I'm  splitting  by  source  column.   This  is  also  a  new  feature  here.  It  gives  us  a  nice  preview. Now,  this  was  something  that  I  don't  know  about  everyone  else  in  here,  but  I  used  to  struggle  with  this.  I  wasn't  quite  sure  what  I  was  going  to  get,  especially  with  things  like  transpose  and  split,  join  some  of  the  more  complex  table  formulas. Here  I  have  all  of  my  500  rows  of  data  for  each  of  these  different  metrics,  but  what  I'm  missing  is  a  time  stamp.  Without  having  that  before,  I  might  have  done  this  wrong,  but  now  I  want  to  group  this  by  time  stamp.  All  of  these  now  have  a  time  stamp  associated  with  that  particular  metric.  Now  I'm  going  to  just  call  this  Split  Data  and  hit  okay. There,  back  to  my  workflow,  you  can  see  I've  concatenated,  I've  split,  but  now  I  have  this  big  ugly  column  name  that  I  don't  want.  T here's  a  nice  feature  inside  of  JMP  to  recode  these  column  names.  I f  I  go  to  Columns  and  Column  Names,  there's  a  Recode  Column  Name,  and  this  works  just  like  recode  for  your  normal  data.  I'm  going  to  do  a  little  advanced  extract  segment  here.  I  want  to  pull  out  a  portion  that  just  looks  at  the  very  end,  and  that  looks  like  the  right  values  and  I  hit  okay.  Then  I'm  going  to  hit  recode,  and  there  we  go. Then  the  last  thing  you'll  notice,  again,  this  is  all  captured.  The  last  thing  I  want  to  do  here  is  make  that  graph.  I'm  going  to  just  go  graph  and  I'm  going  to  grab  those  metrics,  A,  B,  and  C,  and  then  map  them  out  by  timestamp. Now,  you'll  notice  one  thing.  This  is  not  added  to  the  workflow  yet,  so  I  can  hit  done  and  it's  still  not  added.  The  reason  that  is,  is  I  could  still  be  making  changes  to  this.   Maybe  I  don't  like  this  name.  I  might  want  to  call  it  metric  versus  time.  Maybe  I  don't  like  the  format  of  these  time  stamps,  so  I  can  change  that.  But  I'm  doing  all  of  these  changes  and  the  workflow  doesn't  capture  it  until  I  close  this.   When  I  hit  Close,  there  you  go. I'm  going  to  stop  recording  now  and  I  will  go  ahead  and  close  these  two  new  tables  that  were  made  and  show  you  that  this  works.  There  we  go.  Okay,  so  now  you  may  be  asking,  "Well,  why  would  I  use  a  workflow?  Why  not  just  use  a  script?  What's  the  advantage  of  that?" So l et's  close  out  of  this  and  show  you  why.  I  gave  you  an  accidental  preview  of  this  earlier,  but  we'll  show  you  here. We're  back  to  our  application  here.  I  went  through  and  I  pulled  these  three  tags,  and  then  when  I  concatenated  them,  it  was  looking  for  those  three  data  tables  with  those  three  names.   If  I  pulled  different  tags,  so  let's  say  I  just  want  all  of  this  data,  so  everything  with  an  A  at  the  end  here,  I'm  going  to  do  a  data  poll  on.  If  I  was  using  a  script  and  I  was  looking  for  those  specific  data  file  names,  the  workflow or  a  script  wouldn't  work,  but  the  workflow  has  this  generalizability.   If  I  look  here  in  this  concatenate  tables,  it's  looking  for  three  tables,  IA,  IB,  and  I C,  and  I  don't  have  those  tables  open.  I  have  AI  open,  but  I  have  two  other  tables  open. L et's  see  what  happens  when  I  run  this. It  prompts  me.  It  says,  "Hey,  what  data  do  you  actually  want  to  compile?  You  have  different  data  sources  here."  I  actually  want  to  compile  these  three  that  I  have  open.   Now  it  says,  "Oh,  wait,  I  couldn't  find  the  column  names."  Again,  when  I  went  through  and  I  recoded  columns,  if  I  was  running  a  script,  it  would  potentially  just  wouldn't  work  because  it  didn't  find  that  column  name. But  here  it  says,  "Hey,  I  can't  find  this  column  IB.  Which  one  is  that?"  Let's  just  use  a  replacement  column.   There  we  go,  and  it  worked.  What  this  is  doing  is  it  has  the  ability  to  be  generalizable  with  this  reference.   By  default,  a  workflow  has  this  ability  to  have  a  replacement  reference,  and  I  can  manage  this.   Here  you  can  see  here  are  the  tables,  and  I  can  prompt  you  to  pick  those  tables.   Then  here  are  the  columns  that  are  referenced,  s o  I  can  have  substitutes  there. Unlike  a  script,  this  will  prompt  you  if  it  doesn't  find  what  it's  looking  for.  So  it's  very  nice  in  that  aspect.  T hat  was  basically  how  to  build  a  workflow  and  then  use  that  to  compile  data  and  have  it  be  generalizable.  I'm  going  to  pass  it  back  to  Hadley  here  for  some  closing  thoughts. Thanks  very  much.  Let  me  just  share  my  screen.   In  summary,  making  an  easy  button  for  data  access  solves  some  problems.  It  makes  a  lot  of  things  easier.  What  it  does  is  it  addresses  difficulties  in  assessing  data  because  problems  persist  longer  than  needed.  That's  what  happens  when  you  don't  have  access  to  the  data.   Getting  the  data  in  the  right  format  is  really  80 %  of  a  solution.  Once  you've  gotten  the  data  collected and  formatted,  compiled  it,  cleaned  it,  and  then  doing  the  rest  of  it  is  really  the  fun  and  easy  part. Creating  these  buttons  allows  data  to  be  quickly  and  easily  imported.  It's  possible  to  add  filters,  not  something  that  we  showed  today,  but  if  you  go  back  and  look  at  our...  Well,  I  guess  the  selecting  from  the  list  was  one  of  the  filter  options.  Of  course,  you  can  always  add  others.  You  can  see  that  from  the  previous  presentation  that  we  did  back  in  2022,  as  well  as  extending  this  to  SQL,  web  API,  and  any  other  place  that  your  data  may  be. There  are  two  add- ins  on  the  community  that  we'd  like  to  mention.  There's  the   OSIsoft PI Importer  as  well  as  the PI  Concatenator,  you  can  find  those  things  here.  If  you  just  Google  these  or  look  on  our  community, JMP. C ommunity.j mp.c om.  Thank  you  very  much.
Working as a manufacturer in the biopharmaceutical industry means we often need to show that we obtain similar results on different sites when transferring a manufacturing process and, notably, when scaling processes up or down. Comparison techniques such as t-tests and ANOVAs are widespread, but equivalence testing has become a standard way to show two processes behave similarly. When we look at a few parameters, those techniques are easy to apply, but when we have large numbers of variables, it becomes difficult to see the bigger picture. The challenge with equivalence testing is that it requires the scientists to provide a value for what they deem an acceptable difference between the groups of data. In addition, many processes change over time, and we are interested in capturing whether they behave similarly across the duration of the process. JMP scripting is a great way to automate the data prep, visualisations, and production of all the plots and comparison tests for those data sets. The multivariate platform in JMP helps create a holistic picture of the process for each time point. We can now use equivalence testing and relate it to the individual variable contributions.     Hello,  everyone.  Thanks  for  joining  my  JMP  talk  today.  Today,  I  would  like  to  talk  to  you  about  how  we  look  at  equivalence  between  sets  of  batch  data  over  time  at  Fujifilm.  In  particular,  I'd  like  to  speak  about  a  new  multivariate  take  on  the  two  one  sided  T-t ests. Although  this  particular  bit  is  a  new  comparison  technique,  it  doesn't  replace,  rather  it  complements  the  usual  single  point  techniques,  and  in  the  workflow,  we  are  still  relying  heavily  on  those.  The  last  bit  here  will  describe  how  we  compare  data  sets,  from  the  time  series,  we  get  from  two  different  scales,  but  it  could  be  any  logical  group. I'll  quickly  go  through  how  we  prepare  the  data,  or  rather  get  it  in  the  state  where  we  can  run  the  scripts,  and  we  will  look  at  the  visualizations  for  the  usual  single  time  points  in  JMP,  and  we'll  also  look  at  some  scripts  that  I  use  to  run  PCA  on  all  the  variables  and  test  equivalence. T alking  about  TOST  or  equivalence  test.  T he  two  T-test  is  a  two  one  sided  T- test,  and  it  checks  whether  on  average,  your  two  data  sets  for  a  given  variable  are  equivalent.  Very  similarly,  a  multivariate  test  checks  whether  on  average  the  end's  principal  component  or  PC  for  a  given  day  of  a  fermentation  here,  is  equivalent  for  two  different  scales. You  need  data  to  be  in  a  specific  format,  but  in  particular  you  need  two  different  groups  here.  It's  two  scales,  if  you  have  more  than  two,  you  will  need  to  split  them  in  sets  of  two.  It's  more  suitable  for  time  series,  because  it's  a  data  reduction  technique  that  you  wouldn't  need  if  you  hadn't  a  problem  with  having  a  lot  of  data  points. This  was  part,  originally,  of  a  script  that  was  all  done  in  R,  but  as  I  moved  into  using  JMP  for  visualizations  more  and  more,  I  thought  that  it  was  much  easier  to  use,  especially  if  we  want  to  pass  on  those  scripts  to  staff.  R  is  not  always  that  accessible. I  moved  a  lot  of  the  script  into  JMP  by  now.  The  only  thing  that's  still  in  R  is  the  data  imputation.  The  scripts  and  pre  work  take  care  of  outliers,  missing  data,  and  inconsistent  entries.  The  R  does  the  data  imputation.  Why  have  I  moved  to  JMP?  First  of  all,  it  was  because  I  visualized  in  JMP. Why  is  it  good  to  visualize  data  in  JMP?  Because  JMP  is  just  made  for  that.  It's  really  good  for  looking  at  missing  data  and  outliers  and  any  graphs.  The  time  series  are  no  exception  to  this.  The  missing  data  visuals  in  JMP,  give  you  a  color  map  of  where  your  data  is  missing,  so  you  can  find  rows  where  data  is  missing. That  means  a  day  might  not  be  the  best  to  keep  in  the  data  set.  You  can  immediately  visualize  chunks  of  data  missing,  that  it  would  be  days  in  a  row,  and  you  can  make  a  decision  on  whether  you  want  to  keep  all  those  days  or  do  the  analysis  twice,  for  example,  and  it  will  quickly  show  you  if  you  have  data  missing  from  one  group  and  not  the  other,  in  which  case  you'd  have  to  do  away  with  that  variable  altogether. Also,  outliers  need  processing  prior  you  interpolate,  otherwise  they  will  have  a  huge  effect  on  the  PCA  and  the  comparison  test.  There  is  an  outlier  detection  platform  in  JMP,  but  this  is  used  here  in  our  workflow  combination  with  watching  the  time  series  and  the  comparisons.  But  I'll  show  all  that  in  the  demo. Then  the  Graph  Builder  is  used  to  plot  all  the  time  series.  There  are  many  ways  to  do  that,  but  I  will  show  you  in  the  script  the  two  main  graphs  that  we  use  to  check  that  our  data  is  good  to  go.  Here  they  are,  the  time  series,  and  those  in  particular  are  wrapped  by  a batch,  so  that  you  get  an  individual  plot  for  each  time  series. It's  a  small  plot,  but  usually  it's  enough  to  spot  missing  data  outliers  or  any  weird  or  different  behavior.  Here,  for  example,  let  me  get  the  laser  on.  There we go. Here  we  have  a  cluster  of  points  that  are  questionable.  We  need  to  check  whether  this  is  behavior  that  we  want  to  capture  or  if  it's  behavior  that's  unusual,  and  we  want  to  imputate  the  rest  of  the  data. Or  we  could  have  single  outliers  like  here.  We  see  this  quite  often,  but  here  you  can  notice  it's  all  on  the  same  day.  So  is  it  something  that  happens  on  day  six  in  those  fermentations?  Another  way  to  plot  time  series,  and  we  do  that  as  well,  is  to  actually  overlay  them.  By  overlaying  them,  you  can  see  whether  your  data  is  consistent  for  a  given  day. Here  we  have  individual  value  plots,  which  is  what  we  look  at.  But  I've  also  asked  JMP  to  put  a  little  box  plot  around  all  those  data  because  this  mimics  what  we  have  when  we  are  doing  our  ANOVAs  in  the  second  step  of  our  data  visualization. This  is  very  typical  of  what  happens  in  our  processes  over  time.  In  the  first  week,  the  data  is  showing  very  low  variability.  The  box  plots  are  small  and  they  are  usually  fairly  well  aligned.  The  average  is  around  the  same  value. When  we  reach  the  second  week  of  the  fermentation,  things  start  to  drift  apart  and  things  start  to  get  much  more  variable.  If  you  were  plotting  day  six  by  itself  in  one  batch  per  day  type  of  plot  here,  you'd  be  able  to  see  what  the  difference  is  on  average  between  the  large  scale  in  red  and  the  small  scale  in  blue. On  day  six,  you  have  a  small  difference  and  on  day  12,  you  have  a  large  difference.  Those  differences  are  what  we  are  looking  to  test  when  we  do  single  poin T-tests  with  our  ANOVAs.  Before  we  carry  on,  I  just  want  quickly  have  a  recap  of  what  the  differences  are  between  T- test  and  TOST. A  T- test  is  completely  statistical,  whether  the  TOST  requires  a  user  input  for  a  practical  acceptable  difference.  In  a  T- test,  you  hypothesize  that  there  is  no  difference  between  the  mean  and  if  you  get  a  small  P  value  or  significant  result,  then  you  deny  that  and  you  say  there  is  a  difference  between  your  data  sets. A  TOST  tells  you  that  there  is  no  difference  between  your  data  sets,  if  you  have  a  significant  result.  If  you  fail  a  T- test,  the  confidence  interval  for  the  main  difference,  in  those  plots,  that's that  black  square  here.  So  the  confidence  interval  for  the  TOST,  if  you  fail,  does  not  cross  the  zero  line. But  if  you  fail  a  TOST,  the  confidence  interval  for  the  difference  is  not  contained  by  the  practical  acceptable  difference.  You  have  two  outcomes  for  a  T- test  and  two  outcomes  for  a  TOST,  which  means  you  have  four  combinations. Either  you  pass  or  fail  both,  or  you  pass  and  fail  one  of  them.  In  JMP,  usually,  there  are  different  platforms  that  do  TOST  and   T-test,  but  you  will  have  a  normal  distribution  for  the  difference.  If  you  pass  a  test,  then  your  mean  difference  is  in  that  little  bell  curve.  If  not,  it's  outside. So  it  could  quickly  visualize  which  ones  passed  or  not  in  an  ANOVA.  H ere  they  are,  the  ANOVAs.  Let's  step  two of  our  visualization  and  clean  process.  We  use  a  script  to  plot  all  of  those  together  in  a  report  so  that  we  can  look  at  all  of  them.  If  you  think  about  the  data  set  here,  it  was  about  15  variables  over  12  days.  So  you  have  over  150  such  plots,  which  is  a  lot  of  data  to  look  at,  especially  if  you  change  things  and  plot  them  again. But  here  are  some  examples  of  what  you  might  see.  You  might  pass  a   T-test  or  fail  it.  You  might  pass  a   T-test,  but  only  because  you  have  enough  layer  that's  pulling  one  of  the  data  sets  up  or  down,  for  example.  There  are  many  possible  results  that  you  would  get  here. Not  everything  is  on  this  screenshot,  but  we're  also  looking  at  the  variance  in  that  report.  The  principal  components  comparisons,  we  don't  do  this  here  with  the  script.  I  use  the  graph  builder  to  actually  plot  those  because  they  are  like  the  plots  I  had  in  a  couple  of  slides  previous. But  the  difference  that  you  could  see  here  is  in  the  scale.  Now,  because  we're  talking  about  principal  components,  the  scores  tend  to  be  around  zero  on  average,  and  they  vary  between  minus  three  and  three  because  we  normalize  the  data  before  carrying  out  the  PCA. The  advantage  of  this  is  that,  now  instead  of  having  to  provide  an  acceptable  value  for  a  task,  because  the  data  is  normalized,  we  can  actually  blanket  calculate  that  acceptable  difference  by  taking  a  multiplier  of  the  standard  deviation  for  our  scores. Here  we  have  the  first  principal  component  and  it  clearly  shows  there's  a  big  difference  between  the  large  and  the  small  scale.  Here  with  the  second  principal  component,  there  is  a  smaller  difference.  This  is  typical  of  what  we  see,  because  the  first  principal  component  tends  to  capture  the  broader  shape  of  the  fermentation  profile. So  if  there  is  a  difference  in  that  broader  shape,  the  TOST  for  the  first  PC  tends  to  fail.  Typically,  what  I've  seen  is  that  for  our  data,  two  principal  components  could  capture  about  60 %  of  the  information  in  the  variables.  For  those  of  you  who  may  have  done  PC  on  data  before,  that  may  seem  a  low  number,  but  that's  probably  because  all  the  variables  have  a  different  story  to  tell. Another  thing  I'd  like  to  spend  a  little  bit  of  time  on  is  the  loading' s plots.  This  is  part  of  the  PCA  platform  in  JMP,  and  it  has  this  plot  at  the  very  top  of  the  platform,  and  it's  a  good  one  to  look  at  if  you're  a  scientist  that's  more  interested  in  looking  at  what's  really  going  on. But  the  reason  why  I  have  this  on  a  slide  here  is  because  this  is  a  good  representation  of  what  we  are  going  to  see  in  how  much  each  variable  contributes  to  the  model  that  we're  choosing  before  doing  the  equivalence  test. Here,  for  example,  all  the  variables  related  to  viability  for  our  fermentation  are  highly  correlated  because  they  are  close  together and  the  way  they  project  onto  the  PC  one  and  PC  two  here,  they  get  high  values.  We  said  that  those  map  well  to  the  first  two  PCs.  So  that  means  that  they  are  participating  a  lot  to  the  model. Here  we  have  some  other  variables  that  are  closely  clustered  together  here.  So  sodium,  potassium  and  Glutamine.  They  are  highly  correlated.  They  map  very  well  to  the  first  PC,  but  not  to  the  second.  So  they  don't  contribute  a  lot  to  the  model  with  the  second  PC. Then  here  you  have  problematic  variables.  They  do  not  map  well  to  either  PCs.  That  means  that  in  a  2PC  model,  you  are  not  going  to  capture  the  behavior  for  those  variables.  When  you  see  this,  you  already  know  that  a  2PC  model  is  not  going  to  give  you  a  lot  of  equivalence  for  those  variables. Last  step  is  to  actually  plot  the  TOST.  This  is  done  again  using  a  script.  Those  graphs  are  not  the  graphs  that  you  would  usually  find  in  JMP,  but  they're  pretty  typical  of  TOST  plots.  For  each  PC  for  each  day,  we  will  have  a  TOST  or  equivalence  test  result.  If  the  confidence  interval  is  outside  of  the  acceptable  range,  which  is  three  times  the  standard  deviation  of  the  scores  in  this  case,  then  we  fail  the  TOST. When  we  fail  the  test,  we  give  it  a  zero  in  the  script.  To  summarize  what  happens  here,  each  PC  will  capture  a  certain  amount  of  the  variability.  Each  PC  can  pass  or  fail  a  TOST.  Furthermore,  each  variable  contributes  to  a  certain  extent  to  each  PC. A  principal  component  is  a  linear  combination  of  all  the  variables.  Altogether,  a  variable  that  has  a  strong  contribution  to  a  principal  component  that  passes  a  TOST  will  have  a  strong  impact  on  overall  equivalence  between  the  batches.  This  is  what  we  are  trying  to  put  together. How  do  we  put  this  together?  There  are  many  ways  we  could  put  this  together.  I  have  done  something  pretty  simple  here.  It's  just  a  sum product  of  passing  or  failing  a  PC,  times  the  contribution  of  the  variable.  Basically,  for  example,  for  two  PCs  here,  we're  failing  the  first  equivalence  test,  and  this  viable,  had  a  40 %  contribution,  so  that  gets  zero,  plus  one  passing  the  TOST,  times  the  contribution  here,  to  that  PC. That's  the  overall  score  for  it.  In  black,  you  have  the  basic  scores  for  each  day  here.  Let's  call  the  IEQ,  you'll  see  that  in  the  tables  later.  On  average,  we're  getting  about  70 %.  It's  not  too  bad  over  the  course  of  the  fermentation.  Adding  PCs  doesn't  make  a  very  big  difference  because  this  mapped  very  well  to  the  two  PCs. The  pH,  which  was  one  of  the  variables  that  did  not  map  very  well  to  the  first  two  PCs  gets  a  really  bad  average  score,  if  you  add  only  two  PCs  in  the  model,  around  30 %.  But  if  you  add  another  four  PCs,  there  are  going  to  tell  us  this  one,  that  number  goes  up  to  over  80 %. There  is  no  bad  or  good  numbers  here,  but  it's  something  you  need  to  keep  in  mind  that  it  really  depends  on  the  model  that  you  choose  for  running  this.  Moving  on,  this  is  the  very  last  output  from  the  script,  and  that's  what  we're  really  interested  in,  especially  if  we  are  comparing  different  processes  or  different  ways  to  run  it. You  have  a  bar  chart  of  all  your  individual  equivalence s.  This  shows  you  really,  by  variable,  which  ones  are  similar  from  one  scale  to  the  other.  Here  we  have  three  variables  that  are  pretty  similar  amongst  batches,  and  then  it  really  drops  down  up  to  the  last  one  here  which  has  a  very  low  equivalence. In  the  top  right  corner  here,  JMP  will  put  an  average  if  you  ask  for  it.  T hat's  a  good  metric,  although  it's  very  reducting.  It's  a  good  metric  to  compare  the  same  processes,  if  you're  using  different  ways  to  run  the  TOST  or  different  numbers  of  PCs. I  have  more  slides  about  this,  but  I  think  it's  better  to  run  straight  into  the  demo  in  the  interest  of  time.  I've  put  a  little  JMP  journal  together.  Not  very  good  at  this.  I  hope  it's  going  to  work.  In  JMP,  we'll  just  look  at  the  data,  the  three  scripts,  and  two  different  ways  to  run  the  last  one. That's  the  anonymized  data  set  here.  My  computer  is  very  slow.  There  we  go.  I  think  it's  working.  It's  just  really  sluggish.  In  this  data  set  is  the  bare  minimum  that  you  need,  a  run  type  with  two  groups,  a  batch  ID,  which  is  a  categorical  variable  despite  being  a  number  here,  and  the  time  ID  in  this  case,  it's  over  one  recording  a  day,  so  it's  over  a  number  of  days. The  first  bit  we  do  is  plot  all  the  time  series.  I've  left  this  with  a  few  bits  and  bobs  that  are  not  really  good,  so  that  I  could  point  them  out.  All  the  scripts  in  this  group  and  there  are  three  of  them,  are  basically  doing  the  same  thing  at  the  start,  a  bit  of  cleaning  up  and  prepping,  and  it's  going  to  create  a  clone  of  your  data  table  to  work  on  without  breaking  your  data  table,  and  also  a  directory  to  save  all  the  outputs  from  the  script. Then  the  scripts  are  basically  looping  here  over  the  number  of  variables  and  plotting  them  one  at  a  time  and  putting  everything  in  a  report.  Let's  run  this.  This  is  a  very  generalized  script  and  it  works  well  on  all  the  data  sets.  Always  nervous,  things  are  not  going  to  work  because  it's  so  slow.  Here  we  go.  It's  still  thinking.  I'll just  be  patient  and  wait. This  will  plot  this  wrapped  by  batch  time  series  and  in  overlaid  time  series  as  well.  You'll  have  one  of  each  for  each  variable.  It  will  say  variable  one,  the  actual  name  of  your  variable  here  and  plot  them.  This  is  what  we  want  to  see,  basically. We  have  the  same  shape  for  all  the  batches  and  they  are  consistent  across  the  scales.  We'll  move  down  to  one  that  doesn't  look  as  good.  There  we  go.  For  this  variable,  we  have  the  same  shape  for  this  scale  ish,  and  then  a  very  different  shape  for  the  small  scale.  We  need  to  find  out  why  this  is  happening  and  do  we  want  to  keep  this  variable  in  the  model. In  particular  here,  we  have  one  batch  that's  very  misbehaved.  If  you  look  at  this  in  an  overlay  plot,  it  is  very  obvious  that  this  average  curve  doesn't  represent  either  the  large  scale  or  the  small  scale.  This  is  a  variable  that  you  need  to  come  back  on. Variable  6,  I  think  was  in  the  presentation.  This  shows  you  where  you  have  some  outliers  that  you  may  have  missed  the  time  before.  Then  another  thing  you  need  to  look  for  in  those  plots,  is  whether  your  small  and  large  scale  numbers  are  mingled  together. If  your  red  and  blue  points  are  all  mixed  together,  then  chances  are  your  scales  are  pretty  similar.  But  in  some  cases,  like  here,  for  example,  the  small  scale  data  is  almost  always  above  the  large  scale  data.  So  you  can  expect  to  see  a  difference  here.  Here  I  have  less  than  variable  11,  which  is  really,  really  bad. This  happens  quite  often  to  us,  when  we  have  a  difference  in  recording  the  variables.  Here  it  was  actually  a  different  unit,  and  that's  why  we  have  very  different  numbers.  Now,  when  you  put  the  data  from  graphs  in  a  report  like  this,  unfortunately,  you  lose  my  favorite  feature  in  JMP,  which  is  interactivity. You  can't  actually  highlight  a  point  or  a  series  of  points  and  go  see  what  they  are  doing  in  the  table.  But  the  script  is  saving  all  those  individual  plots  for  you.  Here  we  were.  In  here,  it's  created  a  directory  with  all  the  plots  that  we've  just  seen,  plus  it  saved  that  clone  with  the  time  series  tagged  at  the  end. In  here,  you  can  see...  If  I  can  actually  use  my  mouse,  you  can  see  those  time  series  one  at  a  time.  Then  you  can  select  the  points  that  you  would  normally  to  use  the  interactivity  in  JMP.  If  you  have  several  open  at  the  same  time,  all  of  them  will  be  highlighted  in  all  your  plots. That's  it  for  time  series.  We'll  move  on  to  the  second  part  of  the  process,  and  that's  looking  at  all  your  ANOVA.  This  is  in  the  Fit Y-by-X platform,  and  like  the  other  script,  it's  doing  a  bit  of  tidying  up  at  the  start,  and  then  it  loops  over  days  and  creates  a  subset  of  the  table  for  each  day.  Then  there's  a  report  on  the  differences  between  the  groups. If  you  have  written  script  before,  you  will  see  that  this  is  pretty  typical  of  writing  a  script  in  JMP.  Some  of  it  is  written  by  hand,  and  a  lot  of  it,  the  bulk  of  what's  happening,  I  basically  ran  in  JMP  and  copy  pasted  it  into  the  script.  We'll  run  this  on  this  dirty  data  set.  Hopefully,  it  doesn't  take  too  much  time. If  you  have  a  fast  computer,  I  believe  that  you  would  not  even  see  those  windows  actually  open.  It  would  be  instant.  You  could  see  here  how  JMP  is  basically  going  to  a  number  day and  then  taking  a  subset  of  that  table,  running  a  script,  saving  it  to  that  little  data  table,  and  it's  doing  this  for  every day,  and  we  have  12  days  here,  so  it  takes  a  while. This  will  also  save  everything  in  its  own  folder.  In  this  case,  we'll  just  look  at  one  of  the  saved  reports.  Here  you  have  a  subset,  all  this  is  for  day  one,  but  it's  a  much  smaller  table.  Here  you  have  the  report  that  you  and  your  scientists  would  want  to  look  at,  which  shows  you  all  the   T-tests.  Now  you  can  look  at  all  those   T-tests  just  to  see  if  they  pass. You  could  count  how  many  pass  and  take  a  proportion  of  passing   T-test.  But  this  is  also  a  good  place  for  finding  those  more  subtle  outliers  because  each  box  plot  might  have  some  data  that  you  want  to  question.  Again,  you  would  highlight  those  points  and  check  whether  you  want  to  keep  them  in  your  final  data  set  or  not. Moving  on  again.  We're  finally  at  the  last  bit,  which  is  probably  the  most  interesting.  That's  all  the  PCAs.  This  is  a  much  bigger  script,  because  it  has  to  fetch  information  from  the  JMP  platforms.  I  don't  have  a  lot  of  time  for  this,  but  I  can  answer  questions  at  the  end  if  you're  interested. The  other  thing  with  this  script  is  that  I  have  hard  coded  some  bits,  so  it  needs  to  be  modified  for  every  data  set.  I  need  to  fix  that  at  some  point.  For  example  here,  it's  actually  doing  a  principal  component  analysis  on  one  of  the  days,  so  a  subset  of  the  data  table.  Then  we  switch  to  the  PCA  report,  and  this  becomes  an  object  in  your  JMP  script. Then  from  this  object  here,  you  can  get  items.  For  example,  I  run  the  PCA  and  I  have  this  as  an  object,  and  now  I  say,  I  want  the  eigenvalues  in  there.  The  way  to  find  the  objects  that  you  need  is  to  open  the  tree  structure,  in  your  JMP  report  and  everything  is  numbered  and  aligned.  So  you  can  get  everything  that  you  need  from  the  JMP  report  as  a  value,  as  a  matrix,  as  an  array. It  really  depends  on  what  you  want.  But  you  could  see  I've  done  this  here.  So  once  it  has  all  these  values  here,  I  extract  the  principal  components  and  I  fit  again,  Y-by-X  the  principal  components  versus  my  scales  here.  A gain,  I  switch  to  report.  I'm  doing  this  so  that  I  can  get  the  root  mean  square  error  from  that  report. That's  because  it's  the  best  estimate  I  will  have  for  my  standard  deviation.  I'm  using  this  standard  deviation  here  to  blanket  calculate  my  acceptable  difference  for  my  TOST.  I  finally  can  actually  run  my  TOST  here.  So  again,  that's  another  group.  And  this  time  it's  feet  wide  by  X,  but  I'm  asking  for  an  equivalence  test  with  Delta  as  my  acceptable  difference. The  rest  of  the  script  will  plot  all  the  tasks  and  it's  very  boring.  Then  at  the  end,  it  will  create  a  table  with  all  the  outputs  and  all  the  things  that  we  need  to  create  our  bar  chart  and  eventually  we  could  also  create  the  bar  chart.  We'll  run  this  for  this  data  set.  Just  checking  I  have  the  right  one.  There  we  go.  There's it.  I'll  click  on  it  now. You  could  see  it  in  the  background  here.  It's  upsetting  the  tables  and  it's  doing  this  painfully  slowly.  For  every  day,  it  will  select  the  day,  make  a  smaller  data  table,  do  a  PCA  on  all  the  variables,  and  then  I  will  save  the  principal  components,  the  eigenvalues  and  the  cosines  for  further  calculations. It  will  use  the  principal  components  for  first  doing  a   T-test,  because  that's  where  we're  going  to  get  our  estimate  of  the  standard  deviation  and  second,  do  an  equivalence  test  to  check  whether  it  passes  equivalence.  I  think  we're on  day  seven,  we're  going  to  get  there  eventually. It  will  also  plot  all  our  equivalence  tests,  and  it  will  also  create  the  bar  chart  and  the  new  directory.  Bre with  my  computer. Well,  this  is  taking  longer  than  it  should,  really.  I  hope  it's  going  to  work.  Sometimes  scripts  that  are  quite  busy,  mean  that  it's  hard  for  JMP  to  catch  up  with  what's  happening  in  the  background. I  hope  it's  not  going  to  fail  because  of  that.  No,  here  we  go.  It's  now  created  a  report,  and  for  each  day,  it's  going  to  put  each  TOST  in  a  column  of  graphs.  I  have  written  the  script  in  such  a  way  that  they're  all  the  same  size  and  that  was  suggested  by  one  of  our  scientists,  actually,  so  they're  much  easier  to  compare. Here  we  had  data  that  really  needed  some  extra  cleaning  up,  so  it  comes  to  no  surprise  that  all  our  equivalence  tests  for  the  first  principle  component  are  failing.  That's  because  the  PCA  is  done  on  variables  that  are  not  similar  between  groups.  But  the  more  subtle  behavior  that's  captured  in  a  second  PC  is  still  passing  a  lot  of  the  equivalence  tests. I'll  close  this  to  show  you  what's  been  saved  in  the  directory  for  this  one.  For  this,  you  have  individual  subsetted  table  with  their  PCA  and  sub  script.  Even  opening  a  small  table  like  this  is  taking  a  long  time.  There  we  go.  Here  are  the  PCAs.  Here's  the  loading  plot. This  is  where  the  eigenvalues  come  from,  and  here  the  cosines  which  are  pulled  out  by  the  script.  It  has  the  TOST  results  that's  used  for  making  the  TOST  graphs,  but  we've  already  seen  those.  It  has  a  table  that  shows  you  which  TOST  passed with  a  zero  or  a  one  here,  and  the  explained  variance,  and  the  calculations  for  the  explained  variance  in  the  same  table. T his  columns  here  is  what  we're  going  to  use  to  create  our  bar  chart.  The  bar  chart  gets  saved  in  the  journal  in  this  case.  There  are  many  ways  you  could  do  this,  really.  For  15  variables  and  not  the  best  of  cleanup  jobs,  let's  see  what  equivalence  we  get  here.  It  is  all  working. It's  just  really  slow.  Sorry  about  that.  There  we  go.  I've  had,  again,  feedback  from  scientists  saying  that  they  would  prefer  to  see  the  variables  in  the  order  they  were  in  originally,  because  most  of  our  data  is  recorded  in  templates,  so  people  are  used  to  seeing  those  variables  in  order. But  it's  also  nice  to  have  it  in  descending  order  so  that  you  can  quickly  see  which  variables  are  quite  equivalent  and  which  ones  are  not  doing  so  well.  Here  on  average,  we  have  21 %  equivalence  across  all  our  variables.  It's  not  a  very  high  number.  I  don't  have  a  criteria  for  that  number,  but  I  think  around  60 %- 75 %  would  be  quite  desirable. I'll  close  everything  I  can  to  make  some  space.  We'll  go  back  to  see  what  happens  if  we  remove  one  offensive  variable.  I  haven't  done  enough  cleaning  up  here,  but  I'm  removing  variable  11,  which  was  really  not  an  acceptable  variable  to  have  in  our  data  set. I  will  run  the  task  with  three  PCs  this  time,  so  that  I  can  at  least  have  a  shot  at  capturing  the  variability  in  things  like  pH  or  PO2,  which  tend  to  be  much  more  complex.  We'll  run  this  one  and  we'll  have  a  look  at  the  bar  chart  and  see  how  much  equivalence  we  can  capture. I  suspect  this  is  going  to  be  slow  again.  This  is  going  slowly.  We're  only  on  day  two,  so  I  need  to  fill  up  the  time.  As  I  said,  we  don't  have  a  criterion  to  look  at  this  total  number.  It's  more  of  a  relative  number. Either  you  have  a  set  of  criteria  for  cleaning  up  your  data,  or  maybe  because  you  are  running  batches  and  recording  them  in  similar  ways,  you  would  say,  we  will  always  only  look  at  those  10  variables,  and  then  you  can  compare  the  overall  equivalence  or  the  bar  charts  for  given  sets  of  variables  that  are  comparable. The  other  way  you  could  do  it  is  using  the  same  data  sets  like  I  have  today. I  know  we  have  21 %  equivalence  for  15  variables,  but  once  we  remove  variables  11  and  five,  for  example,  and  clean  up  some  of  the  outliers,  then  that  number  starts  going  up,  or  it  could  be  we  have  only  21 %  with  two  PCs,  but  if  we  add  a  couple  because  some  of  the  variables  don't  map  very  well  to  the  first  two  PCs,  then  this  number  also  is  going  up. It's  very  difficult  to  put  a  criterion  on  that  number,  but  it's  pretty  good  for  comparing  different  models  or  different  data  sets  that  have  been  treated  reasonably  similarly.  How  are  we  doing  here?  Almost  there.  I'm  very  sorry  about  this.  My  computer  is  particularly  slow  today.  Here  we  go. Here  are  tasks,  and  this  time  there  are  three  PCs,  so  they're  aligned  by  three.  I  think  if  we  did  this  bigger,  it  would  start  sticking  out  of  the  window  here.  Because  we  have  removed  one  variable  already,  we  could  see  that  some  of  the  tasks  are  passing  even  for  the  first  PC.  So  that's  definitely  made  a  very  big  difference. I  will  close  those  and  go  back  into  the  directory  it  was  created.  The  way  I've  written  this,  if  I'm  doing  two  data  sets  in  the  same  directory,  it's  going  to  get  erased  because  Save  As  in  script  in  JMP  will  save  on  top  of  existing  data  if  it  has  the  same  name. Here  was  the  same  data,  we  just  removed  one  variable  and  added  one  PC,  and  we  got  from  21 %  to  about  47 %  on  average  across  the  variable  equivalence.  That's  showing  you  what  a  big  difference  it  can  make  from  just  a  small  cleaning  step  or  choosing  a  slightly  different  model  with  one  more  PC  in  this  case. Now  it's  me.  I've  gone  through  all  the  scripts.  I'll  put  back  my  very  last  slide  up  here  to  conclude.  This  is  a  new  technique  to  look  at  equivalence,  this  multivariate  technique.  I  haven't  seen  it  used  somewhere  else.  It's  a  complement,  not  a  replacement. You  should  still,  especially  if  you're  heavily  involved  with  the  data,  you  should  still  looking  at  all  the  time  points  that  you're  interested  in.  It  gives  a  holistic  picture  with  a  lot  of  detail  because  you  have  a  lot  of  output.  But  if  you're  only  interested  in  the  final  information,  really  that  bar  chart,  gives  you  a  lot  of  information  in  just  one  graph. You  could  do  this  with  any  types  of  groups  that  you  want  to.  This  happens  to  be  scales  because  we  look  at  the  difference  between  manufacturing  and  lab  scales  a  lot  at  Fujifilm.  That's  it,  really.  It's  your  multivariate  to  one  sided   T-test.  As  a  part  of  our  process  flow  to  look  at  scale  up  and  scale  down  data.  I'd  be  happy  to  take any questions.
Since the Functional Data Explorer was introduced in JMP Pro 14, it has become a must-have tool to summarize and gain insights from shape features in sensor data. With the release of JMP Pro 17, we have added new tools that make working with spectral data easier. In particular, the new wavelets model is a fast alternative to existing models in FDE for spectral data. This presentation introduces these new tools and how to use them to analyze your data.     Hi,  everyone.  Thanks  for  coming  to  our  video.  My  name  is  Ryan  Parker,  and  today  I'm  going  to  present  with  Clay  Barker  about  some  new  tools  that  we  have  added  to  analyze  Spectral  Data  with  the  Functional  Data  Explorer  in  JMP  Pro  17.   First,  I  just  wanted  to  start  off  with  some  of  the  motivating  data  sets  that  led  us  to  add  these  new  tools.  They're  really  motivated  by  these  chemometric  applications,  can  definitely  be  applied  to  other  areas,  but  for  example,  we  have  this  spectroscopy  data  where  the  first  thing  you  might  notice  with  this  is  we've  got  a  lot  of  data  points  sampled,  but  we  also  have  some  very  sharp  peaks  in  our  data.  That's  going  to  be  a  recurring  theme  where  we  have  a  need  to  really  identify  these  sharp  features  that  the  existing  tools  we  have  in  JMP  are  a  little  difficult  to  really  capture  those.   For  example,  we're  thinking  about  composition  of  materials  or  how  we  can  detect  biomarkers  and  data.   These  are  three  spectroscopic  examples  that  we'll  look  at. Another  example  of  data  that  is  of  interest  is  this  mass  spectrometry  data.  Here  we're  thinking  about  a  mass  to  charge  ratio  that  we  can  use  to  construct  a  spectrum  where  the  peaks  in  the  spectrum,  they're  representing  proteins  that  are  of  interest  in  the  area  of  application.  One  example  is  comparing  these  spectrums  between  different  patients,  say  a  patient  with  cancer  or  a  patient  without,  and  the  location  of  these  proteins  is  very  important  to  identify  differences  in  these  two  groups. Another  example  is  chromatography  data.   Here  we  can  think  about  we're  using  some  chemical  mixture  over,  we're  using  a  material  that's  going  to  help  us  quantify  relative  amounts  of  the  various  components  that  are  in  these  mixtures.   By  using  the  retention  time  in  this  process,  we  can  try  to  identify  the  different  components.   For  example,  if  you  didn't  know  this,  I  was  not  aware  until  I  started  to  work  with  this  data,  trying  to  impersonate  olive  oil  is  a  big  deal.   We  can  use  these  data  sets  to  figure  out,  okay,  what's  a  true  olive  oil,  or  what's  just  some  other  vegetable  oil  that  someone  might  be  trying  to  pass  off  as  an  olive  oil? The  first  thing  I  want  to  do  is  go  through  some  of  the  new  preprocessing  options  that  we've  added  to  help  work  with  spectral  data  before  we  get  to  the  modeling  stage.   We  have  a  new  tool  called  the  Standard  normal  Variate,  a  multiplicative  scatter  correction  where  you  have  if  you  have  light  scatter  in  your  data,  the   Savitzky–Golay filter,  which  is  the  smoothing  step  when  you  have  spectral  data  that  we'll  get  into.  Finally,  a  new  tool  to  perform  a  baseline  correction  for  the  data  to  remove  trends  that  you're  not  really  interested  in  that  you  want  to  get  out  first. Okay,  so  what's  standard  normal  variant?  Currently  in  JMP,  we  have  the  ability  to  just  standardize  your  data  in  FDE.  But  when  you  use  that  tool,  it's  just  taking  the  mean  of  all  of  the  functions  and  a  global  variance  and  scaling  it  that  way,  but  with  standard  normal  variant,  we're  thinking  about  the  individual  means  and  variances  of  each  of  the  functions  to  standardize  and  remove  those  effects  before  we  go  to  analysis.   Whenever  I'm  on  the  right  here,  after  performing  the  standard  normal  variant,  we  can  see,  okay,  there  was  some  overall  means  and  now  they're  all  together  and  any  excess  variance  is  taken  out  before  we  go  to  analysis. Multiplicate  of  scatter  correction  is  the  next  step,  and  it's  an  alternative  to  standard  normal  variant.   In  some  cases  you're  thinking  you're  going  to...  Whenever  you  use  it,  you  may  end  up  with  similar  results.   The  difference  here  is  the  motivation  for  using  multiplicate  of  scatter  correction.   That's  when  you  have  light  scatter,  or  you  think  you  might  have  light  scatter  because  of  the  way  that  you  collected  the  data. What  happens  is  for  every  function,  we're  going  to  fit  this  simple  linear  model  where  we've  got  a  slope  and  an  intercept,  and  we're  going  to  use  those  estimated  coefficients  to  now  standardize  our  data  that  we're  going  to  work  with.   We  subtract  off  the  intercept  and  divide  by  the  slope,  and  now  we  have  the  standardized  version.  A gain,  you  can  end  up  with  similar  results  as  standard  normal  variance. Now  the  next  preprocessing  step  I'm  going  to  cover  is  the   Savitzky–Golay filter.  When  you  have  spectral  data,  before  we  get  to  the  modeling  stage,  the  new  modeling  tools  we  have,  they're  developed  in  such  a  way  that  they're  trying  to  pick  up  all  the  important  pieces  of  the  data.   If  you  have  noise,  we  need  to  do  a  step  where  we  smooth  that  first.   That's  where  the   Savitzky–Golay filter  comes  in.  What  we're  doing  is  we're  going  to  fit  an  end- degree  polynovial  over  a  specified  bandwidth  that  we  can  choose  to  help  try  to  remove  any  noise  from  the  data.  In  FDE,  currently,  we're  going  to  go  ahead  and  select  those  best  parameters  for  you,  the  degree  and  the  width  to  try  to  minimize  the  model  error  that  we  get. One  thing  I  do  want  to  point  out  is  that  we  do  require  a  regular  grid  to  give us  this  operation,  which  will  come  up  again  in  the  future,  but  FDE  is  going  to  create  one  for  you.   We  also  have  this  reduced  grid  option  available  if  you  want  finer  control  first  before  you  rely  on  us  making  that  choice  for  you.  The  nice  thing  about  this   Savitzky–Golay filter  is  because  of  the  way  the  model  is  fit,  we  now  have  access  to  derivatives.  This  is  something  that  has  come  up  prior  to  spectral  data,  and  now  that  we  have  this,  we've  got  a  nice  way  for  you  to  access  and  work  with  modeling  these  derivative  functions. The  last  one  I  want  to  cover  is  the  baseline  correction.  What  baseline  correction  is  doing  is  it's  thinking  about  there  might  be  overall  trends  in  our  data  that  we  want  to  get  rid  of.  This  data  set  on  the  right  has  just  a  very  small  differences  like  linear  difference  in  the  functions.  What  we're  thinking  about  is,  okay,  we  don't  really  want  to  care  about  that,  we  want  to  get  rid  of  it.  What  this  tool  is  going  to  allow  you  to  do  is  select  the  baseline  model  that  you  want. I n  this  case,  it's  just  a  really  simple  linear  model,  but  you  may  have  some  where  you've  got  exponential  or  logarithmic  trends  that  you  want  to  get  rid  of.  And  so  we  have  that  available.  Then  you  can  select  your  correction  region. For  the  most  part,  you're  going  to  want  to  correct  the  entire  function,  but  it  may  be  possible  that  maybe  only  the  beginning  or  the  end  of  the  function  is  where  you  want  to  correct.   We  end  up  with  these  baseline  regions  that  are  these  blue  lines.  If  we  click  this  add  button,  it'll  give  us  a  pair  of  blue  lines.  We're  going  to  drag  these  around  to  parts  of  the  function  that  we  believe  are  real.  All  the  peaks  in  these  data  is  something  that  we  don't  really  want  to  touch.  This  is  the  part  of  the  functions  that  we  want  to  keep  and  analyze  and  is  going  to  give  the  information  that  we're  interested  in.   Also  if  you  select  this  within  region,  anything  that's  within  these  regions  is  what  will  get  corrected.   You're  either  going  to  do  one  or  the  other,  right?  You're  going  to  either  leave  it  alone  or  you're  going  to  change  only  within  that  in  those  data  sets. Finally,  you  don't  see  it  here,  but  you  can  also  add  anchor  points.  It  may  be,  depending  on  your  data,  easy  to  just  specify  a  few  points  that  you  know  this  describes  the  overall  trend.  When  you  click  add,  you'll  get  a  red  line,  and  that's  going  to  tell  you  this,  wherever  I  drag  that  line,  it's  definitely  going  to  be  included  in  the  model  before  I  correct  the  baseline.  When  you  click  okay  here,  you'll  just  end  up  with  a  new  data  set  that  has  the  trend  removed. Okay,  so  that  brings  us  to  the  modeling  stage.   What  we've  added  for  JMP  Pro  17  are  wavelet  models.  Okay,  so  what  are  wavelet  models?  They  are  basis  function  models,  not  like  anything  we  have  currently  in  JMP,  but  they  can  have  very  dramatic  features.   What  these  features  are  doing  are  helping  us  pick  up  these  sharp  peaks  or  these  large  changes  in  the  function.  We  also  have  the  simple  Haar  wavelet,  which  is  just  a  step  function.   If  it  turns  out  that  something  really  simple  is  like  the  step  function  fits  best,  we  will  give  you  that  as  well.   You  can  see  we  have  a  few  different  options  that  are  really...  If  you  think  about  bending  these  wavelets  and  stretching  them  out,  that's  how  we  are  modeling  the  data  to  really  pick  up  all  these  features  of  interest. To  just  motivate  that,  I  want  to  show  you  the  current  go- to  in  JMP,  which  is  a  B- spline  model,  which  has  a  very  difficult  time  picking  up  on  these  features  without  any  hand- tuning.   B- spline  model  is  doing  a  little  bit  better.  It  still  has  some  issues  picking  up  the  peaks,  but  it  might  in  some  ways  be  the  best.  Direct  functional  PCA,  doing  almost  as  good  as  P-sp lines,  but  not  quite.   Then  we  have  wavelets.  We're  really  picking  up  the  peaks  the  best.  In  this  particular  data  set,  it's  not  fitting  them  perfectly,  but  we  would  think  looking  at  diagnostics,  the  wavelet  model  is  definitely  the  one  we  would  want  to  go  with. Again,  we  have  these  five  different  wavelet  model  types,  and  what  we're  going  to  do  is  we'll  fit  all  these  for  you  so  that  you  don't  have  to  worry  about  picking  and  choosing.  Outside  of  the  Haar wavelets,  all  of  the  other  wavelet  types  have  a  parameter.  We  have  a  grid  that  we  are  going  to  search  over  for  you  in  addition  to  the  type. Now  it  may  be  possible  that  there  are  some  cases  where  users  have  said,  hey,  this  particular  wavelet  type  is  exactly  how  my  data  should  be  represented,  so  you  can  change  the  default  model,  but  by  default,  we're  going  to  pick  the  model  that's  going  to  optimize  this  model  selection  criteria,  the  AISC.   Really  what  you  can  think  about  here  is  there  could  be  potentially  a  lot  of  parameters  in  every  one  of  these  wavelet  models.  We're  effectively  using  a  Lasso  model  to  try  to  remove  out  any  parameters  that  really  just  aren't  fitting  the  data.  We  get  a  sparse  representation,  no  matter  the  wavelet  model.  We  saw  this  earlier  where  we  have  to  have  our  data  on  the  grid.  It's  the  same  thing  with  wavelets.  If  you  just  start  going  through  the  wavelet  models  and  your  data  are  not  on  the  grid,  we'll  create  one  for  you.  But  again,  just  wanted  to  point  out  you  can  use  that  reduce  grid  option  to  have  finer  control. Okay,  so  something  else  that  we  show  that  can  help  give  you  some  insight  into  how  these  models  work  is  this  coefficient  plot.  What  it's  telling  us  is  this  X  axis  is  the  normal  X  of  the  input  space  of  your  function,  but  the  Y  axis  is  the  resolution.  These  top  resolutions  here,  you're  thinking  about  overall  means.  As  we  get  into  these  high  resolutions,  these  are  the  things  that  are  happening  really  close  together.  A  red  line  means  it's  a  negative  coefficient.  Blue  means  it's  positive.  They're  scaled  so  that  they're  all  interpretable  against  each  other.  The  largest  lines  give  you  the  idea  where  the  largest  coefficients  are.  We  can  see  that  the  higher  frequency  items  are  really  here  at  the  end  of  the  function.  We  have  some  overall  trends,  but  just  something  to  think  about  that  these  wavelet  models  are  looking  at  different  resolutions  of  your  data. Something  else  that  we've  added  before  we  get  to  our  demo  with  Clay  is  wavelets  DOE.  In  FDE,  we  have  a  functional  DOE  that  is  working  with  functional  principal  components.  If  you  don't  know  those  are,  that's  okay.  All  you  need  to  know  is  that  with  wavelets,  we  have  coefficients  for  all  of  these  wavelet  functions.   In  this  DOE  analysis,  we're  thinking  about  modeling  the  coefficients  directly.   The  resolution  tells  you  an  idea  of  if  it's  a  high- frequency  item  or  low- frequency  item.  Then  this  number  in  the  brackets  is  telling  you  the  location.   You  can  think,  okay,  these  items  here  are  in  the  threes,  and  that's  where  some  of  that  highest  features  were  that  we  saw  in  that  coefficient  plot.  Those  have  what  we're  calling  the  highest  energy. Energy  in  this  case  is  just...  If  we  score  all  the  coefficients,  we  add  them  up,  you  can  think  of  that  being  as  the  total  energy.  So  this  energy  number  here  is  a  relative  energy  and  giving  you  an  idea  of  how  much  energy  it  is  explaining  in  the  data.   The  nice  thing  really  about  using  the  coefficient  approach  is  these  have  direct  interpretation  right  to  the  location  and  to  the  resolution.   An  alternative  that  you  can  try  and  compare  against  functional  PCA  or  functional  DOE  if  you  have  this  interpretability  of  the  coefficients.  Now  I  think  I'll  hand  it  over  to  Clay.  He's  got  a  demo  for  you  to  see  how  you  use  these  models  in  JMP  Pro. Thanks,  Ryan.  Let's  take  a  look  at  an  example  that  we  found.  Ryan  mentioned  briefly  the  olive  oil  data  set  that  we  found.  It's  a  sample  of  120  different  oils.  Most  of  them  are  olive  oils,  some  of  them  are  blends  or  vegetable  oils.   What  we  wanted  to  see  is,  can  we  use  this  high- performance  liquid  chromatography  data?  Can  we  use  that  information  to  classify  the  oil?   Can  we  look  at  the   spectraland  say  this  is  an  olive  oil  or  this  is  not  an  olive  oil? These  data  came  out  of  a  study  from  a  university  in  Spain,  and  Ryan  and  I  learned  a  lot  about  olive  oil  in  the  process.  For  example,  olive  oil  is  actually  a  fruit  juice,  which  I  did  not  know.  Let's  take  a  look  at  our  data.  Each  row  in  our  data  set  is  a  different  olive  oil  or  other  oil,  and  the  rows  represent  the  spectra.   We'll  use  the  Functional  Data  Explorer,  and  it'll  take  just  a  second  to  fit  the  wavelet  models.  Y ou'll  see  here,  we  fit  our  different  wavelets.  As  Ryan  mentioned  earlier,  we  try  a  handful  of  different  wavelets  and  we  give  you  the  best  one. In  this  case,  the  Simlet  20  was  the  best  wavelet  in  terms  of  how  well  it  fits  our  data.  We  can  see  here  where  we've  overlaid  these  fitted  wavelets  with  the  data  that  this  wavelet  model  fits  really  well.  L et's  say  you  had  a  preferred  wavelet  function  that  you  wanted  to  use  instead,  you  can  always  click  around  in  this  report  and  it'll  update  which  wavelet  we're  using.   If  we  wanted  the  Simlet  10  instead,  all  you  have  to  do  is  click  on  this  row  in  the  table,  and  we'll  switch  to  the  Simlet  10  instead.   Let's  go  back  to  the  20  and  we'll  take  a  look  at  our  coefficients. In  the  wavelet  report,  we  have  this  table  of  wavelet  coefficients.  As  Ryan  was  saying  earlier,  these  give  us  information  about  where  the  peaks  are  in  the  data.  The  further  wavelet,  we  think  about  that  like  an  intercept,  so  that's  like  an  overall  mean.  Then  every  one  of  these  wavelet  coefficients  with  a  resolution  is...  It  lines  up  with  a  different  part  of  the  function.  This  resolution  one  is  the  lowest  frequency  resolution,  and  it  goes  all  the  way  up  to  resolution  12.  These  are  much  higher  frequency  resolutions. As  you  can  see,  we've  zeroed  a  lot  of  these  out.  In  fact,  this  whole  block  of  wavelet  coefficients  is  zeroed  out.  That  just  goes  to  show  that  we're  smoothing.  If  we  used  all  of  these  resolutions,  it  would  recreate  the  function  perfectly,  but  we  zero  them  out  and  that  gives  us  a  much  smoother  function.  We  fit  the  wavelet  model  to  our  spectral  and  we  think  we  have  a  good  model.  Let's  take  these  coefficients  and  we're  going  to  use  these  to  predict  whether  or  not  an  oil  is  olive  oil.  I've  got  that  in  a  different  data  set. Now  I've  imported  all  of  those  wavelet  coefficients  into  a  new  data  set  and  I've  combined  it  with  what  type  of  oil  it  is.  It's  either  olive  oil  or  it's  other,  and  we've  got  all  of  these  wavelet  coefficients  that  we're  going  to  use  to  predict  that.  The  way  we  do  that  is  using  the  generalized  regression  platform.  We're  going  to  model  the  type  using  all  of  our  different  wavelet  coefficients.  Since  it's  a  binary  response,  we  choose  the  binomial  distribution,  and  we're  interested  in  modeling  the  probability  that  an  oil  is  olive  oil.  Because  we  don't  want  to  use  all  of  those  wavelet  coefficients,  we're  going  to  use  the  Lasso  to  do  variable  selection. Now  we've  used  the  Lasso  and  we've  got  a  model  with  just  14  parameters.  Of  all  of  those  wavelet  coefficients  that  we  considered  for  our  model,  we  only  really  needed  14  of  them.  We  can  take  a  look,  we've  zeroed  out  a  lot  of  those  wavelet  coefficients.  Let's  take  a  look  at  the  confusion  matrix.  Using  our  model,  we  actually  perfectly  predicted  whether  or  not  one  of  these  oils  is  an  olive  oil  or  it's  something  else.  That's  pretty  good.  We  took  our  wavelet  coefficients  and  we  selected  the  13  most  important  because  one  of  those  14  parameters  is  the  intercept.  We  only  needed  13  of  those  wavelet  coefficients  to  predict  which  oil  we  had. In  fact,  we  can  take  a  look  at  where  those  wavelet  coefficients  fall  on  our  function.  What  we  have  here  is  we  have  the  average  olive  oil   spectralin  blue  and  the  other  oils  in  red,  and  each  of  those  dashed  lines  lines  up  with  the  coefficients  that  we  used.  Some  of  these  really  make  a  lot  of  sense.  For  example,  here's  one  of  the  wavelet  coefficients  that  is  important,  and  you  can  see  that  there's  a  big  difference  in  the  olive  oil  trace  and  the  other  oil. Likewise,  over  here,  we  can  see  that  there's  a  big  difference  between  the  two  there.   You  can  look  through  and  see  that  a  lot  of  these  locations  really  do  make  sense.   It  makes  sense  that  we  can  use  that  part  of  the  curve  to  discriminate  between  the  different  types  of  oil.   We  just  thought  that  was  a  really  cool  example  of  using  wavelets  to  predict  something  else.   Not  that  olive  oil  isn't  fun,  but  Ryan  and  I  both  have  young  kids  and  we're  both  big  fans  of  the  world. We  also  found  a  new  world  data  set  where  someone  had  recorded  wait  times  for  one  of  the  popular  rides  at  Disney  World.  It's  called  the   Seven  Dwarfs  Mind  Train .  It's  a  roller  coaster  at  Disney  World.  Someone  had  recorded  wait  times  throughout  the  day  for  several  years  worth  of  data.  I  also  mentioned  these  are  a  subset  of  the  data.  One  of  the  problems  is  the  parks  are  open  for  different  amounts  of  time  each  day,  and  some  of  the  observations  are  missing.  We  subset  it  down  and  got  it  to  a  more  manageable  data  set.  I  would  say  that  this  example  is  inspired  by  real  data,  but  it's  not  exactly  real  data  once  we  massaged  it  a  little  bit. If  we  graph  our  data,  we  can  see...  The  horizontal  axis  here  is  the  time  of  day,  and  the  vertical  axis  is  the  wait  time.   In  the  middle  of  the  day,  the  wait  time  for  this  ride  tends  to  be  the  highest.   We  can  look  around  at  different  days  of  the  week.  Sunday,  Monday  is  a  little  bit  more  busy.  Tuesday  is  a  little  less  busy,  Saturday  is  the  most  busy.  We  can  do  the  same  thing  looking  at  the  years.   This  is  2015,  2016,  2017.   It  looks  like  every  year  it's  getting  longer  and  longer  wait  times  until  something  happens  in  2021.  I  think  we  all  know  why  wait  times  at  an  amusement  park  would  be  less  in  2021.  We've  got  an  idea  that  you  can  use  this  information,  like  day  of  the  week,  year  and  month,  to  predict  what  that  wait  time  curve  will  look  like.  Let's  see  how  we  do  that  in  FDE. I'll  just  run  my  script  here.  What  we've  done  is  we'll  come  to  the  menu  and  ask  to  fit  our  wavelet  model.  It  takes  just  a  second,  but  really  not  that  long  to  fit  several  years  worth  of  data.  This  time  we're  not  using  the  Simlet  anymore.  We're  using  this  Daubechies  wavelet  function.  What  Ryan  mentioned  earlier  was  the  wavelet  DOE  feature.  Now,  what  I  didn't  show  was  that  we've  also  loaded  time  of  the  day  of  the  week  and  the  year  and  the  month  in  the  FDE.  We're  going  to  use  those  variables  to  predict  the  wavelet  coefficients.  Let's  go  to  the  red  triangle  menu  and  we'll  ask  for  wavelet  DOE. Now,  what  is  happening  behind  the  scenes  is  we're  using  day  of  the  week,  month,  and  year  to  predict  those  wavelet  coefficients,  and  then  we  put  it  all  back  together  so  that  we  can  see  how  the  predicted  wait  time  changes  as  a  function  of  those  supplementary  variables.  Now,  of  course,  we  summarize  it  in  a  nice  profiler.  We  can  really  quickly  see  the  effects  of  month.  If  we're  just  going  by  the  average  wait  time  of  this  particular  ride,  we  can  see  that  September  tends  to  have  the  lowest  wait  time.  We  can  really  quickly  see  the  COVID  effect.  The  wait  times  were  here  in  2019,  and  then  when  we  went  forward  to  2020,  they  really  dropped.  You  can  look  around  to  see  which  day  of  the  week  tends  to  be  less  busy,  which  months  are less  busy. I t's  really  a  cool  way  to  look  at  how  these  wait  times  change  as  a  function  of  different  factors.  Thank  you  for  watching.  That's  all  we  have for  today  and  we  hope  you'll  give  the  wavelet  features  in  FDE  a  try.  Thanks.
Despite the development of new network and media technologies, the intense use of bandwidth and data storage could be a limiting factor in industrial applications. When recording sensor signals from multiple machines, a question must always be asked: which meaningful information could be extracted from the data and what should be saved for later analysis? The answer to this question is a method proposed and implemented by the Production Data Engineering Team at Bundesdruckerei GmbH in Berlin, a wholly-owned subsidiary of the German federal government that produces security documents and digital solutions. This method focuses on pre-processing data directly in the machine controller, strategically reducing the amount of data to send only the meaningful information to the network over OPC/UA, stored in the database, and further analyzed using JMP. A case study is presented, describing the implementation of this method in torque and position data from a servomotor used in a cutting process. The JMP Scripting Language is used to automatically generate reports of the cutting tool wearing, which is also analyzed in combination with the quality data of the product. Those reports allow the Production Engineers to understand the machines better and strategically plan tool changes.     Hi,  I'm   Günes Pekmezci  and  my  colleague,  Luis  Furtado.  We  are  working  at   Bundesdruckerei,  and  we  are  both  engineers  in  production  department  for  the  data  team.  Today,  we  would  like  to  present  you  a  method  to  strategically  process  data  from  industrial  processes  before  analysis  and  storage. I  would  like  to,  first  of  all,  tell  a  little  bit  more  about  our  company.   Bundesdruckerei  is  a  government- owned  company  that  produces  security  documents  and  digital  solutions.  We  are  getting  bigger  and  bigger  every  day.  Right  now,  we  have  3,500  employees.  We  continue  to  grow.  These  figures  are  from  2021. In  that  year,  we  also  had  a  sales  margin  of  €774  million.  We  have  over  4,200  patents.  Most  profits  we  are  earning  is  coming  from  German  ID  systems,  which  I  will  talk  about  it  a  little  bit  more  in  further  slides.  Then  we  have  also  secure  digitization  solutions  as  a  bigger  profit  bringer  for  us. If  we  look  at  the  target  markets  and  our  customers,  we  will  see,  like  I  said,  the  official  ID  documents  first.  This  means  that  we  are  physically  and  digitally  producing  official  identity  documents  like  ID  documents,  passports,  resident  permits,  and  this  is  our  biggest  market. Then  we  also  have  some  security  documents,  which  means  that  we  are  producing  bank  notes,  postage  stamps,  text  stamps,  and  pertinent  security  features  for  the  government.  On  top  of  that,  we  have  a  growing  department  for  eG overnment.  Here  we  are  creating  solutions  for  the  authorities,  mostly  German  state  authorities,  to  digitalize  their  public  administration  systems. We  also  have  high  security  solutions.  In  this  department,  we  are  creating  higher  security  required  solutions  for  the  security  authorities  and  organizations.  We  are  also  having  a  target  market  in  the  health  industry.  We  are  creating  products  here  and  also  systems  for  secure  and  trusted  digitalized  health  systems. Other  than  that,  we  also  are  active  in  the  finance  field.  Here  we  are  creating  products  and  systems  to  control  and  secure  financial  transactions,  both  in  public  and  also  enterprise  sector,  which  also  could  be  taxes,  banks ,  insurance,  et cetera. If  we  come  to  our  use  cases,  what  we  want  to  share  with  you  today,  we  are  going  to  tell  you  about  a  use  case  that  we  decided  to  implement  for  us,  for  predictive  maintenance.  Like  every  other  company,  our  aim  was  to  create  some  use  cases  for  new  digital  area.  We  thought  about  what  could  be  analyzing  for  big  data,  predictive  maintenance,  and  things  like  that. We  decided  also  starting  with  our  biggest  document  that  the  German  passport . This  document  is  very,  very  complex,  and  it  has  a  lifetime  for  10  years.  We  have  a  high  production  rate  also  here,  and  we  decided  to  create  a  predictive  maintenance  use  case  for  a  process  in  this  document. Our  process  is  punching  process.  It  was  a  good  process  for  us  because  we  have  a  good  understanding  in  this  process  and  also  which  is  very,  very  important  in  industrial  of  things  that  we  had  the  access  to  the  data  that  we  could  analyze  to  create  our  information. Our  objective  for  this  use  case  was  to  create  a  better  product  quality  by  making  a  predictive  maintenance  for  our  tool  wear  state.  Instead  of  having  the  tool  worn  out   we  react  to  it.  We  decided  to  look  at  the  data  and  create  an  information  that  will  allow  us  to  plan  our  tool  change  time. We  can  also  minimize  our  downtime,  minimize  our  scrape  rates.  We  could  also  use  this  use  case  in  different  machines,  use  this  as  a   long-term  behavior  of  the  process.  It  was  a  really  good  use  case  for  us  to  start  with. I  will  give  the  rest  to  Luis  to  explain  you  further  how  we  go  into  this  use  case  and  what  we  did  exactly,  what  were  our  challenges,  and  how  we  find  solutions  for  that. Thank  you,  guys.  I'm  going  to  present  a  bit  more  about  our  product  and  process.  In  the  case  of  product  that  we  are  analyzing  this  study  is  the  passport. The  passport,  when  you  think  about  this,  is  a  book.  It's  like  a  sandwich  full  of  pages,  and  those  pages  has  also  a  lot  of  security  features  like  the  picture  that  is  printed,  the  data  that  is  lasered.  There's  also  the  chip,  the  antenna  from  the  chip.  There's  also  h olography  layers.  There's  several  features  for  security  that  is  inside  of  the  German  passport. When  you  make  the  sandwich,  there's  a  lot  of  machines  also  to   bring  all  those  features  to  the  product.  When  you  make  the  sandwich,  you  need  to  cut  it  in  the  right  size  according  to  the  norm.  When  you  cut  it,  we  separate  the  finish  book  and  also  the  borders  that  we  don't  need  anymore. The  point  is  this  cutting  process,  we  use  a  punching  machine .  This  tool  that  is  installed  at  the  end  of  this  punching  machine,  also  wears  with  the  time,  and  the  quality  of  the  cut  also  starts  to  be  not  very  good  at  the  end  as  it  was  in  the  beginning.  What  we  are  trying  here  in  this  project  is  how  to  make  the  assurance,  what's  the  perfect  time  to  change  the  tool  and  that  with  that  term. Here's  a  picture  of  the  end  product,  the passport .  Here  the  borders  that  were  cut.  I'm  going  to  present  a  bit  more  the  tales  of  a  sketch  of  the  machine,  how  that  works,  and  what  was  the  original  idea. But  first,  we  have  our  original  architecture  of  implementation  of  the  data.  We  have  a  machine  with  several  sensors,  sensor  number  1,  2,  up  to  any  sensors  that  we  need  to  measure.  We  bring  all  the  sensors  to  the  machine  PLC,  that  is  the  controller  of  the  machine,  and  then  you  just  mirror  this  data  to  the  master  computer,  and  you  mirror  this  data  again  to  the  database. That  was  the  first  original  implementation  that  we  had.  The  database  will  have  a  lot  of  data,  and  then  it  starts  to  make  the  analysis  of  the  machine  and  try  to  understand  what  is  happening  in  the  machine, and  in  this  case,  what  is  happening  in  the  punching  tool  that  is  cutting  the  passport . When  you  think  about  the  sketch  of  this  machine,  so  we  have  this   servomotor  that  turns  this  wheel .  With  the  mechanical  linkage  here,  we  can  move  up  and  down  the  punching  tool .  At  the  end,  we  have  the  tool,  the  cutting  tool  that  has  exactly  the  end  shape  that  we  need. This  tool  here,  with  the  time,  we  see  it wearing.  It's  not  that  sharp  anymore,  and  then  we  start  to  have  a  not  good  quality  in  the  product  that  we  are  producing.  Then  you  need  to  change  the  tool  to  make  it  sharp  again. Good.  How  you  can  be  sure  that  this  tool  is  good  to  cut?  We  measured  the  position  of  the   servomotor,  and  we  measure  also  the  torque  of  the   servomotor,  and  we  bring  all  the  data  from  the  position  and  the  torque  to  the  controller,  then  you  get,  as  I  presented  in  the  previous  slide,  we  mirror  the  data  to  the  master  computer,  and  then  you  mirror  the  data  to  the  database. In  this  industrial  controller,  it's  not  continuous.  The  curve  is  not  continuous  like  here,  but  it's  discrete.  In  the  end,  you  need  to  think  about  a  measurement  of  every  CPU  cycle  or  the  clock  tick  of  the  CPU.  In  this  case,  we  get  all  this  data  and  it  transferred  to  the  master  computer.  Then  we  make  the  analysis  from  the  database. But  the  point  is  we  realized  that  using  OPC  UA,  not  all  100 %  of  the  data  comes.  This  is  a  scenario  that  everything  is  fine.  We  have  all  the  points  inside  of  the  server,  inside  of  the  database,  but  sometimes  we  have  missing  areas.  We  have  like a  lacon  that  data  is  not  coming.  We  realized  that  we  have  only  95 %  of  the  data.  5 %  of  this  data  is  lost  when  you  have  a  CPU  cycle  of  100 hours. Well,  this  loss  could  be  in  the  point  that  we  are  not  measuring  but  could  be  exactly  the  point  when  you  have  the  peak .  When  you  miss  data  like  here  and  you  miss  data  like  here,  we  compromise  our  measurement  of  the  tool. Even  with  only  that,  you  have  a  data  loss  of  5 %,  and  then  you  have  not  100 %  of  the  data,  you  have  95 %  of  the  data  coming  to  the  storage,  but  95 %  of  the  data  coming  from  storage  for  all  the  sensors  that  we  have  in  a  machine,  for  all  the  machines  that  you  have  in  the  production  process,  it's  a  lot  of  data.  Then  you  start  to  realize  that  after  a  year,  we  have  a  lot  of  database  storage  amount,  and  this  is  something  that  you  want  to  reduce. With  this  original  implementation,  we  still  have  that  missing  data,  normally  missing  data  in  the  points  that  we  need  to  measure.  Then  you  had  open  questions  about  this  implementation.  The  first  question  is,  is  that  possible  to  measure  this  tool  in  a  reliable  way  using  the  motor  torque?  The  other  one  is  how  to  reduce  the  amount  of  data  that  you're  sending  to  the  database? Good.  Then  the  first  idea  that  we  had,  we  decided,  "Okay,  we  won't  check  the  data  from  the  database.  We're  going  to  collect  the  data  directly  on  the  machine  with  a  different  method  that  we  won't  lose  the  data.   100 %  of  the  data  come  to  the  computer  because  you're  measuring  exactly  in  the  machine  controller. Let's  do  this  experiment  a  lot  of  times  for  different  sets  of  the  machine.  Let's  see  if  the  curve  has  the  same  form  and  if  this  curve  changes  a  bit  in  amplitude  when  you  change  the  scenario  in  the  machine.  At  the  end,  you had  four  scenarios,  and  you're  doing  this  extensively  this  test  in  the  machine,  and  you realize  this  is  the  result  of  this  experiment. We  tried  old  and  worn  tool,  so  the  tool  was  not  that  sharp  anymore. We  had  a  passport  with  32  pages.  We  have  two  products.  It's  a  passport  with  32  pages  and  the  passport  with  48  pages. The  client  can  order  quarterly  to,  "Okay,  if  you're  going  to  travel  too  much,  then  order  48  pages."  We  tried  with  old  and  worn  tool,  32  pages.  We  tried  with  old  and  worn  tool  with  48  pages.  Then  we  changed  the  tool  for  a  new  one,  and  we  repeat  this  experiment  with  the  new  and  sharp  tool  for  32  and  new  and  sharp  tool  for  48- page  product. This  is  the  result  of  the  curves.  We  realized  that  all  the  curves  has  the  same  shape,  and  this  is  a  superposition  of  a  lot  of  curves  that  we  tried,  and  the  variation  is  very  small.  But  we  can  also  see  that  we  can  see  very  clearly  the  peak  value  for  the  old  two 48  pages  is  a  bit  far  from  the  old  two 32- page.  Also,  new  tool ,  the  peak  value  is  shorter  because  you  have  less  force  to  cut. This  is  about  the  torque  in  the  motor.  When  you  have  less  force  to  cut  because  it  was  sharp,  then  you  have  an  even  lower  amount  of  torque. Good.  With  this,  we  got  some  information.  All  the  scenarios  present  the  same  shape  of  curve.  The  curve  is  in  the  same  shape,  and  we  realize  that  "Okay,  then  I  don't  need  to  record  all  the  curve.  I  can  also  only  record  the  position  of  the  peak."  This  is  what  is  interesting  for  us  for  this  new  implementation  that  we  are  proposing  here. The  peak  value  could  be  used  for  two  different  things.  The  peak  value  could  be  used  for  the  tool wear  monitoring.  That  is  the  original  idea  that  we  wanted.  Another  thing  that  for  us  is  also  important  is  product  classification.  You  can  also  check  the  quality  of the  product  if  you  are  producing  a  32  or  48  page  is  a  safe  way  to  say  the  product  has  32  or  48  page. Good.  Then  what  is  the  difference?  The  difference  is  the  implementation  directly  in  the  controller  of  the  machine.  The  whole  sketch  of  this  machine  is  the  same.  Then  you  get  the  data  inside  of  the  controller  in  the  same  way. But  what  we  made  here  different.  We  preprocessed  the  data,  we  filter,  we  made  a  window  here .  In  this  window,  we  search  for  the  peak.  When  we  find  the  peak,  we  get  the  peak  of  the  torque  and  in  which  position  of  the  motor  this  peak  happened.  Then  we  just  transfer  one  set  of  data,  not  the  whole  curve  of  the  machine. How  that  works  in  the  end?  The  original  implementation  that  you  saw,  per  sensor  in  a  machine,  we  had  every  year,  11.7  gigabytes  per  sensor.  That  was  quite  a  lot.  When  you  think  that  we  have  several  hundred,  almost  thousands  of  sensors  in  a  machine,  and  we  have  more  machines  in  our  production  area,  this  is  something  very  critical  for  us. With  this  proposed  implementation,  we  have  everything  very  similar.  The  sensors  go  to  the  machine.  But  inside  of  the  machine,  we  do  a  preprocessing.  We  filter  just  the  meaningful  information  that  you  need,  and  then  it  transfer  less  data  to  the  master  computer,  and  then  it  transfer  less  data  to  the  database .  It  made  our  analysis  just  with  this  less  amount  of  data  but  the  meaningful  one. In  this  case,  it  reduced  more  than  a  hundred  thousand...  No,  a  thousand  times  less.  Now,  it's  8  megabytes  per  year  per  sensor.  This  is  a  good  implementation. This  was  implemented  in  JMP  and  JMP  Live.  I'm  going  to  give  the  word  back  to   Günes ,  so  she  could  keep  explaining  the  next  steps,  what  we  did  afterwards. Thank  you,  Luis.  How  we  generated  information  in  JMP  with  this  analysis  is  like  everyone  else.  We  started  analyzing  our  data  in  the  JMP  first,  and  it  was  easy  to  analyze  also  our  huge  data  sets,  like  20  million  data  sets  in  JMP.  But  then  when  we  decided  to  get  just  the  peak  values,  we  were  able  to  create  our  reports  also  very  lighter  and  very  informational.  Then  we  decided,  okay,  when  it's  so  good,  then  we  decided  to  send  our  results  to  the  JMP  Live. Right  now  in  JMP  Live,  we  have  the  following  reports,  and  it  is  generated  automatically  every  week.  There  is  a  meeting  every  week  for  the  machine  colleagues,  and  they  look  at  this  report  to  decide  when  is  the  time  to  change  the  tool. Here  you  can  see  different  machines.  We  have  six  machines  of  this  kind.  Then  you  can  see  our  peak  value  for  the  torque,  and  then  you  see  the  development  through  the  weeks. Here  you  can  also  see  when  we  have  a  tool  change  in  machine  1  and  2,  you  c ould  automatically  see  next  week  the  values  of  the  peak  starting  again  from  a  lower  point  of  view,  which  Luis  already  explained  why  is  it  happening. This  is  our  JMP  Live  report  that  we  create  our  planned  change  time  for  the  tool.  If  we  go  to  the  method  that  we  are  proposing...  I  want  to  tell  you  again  how  we  started  going  toward  this use  case.  We  started,  like  every  other  use  case,  first  of  all,  defining  our  project  requirements.  Then  we  took  all  the  data,  like  many  of  the  other  industries  also  trying  to  do  in  industrial  of  things. We  said,  "Okay,  we  need  all  the  data."  We  tried  to  take  all  the  signals  from  the  machine.  We  analyzed  it  somewhere  different.  Then  we  looked  at  the  data  and  we  said,  "Okay,  is  this  good  enough  for  our  quality  of  the  information?  Does  it  meet  our  project  requirements?" It  wasn't  meeting  our  project  requirements  because  of  this  missing  data.  With  the  missing  data,  we  weren't  able  to  see  the  right  data  to  have  the  relevant  information.  Then  we  said,  "Okay,  let's  go  to  the  machine  and  understand  the  process  a  little  bit  better.  Why  is  this  happening?  What  can  we  do  about  it?" Then  we  started  doing  these  experiments  that  Luis  explained  on  the  machine  directly,  and  we  collected  the  data  locally.  Then  we  come  back  to  our  analyze  process,  and  then  we  said,  "Yeah,  now  the  data  is  good,  the  quality  is  good." Now,  we  also  ask  the  question,  "Okay,  is  this  all  the  relevant  data?  Is  there  a  way  to  reduce  the  storage  without  reducing  the  data  quality?"  Then  we  decided  to  implement  this  preprocessing  algorithm  directly  at  the  machine  to  reduce  the  size  of  the  data. What  we  are  suggesting  for  you,  too,  is  when  you  start  a  use  case  for  the  production  processes,  after  defining  your  project  requirements,  it  is  better  directly  go  to  the  machine  and  start  doing  experiments  there,  and  then  collect  the  data  locally.  When  you  first  do  this  step,  you  will  spare  yourself  a  lot  of  time  to  create  the  architecture  to  be  able  to  get  all  these  data  somewhere  else. Also,  you  will  spare  yourself  lots  of  money  because  you  maybe  don't  need  that  much  space  in  your  servers  and  et  cetera.  If  you  start  directly  here,  you  can  go  all  the  other  steps,  and  then  you  will  be  able  to  get  a  result,  a  use  case  that  works  the  best,  and  you  will  have  less  time  for  that. If  we  summarize  our  lessons  learned  and  benefits  for  the  use  case,  we  can  definitely  say  an   application-oriented  approach  is  very  good  implementing  use  cases  for  production.  You  really  need  a  deep  process  and  machine  understanding  for  the  industrial  of  things  use  cases. It  will  definitely  will  be  better  for  you  if  you  create  a  team  of  engineers,  people  who  are  working  at  the  machines,  and  also  the  data  people  together,  because  you  need  a  really  deep  understanding  of  what's  happening,  what  you  exactly  need  to  be  able  to  get  a  benefit  out  of  it. Our  personal  benefits  for  this  specific  use  case  was  to  create  a  method  that  we  can  use  for  other  machines  and  processes,  which  we  are  also  sharing  for  you  today,  and  hoping  that  you  can  also  use  it  for  your  processes.  Then  also  this  method  that  we  created  for  us  was  able  to  use  in  other  machines  and  other  punching  processes  that  other  machines  have. Also,  we  had  a  really  good  knowledge  at  the  end  of  this  use  case  about  the  tool  wear  state  for  us.  We  could  also  increase  our  downtime  because  instead  of  waiting  for  a  tool  to  be  worn  out,  we  were  able  to  plan  our  downtime.  That  means  automatically  that  we  were  also  decreasing  our  costs. On  top  of  it,  we  were  also  able  to  use  this  method  and  this  analysis  for  a   long-time  behavior  of  our  tools,  which  also  a  great  thing  because  at  the  end,  we  were  able  to  have  a  predictive  maintenance  use  case.  A s  a  cherry  on  the  top,  we  were  able  to  reduce  our  data  storage  needs  significantly. In today's  world  where  we  talk  about  the  energy,  it's  very  important  to  have  just  the  relevant  data  in  our  servers  because  it's  more  sustainable,  it's  more  energy  efficient.  We  were  really  happy  with  our  results,  and  we  are  hoping  also  you  will  get  some  inspiration  out  of  our  method,  and  maybe  you'll  be  able  to  use  it  for  yourselves. Thank  you  for  your  attention,  and  this  was  our  method.  Have  a  nice  day.
Solvay was the first company to succeed in industrializing the production of soda ash in 1863, a product that only requires salt and limestone as raw materials. However, behind this simple idea is an intricate continuous process able to handle reactant solids, liquids, and gases. Similarly, using data to bring value needs a deep understanding of any manufacturing process behind it. This presentation will showcase how Soda Ash at Solvay is scaling up the use of data-driven techniques in the chemical industry. To succeed, we trained our subject domain experts (process engineers) to use JMP and its predictive analytics capabilities to accelerate daily tasks such as monitoring and root-cause analysis. We will discuss our open-source JMP add-in to connect to industrial historians (Aspentech IP.21 and OSIsoft PI), the current training program, and the lessons learned in this digital transformation journey.     Hello  all.  I'm  David  Paige,  I'm  the  Digital  Champion  of  the  Global  Business  Unit  of   Soda Ash & Derivatives  at  Solvay .  Together  with  me,  we  have  Carlos  Perez,  who  is  Industrial  Data  Scientist  at  Solvay  at  corporate  level,  who  will  be  co- presenter  of  this  presentation. This  presentation  is  about  the  scaling  up  of  the  use  of  machine  learning  techniques  in  the  chemical  process  and  concretely  at  Solvay.  Here  in  this  slide,  we  have  the  agenda  of  this  presentation.  First  of  all,  I  will  share  with  you  a  brief  introduction  of  our  multinational  company,  Solvay. T hen  some  words  also  about  our  general  business  unit,   Soda Ash & Derivatives. Then  here  in  the  point  number  three,  we  will  enter  in  discussion  about  how  machine  learning  techniques  helps  improve  our  production  process.  Then  we  will  go  a  little  bit  deeper  about  the  usage  of  JMP  in  our  GBU.  I  will  explain  to  you  the  awareness  sessions  and  the  training  that  we  provided  to  our  population  of  engineers.  A lso,  we  will  see  a  couple  of  practical  use  cases. Then  my  colleague  Carlos  will  share  with  you  one  add- in  that  they  developed  internally  at  corporate  level  at  Solvay,  which  is  very  useful  for  us,  for  the  final  users  to  connect  to  our  main  source  of  data,  which  is  the  MES,   manufacturing execution systems.  Finally,  I  will  share  with  you  the  main  challenges  that  we  faced  during  this  journey  and  also  the  lessons  learned. Brief  introduction  of  our  group,  Solvay.  We  are  a  science  company  founded  in  1863  whose  technologies  bring  benefits  to  many  aspects  of  daily  life.  Our  innovative  solutions  contribute  to  a  safer,  cleaner,  and  more  sustainable  product  found in homes,  food,  and  consumer  goods,  planes,  cars,  batteries,  smart  devices,  health  care  applications,  and  water  and  air  purification  systems. Very  important,  our  group  seeks  to  create  sustainable  shared  value  for  all.  Notably,  through  its  Solvay  One  Planet  program,  we  have  three  pillars:  protecting  the  climate,  preserving  natural  resources,  and  fostering  better  life. Here  at  the  bottom  of  the  slides,  you  can  see  some  key  figures  of  the  group  in  2021.  As  you  can  see,  we  have  a  little  bit  more  of  21,000  employees  all  over  the  world.  We  have  presence  in  63  countries  and  we  have  98  industrial  manufacturing  sites. Now,  if  we  jump  to  our  general  business  unit  of   Soda Ash & Derivatives,  which  is  the  business  unit  that  I  work  for,  coordinating  the  implementation  of  what  we  call  the  digital  transformation  initiatives.  As  you  can  see,  we  have  11  production  sites  distributed  around  the  world.  We  have  six  production  sites  here  in  Europe,  but  also  we  have  two  production  sites  in  North  America  and  one  production  site  in  Asia  out  of  other  locations  around  the  world  like  warehousing  and  buildings. Also,  we  have  three  R&I  centers  located  in  Brussels,  in   Dombasle,  which  is  our  manufacturing  site  in  France,  and  in  Torrelavega  in  the  north  of  Spain.  Globally,  we  are  3,200  employees  all  over  the  world. Our  products.  These  are  the  two  main  products  that  we  produce  in  our  general  business  unit.  We  produce   soda ash  and  we  produce  sodium  bicarbonate.  The   soda ash  is  mainly  used  for  the  glass  manufacturing  production,  different  types  of  glass  for  building,  but  also  for  photovoltaics  panels  or  for  containers,  as  you  can  see  here,  the  example  of  the  bottles. Also, soda ash  is  used  to  produce  detergents  and  very  important  and  very  new  with  the making of the   [inaudible 00:04:44]   of  electrification  that  we  are  seeing  around  the  world.   Soda  ash  is  also  used  for  the  production  of  helium  for  the  batteries. Bicarbonate.  Our  sodium  bicarbonate,  it's  used  for  different  markets.  First  one,  for  the  exhaust  gas  industry  cleaning,  which  is  our  SOLVAir  market.  A lso  a  very  new  application  for  the  same  purpose,  gas  cleaning,  but  for  the  ships.  The  sodium  bicarbonate,  it's  also  used  for  the  pharmaceutical  industry  and  for  the  food  industry. In  this  slide,  I  would  like  just  to  show  you  about  the  complexity  of  our  production  processes  to  manufacture  our  final  products.  As  we  said,  our  final  products  can  be   soda ash,  light  or  dense,  and  also  the  refined  bicarbonate. To  produce  them,  we  need  to  consume  different  raw  materials  such  as  the  limestone,  the  brine,  and  also  we  use  the   coke and anthracite in  our  lime  fields.  As  you  can  see,  the  the  production  process  is  quite  complex  because  we  are  using  different  assets  like  the  absorbers,  like  distillation  in  the  distillation  sector,  dissolvers ,  precipitation  columns,  filters,  compressors. We  have  a  long  variety  of  assets  used  in  the  manufacturing  process  and  very  complex  chemical  reactions,  mixing  gasses,  liquids,  solids.  We  need  to  take  into  account  thousands  of  parameters  in  terms  of  temperature,  pressure,  flows,  and  so  on.  It's  very  important  the  use  of  advanced  analytics  and  machine  learning  techniques  in  order  to  improve  this  production  process. Here  we  are  entering  in  the  chapter  for  how  machine  learning  can  help  to  improve  our  production  process.  First  of  all,  to  share  with  you  our  strategy.  Clearly,  in  the   soda ash  and   bicarbonate,  our  strategy  is  to  be  competitive  and  keep  our  leadership  worldwide  position  in  this  global  commodity  market,  which  is  soda  ash,  but  also  in  the  premium  market  as  the  buyer. What  is  our  objective  to  reach  this  ambition  in  our  strategy?  Our  objective  is  to  reduce  as  much  as  possible  the  variable  and  fixed  cost  in  our  manufacturing  sites  while  ensuring  the  overall  equipment  efficiencies, so  the  OEE,  and  the  quality  of  our  products. Let  me  put  some  examples  of  how  we  can  impact  in  our  variable  cost  and  fixed  cost.  In  the  variable  cost  side,  clearly,  one  of  the  levers  that  we  can  improve  is  the  yield.  If  we  are  able  to  increase  our  sodium  precipitation  yield  in  our  carbonation  sector,  clearly  what  we  are  going  to  do  is  to  reduce  the  need  of  raw  material  and  energy  in  our  production  process  to  produce  the  same  quantity  of  soda ash. The  same  for  the  topic  related  with  energy  efficiency.  In  the  previous  slide,  I  showed  you  the  complexity  of  the  production  process  and  the  energy  that  we  need  to  use  in  the  different  sector  such  as  the  distillation  sector,  calcination  sector,  or  lime  kilns.  If  we  are  able  to  improve  the  main  parameters  on  these  sectors,  we  will  be  able  to  reduce  the  specific  consumption  of  energy  in  our  production  process. In  terms  of  fixed  costs,  one  of  our  main  fixed  cost  in  our  production  process  is  the  maintenance  cost.   We  have  unplanned  events,  unplanned  mechanical  breakdowns  in  our  industrial  assets,  and  also  we  perform  regular  maintenance  activity  and  cleaning  of  our  assets.  If  with  these  machine  learning  techniques  we  are  able  to  anticipate,  to  predict  these  unplanned  events,  we  could  also  potentially  reduce  our  fixed  costs. Our  idea,  our  ambition  is  to  combine  the  deep  expertise  that  our  process  and  control  engineers  have  on  the  domain,  the  soda ash  production  process,  together  with  the  IT  and  computer  science  skills  and  math  and  statistics  skills.  This  is  our  ambition. Traditional  method.  Traditionally,  what  our  engineers  is  doing  is  to  use  the  inputs  of  our  process  using  a  theoretical  model.  For  example,  the  thermodynamics  or  the  chemistry  in  order  to  understand  the  process  and  to  get  an  output.  This  is  the  traditional  method.  But  now  with  the  machine  learning  techniques,  what  we  can  benefit  for  is  about  the  historical  data. In  our  site,  as  I  explained  before  in  this  very  complex  environment  of  the  production  of  soda  ash,  we  store  thousands  of  different  sensors  data  in  our  MES  systems,  in  the   manufacturing execution systems.  Data  from  temperature,  flows,  pressure  in  different  parts  of  the  process.  We  have  this  historical  of  data,  so  we  can  provide  with  the  machine  learning  algorithms,  inputs  and  outputs  of  our  process.  Creating  machine  learning,  big  data  models  that  could  help  us  to  improve  our  process  and  to  understand  better  our  process  for  the  future  inputs. Now,  here  in  this  slide,  just  to  share  some  publications,  also  from  Dow,  another  important  multinational  chemical  company,  that  is  sharing  with  us  here  that  a  chemical  company  must  invest  to  create  a  critical  mass  of  chemical  engineers  with  technical  skills  in  statistics,  mathematics,  modeling,  optimization,  process  control,  visualization,  simulation,  and  programming. But  it's  much  easier  to  train  chemical  engineers  on  data  analytics  topics,  rather  than  to  train  data  scientists  on  chemical  engineering  topics.  We  completely  agree  on  this  statement,  and  this  is  what  we  want  to  do  at  Solvay. We  have  a  lot  of  very  skilled  chemical  engineers,  and  we  want  to  train  them with  this  advanced  analytics  techniques.  T his  is  the  main  reason  why  we  launched  the  program  of  Machine  Learning  Techniques  with  JMP  in  our  GBU,  our  Global  Business  Unit. It  was  a  program  that  started  in  2021.  The  target  population  of  this  program  was  47  engineers  in  our  GBU. I t  was  led  by  the  industrial  data  science  team  at  corporate  level. What  was  the  content  of  this  program?  We  had  one- hour  session,  first  of  all,  one  day,  where  we  explained  why  we  want  to  use  the  machine  learning  techniques  with  JMP  in  order  to  improve  our  production  processes,  as  I  explained  you  before. Then  during  seven  days,  we  made  an  individual  online  course  for  each  of  the  47  engineers  related  to  a  statistical  thinking  part.  It  was  just  an  introduction  of  a  statistical  thinking  methodology.  Then  we  enter  it  on  the  JMP  introduction  part,  explaining  the  tool,  explaining  the  main  feature,  the  benefits  of  using  JMP,  and  the  main  tips  to  start  creating  some  graphics,  some  statistical  reports,  and  some  basic  things.  We  combine  theoretical  lessons  with  practical  exercises  and  planarizations  of  the  web. Then  during  15  days,  we  enter  it  in  more  details  about  what  we  can  do  related  with  machine  learning  techniques,  which  we  did  the  same  individual  online  course  and  also  practical  exercises  and  plenary  session. All  of  this  program  training  last,  let's  say,  around  one  month.  But  then  the  most  important  part  was  the  selection  of  the  real  cases  to  solve  in  the  different  manufacturing  sites  for  the  different  participants  of  the  training.  We  made  this  selection  and  we  provide  a  license  of  JMP,  of  course,  and  a  regular  support  with  weekly,  linear  meetings  and  individual  coaching. Let  me  put  two  examples  of  practical  use  cases  about  this  selection  that  we  did  afterwards  this  training  session.  The  first  one  is  about  increasing  the  sodium   precipitation  yield  in  the  production  processes  of  Rheinberg,  our  manufacturing  site  in  Germany,  and   Torrelavega,  our  site  in  Spain. How  we  use  the  JMP  on  this  project?  First  of  all,  to  screen  the  multiple  variables  that  we  think  that  can  impact  our  main  target  variable,  which  is  the  sodium  precipitation  yield,  in  order  to  explain  the  variability  of  this  target. The  goal  was  to  investigate  what  are  the  variables  that  can  explain  better  the  variability  of  our  target.  For  this,  we  use  one  of  the  tools  that  we're  learning  during  this  one  month  course,  which  is  the  predictor  screen. This  is  very  important,  because,  as  I  explained  to  you  before,  we  have  hundreds  of  variables  impacting  this  output,  which  is  the  yield  in  the  process, s o  it's  very  difficult  to  analyze  one  by  one.   This  tool  allow  us  to,  in  a  very  quick  way,  in  a  very  fast  way  and  intuitive  way,  to  understand  what  are  the  main  contributors  explaining  the  variability  that  we  have  in  our  tool. Also,  we  need  to  say  that  JMP  is  a  very  intuitive  and  code- free  advanced  analytics  tool,  and  this  is  very  important  because  the  production  engineers,  not  all  of  them  have  the  knowledge  to  use  these  programming  code  tools.  Also,  important  to  say  that  to  visualize  the  long- term  variability  of  the  target,  but  also  its  relationship  between  the  most  important  variables  is  a  very  important  feature  that  JMP  has. Finally,  also,  we  use  JMP  to  elaborate  the  statistical  reports  about  the  performance  of  the  different  approaches,  different  trials  that  we  perform  along  the  project.  This  is  the  first  example  where  we  use  JMP  in  order  to  understand  better  our  process  and  improve  our  yield  in  both  of  our  management  designs. Second  one,  in  this  case,  we  are  talking  about  finding  the  root  causes  for  the  variability  of  one  important  parameter  of  the  final  product,  which  is  the  carbonate  content  in  the  sodium   bicarbonate. Here  we  use  JMP  similarly,  like  in  the  project  that  I  explained  before,  we  screen  the  multiple  variables  and  select  the  most  important  ones  to  explain  the  variability  of  this  target.  For  this,  of  course,  we  use  to  use  it  again,  the  predictor  is  giving  input,  as  you  can  see  here  in  the  right- hand  side  of  the  slide. Also  here,  it  was  very,  very  important,  the  visualization  in  a  graphical  way,  the  interaction  of  the  main  variables  that  we  identified  thanks  to  the   predictor screening.  You  need  to  understand  that  on  this  type  of  projects,  we  need  to  collaborate  with  different  stakeholders.  The  production  engineers  cannot  solve  this  type  of  very  complex  projects  alone. Here  we  need  to  align  and  speak  and  generate  debates  with  the  production  operators  on  the  field,  with  production  operators  in  the  control  room,  with  site  managers,  other  engineers  in  other  plans,  experts  at  corporate  level,  and  so  on.  It's  very  important  to  translate  what  we  analyzed  in  analytical  way,  in  a  graphical  way  to  make  and  generate  these  debates. Finally,  also  very  important  to  help  on  the  decision- making  process  in  order  to,  at  the  end,  of  course,  taking  decision  on  these  main  variables  that  we  demonstrated  in  an  objective  way  to  the  people  that  at  the  end  decides  to  make  a  modification  in  the  process  to  take  these  decisions. This  is  what  we  have  done  also  in  this  project.  At  the  end,  it's  about  make  a  modification,  make  a  small  investment  to  modify  a  part  of  our  installation  in  order  to  reduce  the  variability  of  this  carbon  content  of  the  final  product. That's  all  for  my  side  for  the  moment.  Now,  I  will  give  the  floor  to  my  colleague,  Carlos,  data  scientist  at  the  corporate  level,  who  will  explain  to  you  an  add- in  that  we  developed  internally at  Solvay  that  allow  us  to  connect  the  data  from  our  MES,  which  is  the  player,  as  I  explained  before,  where  we  store  all  the  data  into  JMP. Thank  you,  David.  I  will  go  ahead  and  share  my  screen. Can you all  see? Yes. Okay,  I  want  to  get  started.  Do  you  see  this  ribbon  or  it's  only  me? Yes,  it's visible . Yes,  thank  you,  David.  In  this  section  of  the  presentation,  I  will  show  you  one  tool,  one  add- in  to  demonstrate,  an  open  source   add-in  that  we  have  created  in  the  team  of   industrial  data  scientists  at  corporate  level  in  Solvay. T his  is  a  team  that  supports  all  of  the  global  business  units.  That  means  that  we  have  to  provide  for  solutions  for  all  of  the  different  MES  that  exists  in  Solvay. We  did  the  automation  of  this  task  because  we  saw  the  situation  that  was  happening  before  where  we  had  to  download  the  data  in  a  spreadsheet  sometimes  without  a  lot  of  advanced  capabilities.  Then  we  had  to  import  this  data,  treat  it,  and  then  finally  be  able  to  use  it  without... Sometimes  it  was  not  even  clearly  identified  because  it  was  only  the  name  of  the  sensor,  but  sometimes  the  name  of  the  sensor  is  not  very  clear,  because  it's  not  very  well  standardized,  the  notation. To  leverage  the  power  of  data,  we  say,  "Okay,  let's  make  the  process  of  extracting  the  data  as  automated  as  possible  so  that  all  the  process  engineers  can  use  it."  We  have  leveraged  this  power  in  JMP,  in  GBU  soda  ash,  and  also  in  other  GBUs  with  an   add-in  that  is  able  to  connect  to  the  two  most  common  data  bases  in  Solvay  that  are  the  MES  historians,  IP21  and  PI  from  Aztec  and  from  AVEVA,  respectively. This  add-in  connects  directly  to  the  databases  if  we  are  in  the  local  network  and  is  able  to  fetch  with  a  query  whatever  information  is  stored  there.   We  have  automated  the  task  of  connecting  to  the  server,  selecting  the  query  parameter,  downloading  the  sensor  data  table  in  a regular  format  with  the  description  and  units.  A lso,  we  have  integrated  other  functions. It's  also  worth  mentioning  that  in  this  case,  we  are  dealing  with  a  lot  of  sites.   The  sites  of  soda a sh  are  among  these,  of  course,  where  we  use,  as  I  mentioned,  two  main  databases  for  historian MES,  and  where  this  is  more  or  less  the  range  of  sensors  that  we  have  to  take  into  account. This  is  how  it  looks  today  the  add-in  which  is  available  from  the  menu  add-ins.  It  was   developed  from  an  application  menu,  and  it  has  also  some  script  on  the  background.  I  will  show  a  clear  demo  with  this,  so  bear  with  me. It's  a  focus  on  process  engineers.  As  I  mentioned,  it  integrates  a  description  and  engineering  unit,  which  is  very  useful  for  identifying  what  data  you  are  using  in  your  analysis. A t  the  end  of  the  day  you  get  two  data  tables.  One  of  them,  so  what  I  see,  what  you  have  with  typical  statistics,  and  the  other  one,  which  is  a  time  series  with  all  the  details  of  every  sensor  according  to  your  extraction.  But  above  that,  the  functions  that  I  mentioned. I  will  go  out  of  this  script  to  be  able  to  show  you  the  demo  of  this  tool.  Just  bear  with  me.   This  is  JMP,  as  you  know  it.   Here  we  have  an   add-in  that  when  it  is  connected  to  the  database,  you  are  able  to  see  all  the  list  of  servers  here  in  the  list. It  is  connected  to  a  server  list  that  is  maintained  by  another  team,  which  is  an  IT  team.  In  this  server  list,  it  is  also  possible  to  modify  the  details  in  case  one  of  the  servers  is  not  available.  For  example,  one  can  put  the  IP  address  and  domain  here  to  add  a  new  server  that  is  connected  to  the  internal  network  of  Solvay. After  that,  after  selecting  your  server,  the  next  step  is  to  go  ahead  and  filter  your  sensor  by  name  or  description.  This  is  important  because  as  we  mentioned,  we  have  in  the  order  of  thousands  of  sensors,  which  means  that  if  you  are going  to  go  ahead  and  try  to  see  everything  that  is  available,  it  might  take  a  long  time  or  the  server  might  crash. For  that  reason,  we  have  this  filter  so  that  you  can  see  what's  relevant  to  you  in  case  you  want  to  see  flow  sensors,  temperature  sensors,  pressure  sensor,  or  you  want  to  look  the  three  sensors  by  description  or  both. After  you  are  done  with  this  filter,  what  you  need  to  do  is  select  the  relevant  tags  from  this  other  list  which  are  given  in the  presentation,  just  an  example.  But  in  this  case,  I'm  not  connected  to  a  local  network,  but  you  can  see  an  example  here. You  will  see  here  all  the  list  of  available  sensors,  and  what  you  need  to  do  is  to  add  them  to  right -hand  side,  which  is  in  this  way.   The  right- hand  side  list  means  that  these  sensors  are  ready  to  be  extracted  for  you.  Now  what  you  need  to  choose  is  the  start  time  and  end  time  for  your  extraction. You  select  that,  it  perhaps  is  one  day,  one  month,  one  year,  I  don't  know.  Then  you  have  to  choose  what  type  of  method  you  want.  The  most  common  is  interpolated  because  it  means  that  you  will  have  evenly  spaced  data  by  minute,  by  second,  by  hour  or  by  day.  Also,  we  offer  an  aggregation  that  is  in  this  case  is  the  average.  A lso,  we  offer  to  extract  the  actual  data  as  it  was  recorded  by  the  sensor. One  more  thing,  if  for  some  reason  you  already  know  the  list  of  sensors  that  you  want  to  download,  and  you  don't  want  to  browse  by  name  or  description,  you  can  directly  paste  also  this  list  from  a  CSV  format  that  you  have  available. When  you  have  all  these  parameters  ready,  what  you  have  to  do  is   heat  from  extraction.  T his  will  take  the  time  it  takes  the  SQL  query  to  go  to  IP 21  or  PI.  W hen  it  finishes,  you  will  get  two  tables  with  what  I  mentioned  before.  One  with  summary,  which  will  allow  you  to  understand  the  typical  statistically  values  for  each  sensor,  row  by  row. In  this  case,  you  have  the  name  of  the  sensor,  description  units,  and  also  the  mean,  standard  deviation,  max,  mean  range  of  the  sensor.  In  this  way,  you  can  understand  if  the  sensor  is  perhaps  not  working  or  something  odd  is  going  on  so  that  you  don't  need  to  extract. Furthermore,  you  will  also  get  the  time  series  data,  which  in  this  case  looks  like  this.   You  get  a  column  with  the  time  stamp,  then  also  one  column  by  sensor  with  the  proper  format.  For  example,  this  is  a  continuous  amount,  this  is  a  discrete  amount,  and  everything  is  properly  formatted.  On  top  of  that,  you  get  the  description  of  the  sensor  as  I  mentioned,  and  the  units,  which  is  very  useful  for  processing  units.  This  allows  you  already  to  apply  all  the  methods  from  JMP. Here  you  have  an  automated  version  of  the   add-in  that  allows  you  to  extract  data  directly  to  JMP.  It's  also  open  source,  so  if  you  are  interested  in  contributing,  you  can  go  to  the  community  in  JMP  or  in  GitHub  and  contribute  your  own  developments. On  top  of  that,  we  also  offer  three  functionalities,  which  are  the  update  table.  The  update  table  will  make  sure  that  when  you  are  done  extracting  your  data  and  you  perform  one  analysis,  you  can  keep  updating  the  same  analysis  the  next  day. For  example,  let's  say,  yesterday  I  downloaded  this  data  and  I  created  one  column  for  calculating  some  value.  L et's  say  that  today  I  want  to  also  see  how  this  calculated  value  is.   That  means  I  just  have  to  hit  this  button  and  this  data  will  be  updated  with  the  newest  data  from  yesterday  to  today. Also,  we  offer  a  refresh  functionality  in  which  it's  meant  to  work  as  some  dashboard.  This  means  that  it  will  fix  time  window  and  you  will  be  able  to  see  your  analysis  with  respect  to  the  current  time.  That  means  that  if  I  perform  an  analysis  yesterday  and  I  have  a  new  column  with  a  new  formula,  I  can  hit  this  button  and  only  see  the  relevant  data  table  for  the  actual  period  for  getting  the  past. That  means  that  as  I  said,  some  fix  window  is  fixed,  and  then  you  can  see  the  same  analysis  with  the  current  time  instead  of  with  the  old  time.   One  of  them  will  only  see  one  time  window  and  the  other  one  will  update  the  full  time  window. Furthermore,  you  also  have  the  add  new  tags,  which  means  that  for  some  reason  you  forgot  to  add  a  tag  and  you  are  remembering  that  it's  very  important,  you  can  add  a  new  functionality. With  all  this  said,  I  will  go  to  the  next  slide.   That  means  that  by  this  point  you  have  already  a  nice  data  table  in  JMP,  which  with  all  these  functionalities  that  we  mentioned,  update  table,  refresh  table  and  add  new  tags.  This  allows  you  already  to  use  the  typical  methods  for  advanced  analytics  in  JMP.  For  example,  this  one  I  am  showing  here  both  the  JMP  and  JMP  Pro  version,  but  this  is  up  to  you. We  also  empower  the  user  to  use  another  add-in  that  we  also  developed  that  is  called  Predictor  Explainer,  which  will  be  presented  in  another  Discovery  talk.   We  also  have  other  types  of  analysis.  This  will  allow  us  to  perform  the  typical  task  in  data  analytics,  which  could  be  root  cause  analysis,  anomaly  detection,  process  optimization,  and  others. With  this,  I  will  let  David  to  conclude  on  the  presentation. Yes,  thank  you  very  much,  Carlos.  I  don't  know  if  you  are  seeing  now  my  screen.  If  you  stop  sharing,  maybe. Yes,  stop share . Okay.  Good.  Let  me  reshare  the  screen. Yeah. Okay.  Could  you  see  now  my  screen? Yeah. Perfect.  Thanks,  Carlos.  Thanks  for  your  support  that  you  provided  to  our  GBU,  not  only  developing  this   add-in,  but  also  coaching  our  production  process  engineers  on   JMP, too . Last  slide  to  share  with  you  what  are  the  main  challenges  that  we  faced  during  this  journey  of  scaling  up  the  usage  of  JMP  in  our  GBU,  and  also  the  lessons  learned  and  the  next  steps. Today,  let's  say,  that  around  20 %  of  the  target  population  that  two  years  ago  we  started  with  this  program,  is  continue  using  JMP  today  in  a  routine  basis.  The  main  blocking  points  that  we  found  are,  of  course,  resistance  of  change.  Some  people  are  more  comfortable  using  another  tools  like  Minitab  or  only  Excel  files.  In  any  project  or  initiative  that  requires  a  change  in  a  tool,  there  is  always  this  resistance  of  change  that  requires  time  and  efforts  to  change. But  another  reason  is  also  the  lack  of  time.  Lack  of  time  that  is  linked  also  to  priorities.  The  priorities  of  the  role  of  production  and  process  engineers  is  not  always  fully  oriented  on  process  optimization  only,  because  sometimes  there  are  too  much  reporting  to  do  and  other  topics  to  cover  in  their  role. The  main  points  to  keep  during  this  process  are  this  type  of  awareness  that  we  did  with  the  practical  industry  success  for  example.  This  is  very,  very  important.  In  order  to  convince  the  people  and  to  show  the  value  to  use  machine  learning  techniques  to  improve  our  process  and  to  reach  this  competitive  level  that  we  want  as  a  company,  as  a  business,  we  need  to  use  these  practical  industry  successful, for  examples. Because  this  is  related  with  a  population  of  chemical  engineers  that  they  will  not  understand  if  we  start  to  talk  about  different  examples  in  marketing,  in  finance,  to  improve  all  of  these  other  areas  with  which  we  need  to  show  them  clear  and  concrete  examples  related  to  process  industry. Then  also  an  important  point  about  the  importance  of  this   predictor screening   tool  as  a  kit  tool  for  us  for  the  variability  sourcing.  The  main  problem  that  we  have,  as  I  explained  it  before,  is  the  variability  that  we  have  is  certain  parameters  of  our  process  that  we  need  to  reduce. If  we  are  able  to  reduce  this  variability  of  the  key  parameters,  we  are  going  to  really  reduce  our  variable  and  fixed  cost  in  our  production  manufacturing  sites.  This  tool  is  very  important  for  our  production  engineers  to  find  the  root  causes  of  this  variability  and  act  on  them. Also  an  important  thing  is  this  combination  that  we  did  between  planarization,  so  all  together  sharing  thoughts  and  experiences,  but  also  with  individual  practice.  P rovide  time  to  the  people  to  practice  by  their  own  and  then  exchange  in  a  common  call. Finally,  the  points  that  we  identified  to  reinforce  and  to  implement  in  the  near  future  are,  first  of  all,  of  course,  in  order  to  tackle  this  problem  of  resistance  of  change,  we  need  to  convince  the  site  management  about  the  importance  of  analytics  for  the  production  and  process  engineers.  We  need  to  launch  a  series  of  awareness  sessions  dedicated  for  them.  This  is  in  the item  we  are  going  to  do  a  lot  of long  this  year. Also  very  important  for  us,  we  identified  this  strong  individual  coaching  for  the  production  and  process  engineers  when  they  start  to  use  JMP  in  the  real  cases,  in  the  real  projects.  Because  JMP  requires  time,  the  different  tools  as  per  the  tool  screening  and  other  tools  requires  time.  It's  very  important  for  the  very  first  projects that  one  engineer  developed  using  JMP  to  have  a  good  coach,  a  good  trainer  to  have  a  company  during  the  process. That's  all  from  our  side.  Thanks  a  lot  for  your  attention.  If  you  have  any  questions  for  me  or  for  Carlos,  we  are  available.  Thanks  a  lot.
JMP software was initially implemented at CEA in 2010 by R&D teams who develop nuclear glass formulation. Over the years, JMP has been used for multiple purposes, such as data visualization of highly complex composition domains, optimal mixture designs, and machine learning techniques to create property-to-composition predictive models. More recently, JMP enabled us to develop very innovative methodologies. Two case studies will be presented. First, we will show an original approach based on an automatic and intelligent subsampling of the data, combining techniques of optimal designs and several predictive methods in JMP and JMP Pro to create very robust and accurate predictive models. Second, we will present an amazing benefit of using the Simulation platform where a response is below the limit of detection in the most part of the design space.     Thank  you  for  watching  this  presentation  for  the  Europe  Discovery  Summit  Conference  online.  My  name  is  Damien  Perret.  I  am an  R&D  scientist  at  CEA  in  France.  I  am  with  Francois  Bergeret  the  statistician  and  the  founder  of  Ippon  Innovation  in  France.  With  Francois,  we  are  very  happy  to  be  here  today  and  we  would  like  to  thank  the  steering  committee  who  gave  us  the  opportunity  to  talk  about  this  work  which  is  about  innovative  approaches  using  JMP. We  will  give  you  two  case  studies.  Francois  will  present  the  first  case  and  I  will  present  the  second  case.  Just  start  with  a  few  words  about  CEA.  CEA  is  a  French  government  organization  for  research,  development,  and  innovation  in  four  areas;  low  carbon  energies,  technological  research,  fundamental  research,  and  defense  and  security.  The  CEA  counts  about  20,000  people  and  we  are  located  on  nine  different  sites  in  France.  We  have  strong  relationships  with  the  academic  world  and  many  collaborations  with  universities  and  partners,  both  in  France  and  all  around  the  world. A few  words  about  Ippon  Innovation.  Of  course,  we  are  a  smaller  company  compared  to  the  CEA.  It  was  created  15  years  ago.  We  are  based  in  the  South  of  France,  Toulouse.  We  are  a  team  of  statistician,  only  statisticians  with  skill  in  industrial  statistics,  example,  SPC,  measurement  system  analysis,  and  of  course,  machine  learning  and  so  on. I'm  a  JMP  user  since  1995,  so  a  long  time  ago.  I  started  with  JMP  3.  Of  course,  Ippon  is  JMP  Partner  because  we  used  a  lot  of  JMP.  For  example,  for  yield  optimization,  we  have  a  tool  called  Yeti,  automatic  yield  optimization  in  complex  manufacturing  systems.  We  also  developed  solution  based  on  a  customer  request,  what  we  call  software  on  demand,  for  example,  a  full  solution  for  outlier  detection  or  statistical  process  control. Using  JMP  and  GSL,  we  have  several  GSL  expert  here,  including  Carole  Soual,  which  is  co- author  of  this  talk  today.  We  also  have  a  classical  consulting  and  training  expertise  based  on  JMP  on  industrial  statistics.  The  content  of  the  presentation  today,  we  will  present  two  real  case  studies  with  Damien.  The  first  one  is  based  on  simulation  and  a  computer  design  of  experiment,  and  the  second  one  will  be  presented  by  Damien  on  a  machine  learning  tool  for  prediction. I  will  present  the  case  study  number  1  based  on  mixture.  It  is  okay.  You  can  go  on  the  next  slide.  The  context  is  a  risk  assessment  and  a  probability  calculation.  To  explain  it  a  little  bit,  mixture  design  of  experiment  was  created  by  the  CEA  to  evaluate  the  performance  of  a  material  for  nuclear  waste.  The  conditioning  is  done  with  salts  in  a  matrix.  The  performance  is  determined  by  a  threshold  on  the  energy.  The  energy  has  to  be  higher  than  the  threshold. It  is  not  so  easy  to  estimate  the  probability   to  be  below  this  threshold.  We  will  use  all  the  tools  and  all  the  data  that  we  have  to  do  this  task.  Of  course,  the  probability  has  to  be  as  small  as  possible.   Damien,  the  CEA  expert,  will  assess  if  this  probability  is  okay.  Now,  what  methodology  we  use  to  estimate  this  worst  case  probability? First  of  all,  based  on  the  data  from  the  mixture  design  of  experiment,  we  have  estimated  several  models.  Basically,  some  classical  linear  models,  also  classification  and  decision  trees,  and  also  neural  networks  are  the  main  models  used.  For  each  model,  we  have  done  three  analysis. First,  the  Monte  Carlo  simulation  on  the  factor,  so  classical  random  simulation  with  Monte  Carlo.  Also,  what  we  call  space  filling  design,  a  computer  design  with  JMP,  where  we  try  to  explore  the  design  space  but  by  computer  simulations,  adding  a  noise  on  the  response.  This  is  done  with  JMP  Pro.  Last  thing  we  did  is  a  blending  of  Monte  Carlo  and  space  filling  design.  We  will  detail  this,  but  this  is  very  useful  as  we  want  to  estimate  a  worst  case  probability.  We  can  go  to  next  slide. Case  study  number  1,  classical  JMP  simulation  for  the  mixture  DOE.  First  of  all,  we  have  to  select  the  best  model.  Before  doing  the simulation,  we  need  to  find  the  best  model.  There  is  a  very  nice  feature  in  JMP  Pro,  which  is  called  model  comparison.  Very  quickly,  you  can  compare  the  models  based  on  criteria,  RS quare or  the  AISC.  We  have  done  this  and  I  will  do  the  demo  right  now.  I  share  my  screen  now.  I  have  to  share  my  screen.  We  have  to  use  JMP.  Here,  I  open  the  data  set.  This  is  a  data  set  from  mixture  design  of  experiment.  There  is  eight  factors,  X 1- X8,  and  the  response  is  the  energy. To  show  you  an  example  of  the  model  that  we  did,  we  perform  here,  for  example,  a   predictive  modeling  neural  network.  The  response  is  the  energy.  The  X's  factors  are  here.  We  click  here.  For  information,  we  decide  here  not  to  divide  the  full  sample  in  learning  sample,  validation  sample,  test  sample,  because  data  are  from  design  of  experiments,  so  we  have  not  a  lot  of  data.  Thirty-one  experiment  maximum  here. We  use  a  KFold  validation  to  save  samples,  I  would  say.  A  very  simple  neural  net  here,  one  layer,  and  we  click  on  Go.  There  is  quite  a  good  model  with  a  correct,  let's  say,  RS quare  for  that.  How  is  working  the  comparison  of  the  model  in  JMP?  You  need  to  save  the  prediction  formula.  You  click  on  the  spot  and  I  save  the  formula.  Doing  this,  you  have  formulas  here  of  the  neural  net  with  hidden  layers.  You  see  the  formula,  hyperbolic  tangent  here  of  the  linear  predictor.  Second n euron  here  of  the  hidden  layers,  another  formula,  and  so  on.  At  the  end,  for  the  neural  net,  the  final  formula  for  the  prediction  is  a  linear  combination  of  the  hidden  neurons.  I  keep  this  formula  for  the  moment,  and  I'm  going  to  do  the  same  with  a  linear  model. The  linear  model  was  saved  here  to  save  time.  We  decide  to  have  a  linear  model.  For  mixture  design,  we  have  some  cross  effect  here  and  a  linear  model.  Of  course,  we  need  to  clean  the  model.  We  need  to  remove  what  is  not  significant  here  and  so  on.  When  the  job  is  done,  we  also  have  to  save  the  formula.  I  saved  what  I  call  the  prediction  formula.  Here,  once  again,  in  the  JMP  table  somewhere,  you  will  have  the  prediction  formula  for  the  energy,  ordinary  least  square.  You  have  a   classical  linear  formula  for  the  linear  model. To  summarize,  here  I  have  two  models,  neural  network  and  ordinary  least  square,  and  I  have  a  formula  for  the  two  models.  Then  with  these  formulas,  I  go  to  Analyze,  Predictive  Modeling,  and  Model  Comparison.  This  is  the  nice  JMP  platform  for  machine  learning.  Here  what  you  have  to  enter  is  just  the  formula  of  the  predictor.  Here,  for  the  example,  I  just  enter  two  formula,  ordinary  least  square  and  neural net,  and  that's  all.  You  just  click  Okay.  Here  you  have  a  comparison  model  with  some  criteria  and  you  select  the  best  model h ere.  First  of  all,  I  add  noise  in  the  data,  so  RS quare  are  not  so  good. In  addition,  we  can  compare  the  both  model.  In  that  case,  for  the  worst  case  analysis,  we  decided  decided  to  perform  the  simulation  both  on  linear  model  and  also  on  neural  nets.  Now,  we  are  going  to  do  that  work.  L et  me  show  you  what  we  did.  Maybe  I'm  going  to  share  this  slide,  Damien,  because  I  need  to  show  something  here.  It  will  be  easier  with  my  PC.  So the  live  demo  is  done.  Here  I'm  going  to  use  a  Monte  Carlo  simulation.  Very  easy  with  a  JMP  prediction  profiler. First  of  all,  this  is  my  first  demo.  After  that,  I  will  do  the  mixture  design  of  experiment  space  filling  design.  I  have  to  say  that  both  for  Monte  Carlo  and  also  for  space  filling  design,  we  have  some  mixture  constraints.  It  means  that  the  sum  of  the  component  has  to  be  equal  to  one.  It's  not  a  real  space  filling  design.  It's  not  a  real  Monte  Carlo  because  you  have  this  constraint.  J MP  has  a  smart  algorithm,  iterative  algorithm  to  take  into  account  the  constraints.  We  are  going  to  simulate  Monte  Carlo  with  the  first  factors  and  then  with  the  second  factors  taking  into  account  the  first  one  and  so  on. At  the  end,  I  will  present  the  full  simulation  with  both  Monte  Carlo  and  space  filling  design.  This  is  new  because  what  we  have  done  is  for  each  run  of  the  space  filling  design,  we  have  done  1,000  Monte  Carlo  simulations.  It's  really  a  worst  case  here,  but  it  was  the  objective  for  the  CEA  to  have  the  worst  case.  Thanks  to  this,  we  will  get  a  really  good  estimation  of  the  worst  case  probability. Now,  let  me  jump  to  the  demo.  First  of  all,  Monte  Carlo.  Here  I  have  the  result  for  the  Monte  Carlo.  No,  sorry,  it's  not  the  right  one.  Here  it  is.  Let's  use  the  neural  network.  Here,  for  the  neural network,  you  can  ask  for  the  profiler.  If  you  ask  for  the  profiler,  here  you  can  ask  the  simulator.  I  will  randomly  move  the  factors.  But  what  is  important  here,  I'm  asking  for  uniform  distribution  because  it's  more  classical  for  mixture  design.  Here,  I'm  going  to  simulate  random  data  for  the  factor  X1 ,  but  between  this  value  and  this  value  because  there  is  a  constraint  here.  I  continue  with  the  second  factor  with  uniform  simulation  between  this  value  and  this  value,  and  so  on. To  save  time,  I  will  automatically  run  the  simulation  here.  Here  it  is.  Sorry,  I  don't  have  the  right  data  set.  Sorry  about  that.  W here  it  is.  Model  comparison  and  simulation.  Here  it  is.  Okay,  here  it  is,  Monte  Carlo  simulation.  To  save  time,  I'll  show  you  directly  the  result  with  a  random  simulation  with  a  mixture  constraint. W e  have  done  this  Monte  Carlo  simulation  and  we  have  this  result  on  the  energy.  What  is  nice  also  is  that  you  can  put  the  result  in  a  table.  This  is  the  result,  10,000  Monte  Carlo  simulation.  For  each  simulation  and  with  the  model,  you  have  the  energy.  Now,  to  calculate  the  probability  to  be  lower  than  the  spec  limit,  we  just  have  to  do  a  distribution  of  the  simulated  energy.  This  is  a  distribution,  not  exactly  Gaussian,  more  close  to  a  Laplace  distribution.  But  anyway,  it  doesn't  matter.  We  will  do  the  process  capability  and  the  spec  given  by  the  CEA  is  minus  100.  I  just  click  on  this  and  we  have  what  we  call  the  capability  analysis,  overall  capability.  Ppk  is  quite  good,  higher  than  1.3 .  Here,  the  expected  percentage  of  out  of  spec  is  very  low   because  this  number  is  clearly  very  low. We  can  have  a  look  in  a  scientific  notation.  We  are  at  a  very  small  probability  and  clearly  it  was  acceptable  for  the   [inaudible 00:15:13] .  At  this  time,  based  on  neural  network  model,  based  on  Monte  Carlo  simulation,  we  have  estimated  the  first  probability  to  be  out  of  specification  and  this  probability  is  here.  Next  step.  What  we  have  done  here  is  that  we  are  going  back  here.  Sorry,  I  don't  close  this.  I  don't  close  this.  I'm  going  here  to  do  another  simulation  with  what  we  call  a  simulation  experiment.  Simulation  experiment  is  also  called  sometimes  a  space  filling  experiment.  You  try  to  explore  the  design  space. Here,  we  have  to  remind  that  there  is  a  mixture  constraint,  so  we  will  not  explore,  of  course,  a  full  design  space,  but  we  will  explore  a  part  of  the  design  space  here.  Here  is  a  result,  128  computer  runs  with  the  simulation  data.  We  can  have  a  look  at  the  result  of  the  simulation.  If  we  do  a  scatterplot  matrix  on  the  simulated  experiment  on  the  factors,  here  is  the  exploration  of  the  factor  with  a  mixture  constraint.  With  this,  once  again,  we  have  simulated  energy  but  here,  it's  not  Monte  Carlo  simulation,  it's  a  computer  simulation  experiment. Same  job,  we  can  put  the  energy  here.  W here  is  the  simulated  energy?  Here  it  is.  Once  again,  roughly  normally  distributed.  Here  I'm  going  to  calculate,  once  again,  the  process  capability  with  a  spec  minus  100.  Here  it  is.  Here  I  should  have  the  process  capability  which  is  good.  Once  again,  you  have  a  probability  to  be  out  of  the  spec,  which  is  close  to  7  per  1,000.  It  is  also  acceptable  as  a  result.  But  you  can  see  that  in  that  case,  the  probability  is  higher  than  the  previous  one  because  the  calculation  was  different  and  because  it  was  a  simulation  experiment  exploring  the  space  and  it's  different  from  Monte  Carlo  simulation. Last  thing  that  we  did,  and  I'm  going  to  open  the  d ata  file.  Here  it  is,  Simulated  Monte  Carlo.  Here  it  is.  Not  the  right  one,  sorry.  I  have  a  lot  of  things  open.  Here  it  is.  What  is  this  file?  What  we  did  here.  We  did  both  a  space  filling  design  and  for  each  run  of  the  space  filling  design,  we  have  done  1,000  Monte  Carlo  simulations.  The  total  number  of  points  here  is  128,000  lines.  Both  Monte  Carlo  and  also  a  computer  design  of  experiment. Here  we  have  really  a  good  data  set  with  all  the  potential  variations.  Some  are  forced  by  the  design  of  experiment,  others  are  clearly  random  with  the  Monte  Carlo.  Here,  once  again,  we  are  going  to  estimate  the  probability.  You  have  the  nice  distribution  of  the  energy.  Then  we  will  once  again,  calculate  the  probability  to  be  out  of  spec.  Entering  the  spec  here,  here  it  is.  Here  we  have  a  probability  to  be  out  of  spec,  which  is  close  to  1  per  1,000.  Once  again,  it  was  quite  a  good  result  here.  This  result  is  quite  innovative.  Here,  just  for  information,  we  had  to  create  a  little  JMP  script  here  to  do  this,  to  mix  the  computer  design  and  the  Monte  Carlo  simulation.  There  is  a  little   GSL  code   for  that.  That's  all  for  my  part.  Damien,  maybe  you  can  go.  I  stopped  the  sharing. Okay. This  is  now  the  case  study  number  2.  For  this  study,  we  have  developed  a  custom  tool  for  predictive  application.  The  objective  here  was  to  create  a  tool  including  statistical  models  in  JMP  Pro  in  order  to  predict  a  specific  property  which  is  the  glass  visco sity  as  a  function  of  composition  and  temperature.  To  do  that,  experimental  data  are  coming  from  both  commercial  database  and  from  our  own  database  at  CEA .  A s  we  will  see,  the  originality  of  the  approach  comes  from  the  methodology  for  data  sub sampling. We  wanted  the  algorithms  to  be  coded  in  GSL  and  implemented  in  JMP  Pro.  The  response  of  the  model  is  the  glass  viscosity,  of  course,  and  the  factors  are  the  weight  percent  of  the  glass  components.  Here  are  some  background  information.  You  have  to  know  that  glass  material  is  a  non- crystalline  solid  and  it  is  obtained  by  a  rapid  quench  of  a  glass  melt.  From  a  material  point  of  view,  glass  is  a  mixture  of  different  oxides. The  number  of  oxides  is  variable  from  two  or  three  in  a  very  simple  glass  to  about  30  and  even  more  in  the  most  complex  compositions.  T here  is  a  long  tradition  in  the  calculation  of  glass  properties,  and  we  think  that  the  first  models  were  created  in  Germany  at  the  end  of  the  19th  century.  Since  then,  the  amount  of  published  literature  in  the  field  of  glass  property  prediction  has  increased  a  lot.  So  that  today,  we  have  a  huge  amount  of  glass  data  available  in  commercial  database. Several  challenges  remain  for  the  prediction  of  the  glass  v iscosity  because  the  glass  v iscosity  is  a  property  that  is  very  difficult  to  predict.  First,  v iscosity  has  a  huge  range  of  variation  up  to  certain  orders  of  magnitude.  A lso,  v iscosity  is  very  dependent  on  physical  and  chemical mechanisms  that  can  occur  in  the  glass  melt,  depending  on  the  glass  composition,  like  fast  separation  or  crystallization,  for  example. Here  is  just  a  short  example  that  shows  this  difficulty.  We  have  selected  three  compositions  of  what  we  call  SBN  glass,  which  is  very  simple  glass  with  only   3 oxides.  W e  applied  the  best  known  models  from  the  literature  to  calculate  the  visco sity.  T hen  we  compare  the  predicted  values  with  the  experimental  value  we  have  measured  with  our  own  device.  You  can  see  that  even  for  a  very  simple  glass,  it  is  not  easy  to  obtain  one  reliable  value  for  the  predicted  v iscosity.  Here  is  a  good  picture  we  like  to  use  to  give  a  view  of  the  database  where  each  dot  is  one  glass  in  a  multi  dimensional  view  of  the  domain  of  compositions. Data  may  come  from  different  isolated  studies,  or  we  can  have  data  coming  from  studies  using  experimental  design,  or  data  obtained  from  parametric  studies  with  variation  of  one  component  at  a  time.  W e  spent  a  lot  of  time  in  the  past  to  apply  different  method  of  machine  learning.  A  classical  approach  was  used  for  partitioning  the  data  into  a  training  set  and  a  validation  set.  But  at  the  end,  no  statistical  model  with  acceptable  predictive  capability  was  found  to  predict  the  v iscosity.  T hat's  why  we  have  decided  to  use  a  different  approach. Instead  of  using  all  the  data,  we  think  it  is  better  to  create  a  model  by  using  data  close  to  the  composition  where  we  want  to  predict  the  v iscosity.  So  for  example,  if  we  want  to  predict  here,  one  model  will  be  created  from  the  data  we  have  in  this  area,  and  a  different  model  will  be  created  if  we  want  to  predict  the  property  here,  for  example.  T hat's  why  we  say  that  this  technique  is  dynamic  because  the  model  depends  on  the  composition  and  it  is  related  and  fitted  where  we  want  to  predict.  W e  say  that  the  model  is  automatic  because  we  don't  have  to  do  this  manually. Every  step  is  done  by  a  algorithm  implemented  in  the  tool.  O ne  of  the  most  important  point  is  certainly  the  determination  of  the  optimal  data  set  to  create  the  model.  F or  that,  we  have  implemented  two  methods  of  sub sampling.  In  the  first  method,  a  theoretical  or  virtual  design  of  experiment  is  generated  around  the  composition  of  interests.  Then  each  run  of  the  design  is  replaced  by  the  most  similar  experimental  data  present  in  the  database,  leading  to  the  final  training  data  set. In  the  second  method,  we  have  implemented  in  the  tool. Th is  is  based  on  different  size  of  data  sets  created  around  the  composition  of  interest.  A  small  data  set  is  generated  by  the  tool  and  model  are  created  on  this  small  subset  to  predict  the  visco sity.  T hen  bigger  and  bigger  data  sets  are  generated  automatically  and  the  optimal  size  is  evaluated  by  several  statistical  criteria  associated  to  each  subsets. Finally,  the  construction  of  the  models  is  based  on  three  different  algorithms  implemented  in  the  tool.  First  is  a  polynomial  model  obtained  by  a  multi- linear  regression.  Second  is  a  genreg  model  and  neural  net  model.  A t  the  end,  we  have  six  different  calculated  values  which  makes  the  prediction  very  robust. Let's  go  to  JMP  to  see  how  it  works.  L et  me  first  show  you  the  code.  T he  script,  as  you  can  see,  is  quite  long,  about  700  rows,  which  is a quite  complicated  GSL  code.  The  first  thing  you  have  to  do  is  to  enter  the  composition  of  the  glass  you  want  to  predict  the  viscosity.  T o  do  that,  you  can  use  an  interface  we  have  created.  You  just  have  to  select  the  oxide  entering   the  composition.  F or  each  oxide,  you  just  have  to  enter  the  weight  percent.  Or  if  you  want,  you  can  directly  enter  the  composition  in  the  script,  which  is  a  little  bit  more  quick,  I  would  say. Then  you  launch  the  script.  I  won't  do  that  now  because  it  takes  about  one  or  two  minutes.  It's  not  very  long.  But  for  this  demo,  I  have  already  run  the  script.  Let  me  show  you  the  results.  At  the  end  of  the  calculation,  you  have  this  window  where  you  can  get  a  statistical  report.  Very  interesting.  First,  here  you  have  the  composition  you  have  entered,  it's  just  a  reminder.  W e  have  the  graph  showing  the  predicted  value.  O n  the  Y  axis,  we  have  the  predicted  values  of  the  viscosity  calculated  by  the  three  algorithm  and  for  the  two  methods. On  the  X  axis,  we  have  the  number  of  enlargement  for  the  second  method  I  have  described.  I n  red,  which  is  the  most  important  value,  I  would  say,  is  the  average  of  all  the  different  predictions.  I t  is  the  best  prediction,  I  would  say,  of  the  glass  viscosity .  I f  we  need  to  have  more  statistical  details,  we  have  a  lot  of  information  in  this  report  to  study  the  quality  of  each  model.  For  example,  we  can  check  the  values  of  the  PRESS  statistics  for  the  multi- linear  BIC  F  model. For  example,  here  we  can  see  that  the  PRESS  values  tell  us  that  the  prediction  using  the  method  number  1  is  a  little  bit  better  than  for  the  second  method.  W e  also  see  that  the  model  degradation  with  the  enlargement  of  the  training  set.  We  can  also  check  the  RS quare  values  for  the  two  different  method  and  for  the  different  algorithm,  and  we  can  compare  them.  W e  can  have  even  more  details  on  the  design  of  experiments  that  were  created  and  all  the  formula  of  prediction.  T his  is  a  lot  of  information,  but  there  is  the  most  important  part  here,  which  is  the  predicted  value  of  the  viscosity. Let's  go  back  to  the  PowerPoint.  T his  is  the  result   obtained for  the  simple  SBN  glass,  which  is,  as  I  said  earlier,  a  simple  glass  with  only  3  oxides.  But  we  have  calculated  the  visco sity for  three  different  composition  of  SBN  glass.  W e  compare  the  results  obtained  from  our  tool  here  with  the  results  from  the  models  available  in  the  literature.  In  term  of  the  relative  error  of  prediction,  it  has  to  be  as  low  as  possible.  W e  can  see  that  the  best  results  of  prediction  are  obtained  with  our  tool,  which  is  really  great. H ere,  we  have  the  results  obtained  on  more  complex  glasses.  T he  tool  predictive  capability  was  evaluated  by  extracting  230  rows  from  the  global  database.  In  this  table,  we  have  the  relative  error  of  the  visco sity  prediction  for  different  type  of  glass  and  for  the  global  subset  of  data.  Three  quantiles  are  given,  the  median,  meaning  that  50 %  of  the  predicted  values  have  a  relative  error  that  is  below  the  value  indicated  here.  W e  also  give  the  75 %  and  the  90 %  quantiles.  W hen  we  talk  about  glass  v iscosity,  traditionally,  we  consider  that  the  predictive  error,  around  30 %  is  very  good. W e  see  that  for  the  majority  of  the  data,  the  model  capability  is  fine  and  we  were  very  happy  with  the  results  we  obtained.  Here  are  some  very  important  key  parameters.  It  is  very  important  to  take  into  account  as  many  inputs  from  the  glass  experts  as  possible.  For  example,  we  had  to  create  specific  algorithms  to  handle  the  nature  and  the  wall  of  oxides  and  viscos ity.  Another  point  of  major  importance  is  related  to  the  origin  and  to  the  reliability  of  the  data. For  this,  a  significant  amount  of  time  in  this  project  has  been  spent  to  the  constitution  of  a  reliable  database.  W e  had  to  implement  ways  and  we  had  to  study  different  ways  of  calculating  the  distances  between  the  glass  composition.  I t's  time  to  conclude  now.  We  have  presented  two  different  case  studies.  In  the  first  case  studies,  we  have  created  several  models  and  we  have  compared  them  for  the  risk  assessment.  We  have  seen  that  it  was  easy  with  JMP  to  perform  Monte  Carlo  simulations,  even  for  mixture  designs  with  constraints.  We  have  seen  that  it  was  easy  also  to  perform  space  filling  design,  and  again,  even  for  mixture  designs. By  combining  Monte  Carlo  and  space  filling  designs,  worst  case  probability  has  been  estimated.  I n  the  second  study,  we  have  presented  the  tool  we  have  created  with  an  original  method  of  subsampling  the  data.  For  each  composition  of  interest,  a  specific  model  of  viscosity  was  constructed  around  this  composition,  and  we  have  seen  that  the  prediction  accuracy  on  the  visco sity  was  very  promising  and  much  better  than  models  available  in  the  literature.  Thank  you  for  your  attention.
Siltronic AG is a global technology leader in the semiconductor wafer industry. This presentation will introduce the Siltronic AG approach to preparing batch process data for modeling with JMP Pro. It will demonstrate some interactive steps to clean and rearrange the dataset before modeling using an anonymized dataset containing both historical and experimental batch data. Once the best model algorithm is found, the boosted tree model will be tuned.  The Siltronic AG team found that a technically sound model may be physically worthless, meaning it had been overfitted. Therefore, the team started with a large set of factors, gradually reducing the factor list and testing the model's behavior to find the most effective factors (step backward strategy for a boosted tree in a small JSL routine). The last step provided the best insight into which levers are the strongest to optimize the process.     Hello,  everyone.  Thanks  for  joining  in.  In  this  talk,  I  want  to  talk  about  how  we  did  prepare  our  batch  process  data  for  modeling  with  JMP  Pro  and  gaining  valuable  insights  with  a  team  approach.  My  presentation  is  detailed  in  a  PowerPoint  part.  That  is  the  first  part  and  the  details  all  here  shown  will  follow  in  JMP,  like  how  the  data  set  looks,  which  platforms  I  have  used  like  missing  data,  multi collinearity,  functional  data  explorer,  predictor  screening,  modeling  batch  data  with   Boosted Tree  and  Profiler. Summarized  data  will  be  analyzed  by   Boosted Tree  as  well,  and  then  by  a  script with   Boosted Tree  backward  selection.  At  first,  my  company,  Siltronic,  has  world- class  production  sites  all  over  the  world  like shown  here  and  about  4,000  employees.  Here  are  some  key  figures.   If  you  imagine  that  we  have  a  complex  process  flows  like  shown  here  with  silicon  mold  in  a  crucible.  Silicon  ingot  is  created  here.  That's  my  special  task.  To  make  processes  for  growing  silicon  ingots.  Then  the  ingot  is  ground  and  sliced. Edge  rounding  is  done  for  the  wafers,  laser  marking,  lapping,  cleaning,  etching,  polishing,  and  maybe  epitaxy  for  the  final  wafer  to  be  created.  Our  portfolio  is  that  we  are  selling  300  millimeter,  200  millimeter  and  smaller  diameter  wafers  for  different  applications  like  shown  here,  silicon  wafers  with  several  specifications. About  me,  my  education  is  I'm  an  electrical  engineer  and  I  did  some  Six  Sigma  education,  and  my  main  task  is  to  develop  processes  for  growing  silicon  crystals  like  shown  here,  and  I'm  as  well  responsible  for  around  500  users  at  Seltronic  JMP  users.  How  does  the  task  look  like?   What  we  see  here  is  the  final  table,  but  it  has  been  created  and  this  takes  a  lot  of  effort  as  well.  So  there  are  some  database  queries  behind  to  get  this  data  from  database. We  fetched  the  results  into  JMP  data  tables  and  enlarged  the  data  set  with  archives  from  earlier  date  and  enriched  some  information  like  details  of  experiments  and  details  on  consumables  and  wrote  some  script  for  graphs  and  evaluations.  T hen  we  have  done  the  modeling  tasks  and  of  course,  looked  for  missing  data  correlations  to  see  what  are  the  most  significant  effects  and  to  do  feature  engineering  to  see  which  features  are  important  to  generate  an  optimal  result. At  this  point,  I  will  switch  into  JMP  then.  We  can  see  here  my  journal  I'm  working  with  and  the  JMP  main  window  and  the  abstract  is  seen  here.  We  will  start  with  technical  hints.  The  use  data  set  I  show  here  is  fully  anonymized  and  standardized,  and  all  identifiers  are  generic  for  better  understanding  what  are  the  features,  what  is  the  result,  and  so  on.   The  aim  of  this  presentation  is  to  show  all  the  steps  we  needed  for  getting  an  overview,  restructuring,  and  understanding  the  data  set,  and  how  to  build  the  models  to  get  some  insights  of  the  content  of  the  data  set. I  will  show  some  results  that  we  have  discussed  in  a  team.   The  team  is  very  important  here  because  the  team  drove  a  lot  of  discussion  and  work  as  well,  how  to  analyze  and  what  features  may  be  interesting  and  what  should  not  be,  and  what  may  be  the  physics  behind.   I  will  start  with  the  data  set  here.  It's  also  a  part  of  the  contribution  in  the  community.  Here  it  is  opened  and  I  will  change  the  design  a  little  bit  to  see  how  it  looks  like.   We  have  around  80,000  rows  in  this  data  set  and  it's  a  batch  data  set,  so  we  have  a  batch  ID. T his  data  set  is  quite  challenging  because  it  has  a  mixture  of  historical  data  like  here,  POR  batches.  We  can  see  here  that  we  have…  Most  of  the  data  is  historical  data,  and  there  are  only  a  few  special  experiments  shown  here.   We  have  several  features  then,  like  one  categorical,  it's consumable.   We  have  the  batch  maturity,  it's  the  time,  also  standardized.   Then  we  have  several  features.  So  these  X  values  here,  we  have  one  result  column,  and  to  reduce  the  noise  a  little  bit,  we  have  calculated  a  new  moving  average  as  well. Let's  have  a  look  at  how  the  data  set  looks  more  in  detail.  We  can  see  here,  if  we  do  a  summary  on  the  data  like  this,  you  can  do  this  from  the  table s  menu  as  well.  Summary.  We  get  around  500  rows,  500  batches.   This  is  a  summary  by  batch,  and  we  see  that  there  is  no  variation  in  the  parameters  X 1  to  X 4,  meaning  that  they  are  constant  for  each  batch,  and  the  others  are  changing  at  different  rates.  To  have  a  look  how  the  data  looks  at  all,  we  can  see  here  the  result  parameter  like  yield,  a  long  time  for  all  the  rows  of  the  batch  data  set,  and  this  smoothing  is  done  by  JMP  Graph  Builder  platform. We  implemented  this  as  a  formula  as  it  is  available  as  a  function  in  JMP.   We  can  have  a  look  here  at  some  special  batches  as  well.  If  we  use  the  local  data  filter  and  see  here  how  the  average  works  and  what  noise  is  in  the  single  data  points,  the  blue  ones  are  the  original  data  of  yield,  and  the  orange  one  are  the  moving  average.  I  will  close  this  then,  and  next  point  may  be  to  look  at  how  much  data  is  missing.  So  we  have  this  in  JMP  as  well.  We  can  mark  all  the  columns  and  do  the  missing  data  pattern  platform  like  this. It  will  show  us  that  from  about  80,000  rows,  we  have  178  rows  with  some  missing  data  in  one  column.   This  can  also  be  shown  here  as  a  graph.  This  is  very  important,  at  least  for  the  data  creation steps,  it  was  important  to  see  where  is  some  data  missing  and  to  fix  this  missing  data  then  as  much  as  possible.  Another  step  to  look  at  the  data,  maybe  Columns  Viewer.   We  can  get  here,  I  put  all  the  columns  in,  and  here  we  can  see  again  like  we  had  before,  there  is  some  rows  missing  for  parameter  X2. We  can  see  what  the  min,  max,  mean,  standard  deviations,  and  so  are  for  all  the  parameters  we  can  see  here.   Here  we  nicely  see  that  everything  is  standardized.   For  the  yield,  it's  between  zero  and  100.  We  can  as  well  start  from  here,  distribution  platform.  So  all  the  columns  are  marked  and  we  get  by  only  one  click  for  all  the  data,  the  distributions.  We  can  see  here  what  consumables  are  used  how  often  that  we  have  most  data  from  historic  processes  and  only  some  of  some  experiments  with  special  settings. The  time,  of  course,  looks  nicely  distributed,  but  the  others  don't  look  that  nicely.  So  there  is  a  lot  of  room  between  some  settings,  and  it's  sparsely  distributed,  non- normal  distributed  for  the  most  parameters,  and  that  makes  it  even  more  challenging  to  analyze  this  data.   We  go  to  the  next  steps.  I  will  close  these  reports.   Then  we  may  look  even  more  in  detail  on  some  things  like  how  the  parameters  are  correlated.  We  can  see  this  in  the  multivariate  platform.  It  needs  some  time  to  be  calculated. You  will  find  it  here  under  the  analysis  menu,  multivariate.   It  takes  the  numeric  columns  and  generates  this  correlation  report,  and  you  will  see  that  the  parameters  like  X6  and  X5  are  highly  correlated.   This  makes  it  difficult,  like  X10  and  X9  as  well,  makes  it  difficult  to  do  feature  engineering.   What  we  want  to  know  from  the  analysis  is  which  parameter  causes  some  yield  drop.   If  two  parameters  are  correlated,  it's  not  so  easy  to  detect  or  to  find  out  which  one  is  the  responsible  one. Here  in  the  scatter  plot  matrix,  you  can  see  as  well  which  parameters  change  with  time,  like  X1,  X2,  up  to  X4  is  constant  over  time,  and  the  others  are  changing  and  how  they  are  distributed,  and  you  can  nicely  mark  some  rows  like  here.  They  are  selected  in  the  data  table  then  and  see  how  the  curves  are  for  each  parameter  over  time  or  which  parameter  over  what  parameter  combination  looks  like.  Next,  I  want  to  use  the   functional data explorer. The   functional data explorer  allows  us  to  fit  curves  for  each  batch  and  extracts  the  features  of  each  curve.   Then  we  can  have  a  look  at  which  batches  behave  similar  or  maybe  extreme  ones.  So  the  start  is  like  this.  We  can  have  a  look  at  how  I  started  the  analysis.  We  launch  analysis.   I  put  time  as  an  X  parameter,  Yield  as  the  output  parameter  Y,  and  the  ID  function  is  the  Batch  ID.   Then  we  have  here  some  informal  parts  like  Part  and  Group. This  platform  is  available  in  JMP  Pro  only,  and  when  we  start  it,  we  can  do  some  data  processing  here.  But  in  this  case,  it's  not  necessary.  We  can  have  a  look  at  each  batch,  how  it  looks.  So  there  are  a  lot  of  graphs  here  like  this.  We  can  mark  the  rows.  We  can  see here  the  marked  rows  and  the  data  table  as  well.  To  go  on  with  this  platform,  we  need  to  make  some  models  like   P-splines  for  each  batch  and  JMP  does  this  and  defines  itself  which  splines  are  used  and  how  many  supporting  functions  are  needed,  like  the  knots  shown  here. So  the  best  result  is  given  with  a  cubic  spline  with  20  knots.   You  can  see  how  each  batch  is  modeled  here  by  the  red  line  shown  here  and  how  it  looks.   We  have  here  the  shape  functions.  So  each  curve  is  added  together  by  a  combination  of  shape  functions,  and  we  get  for  each  batch  the  coefficient  for  each  shape  function.  If  we  are  looking  at  Shape  Function  1…  This  is  the  main  behavior  of  all  batches  with  a  drop  here  at  around  0.7.  We  can  see  that  here  we  have  Component  1.  This  is  a  coefficient  for  the  shape  function  one. If  we  select  these  batches,  we  will  see  that  they  have  a  pronounced  shape  like  Shape  Function  1.  We  can  see  it  here.   We  can  as  well  use  the  Profiler.  So  this  is  mostly  for  understanding  the  data  but  we  have  not  used  it  for  further  analysis  because  we  did  not  really  need  the  information  how  the  back  batch  looks  as  a  shape  for  each  curve.  We  were  more  interested  in  average  yield  of  each  batch  because  we  cannot  define  only  to  use  the  first  part  of  the  batch  and  forget  about  the  second  part.  This  would  not  work  in  our  case. As  well  to  see  again  how  this  works  together,  we  can  have  a  look  in  Graph  Builder.  The  graph  of  some  batches  we  have  seen  just  before.  Maybe  you  see  this  number  here.  We  have  seen  it  before.   Here  it  is  shown  again  together  with  the  moving  average  of  yield.  Next  step  would  be  to  start  modeling  of  the  batch  data. When  doing  modeling,  it  may  be  interesting  to  see  or  have  an  idea  which  parameters  are  most  important  for  the  variability  of  the  output.   There  we  have  the  predictor  screening  platform.  You  can  as  well  start  it  from  here.  Analysis  and  predictor  screening.  I  wrote  it  here  as  a  script  simply  to  start  it  by  pressing  a  button.   When  doing  so,  we  will  see  some  Bootstrap  Forest  analysis  going  on,  and  it  shows  us  the  importance  of  the  features  we  have  in  our  data  set.  Time  is  the  most  important,  but  this  is  useless  at  the  end  for  us  because  we  need  to  use  the  full  batch. T hen  comes  X1,  then  comes  part  X8,  X5,  and  so  on.  So  here  we  could  as  well  select  a  few  rows,  copy  them  and  put  them  into  a  model.  I  will  stop  this  here,  and  to  see  which  model  works  best,  I  used  model  screening  platform.   I  will  not  run  it  here  because  it  takes  several  minutes.  But  we  have  seen  that   Boosted Tree platform  may  perform  well.  There  is  maybe  not  so  a  big  difference  between  the  next  ones,  but  that's  the  reason  why  I  used   Boosted Tree platform. Then  we  will  run   Boosted Tree  like  this  on  the  batch  data,  and  it  works  quite  quick.  We  will  see  the  result,  and  a  nice  feature  of  the   Boosted Tree platform  as  well  is  that  you  have  column  contributions  so  that  you  can  nicely  do  some  feature  engineering.  We  can  see  here  that  we  have  71 %  R square  for  training  and  66  for  validation,  may be  okay,  and  we  have  still  all  features  in.  But  we  are  interested  mostly  in  which  features  are  reliably  the  most  important  ones. When  doing  this,  we  can  save  it  as  a  column,  save  prediction  formula  in  the  data  table.  We  see  in  the  data  table  we  have  a  formula  now  and  we  can  use  it.  We  can  maybe  have  a  look  at  how  the  model  performs  or  to  use  it  in  Graph  Builder  simply  to  see  how  the  model  data  looks  like  over  the  batch  maturity.   I  hope  the  Graph  Builder  to  show  the  graph  the  graph  soon,  and  here  it  comes.  Yes.   We  have  seen  that  this  modeling  works  quite  well,  so  we  have  a  formula  now  to  rebuild  the  data  and  we  can  maybe  work  with  it. But  especially  for  the  batch  data  modeling,  we  have  a  problem  here  that  validation  will  not  work  because  we  may  have  here  for  some  batch,  these  rows  in  training  set  and  the  rows  next  to  it  in  validation  set.  So  they  are  not  well  separated  for  the  features  that  control  the  batch  then.   Additionally,  the  model  is  not  very  stable,  so  we  will  get a  different  result.   For our  different  runs  of  the  model,  this  is  known  from  tree- based  methods  that  they  may  give  different  results  for  high  variability  data. If  we  do  something  like  running   Boosted Tree  twice,  we  get  also  different  column  contributions  and  maybe  different  order  like  we  can  see  here  for  the  part  and  X 5  are  switched  here  for  these  two  runs,  and  I  will  show  it  again  also  here.  If  we  run   Boosted Tree  twice.  Don't  know.  Yes,  here  I  should  have  the  script.  It  comes  later.  So  at  this  point,  we  have  said  that  it  may  be  better  to  model  the  summarized  data  because  we  need  to  use  the  full  batch.   Here  I  have  a  script  now  to  summarize  the  data  in  a  form  that  we  have  only  one  row  for  each  batch. There  is  a  nice  feature  statistics  column  name  format  that  we  get  the  same  columns  for  the  summarized  data  as  we  have  in  the  original  table  that  we  can  use  the  same  scripts  for  both.   Doing  so,  we  get  here  the  summary  data  table— I  can  close  the  script— with  around  500  batches.  It's  a  lot  of  easier  to  model,  and  here  I  have  summarized  the  data  for  0.6  to  0.8  time.  So  it's  where  the  yield  drop  was,  and  we  can  again  do  here  some  predictor  screening  like  this  and  see  that  I  still  have  time  in  that  data  set  to  see… It's  more  like  to  see  what  level  noise  is,  and  it's  around  these  parameters  that  are  likely  also  noise  for  the  model  then.  Then  we  can,  of  course,  do  some  model  comparison.  So  I  selected  a  few  parameter  that  we  found  to  be  responsible,  most  probably,  and  I'm  doing  two   Boosted Tree  analysis  and  then  do  some  model  comparison  for  both.   It  looks  like  this.  We  can  see  here  we  get  a  Profiler  and  compare  the  result  for  different  settings,  maybe  like  this.  Here  we  have  still  the  problem.  We  see  some  features  like  this  here  for  X10  in  this  model  and  not  in  that  model. So  it  likely  seems  to  be  noise.   At  the  beginning,  we  have  discussed  a  lot  about  these  differences.  We  have  seen  sometimes  and  sometimes  not,  and  asked  the  question,  what  is  true,  what  is  physical,  and  what  is  not.  That  brought  me  to  the  step  then  that  we  need  to  continue  with  feature  selection,  and  that's  why  we  created  this  script.  It  takes  this  summary  data  and  has  been  done  for  the  batch  data  as  well. For  each  step,  it  builds  Boost ed  Tree  model  for  the  full  parameter  set,  saves  the  model  into  the  formula  depot,  and  saves  the  model  performance  R square  and  so  into  a  data  table,  and  shows  us  the  column  contribution.   Here  we  can  see  something  that  we  have  seen  often  that  with  the  higher  number  model,  it  is  the  model  with  less  parameters  like  we  can  see  here  with  the  column  contributions,  we  have  the  best  result.   It  looks  different  for  each  run,  but  the  tendency  we  see  in  most  times.  So  here  we  can  see  for  more  or  less  sure  that  part  and  X1  and  X5  are  the  most  important  parameters. This  one  may  be  there  sometime  and  maybe  not,  so  we  will  focus  on  these  three  parameters.   As  well,  we  can  have  a  look  in  the  Formula  Depot.  There  we  can  start  model  comparison.  We  maybe  can  compare  the  first  model.  So  we  do  it  like  this,  model  comparison.  This  is  our  data  table.  Take  the  first  number.   The  numbers  here  are  shifted  by  one,  and  maybe  the  5th  should  be  that  one,  and  the  last  one.  This  will  not  work.  I  think  it's  number  three,  and  the  last  one. The  ones  with  the  highest  validation  score  and  maybe  compare  them  here.  We  see  the  model  comparison  dialog.   We  see  that  the  last  model  is  between  the  best  models  we  could  fit  at  all,  and  can  see  here  the  Profiler,  for  example,  and  as  well,  may  use  extrapolation  control.  We  have  seen  that  we  have  sparse  data,  so  not  behind  every  point.  There  is  some  data.  Let's  look  where  it  is.  Here  it  is.  Extrapolation  control  warning  on.  So  it  shows  us  when  there  is  no  data  between  the  points. Here  we  can  maybe  compare  the  models  and  we  see  here  that  there  is  no  variability  on  the  X  factors  that  haven't  been  used  here.  To  sum  up,  let's  close  some  tables  first  and  some  dialogs.  To  sum  up,  we  have  prepared  a  workflow  for  modeling  this  data  and  have  done  several  steps  and  additional  script  to  enhance  understanding  and  to  drive  the  discussion  about  what's  important  and  what's  not. I  have  a  proposal  for  a  model  and  some  tasks  we  can  focus  on  to  improve  the  year  yield  of  our  process,  and  you  will  find  the  data  and  the  presentation  in  the  user  community.   If  you  have  other  ideas  how  to  explore  this  data  set  and  how  to  find  the  final  best  model,  you  can  contact  me  or  post  something  on  my  contribution  in  the  in  the  community  for  this  Discovery  Summit.   Thanks  for  your  attention  and  bye.  That's  it,  Martin.
A client working on blending his wine wanted to get as close as possible to a known wine with different ingredients. To help the client do this, we created a mixture design with four ingredients for 16 samples to compare with the known blend. The samples were tasted randomly, and the panel was asked to create groups of similar samples and describe the group. The result was a distance matrix developed with the help of a JMP script. This matrix was processed by a multidimensional scaling to obtain a map that was easy to describe to the panel. A K-means classification was used to find the samples close to the target. Finally, the distance between the target and the other sample was calculated and represented by a contour plot to show the best part of the mixture design. The terms used by the taster to describe the groups were processed by JMP Text Explorer and then by AFCM to show a map with samples and terms to better describe each sample's position using sensory properties.   Good  morning  or  good  afternoon  for  all  of  you.  I'm  Margaux  Renaud  and  today  we  will  talk  about  the  mixing  plan  of  wine  blending  and  the  testing  of  these  modalities  to  validate  the  receipt  of  a  wine. First  of  all,  I  would  like  to  present  my  company.  I'm  working  for  Chêne  Company,  which  is  a  group  of  cooperage.  I t  owns  a  French  cooperage,  Taransaud, which  is  make  barrels  and  vats  from  French  oak.  An  American  cooperage,  Canton,  and  Kádár  Cooperage,  and   [inaudible 00:00:42] ,  which   make  oak  wood  sticks  and  chips,   XtraChên. The  French  cooperage  and  the  R&D  department  are  based  in  the  Bordeaux  area  in  France.  In  our  R&D  department,  we  have   8%  with  different  background.  We  have  a  PhD  in  chemistry  and  enology,  engineer,  agronomist,  enologist,  technician. W ith  all  these  various  skills,  we  have  a  lot  of  different  trials. From  the  forest,  for  example,  we  can  do  trials  about  DNA  of  the  oak  in  the  forests.  On  the  aging  wood,  we  thought  to  do  the  barrel  or  the  vats  in  relation  with  the  climate  change.  Two,  the  analysis  on  our  client  wine  in  our  barrel  directly  in  the  wine. Today,  I  would  like  to  talk  with  you  about  a  mixing  plan  for  a  wine  blending.  It's  a  client  trial  and  I  will  present  you  the  problematic  of  the  client.  In  this  case,  the  client  has  different  wine  and  one  of  them,  you  want  to  keep  in  the  same  wine  style,  but  you  want  to  optimize  the  ingredients. In  the  wine  industry  what  we  call  ingredients  it's  very  different,  very  diverse.   It  can  be  different  variety  of  wine.  In  Bordeaux  area,  we  used  to  mix  Merlot  and  Cabernet,  for  example.  It  can  be  different  quality  of  wine,  different  type  of  aging.  If  the  wine  is  aging  in  barrel  or  in  tanks  with  oak  chips  or  without. I n  this  case,  the  client  has  four  different  ingredient  to  mix.  Our  team  follow  the  different  ingredient  during  all  the  wine  aging,  and  at  the  end,  we  have  created  a   mixing plan  and  we  taste  it. Just  before  to  go  on  JMP  to  present  you  the  way  to  process  data,  I  just  want  to  talk  a  bit  about  the  wine  testing.  An  important  thing  in  the  wine  industry  is  all  the  wine recipe  is  decided  by  testing.  We  do  a  lot  of  analysis,  but  it's  not  the  last  point  of  a  recipe.  It's  always  the  tasting. The  usual  way  to  taste  wine  it's  a  quantitative  tasting.  We  do  a  profile  with  grade  on  different  descriptor.  It  could  be  bitterness,  for  example,  or  the  fruity  notes  or  the  woody  notes.   All  the  taster  are  writing  the  intensity  of  the  descriptor. Then  we  process  the data  with  an  ANOVA,  a  two- factor  ANOVA.  The  first  factor  is  the  modalities,  the  different  modalities  in  the  trial,  and  the  tester.  To  have  really  significant results  you  need  to  have  a  large  and  trained  panel  for  your  testing. In  our  case,  when  we  do  a  trial  with  a  client  or  in  our  group,  we  have  different  different  type  of  taster.  Most  of  the  time  you  have  the  winery  team,  some  part  of  the  commercial  team,  and  some  part  of  the  R&D  team.  A ll  this  taster  doesn't  taste  the  wine  in  the  same  way.  They  don't  have  the  same  target  when  they  taste  the  wine. The  target  for  the  client  is  not  the  same  as  the  commercial  part  and  it's  not  the  same  for  us.   Most  of  the  time  we  are  not  trained to  taste  the  wine  in  the  same  way.  When  we  analyzed  the  data,  there  is  a  really  big  effect  of  the  taster.  In  fact,  the  taster  have  not  the  same  feeling  about  the  profile  asking. For  us,  it's  complicated  to  using  the  profile,  so  we  decide  to  use  another  type  of  testing,  the  free  sorting.  The  free  sorting  it's  a  testing  when   I  asked  my  taster  to  test  the  different  modalities  and  to  make  group  groups  inside  them.  I  put  a  little  example  on  the  PowerPoint. In  this  case,  I  asked  the  taster  to  make  groups  if  the  wine  is  similar  and  if  there  is  difference  between  the  two  modalities,  they  put  them  in  two  different  groups.  In  this  case,  for  example,  there  is  11  samples,  and  the  taster  decides  to  make  four  groups.  A  first  one  with  four  samples,  a  second  one  with  three,  another  one  with  three  of  the  sample,  and  the  last  one  with  an  only  glass  of  wine. I  ask  them  after  making  group  to  describe  a  bit  the  group.  In  this  case,  the  taster  decide  to  put  together  for  sample  because  they  have  some  chestnut   not  present  in  the  other  samples.  I n  this  case,  we  don't  need  to  have  a  trained  panel,  so  if  there  is  enough  big  difference  between  my  modalities,  normally  all  the  taster  will  put  together  the  wine,  the  wine  really  close  and  put  separately  the  other  wine. This  type  of  tasting  is  really  easy  to  use  for  us  because  we  don't  need  a  trained  panel.  We  can  have  a  small  panel  too.  It  can  be  used  in  different  language.  It  doesn't  matter  if  we  have  a  French  panel  or  an  Italian  panel,  for  example.  They  just  have  to  do  groups. The  other  thing,  thanks  to  JMP,  it's  easy  to  present  the  result  right  after  the  testing.  When  you  do  a  profile,  you  have  to  process  data  making  the  ANOVA  test,  and  send  the  result  to  the  client.  Most  of  the  time  it  takes  a  few  days  or  a  few  weeks  if  you  are  really  late. With  the  free  sorting,  we  can ,  and  thanks  to  JMP,  present  the  result  right  after.  This  type  of  testing  will  create  a  distance  matrix  between  all  the  sample.  In  fact,  if  you  put  samples  in  the  same  group,  there  is  no  distance  between  them.  If  you  put  them  in  two  other  group,  there  is  a  distance  of  one  between  them.   At  the  end,  you  can  make  a  matrix  distance  between  all  the  samples.  It's  what  I  do  with  JMP.   I  will  show  you  just  after. Okay,  I  will  switch  on  JMP.   To  process  this  data,  I'm  using  a  project.   I'm  using  several  data  tables  and  it's  easier  for  me  to  put  them  in  the  same  place.   Before  to  go  on  the  testing  result,  I  just  want  to  talk  a  bit  about  my  mixing  plan. I  told  you  that my  client  has  four  ingredients.   Unfortunately,  I  didn't  make  the  mixing  table  with  JMP.  Because  when  I  began  to  work  on  the  mixing  plan,  I  was  not  really  confident  enough  with  JMP  to  do  it  on  it.  The  client  gave  us  a  lot  of  rules  in  this  mixing  plan,  a  bit  complicated.  So  we  decided  to  make  it  by  hand  and  to  treat  the  rest  of  the  result  with  JMP. Just  to  show  you,  this  is  my  mixing  plan.  I  have  a  code  for  all  of  my  samples  and  the  ingredients  one,  two,  three,  four,  and  the  proportion  of  each  one  in  these  samples.  There  is  just  few  information.  For  my  ingredients,  there  is  a  minimum  and  maximum  proportion. The  important  thing  is  the  ingredients  works  two  by  two.  The  ingredients  one  and  two  are  working  together.  In  fact,  the  ingredient  one  plus  the  ingredient  two  is  always  equal  to  14 %  of  the  blending.  Exactly  the  same  for  three  and  four.  The  addition  of  these  two  is  always  equal  to  86 %  of  the  sample. That's  few  words  given  by  the  client. T hanks  to  that  we  did  a  mixing  plan  with  16  samples  and  a  target.   The  target  is  the  historical  recipe  of  the  winery,  the  typical  wine.   The  client  wants  the  other  ingredients  to  be  closer  than  the  historical  wine. You  can  see  here  the  mixing  plan.  It's  why  I  explained  just  earlier,  they're  working  to  pay  two.   Okay,  this  is  the  mixing  plan.   We  created,  we're  blending  the  samples,  and  we  did  the  tasting  with  the  client. This  is  my  results  data  table.  It's  in  fact,  very  easy.  I  have  a  first  column  with  my  sample  in  the  wine  testing.  Most  of  the  time  you  have  to  test  without  knowing  which  is  the  modality  in  your  glass.   To  do  that,  to  recreate  a  random  number,  sorry,  a  random  number  of  three  digits  like  that  you  can't  know  which  sample  is  it. I  put  it  on  my  first  column  and  after  that  I  have  one  column  by  tester.  In  this  case,  I  have  five  testers.  On  each  column,  I  put  the  group  where  the  sample  has  been  put.  Just  to  show  you  with  the  distribution  we  can  see  for  the  tester  one  in  the  group  three,  for  example,  just  for  the  tester  one.  He  put  in  the  group  three  the  sample  4 74,  486  and  910.  It's  the  same  for  all  the  samples. I'm  not  sure  I  said...  Yes,  I  told  you  that  at  the  beginning.  I  asked  to  my  tester  to  describe  the  group  with  few  words.  When  I  do  a  testing  with  my  clients,  I  don't  write  on  my  result  data  table,  group  six,  group  one.  I'm  writing  directly  the  descriptor, the term  used  by  the  tester  to  describe  the  proof.  I  will  explain  you  why  a  bit  later. I  have  this  data  table.  To  have  it,  most  of  the  time,  I  ask  to  my  tester  to  put  the  result  on  Excel  file  on  a  tablet  like  that.  He  put  directly  all  the  results  on  the  file  and  I  just  have  to  open  it  after  with  JMP. I  need  another  the  data  table,  which  is  called  NUMMOD.  You  can  see  that  the  first  column  is  my  random  number  and  the  second  one  is  the  modalities. Y ou  can  see  what  modality  is  behind  the  number  given.  Then  the  other  column  is  the  description  of  each  modalities.  In  this  case,  it's  the  proportion  of  each  ingredient. I  need  these  two  data  table  and  I  need  a  script.  To  process  the  data  directly  after  the  testing,  I  have  created  a  script.  For  this  script,  I  have  to  thank  a  lot  the  JMP  communities  because  they  helped  me  a  lot  to  do  this  really  complicated  part. In  fact,  this  script  helped  me  to  create  the  distance  matrix  just  with  the  data  result  I  show  you  earlier.  In  this  case,  this,  I  will  not  explain  all  the  line  because  it's  a  bit  complicated,  but  I  will  show  you  how  I'm  using  it.  I'm  just  checking  I  am  on  the  right  data  table  and  I'm  running  the  script. I  can  save  the  results.  Thanks  to  the  project,  I  can  save  the  result  directly  inside  the  folder  result.  Yes,  the  folder  result.   Directly,  I  can  have  my  distance  matrix.   You  can  see  I  have  still  my  sample  number  in  the  first  column.  Then  all  the  samples  in  column  and  the  distance  with  all  the  other  samples, so   for  the   001,  it's  the  same  sample,  so  it's  0.  Then  you  have  the  distance  with  the  other  samples. In  the  script,  I  have  also  joined  the  information  from  my  data  table  in  the  map,  so  I  can  add  the  modalities  and  the  ingredient  proportion  in  the  same  data  table.  The  best  way  to  show  the  result  is  to  create  a  map.  To  show  the  map,  I'm  using  a  multivariate  method  and  precisely  the  multi dimensional  scaling. In  this  case,  I  will  put  in  column  my  distance  matrix.  I  didn't  show  you,  but  I  have  grouped  directly  all  my  matrix,  it's  also  in  the  script.  Like  that  I  just  have  to  select  this  group  of  columns  to  put  inside  the  process.  I  add  my  distance  matrix  on  it.  I'm  running  it  and  I  can  have  this  map. I  can  see  all  my  sample,  the  16  plus  the  target.  I  don't  know  which  one  is  it.  What  we  can  see  is  some  samples  are  really  close.  For  example,  the  246  and  the  592  are  really  close.  They  look  really  similar  for  all  the  taster.  Not  the  same  because  they  are  not  on  the  same  point.  There's  a  little  distance  between  them,  but  really  close.  At  the  opposite,  the  246  and  the  661  are  really  far  away  from  each  other.  They  look  really  different. At  this  point  when  I  present  the  results  to  my  panel,  I  begin  to  show  which  sample  is  it.  We  can  talk  about  if  all  the  tester  are  agree  with  the  map. I f  they  say,  okay,  I  can  find  my  group  on  this  one.  We  can  talk  about  that and  I  show  which  sample  is  it. For  that,  I  have  just  to  label  the  modality.  I  go  back  on  my  map  and  you  can  see  there  is  the  code  of  each  sample  of  the   mixing plan  and  most  important,  the  target.   You  can  see  the  original  recipe  is  here.  W e  can  say  that  there  is  some  sample  really  close  from  this  one. I  think  this  one  should  be  interesting  to  use  with  all  the  ingredients  to  keeping  the  same  wine  style  of  the  target.  To  be  sure  of  that,  I  will  do  clustering  to  ask  to  JMP  to  show  me  which  sample  are  really  close  from  each  other.   For  that,  I'm  doing  a  clustering  and  more  precisely  a   [inaudible 00:19:04]   cluster. A s  I  did  for  the  multi dimensional  scaling,  I'm  using  the  distance  matrix  as  [inaudible 00:19:15],  sorry.  I'm  running  it.  Usually,  I'm  testing  three,  four,  or  five  cluster  because  I  know  in  my  testing  it's  more  or  less  the  number  of  group  usually.  In  this  one,  I  already  know  that  three  cluster  is  the  best  way.  I'm  testing  three  and  I'm  saving  the  cluster  in  the  data  table  like  that. I  can  put  in legend  the  row  state  of  the  cluster  and  the  map  will  be  colored  with  the  different  cluster.   You  can  see  we  have  three  cluster  really  well  separate.   One  looks  very  interesting,  the  green  one.  You  have  the  target  and  four  sample  really  close  of  the  target. I  can  start  the  process  now.  I  can  say  to  the  client,  okay,  you  can  use  one  of  these  four  samples  from  the  mixing  plan  to  keep  the  same  quality  or  the  same  type  of  wine.  They  are  really  close.  Maybe  you  can  choose  this  one,  it's  the  closer  one. But  if  I  want  to  give  more  information  to  the  client  about  where  it  can  play  inside  the   mixing plan,  I  did  another  treatment.  I  would  like  to  know  the  distance  between  each  sample  from  the  target.   For  that,  I  saved  the  coordinates  of  each  sample.  You  can  see  they  are  right  here,  the  dimension  one  and  the  dimension  two.  I  have  just  calculated  the  distance  between  the  target  and  all  the  others  in  sample. To  go  a  bit  faster,  I  have  already  created  a  script  with  just  adding  a  new  column  and  a  formula  to  calculate  the  distance  between  the  target  and  the  sample.  I  will  just  running  it.  You  can  see  here  the  new  column  with  the  distance. To  represent  the  best  part  of  the   mixing plan,  I  will  do  a  graph  builder.  A s  I  said,  the  sample  is  working  two  by  two.  I  can  represent  it  in  two  dimensions.  For  that,  I  will  put  the  ingredients  three  here  and  the  ingredient  one  here.  As  they're  working  two  by  two,  we  know  that  the  complement  of  the  ingredient  one  is  the  ingredient  two,  and  the  complement  of  the  ingredient  three  is  the  four. We  don't  need  to  show  the  target,  so  I  will  hide  and  exclude  it.  I  have  my  16  sample  right  here.  I  will  put  the  distance  in  color  and  I  will  represent  it  with  the  contour  and  the  points. To  be  easier,  I'm  just  changing  the  color.  I  will  take  this  one,  the  green,  yellow,  red.  Like  that  the  sample  is  close  from  the  target  with   the  shorter  distance  from  the  target  are  in  green  and  the  other  one  are  in  red.  I  don't  really  know  how  to  change  the  color  of  the  points.  We  don't  see  them  very  well. You  can  have  this  type  of  mode.  That  is  really  interesting  for  the  client.  You  can  see  there  is  different  spots  in  green  and  different  spots  in  the in  red.  In  fact,  we  know  that  it's  not  interesting  for  the  client  to  playing  with  the   mixing plan  in  this  area.  It  doesn't  look  like  historical  wine.  It's  the  same  for  this  area. But there  is  two  other  green  area.  This  one  there's  in  fact,  only  one  sample  really  close  from  the  target.  If  you  look  the  point  around,  then  doesn't  really  look  like  for  the  historical  wine, so  it's  not  really  interesting  to  play  in  this  area. In  this  one,  it's  really  more  interesting  because  you  have  three  sample  really  close  from  the  target  and  two   other  one  a  bit far  away,  but  still  close.  W e  can  say  to  the  client  that,  okay,  if  you  want  to  keep  the  same  type  of  wine,  you  can  add  between  4  and  10 %  of  your  ingredient  one  and  between  20 %  and  60 %  of  your  ingredients  three.  The  most  interesting  is  to  keep  in  this  area  above  50 %  of  your  ingredient  3  and  around  7 %  of  your  ingredient  one. With  this  information  our  clients  in  relation  with  the  age  is  volume  tank,  is  what  aging  you  want  to  do.  It  can  play  a  bit,  but  in  the  way  to  be  sure  to  keep  the  same  quality  and  the  same  type  of  wine.  This  helps  really  a  lot  the  clients. We  can  do  another  treatment.  I  will  explain  you  quickly  because  it's  a  long  treatment  to  do.  But  in  this  case,  I  only  use  the  group.  It  doesn't  matter  if  it's  called  group  one  or  if  it's  called  Fruity,  Woody,  it  doesn't  matter.  It's  just  the  group. But  I  asked  my  panel  to  describe  the  group.  In  this  case,  I  do  another  treatment.  From  the  data  table  result,  this  one,  I'm  doing  a  text  explorer  with  a  classic   JMP Pro .  I  can  have  this  type  of  data  table  with  my  samples  and  descriptor.  In  fact,  I  ask  him  to  count  how  many  times  each  descriptor  are  written  for  each  sample.  Like  that,  I  can  do  another  type  of  map  with  a  multi word  method,  but  this  one,  a  multiple  correspondence  analysis. In  this  case,  I  will  put  in  response  the  descriptor,  and  in  factor,  the  modalities.  I  will  add  in  the  count  in  the  frequency.  Just  after  running  that,  I  just  will  show  you  with  the  script  because  we'll  see  it's  well,  there  is  a  better  presentation  in  this  way. Okay.   You  can  have  this  type  of  map  with  in  blue  all  the  modalities,  all  the  samples,  and  in  red,  all   descriptor  used.   It's  a  complementary  map  from  the  first  one,  from  this  one.  From  this  one  because  in  this  one,  you  have  the  sample  close  or  far  away  from  each  other,  but  you  don't  know  why.  You  don't  know  why  this  sample  are  together,  or  why  this  sample  are  on  the  right  of  the  map,  and  why  this  one  are  on  the  left,  why  they  are  separated. With  this  process,  we  try  to  explain  a  bit  why  the  sample  are  separated.  It's  not  always  exactly  the  same  map  because  it's  not  the  same  treatment.   This  one  needs  a  process  a  bit  longer  than  the  first  one  because  when  sometimes  you  don't  have  exactly  the  same  way  to  write  a  word,  whereas  in  French  we  have  accent,  so  sometimes  you  have  to  check  the  result  of  the  data  table before  to  do  the  process. Some  words  are  more  or  less  the  same  sense,  so  you  have  to  put  them  together.  So  it's  a  bit  longer,  so  I  can't  do  it  right  after  the  testing,  but  I  do  it  after.  We  can  explain  a  bit  better  why  the  sample  are  located  on  this  way  on  the  map. In  this  case,  you  can  see  some  sample  are  really  high,  coconut,  some  vanilla  nuts,  other  one  more  toasty,  spicy on  this  one.   Unfortunately,  some  are  really  negative descriptor,  so  you  can  explain  a  bit  better,  always  working  all  the  samples.  That  is  really  good  complementary  information  of  the   mixing plan  to  explain.  If  you  choose  to  go  on  that  side  of  the   mixing plan,  all   your  wine  will  be  described.   That's  it. I  just  now  conclude.  I  hope  it  was  not  too  speedy.   For  us,  the  testing,  it's  a  difficult  exercise  to  modelize  and  to  represent  with  the  panel  we  used  because  it's  not  trend,  it's  not  a  big  one,  and  we  don't  have  the  same  target  when  we  begin  testing. It's  why  we  decided  to  use  a  descriptive  testing,  not  quantitative,  the  free  sorting.   This  type  of  testing  can  be  only  thanks  to  JMP,  thanks  to  the  script.  I  can  do  all  the  process  really  quickly,  really  show  about  the  significance  of  the  results,  and  I  can  show  it  right  after  the  testing. Like  that,  we  can  talk  with  all  the  taster  about  the  results.   When  we  leave  the  testing,  we  are  all  clear  with  the  wine  we  have  tasted  and  the  result.  It's  really  more  powerful  than  just  testing  with  some  weight.  W e  can  use  that  type  of  testing  with  a  small  and  untrained  panel. Just  to  finish,  in  this  trial,  the  client  was  really  happy  with  this   mixing plan  and  it  can  adjust  the  recipe. I  know  the  recipe  is  working  since  two  years  with  the  four  ingredients  and  it  can  play  a  bit  each  year,  but  the  recipe  is  fixed  and  he's  really  happy  with  that.  Thank  you  very  much  for  your  attention.  Have  a  good  day.
Many know JMP as a powerful tool for analytics and modeling and aspire to leverage JMP’s advanced capabilities to champion improvements and business understanding. It can take time and domain experience to achieve a high level of proficiency. Don’t dismay; we all start somewhere! Even at modest experience levels, value can rapidly be achieved using JMP fundamentals. Fundamentals can be quickly propagated across an organization to seed and inspire a culture of analytics. Hear how our team has integrated offerings from JMP education in a JMP “boot camp” format. The faster an organization can establish basic proficiency in JMP, the sooner it can benefit from that investment. Additionally, having a shared platform for both basic and advanced analytics creates a collaborative community, increases self-sufficiency, and provides a learning path to foster employee development. While sharing our training approach, we will demonstrate foundational JMP features, including data filters, tabulate, summary, recode, column formula, and column properties functions to track student progress. See JMP in action as we highlight methods to construct, customize and journal graph builder visuals in ways that entice spreadsheet users to make the “JMP” to becoming JMP data ninjas.      Hi,  I'm  Trish  Roth.  I  am  going  to  be  presenting  to  you  today  about  managing  a  learning  program  with  JMP  that  is  developing  the  next  generation  of  JMP  Ninjas.  A  little  bit  about  myself.  I  am  a  data  scientist  in  core  diagnostics.  My  colleague,   Jeff Pennoyer,  who  helped  develop  this  training  and  the  presentation  materials,  isn't  able  to  join  us.  But  acknowledge  his  contributions  and  also  many  other  colleagues'  contributions  over  the  years  in  putting  together  training  to  improve  the  skill  sets  that  we  have  within  our  organization  with  the  use  of  JMP. We  both  have  biochemistry,  technology  backgrounds,  and  have  worked  in  the  data  science,  analytics  space  for  a  number  of  years,  along  with  many  other  folks  in  our  division.  A  little  bit  about  Abbott,  in  case  you  aren't  familiar.  It's  a  large  global  health  care  company.  We've  been  in  business  for  over  130  years,  operate  around  the  world. We  have  over  113,000  employees.  We're  all  focused  on  bringing  life- changing  health  technologies  to  the  people  who  need  them.  Y ou  can  see  on  the  right  hand  side  a  number  of  the  different  product  lines  that  we  support.  It  varies  by  country.  If  you  go  to  abbott.c om  from  your  location,  you'll  see  more  information  about  the  kinds  of  products  that  Abbott  delivers. Both  Jeff  and  I  and  a  number  of  colleagues  who've  been  involved  with  this  project  and  training  are  from  the  diagnostics  division,  particularly  core  laboratory  where  we  work  with  large  hospitals   and  reference  laboratories  who  provide  diagnostic  testing  directly  to  patients  or  to  physicians.  Y ou  can  see  some  of  the  other  product  lines  here. T he  purpose  of  the  presentation  is  really  twofold.  Wanted  to  give  some  insight  and  thoughts  around  how  we  approach  training,  the  types  of  things  we  include.  How  we  organize  it,  and  as  well  talk  about  some  of  the  features  and  functions  of  JMP  that  we  focus  on,  particularly  in  our  beginner  training,  to  get  people  comfortable  with  data  manipulation  and  data  preparation  and  data  summarization. These  skill  sets  can  really  serve  them  as  they  continue  to  grow  and  develop  as  data  analysts  and  move  on  to  more  advanced  analytics  and  statistical  analysis  and  modelling.  But  this  is  a  good  foundation  to  get  people  started  and  comfortable.  T hat's  the  approach  that  we  have  taken.  We  leverage  materials  that  JMP  provides.  We  leveraged  area  experts.  We  try  to  have  area  specific  examples  to  really  make  the  training  relevant  to  people  so  that  they  can  see  the  value  and  how  they  might  apply  it  in  their   day-to-day. We  talk  to  both  managers  and  employees  about  what  they  want,  what  they  need.  As  we  try  to  think  about  how  much  can  we  really  deliver.  What  kinds  of  skill  sets  are  we  lacking,  or  where  do  we  not  have  enough  people.  How  do  we  grow  those  skill  sets.  How  do  we  fill  those  knowledge  gaps.  Then  on  the  employee  side,  people  want  to  contribute.  They  want  to  grow.  They  want  to  become  functioning  members  of  their  departments,  especially  when  they're  new.  They  want  to  be  independent. H opefully  we  can  find  an  intersection  between  those  needs  and  wants  to  develop  some  training.  We  always  have  to  have  the  conversation  around  investing  time,  both  from  a  manager  standpoint  and  giving  their  employees  time  and  space  to  work  on  developing  their  skill  sets  and  employees  have  to  be  willing  to  invest  time  and  practice  and  think  about  how  they're  going  to  apply  things  so  that  it  sticks  and  they  really  do  hone  their  skill  sets.  We've  defined  a  body  of  knowledge  that  we  focus  on,  particularly,  again,  at  the  beginning  and  intermediate  stages.  We  have  a  fair  amount  of  information  and  knowledge getting  started  and  we  do  this  in  what  we  call  a  boot  camp  style. Sometimes  it's  intensive  over  a  couple  of  days.  A  couple  of  hours  at  a  time  to  just  really  get  people  into  JMP,  get  them  familiar  with  how  to  work  with  data  in  JMP.  Oftentimes,  they're  coming  from  Excel,  so  we  have  to  reorient  them  a  bit  but  the  basics.  What  are  the  menus?  What  are  the  preferences?  How  do  you  get  data  in?  How  do  you  do  some  basic  data  clean-up  and  summarization  functions,  basic  graphing,  creating  formulas?  T his  allows  people  to  get  up  to  speed  and  be  able  to  actually  deliver  some  analysis  pretty  readily  once  they  get  through  this  boot  camp  core  information. Then  depending  on  how  deep  we  want  to  go  with  the  learning.  Depending  on  what  the  organizational  needs  are.  What  the  time  availability  is.  We  will  start  to  get  into  the  more  traditional  exploratory  data  analysis  and  statistical  analysis  like  particularly  capability  analysis,  control  charting,  hypothesis  testing,  regression.  A s  people  move  into  these  topics,  they  then  go  on  to  do  much  more  modelling.  We've  got  a  lot  of  interest  in  scripting.  O nce  people  have  these  foundations,  they  can  start  to  move  on  to  these  other  topics  and  really  deliver  value  to  the  organization. Once  we  got  the  material  and  how  much  we  think  we're  going  to  deliver  and  present,  we  have  to  really  think  about  what's  the  best  way  to  deliver  it.  We  do  like  the  in  person.  Obviously,  in  the  last  couple  of  years,  we've  not  done  a  lot  of  that.  But  there  are  some  folks  who  just  really  do  better  face  to  face  where  they  can  have  somebody  standing  over  their  shoulder  and  watching  what  they're  doing.  We  do  predominantly  use  virtual  conferencing.  We  can  bring  together  people  from  a  lot  of  different  sites,  locations  that  way,  and  minimize  travel. We  also  do  some  very  informal  things,  small  bursts  of  topics,  one  particular  topic,  maybe  over  a  lunch  hour  or  a  small  group  meeting.  We  also  have  some  fully  independent  learners.  We  point  them  to  the  learning  resources  both  from  JMP  and  also  from  our  curated  set  of  information  and  presentations  and  recordings  that  we  have  internally.  W e've  planned  for  how  to  organize  and  centralize  and  post  information  so  that  it  remains  accessible  to  others  when  they  want  to  come  and  do  some  training. Just  have  a  little  snippet.  From  our  SharePoint  side,  just  basic  information.  You  don't  have  to  be  a  really  good  website  developer.  You  can  put  a  little  calendar  of  events.  You  can  have  information  for  beginners,  intermediates.  We  provide  links  to  past  recordings.  When  folks  finish  their  training,  we  like  to  do  a  little  congratulations  and  give  them  some  recognition.  I nternally,  these  links  would  take  you  out  to  some  presentations  to  listing  folks  who  have  successfully  completed  elements  of  our  training. But  then  this  leaves  a  body  of  knowledge  and  a  body  of  resources  internally  that  folks  can  leverage  as  well.  We  have  a  lot  of  links  out  to  the  JMP  community  where  there's  a  lot  of  good  information  and  SharePoint  document  libraries,  so  presentations  and  data  files.  We  can  keep  this  all  centralized  and  people  can  access  it  on  demand  when  they  have  time  or  interest  in  training. A s  well,  we  maintain  a  list  of  subject- matter  experts  here  just  showing  Jeff  and  I.  But  there's  many  other  colleagues  that  have  been  involved  and  give  their  time  and  talent  to  help  others  develop.  They  put  a  direct  link  to  top  five  countdown  of  why  data  preparation  is  faster,  easier,  and  better  in  JMP.  It's  about  a  three  minute  video  from  Julian  Paris  at  JMP.  It  gets  people  energized  and  motivated  and  excited  when  they  see  all  of  the  features  and  functions  that  they're  going  to  be  learning  about. W e  sometimes  kick  off  training  with  a  couple  of  little  videos. As  I  mentioned,  we've  defined  some learning  levels  that  helps  folks  try  to  figure  out  what  they  should  sign  up  for  or  where  they  might  fit.  It  is  a  challenge  because  there's  such  a  broad  base  of  functions  within  JMP  that  an  intermediate  level  or  beginner  level  can  cover  a  lot  of  territory.  But  we  do  our  best  to  try  to  get  people  into  a  group  where  they  feel  comfortable  and  are  at  the  same  learning  pace  and  level.  O ur  more  advanced  and  intermediate  folks,  we  have  them  do  teach  facts  and  presentations.  It  helps  hone  their  skills  and,  again,  builds  the  community  within  our  organization. Put  a  little  example  of  how  we  survey  to  solicit  people  that  might  be  interested  in  the  training.  We  leverage  Microsoft  Forms.  We  can  create  internal  surveys,  collect  demographic  information  about  people  who  their  managers  are.  Again,  it's  very  important  that  there's  good  collaboration  and  communication  between  the  learners  and  their  managers  to  make  sure  that  this  is  something  that  can  be  supported.  We  need  to  know  where  people  are  so  we  can  consider  time  zones  as  we're  thinking  about  how  we're  going  to  schedule  training. Just  survey  101,  the  more  you  can  give  canned  responses  that  a  user  selects  from  versus  entering  their  own  information,  the  easier  you  will  have  in  being  able  to  analyze  and  summarize  that  information  when  you  get  it  back.  Have  a  multi  response  question,  because  one  of  the  things  we  can  look  at  it  shortly  when  we  get  to  the  demo  is  how  JMP  can  handle  a  single  question  that  has  multiple  responses  so  that  you  can  understand  different  categories  that  people  might  have  selected. But  we  do  also  want  to  understand,  again,  the  why  folks  want  to  participate  in  the  training  so  we  can  ensure  that  we  meet  their  needs  and  that  they're  coming  into  it  for  the  right  reasons.  S ometimes  we  collect  other  information  about  other  things  they  might  be  interested  in  learning.  Now  we'll  move  into  JMP.  W here  we're  going  to  start  is  we've  done  a  survey  and  we've  gotten  back  our  results.  We're  going  to  get  that  survey  information  into  JMP  to  find  out  who's  interested  in  taking  the  class.  Then  we're  going  work  up  through  a  series  of  columns  and  formulas.  How  are  we  going  to  keep  track  of  these  folks  as  they  move  through  their  training? I  will  be  using  JMP  17  standard,  but  most  of  this  has  also  been  done  actually  started  out  in  15  and  16.  It  will  work  there  as  well.  Just  so  you  can  see  where  we're  going.  Again,  we're  going  to  import  this  information.  We're  going  to  do  some  cleanup.  We're  going  to  enrich  the  information  by  adding  some  formulas  and  columns.  Then  we'll  create  a  subset  so  we  can  track  our  beginners.  Then  we're  going  to  start  to  import  information  into  that  table  to  keep  track  of  what  people  have  completed.  Then  I've  got  some  scoring  formulas  so  I  can  figure  out  who  has  completed  the  training.  If  not,  what  elements  of  the  training  that  they're  missing.  Then  we  can  use  that  data  table  with  the  scoring  to  then  communicate  back  congratulations  to  both  the  student  and  their  manager. I'm  going  to  get  out  of  PowerPoint  and  go  to  a  JMP  journal.  For  the  remainder  of  the  discussion,  we  will  be  in  JMP.  Again,  the  registration  form  comes  back  in  the  form  of  an  Excel  file.  It's  embedded  in  this  worksheet.  When  you  launch  it,  JMP  is  going  to  look  to  import  that  information.  As  a  best  practice,  I  always  click  on  the  Restore  Default  Settings.  This  is  a  fairly  simple  worksheet.  It  only  has  one  tab  and  you  can  quickly  evaluate,  are  the  columns  looking  right?  My  headers  in  the  right  place?  The  data  elements  look  like  they're  going  to  be  properly  imported.  W e  have  a  quick  look  at  the  data. If  we  did  have  any  hidden  rows  or  columns  or  empty  rows  columns,  we  could  decide  whether  or  not  we  wanted  those  to  be  imported  or  not.  We're  going  to  leave  the  defaults  for  this.  Simply  click  Import  and  there  we  go.   JMP has  ingested  the  information  from  the  spreadsheet.  Again,  there  is  83  respondents.  You  can  see  all  of  the  categories.  These  were  the  questions  in  the  questionnaire.  Each  one  comes  in  as  a  different  column  in  JMP. Obviously,  this  is  anonymized,  so  you  can  see  the  learner's  email,  first  name,  last  name.  This  email  is  going  to  be  important  because  that's  going  to  be  the  key.  That's  going  to  be  the  piece  of  information  when  we  look  at  importing.  Did  they  attend  a  training  session?  Did  they  turn  in  their  homework?  That's  going  to  be  how  we  join  the  information.  All  of  the  reports  that  we  get  will  have  the  learner's  email.  That's  how  we  can  combine  data  that's  going  to  come  as  we  progress  through  the  training. What  are  we  going  to  do  with  this  file?  We've  imported  it.  We're  going  to  do  some  data  functions  to  clean  it  up  and  enrich  it.  I've  listed  those  here.  We're  going  to  look  at  the  location  information  and  we're  going  to  see  that  we  have  some  permutations.  We're  going  to  use  the  Recode  function  to  clean  that  up.  When  we  surveyed,  we  combined  track  and  level.  When  we  did  this  survey,  we  were  actually  surveyed  for  more  than  just  JMP  training.  This  is  a  subset  of  that.  But  we  want  to  separate  those  into  two  pieces  of  information  rather  than  having  them  glued  together. Then  we'll  take  a  look  at  doing  some  summaries  and  tabulations  and  graphics  so  that  we  understand  who  is  the  learner  population  that  has  signed  up  for  training.  I'm  going  to  jump  over,  save  this  one  to  the  version  where  I've  already  cleaned  this  up.  W e'll  take  a  look  at  what  that  looks  like.  Again,  here is,  in  the  survey,  we  have  location  and  you  can  see  there's  a  new  column  with  a  plus  sign  that's  got  a  formula. Y ou  can  have  a  look  at  it  and  see  that  it's  doing  some  manipulation.  Where  it  said  Lake  Forest,  we're  actually  converting  that  to  Chicago,  Illinois. The  Recode  function  generated  those  formulas.  I  did  not  have  to  write  that  formula.  The  way  that  you  do  that  is  you  go  to  learner  location  and  you  can  just  right- click  and  select  Recode.  It's  going  to  show  you  here's  all  the  data  elements.  You  can  see  pretty  quickly  that  there's  some  permutations.  Somebody  entered  Knoxville,  Tennessee  with  and  without  a  parenthesis.  Geographically,  you  may  or  may  not  know.  Chicago  is  a  big  location,  a  big  city,  and  actually  Des  Plaines  and  Waukegan  are  actually  all  suburbs  and  Lake  Forest  as  well  are  all  really  part  of  Chicagoland. W e  want  to  group  those  together.  They're  all  the  same  time  zone.  Those  folks  are  within  15,  20- 30  minutes  of  each  other.  W e  could,  again,  we've  got  them  all  highlighted.  I  highlighted  multiple  by  holding  down  the  CTRL  key.  I'm  going  to  right- click  and  I  can  say  I  want  to  group  these  all  to  be  Chicago.  N ow  you  see  all  four  of  these  entries  are  going  to  be  Chicago.  I'm  actually  going  to  add  Illinois.  W e  go  through  that  process  for  a  number  of  the  different  permutations.  Again,  Santa  Clara,  California  got  entered  with  and  without  the  state  designation,  so  I  can  group  these.   I  just  want  to  use  the  two- letter  designation. Similarly,  so  we  can  go  through  the  different  permutations  and  do  this  data  clean  up it.  The  way  that  we  want  to  save  it  is  we  could  overwrite  the  data.  I  don't  like  to  overwrite  data  in  my  data  tables.  I  want  to  save  the  formula  because  if  I  run  another  class.  Or  I  go  to  the  run  another  survey,  it's  likely  that  I  might  see  similar  permutations.  Just  for  the  purposes  of  demo,  I'm  just  going  to  rename  this  as  demo.  Now  it's  going  to  create  a  new  column  formula.  I'm  going  to  hit  Recode  and  you  can  see  that  it  created  this  new  column  here  with  the  formula. A gain,  I  didn't  do  all  of  the  clean-up  permutations,  but  you  can  see  how  it  did  the  mapping.  I n  the  future,  if  it  sees   Des Plaines,  it's  going  to  group  all  of  these  under  Chicago.  W hy  do  that?  It  obviously  reduces  the  number  of  variables  if  you  try  to  plot  or  summarize,  and  it  just  cleans  things  up. W e  did  some  additional  clean  up  items.  As  I  mentioned,  this  track  and  level,  it  broke  it  into  two  pieces  using  a  word  formula. We  just  said  take  the  first  word  of  track  and  level,  and  there  you  have  it.   JMP and  take  the  last  word  of  track  and  level,  and  that  will  get  you  level.  You  can  do  that  by  simply  taking  the  combined  data  column  and  using  some  pre-set  column  formulas  that  JMP  provides.  Here's  first  word,  or  you  can  select  last  word,  you  can  see  first  word  JMP.  Then  I  just  retitled  these  to  simplify  the  name. Now,  why  do  I  do  that?  Now,  I  want  to  have  a  look  at  what's  in  this  data  set  so  I  can  use  the  Analyze,  Tabulate  function.  Now  that  I  have  them  separated,  I  could  leave  them  like  this.  Y ou  can  see  the  population  of  beginners  and  intermediates.  I  like  to  have  them  broken  out.  I'm  going  to  do  track  and  then  I  want  to  know  level.  T here's  different  drop  zones  where  you  can  put  these  depending  on  what  you  want  to  see.  But  I'm  going  to  build  up  a  series  of  charts.  I'm  going  to  do  location. Now  you  can  see  that  there  were  83  respondents,  61  beginners,  22  intermediates.  If  I  check  the  box  down  here  for  order  by  by  count  of  grouping  columns.  You  can  see  that  it  resorted  so  that  the  grouping  with  the  highest  count  is  listed  first.  So  rather  than  being  alphabetical,  it's  in  descending  order  by  how  many  are  in  each  of  the  categories.  You  can  quickly  see  what  is  the  distribution  of  the  locations  of  the  people  interested  in  your  training,  and  you  can  start  to  plan  for  how  you're  going  to  deliver  it. If  you're  done,  you  can  click  done.  Then  what  I've  done  is  use  the  script  function  within  JMP,  Save  script  to  data  table.  Then  I  can  give  that  a  descriptive  name  and  it  will  save  it  right  back  to  the  data  table.  Here  we  go.  Tabulate  by  location  and  level.  If  I  click  that  button,  basically  it's  repeating  that  analysis.  I've  added  a  little  bit  more  detail  where  I,  in  addition  to  the  number  of  respondents,  percentage  of  total,  so  you  can  see  what  proportion  is  in  each  category. Instead  of  tables,  graphics  are  always  nice.  Again,  I've  presaved  some.  We'll  take  a  look  quickly  at  how  to  build  those.  I've  taken  advantage  of  a  function  called  the  column  switcher.  Built  up  a  graph  that  I  liked  and  now  I  can  easily  toggle  between  different  categories.  You  see,  this  one's  a  little  bit  messy,  but  you  can  go  between  categories  to  have  a  look  at  different  managers. Some  only  submitted  one  person  to  be  going  to  training.  Some  submitted  multiple.  It's  a  little  bit  busy.  But  again,  you  can  see  what  functional  area  they  come  from  and  you  can  see  what  location  they  come  from.  I've  also  added  a  data  filter.  If  I  really  just  wanted  to  hone  in  on  beginners,  I  could  select  beginner  intermediate. You  can  see  in  one  page,  I  can  very  quickly  get  a  variety  of  graphs  and  easily  put  them  into  a  presentation  or  save  them  so  I  can  communicate.  The  beginners  are  mostly  from  Chicago,  but  there's  a  good  chunk  from  Dallas.  We've  got  two  Irish  sites  that  have  a  number  of  folks  that  are  interested. But  if  I  go  to  intermediate,  Chicago,  Germany,  again,  Texas,  you  can  see  where  folks  are  coming  from  that  are  interested  in  different  levels  of  training.  The  way  we  build  this,  I'll  pull  it  off  to  the  side  and  we'll  just  look  at  it  quickly. Graph  Builder.  I  started  with  location.  I'm  actually  going  to  drag  location  to  the  Y  axis  and  hit  a  bar  chart.  Now  I  see  each  of  the  categories.  The  reason  I  like  to  do  this,  sometimes  if  the  text  is  long,  it's  a  little  bit  easier  to  read  it  when  it's  in  this  horizontal  orientation  than  if  it's  in  a  vertical  orientation  and  you're  trying  to  read  it  sideways. Again,  if  you  right  click,  I  can  change  the  ordering  and  then  go  order  ascending.  A gain  now  it's  count  data.  So  it's  putting  the  category  with  the  highest  counts  at  the  top  and  the  least  counts  are  at  the  bottom.  I'm  going  to  change  this  from  mean,  it's  not  really  a  mean  it  is  just  a  count.  Now  I  can  add  a  label  that  is  percent  of  total  values. I  can  see  again,  Chicago  is  40  %  and  you  can  get  the  proportion.  Then  on  the  Y,  X  axis,  rather,  you  can  see  the  actual  counts.  You're  getting  a  lot  of  information  in  one  graphic.  One  other  feature  that's  really  nice  in  Graph  Builder  is  I  right  click,  add  caption  box,  and  I  right  click  it  again  and  change  the  caption  box  location,  the  Y  position  we'll  put  it  at  the  bottom.  Just  there's  more  real  estate  at  the  bottom. Again,  you  can  see  there  were  83  participants,  40 %  are  from  Chicago.  A  lot  of  information  in  one  graphic.  You  can  further  customize  it  by  right  click,  hide.  I  don't  really  want  all  these  annotations.  I  think  it's  obvious  that  it's  count.  I'm  just  going  to  hide  some  of  the  annotations.  It  just  makes  it  a  little  cleaner. The  way  I  got  the  coloring  was  to  drag  location  over  to  color.  Now  each  category  has  got  a  different  color,  but  I  can  customize  those  so  I  can  click  on  the  bar.  Chicago's  blue,  it's  fine.  Next  one  down  is  Longford.  If  I  right  click,  I  can  change  the  coloring.  It  will  go  a  little  bit  lighter  blue.  Dallas,  Texas  was  the  next  category,  fill  color.  I  will  do  light  blue. Then  the  rest  of  them,  I'm  just  going  to  hold  down  my  CTRL  key  as  I'm  clicking  through  all  of  these  categories  to  highlight  the  rest  of  them.  Again,  pick  one  of  them  in  the  legend,  right  click,  and  we're  just  going  to  send  them  to  gray.  That's  how  I  got  to  the  customization.  Now,  I  don't  really  need  to  see  this  legend,  so  I  can  get  rid  of  it. The  way  the  column  switcher  works,  if  I  look  up  here  in  the  toolbar  column  switcher,  it's  also  icon.  It's  right  now  on  location.  I  can  add  to  that  organization  manager's  email,  again,  holding  the  CTRL  key.  Okay.  T  hat's  what  brings  up  this  window  where  now  I  can  switch  between.  It  gets  a  little  busy  and  I  have  to  go  through  the  same  color  customization  if  I  want  to  have  the  blue  and  just  a  simple  blue  and  Gray. But  it  remembered  it  for  location.  I  can  say  done.  Again,  that  gets  me  real  estate.  I  don't  really  want  to  look  at  that  legend.  U nder  Show,  I  can  turn  it  off  and  then  resize.  If  I  wanted  to  be  able  to  toggle  again  between  beginner  and  intermediate,  we  can  use  a  local  data  filter  and  say  I  want  to  be  able  to  filter  on  level  to  select  it,  do  the  plus  sign. One  nice  feature  I  think  I've  noticed  in  17  is  if  I  resize  this,  it  enlarges  the  font  automatically,  which  is  actually  very  nice.  Now  I  can  toggle  between  beginner  and  intermediate,  or  I  can  clear  the  selection  and  again  leave  it  at  both.  Once  I'm  happy  with  that,  again,  onto  the  red  hotspot,  Save  script  to  data  table,  give  it  a  descriptive  name  so  you  know  what  it  is,  and  it  will  save  that  script  to  the  data  table  so  you  can  re  execute  it. The  other  thing  you  can  do  is  under  the  Edit  menu  is  Edit  Journal  or  CTRL  J,  then  it  will  grab  an  image  of  that  analysis  and  place  it  in  your  journal,  which  is  what  I  have  done  here. I f  I  go  to  the  journal,  you  can  see  that  I've  captured  these  images  of  the  graphs  the  way  that  I  like  them.  Then  nice  thing  about  doing  it  in  a  journal  versus  grabbing  a  static  picture  is,  again,  you've  got  your  in  JMP  and  you've  got  your  red  hotspot,  which  means  you  can  have  interactivity. I  can  select  the  graph  from  the  journal,  and  as  long  as  the  table  behind  the  graph  is  open,  you  can  say  run  a  new  window  and  I  get  back  my  interactive  graph.  If  I  wanted  to  make  some  additional  changes,  change  the  text,  I  could  do  that.  But  I've  got  everything  saved  in  a  nice  workbook. Now  we  have  a  learner  list.  We  know  where  they're  from.  I  wanted  to  quickly  touch  on  the  skill  set  piece  in  this  multi  response.  The  way  I  handled  that  is  I  actually  created  a  copy  of  the  column  called  skill  set.  If  you  look,  it's  a  little  bit  hard  to  see,  but  if  you  look  carefully  at  the  original  column,  each  of  the  selected  items  was  separated  by  a  semi  colon. JMP  can  handle  a  semi  colon  as  a  delimiter.  I  found  that  it  didn't  work  very  well  in  this  analysis.  As  a  workaround,  I  created  a  copy  of  the  column,  and  then  I  did  CTRL  F,  and  you  can  do  a  simple  find  and  replace.  I  replaced  the  semi  colon  with  a  comma,  and  JMP  liked  the  comma  a  whole  lot  better. Now  why  do  that?  Again,  I'm  going  to  go  to  the  graph  builder.  Then  the  final  thing  I  did  was  told  JMP,  Hey,  this  column  is  actually  a  multi  response  column  rather  than  being  a  number  or  character.  It's  multi  response.  That's  what  prompts  JMP  to  look  for  that  delimiter  and  understand  that  there's  different  categories  in  that  column. Again,  we'll  just  quickly  go to  the  graph  builder  and  you  can  look  at  the  difference.  Now,  if  I  take  multi,  you  can  see  each  category  is  only  represented  once.  Of  all  the  different  reasons  why  people  are  interested  in  participating  in  training,  each  category  gets  counted  independently  and  you  don't  see  all  the  permutations. A gain,  right  click,  order  by.  You  can  see  what  the  most  popular  ones  people  want  to  improve  their  skill  set  so  that  they  can  be  more  efficient.  We've  been  talking  a  lot  about  storytelling  with  data,  how  to  getting  a  message  across,  how  to  drive  action  with  data  stories.  These  are  all  the  reasons  that  people  want  to  participate  in training.  Yeah,  that  multi  response  function  is  nice,  particularly  if  you're  doing  surveys. Final  thing  we're  going  to  do  on  this  data  table  is  we're  just  going  to  take  a  subset  because  we're  going  to  focus  on  the  beginners.  A gain,  you  can  use  the  data  filter  level.  I  only  want  the  beginners.  Again,  out  of  the  83,  it's  highlighting  the  61.  I'm  going  to  select  a  set  of  columns  in  my  data.  I  don't  want  all  of  them. I don't ant that one. Then  we're  going  to  create  a  subset  table,  subset,  and  we're  going  to  tell  JMP,  I  only  want  to  use  this  selected  columns.  It  gives  you  a  nice  preview.  Here's  the  email  address,  where  they're  from,  and  their  level,  and  we  can  say  okay.  Now  we  have  just  a  list  of  the  beginners,  where  they're  from,  basic  information,  and  this  is  what  we  can  use  to  start  to  build  a  tracker  to  say,  Okay,  these  are  the  folks  that  are  beginners.  I  got  to  make  sure  that  they  complete  their  requirements. How  am  I  going  to  do  that?  I'm  going  to  take  you  to  the  version  where  I  already  have  this  set  up.  Let's  close  the  registration  information.  Now  we're  into  the  completing.  What  I've  done  is  taken  this  basic  data  table  that  had  the  information  about  who  registered  for  training,  and  I  started  adding  a  whole  bunch  of  columns. Beginner  training  consists  of  five  different  classes  that  they  need  to  attend,  five  different  sessions,  three  homework  assignments.  We  assign  them  a  couple  of  STIPs  modules.  That's  Cisco  Thinking  for  industrial  problem  solving.  They're  free  courses  and  modules  available  through  the  JMP  learning  community.  We  assign  a  couple  of  them  to  the  beginners.  They  can  take  more  if  they  want  to.  I've  got  them  all  listed  here. I  will  talk  about  these  hash  marks  in  a  minute  and  why  these  columns  aren't  blank,  like  the  homework  columns.  We  request  that  they  provide  a  data  example.  These  are  all  the  elements  of  the  training.  What  I've  leveraged  is  a  couple  of  column  features.  So  if  I  go  to  class  1  and  I  do  column  information,  what  you  can  see  is  1,  I've  used  this  list  check  function.  I've  told  JMP,  these  are  the  only  values  that  can  go  in  that  column. That  just  helps  keep  the  data  sheet  clean.  If  I  do  any  data  entry,  it  forces  consistency  across  the  data  table.  The  other  really  nice  feature  is  called  value  colors.  Then  I've  assigned  a  specific  color  to  each  value.  Yes,  if  they  attended  class,  they  get  a  nice  dark  green.  If  our  students  were  recording,  they  maybe  they  watched  the  recording  later,  they  didn't  come  to  class  live.  Sometimes  people  tell  me  they're  out  of  the  office,  just  color  coded  that  red. Then  the  key  feature  is  if  you  click  on  the  little  box  at  the  top  and  hit  Apply,  it  will  color  code  your  cells  similar  to  what  people  are  used  to  seeing  in  Excel,  it'll  color  code  your  cells  based  on  the  content  of  that  cell.  It  makes  it  very  easy  to  look  across  this  data  sheet  or  data  table  to  say  where  are  we  at,  how  many  people  are  missing  things,  how  many  people  are  green,  how  many  people  are  red. Once  you  have  one  column  set  up,  you  can  use  copy  column  properties,  and  I  can  broadcast  that  across  the  remaining  four  columns  for  the  different  classes,  which  is  what  I've  done.  When  you're  using  these  value  colors,  it  puts  a  little  black  X  mark  as  an  attention  activator  to  let  you  know  that  it's  going  to  color  code  depending  on  what  you  enter  there  versus  in  the  homework  field,  I  hadn't  yet  activated  that. These  are  all  the  elements  that  are  required  of  training.  Now  I  have  my  workbook  for  managing.  One  other  feature  we'll  talk  about  is  joining.  We  held  our  first  class.  I'm  going  to  clear  this  a  little  bit.  I  actually  already  have  this  Excel  file  open.  Microsoft  Teams  provided  me  with  a  summary  of  the  meeting.  I  had  42  participants.  Here's  who  they  are.  Here's  their  email  address. It  does  actually  tell  me  how  long  they  were  in  the  meeting.  If  I  scroll  down  to  the  bottom,  I  can  decide  if  Learner  26  who  was  there  for  12  seconds if  they  were  going  to  get  credit  for  attending  or  not.  But  really  all  I  need  out  of  this  worksheet  is  just  their  email  address  because  that's  how  I  know  who  they  are  in  my  tracking  sheet. I've  highlighted  it  and  I'm  just  going  to  highlight  it  in  Excel.  One  of  the  other  things  you  can  notice  is,  Learner  79  must  have  had  a  little  trouble.  They  were  in  for  one  minute  and  then  must  have  gotten  dropped  or  had  to go  back,  come  back  in.  There's  actually  two  entries  for  Learner  76  and  Learner  79.  You  always  want  to  look  at  your  data  first. But  with  JMP  and  Join,  we  don't  have  to  worry  about  that  JMP  will  do  a  good  job  of  merging  the  information.  Another  really  fun  feature  is  the  JMP  add  in.  I've  highlighted  what  I  want.  There's  a  JMP  add  in  within  Excel,  select  it,  data  table.  It's  opening  it  on  my  other  screen.  I'll  pull  it  over.  I've  literally  just  grabbed  that  information  and  put  it  in  a  JMP  data  table. I'm  going  to  quickly  add  a  column  called  Class  1.  This  is  not  a  data  table  we're  going  to  keep,  so  I'm  not  going  to  spend  its  character.  Okay.  Then  this  is  the  list  of  names  of  people  who  attended  class  1.  I'm  going  to  just  enter  Y  because  I  know  that  that's...  I'm  going  to  fill  to  the  end  of  the  data  table  and  it's  already  called  attendance. I  don't  need  the  Excel  spreadsheet  anymore.  I'm  going  to  go  back  to  my  tracker.  You  can  see  that  I've  deleted  some  of  the  entries  here.  I'm  going  to  use  a  table  update  function.  I  like  update  because  I  just  keep  building  onto  the  same  table.  I  don't  constantly  generate  new  tables  that  I  have  to  rename  and  save. In  that  attendance  list,  I  know  that  email  matches  email.  It  is  case  sensitive,  so  I  actually  had  to  make  sure  that  it  was  in  full  lowercase  in  both  locations  in  order  for  it  to  match  up.  JMP  will  give  you  a  really  good  preview.  If  you  don't  see  anything,  then  you  can  take  a  look  at  whether  or  not  maybe  you  missed  something. Then  what  I  want  to  do  is  the  attendance  table,  which  is  the  update  table,  I  want  to  update  the  class  1  information  there.  I  want  to  replace  the  class  1  information  in  the  Master  table  because  it's  just  blank.  Let  me  see  if  I  can  do  this  so  you  can  see  what  happens  when  I  hit...  Here  we  go.  One,  it's  giving  you  a  preview,  but  if  you  watch  up  here,  this  is  the  tracker  sheet.  I'm  going  to  update  it.  I  hit  okay,  and  there  we  go. Now  it's  updated  the  tracker  sheet  with  the  information  about,  yes,  these  people  attended  class  1.  As  we're  moving  through  a  training,  we're  going  to  do  that  on  a  repetitive  basis.  We're  going  to  get  reports  about  attendance.  We're  going  to  get  reports  about  who  completed  their  homework. I  don't  have  to  manually  go  in  here  and,  okay,  you  were  there,  you  were  there.  I  can  do  it  in  a  much  more  automated  fashion.  Then  that  way,  if  somebody  emails  me  or  lets  me  know,  "Hey,  I  wasn't  there,  but  I  watched  the  recording,"  now  I  can  just  enter  that  manually  and  it  really  reduces  the  amount  of  manual  intervention.   Again,  a  lot  like  what  people  are  used  to  working  with  spreadsheets,  but  I  think  once  you  get  used  to  it,  you  can  actually  do  a  lot  more  here. File,  close  this  one.  This  is  just  a  transient.  I  don't  need  to  keep  it  for  any  reason,  so  I'm  not  going  to  save  it.  Then  we're  going  to  go  to...  Now  time  has  gone  by.  We've  run  a  whole  bunch  of  classes.  You  can  see  people  who've  come  to  class.  People  have  missed  stuff.  Some  people  came  and  I  guess  decided  it  wasn't  for  them.  I've done  some  SIPs  modules.  This  is  what  it  looks  like  in  the  end.  This  is  the  accounting  of  what  all  of  the  participants  have  completed.  One  thing  that  would  be  nice  is  SIPs,  they  send  me  a  copy  of  their  certificate,  so  I  do  have  to  manually  enter  that  information.  It  would  be  great  to  be  able  to  get  a  report  that  I  could  then  just  join  in  or  do  some  easier  way  of  tracking  that,  but  to  be  determined. Then  the  last  piece  is,  okay,  great,  I  know  who  was  there.  I  know  what  they  did.  Now  I  got  to  score  it.  This  is  where...  Again,  we've  got  the  tracking  spreadsheet  and  we  come  all  the  way  over  to  the  end.  Now,  I've  added  a  whole  bunch  of  other  columns  which  are  based  on  formulas  and  added  color  coding.  It's  relatively  a  little  busy,  but  it's  relatively  easy  to  see  how  much  green  there  is  versus  red.  I've  got  things  color  coded  so  it  highlights  to  me  who's  missing  information. T hen  I  actually  even  created  a  formula  for  each  person.  What  exactly  is  missing?  If  nothing's  missing,  it's  just  a  series  of  commas.  And  then again,  a  conditional  formula,  and  we'll  look  at  these  really  in  the  moment  that  tells  me,  hey,  did  they  meet  the  minimum  requirements?  All  the  things  that  I  said  that  they  had  to  do.  If  they  did,  then  I  get  a  finished,  and  so  it's  really  easy  for  me  to  then  say,  okay,  I  know  who's  finished,  I  know  who  hasn't,  and  it  updates  automatically.  A ll  this  information  will  be  available  next  time  I  run  the  training.   Once  I  built  it  once,  I  can  make  minor  modifications,  and  so  it  becomes  a  really  helpful  tool. Just  to  finish  out,  the  power  of  the  formula  building.  Here's  class  score.  I  said  we  held  five  classes.  I  decided  you  had  to  at  least  make  it  to  a  minimum  of  three.  How  did  I  score  that?  I  created  a  column  called  class  score.  Again,  you  can  see  it's  got  a  formula.  We'll  take  a  look  at  the  formula.  It  looks  pretty  busy,  but  we  can  build  it  up  in  pieces  and  show  you.  Once  you  get  accustomed  to  building  logic,  you  can  copy  and  paste  the  elements  and  replicate  them  pretty  quickly. I f  we  just  take  a  really  quick  look,  each  box  is  a  different  class.  They  come  to  class  one.  If  there's  no  entry  in  that  field,  it  means  they  didn't  come.  If  the  entry  is  not  a  Y  and  it's  not  an  R  because  remember,  Y  stands  for  yes,  R  stands  for  recorded.  I f  they  didn't  come  to  class  directly  or  participate  in  the  recording,  they  get  zero  points  for  class  one.  If  they  had  a  Y  or  an  R  in  these  different  ways,  you  could  write  the  logic,  they  get  a  point  and  you  basically  take  this  element  and  you  can  paste  it  and  then  update  same  formula  class  two,  class  three.  You  see  I'm  adding  them  up.   They  get  one  point  for  class  one,  one  for  class  two,  three,  four,  and five. It  totals  up,  and  it  is  a  little  bit  hard  to  see  with  some  of  the  color  coding,  but  this  person  only  came  to  one  class.  This  one  came  to  all  four.  Again,  this  is  where  we're  using  the  value  colors.  They're  not  showing  an  order.  If  it's  zero,  one  or  two,  because  that's  not  the  minimum  requirement,  I  color- coded  it  in  some  red  coloring  and  three,  four  and  five,  which  is  minimum  or  above  is  green.   Again,  a  lot  of  information  you  can  build  up  in  formulas  pretty  quickly. Same  thing  for  homework.  If  we  look  at  the...  There  it's  a  minimum  of  two.  We  can  take  a  really  use  a  slightly  different  logic  this  time  just  to  show  you   the  flexibility.  In  the  homework  field,  if  nothing's  there,  that  means  they  didn't  get  a  check mark  or  a  tick  saying  they  completed  the  homework.   If  homework  is  missing,  and  then  this  little  exclamation  point  means  not.   If  it's  not  missing,  which  means  it's  there,  they  get  a  point.  A gain,  you  sum  them  up.  So  if  they  did  all  three  homeworks,  they  would  get  three  points. You  just  work  your  way  across  tips.  We  required  two.  Some  people  overachieved  and  you  see  this  person  at  the  top,  it's  really  dark.  Did  all  seven.   I  developed  an  extra  credit  formula  saying,  okay,  if  you  were  assigned  to,  if  you  did  more,  I'll  give  you  some  bonus  points.  And  that  way,  if  you  missed  a  homework,  you  you  can  cover  it  with  an  extra  steps  module.  A gain,  you  can  just  build  up  logic  statements.  You  have  to  really  think  through  what  your  requirements  are  and  what  the  logic  form  is  going  to  be. We'll  just  do  a  quick  how  do  we  build  that?  I  think  I  had  this  column  seven,  which  was  an  example.  Yeah.  All  right.   We'll  just  clear  this  out,  edit  formula.  Again,  I'm  just  going  to  clean  this  up,  build  it  from  scratch  really  quick.   Again,  once  you're  in  the  formula  editor,  if  you're  not  sure  where  things  are,  you  can  type  and  it'll  show  you  if  it's  under  conditional.  Just  going  to  clean  that  up.  Then  it  guides  you,  well,  what  does  it  need?  If  what?  Well,  if...  Make  sure  you  highlight  the  box.  Class  one,  I  want  to  do  a  comparison.  Is  missing  zero  else  1.  We're  just  going  to  do  a  simpler  formula.  There  you  have  it. Now  you  can  add.  Now  I  need  to  do  the  same  thing  for  class  two.  Depending  where  you  click,  you'll  highlight  different  parts  of  the  formula.  You  want  to  make  sure  you  get  the  whole  box.  You  can  use  your  up  arrow.  Once  this  whole  formula  box  is  highlighted,  I  can  say  Control +  C  for  copy.  Now   I  can  just  paste.  I  don't  have  to  build  up  this  if  then  else  logic.  I nstead  I  can  just  say,  okay,  class  one,  I  want  to  apply  the  same  thing  for  class  two.   That's  how  you  iteratively  would  build  up  a  formula  once  you  apply––  You  can  start  to  see  that's  how  we  added  up  the  scores  based  on  the  yeses  or  our  content  being  in  the  attending  class  formula. Then  the  final  piece  is  I  want  to  know  what's  missing.  One  thing  you'll  notice  is  that  this particular  row  here,  notice  that  they  aren't  designated  as  having  finished  their  training.   If  I  look  across  the  row,  they  completed  four  classes,  they  completed  all  three  homeworks,  they  actually  completed  four  SIPs  modules.  I  know  it's  difficult  to  see  with  the  coloring,  but  what  they  did n't  complete  was  providing  a  data  example.  It's  blank. That  is  a  mandatory  element  of  completing  the  course.  Even  if  they  overachieved  on  everything  else,  unless  they  apply  their  learning  and  provide  us  with  some  example  of  how  they  use  JMP,  they  can't  get  full  completion  credit.   That's  why  even  though  they  have  the  points,  they're  missing  the  one  critical  element. These  formulas  are  stored  in  the  table  that  you  can  reference  later,  but  it  gives  you  an  idea  of...  You  can  build  up  some  pretty  complex  formulas,  but  it's  saying,  okay,  if  their  class  score  is  less  than  three,  that  means  they  didn't  attend  enough  classes.   Note  that  class  is  one  of  the  items  missing.  These  double  pipes  are  for  concatenate.  It's  just  putting  a  comma  delimiter  between  the  elements.  Then  it's  saying  if  homework  is  less  than  two,  that  means  they  didn't  finish  the  minimum  number  of  homeworks  and  so  on. Then  you  can  see  this  data  example.  If  data  example  is  missing,  it's  not  points,  it's  just  black  or  white.  It's  either  there  or  it's  not  there.  You  can  get  a  listing  of  what  they've  completed  and  what  they've  not  completed.  Once  you  have  that,  you  can  quickly  tabulate and  we'll  just  go  to  missing.  Then  I  can  add  email.  Now,  because  I  did  it  in  the  opposite  order,  here's  people  that  are  missing  one  SIPs  Module.  Here's  people  that  are  missing  two,  here's  people  that  are  missing  two  SIPs  modules  and  didn't  do  a  data  example.  Then  I  can  communicate  back  out  to  those  groups  of  folks  exactly  what  they're  missing,  and  so  they  can  either  get  it  done  or  say,   I'm  not  going  to  be  able  to  finish  this. Close  this. Once  we've  spent  the  time  to  build  up  those  formulas,  again,  we  can  do  some  graphics  based  on  that  finished  column,  and  I  can  see  what  percentage  of  people  by  site  or  location  finished  the  training.  I  can  tabulate  35...  We  had  just  over  50 %  completion  rate.  Not  great,  but  that's reality  and  we  can  circle  back  with  what  were  your  barriers  to  finishing?  Again,  you  can  look  at  your  metrics,  you  can  report  back  on  what's  happening,  all  being  driven  off  of  this  one  data  table  by  using  different  formulas  and  different graphics. It's  very  simple  bar  charts  and  summary  tables. Hopefully  that  gives  you  a  flavor  of  without  getting  into  advanced  analytics  and  model  building  and  response  surface  modeling,  you  can  get  a  lot  of  mileage  out  of  the  fundamental  features  of  JMP.   It's  really,  in  my  mind,  a  very  good  jumping  point  for  folks.   We've  had  a  lot  of  success  with  getting  people  up  and  running  and  comfortable. If  you  can  navigate  through  these  tabulations  and  summaries  and  data  cleanups  and  making  some  graphs  and  customizing  the  graphs  and  thinking  about  how  to  annotate  the  graphs  so  they  have  a  quick,  meaningful  message  in  the  most  crisp  presentation,  you  will  have  really  moved  the  needle  on  the  capabilities  of  your  organization, and   hopefully  generated  some  excitement  for  the  use  of  JMP. With  that,  I  thank  everyone  for  tuning  in.  Hopefully,  when  this  is  posted  in  the  community,  if  you  have  questions,  thoughts  or  suggestions,  certainly  welcome  the  discussion  and  hearing  what  other  people  have  to  say.  But  don't  undervalue  how  far  you  can  get  with  getting  a  broad  base  of  beginners  up  and  running.  They  can  go  out  and  do  great  things,  as  I  said,  by  way  of  summary,  get  people  excited  and  get  people  up  and  running. The  nice  thing  is  you  get  beginners  and  advanced  practitioners  now  on  the  same  platform.  They  can  start  to  talk  to  each  other.  The  beginners  can  move  along  and  the  advanced  practitioners  don't  have  to  go  backwards.  We're  trying  to  remember  how  Excel  works,  they  can  stay  in  the  platform  where  they  do  most  of  their  analytics.  When  you  do  that,  you  can  join  the  Ninja  community  and  dare  mighty  things  like  flying  helicopters  on  Mars.  Thank  you  very  much.
One of the most important product test machines (ATOS) is investigated in this global Autoliv project with the target of introducing an automated alarm system for product test data and a root cause analysis. We wanted a flexible automated software solution to transfer data into an SQL database and perform a root cause analysis. Furthermore, we wanted to send web-based links of reports to an existing “leading-to-lean” (L2L) dispatch system, which informs machine owners via mail. We use JMP to automate all processes via Task Scheduler for all these tasks.     Hello .  My  name  is  Astrid  Ruck .  I'm  working  as  Senior  Specialist  for  Autoliv.   Autoliv  is  a  worldwide  leading  manufacturer  of  automotive  safety  components  such  as  airbags ,  seatbelts,  and  active  safety  systems .  Today ,  I  would  like  to  show  you  an  automated  process  of  controlling  product  test  data  and  creating  alarm  reports  for root  cause  analysis . We  will  start  our  presentation  with  a  video  on  the  working  method  of  Autoliv's  most  important  product  test  machine  called  ATOS.  These  machine s  make a  100%  control  such  that  no  defect  part  will  be  delivered .  The  resulting  tests  will  be  written  into  a  log  file ,  including  additional  information,  and  automatically  send  to  a  server  in  Amsterdam . In  the  blue  circle ,  you  see  our  usage  of  JMP .  So  in  the  first  step ,  the  log  files  are  transferred  into  a  database  and  daily  reports  are  created  which  are  saved  on  the  server . If  and  only  if  there  is  an  alarm,  a  second  table  is  used  from  the  traceability  system  of  our  relief  call ed  Atraq   which  includes  component  information  of  every   retractor .   This  is  used  for  predictive  screening  for  root  cause  analysis  in  our  alarm  report .  Alarm  report  is  saved  to  the  server  and   we  use  an  HTTP  post  to  send  this  link  to  our   Autoliv's  dispatch   system  which  is  called  Leading 2Lean .  Leading 2Lean  sent  an  automated  mail  to  the  corresponding  machine  owner . Here  we  see  the  retractor .  It  has  an  orange  webbing  and  a  clear  cover .  Let  us  start  with  a  video .  So  this  is  the  Atlas  machine .  Here  you  see  the  retractor  but  now  we  have  a  black  cover  instead  of  a  clear  cover .  Here  you  see  the  webbing,  and  sometimes  you  will  see  a  little  marker  on  the  webbing ,  because  then  you  can  see  if  there  is  a  webbing  extraction  or  retraction . We  will  start  with  the  tilt lock  testing,  and  tilt lock  testing  is  to  ensure  that  the  blocking  off  your  seatbelt  in  the  case  of  a  roll -over  scenario .   We  start  with  a  tilt  lock  right  testing,  and  here  in  this  little  display  you  see  the  corresponding  tilt lock  angle ,  which  should  be  between  15  and  27 .  So  let  us  run  it .  It  tilts  to  the  right , it tilts  to  the  left ,  it  tilts  for ward ,  and  it tilts  backward . The  next  step  will  be  the  measuring  of  the  webbing  lengths ,  because  this  also  belongs  to  the  blocking  system .  Take  a  look  here  at  this  right -hand  side ,  and  now  it  starts  the  webbing  measurement,  and  now  already  here ,  very  short —might  go  a  little  bit  back — Web  Sense  Lock  and  No  Lock  is  tested  here  in  this  little  box  with  a  sensor .   Web  Sense  Lock  is  to  ensure  blocking  of  webbing  extraction  in  a  case  of  a  crash .  But  Web  Sense  No  Lock  is  to  ensure  free  wheeling  if  you're  in  a  parking  position . These  are  all  informations  which  are  written  into  the  log  files ,  including  machine  parameters  and  an  internal  barcode .   This  in  turn  of  barcode  is  unit  per  retractor .  It  includes  the  retractor  number,  its  global  line  ID ,  its  production day, and  the   key  index . Soon  these  log  files  are  transferred  once  per  day  to  a  server .   It  is  not  SPC .  It  is  used  for   a  root  cause  analysis ,  therefore ,  we  don't  want  to  disturb  the  testing  of  the  products .  Therefore ,  the  transfer  time  is  between  two  shifts  and  it  is  synchronized  within  Autoliv  facilities,  but  different  between  Autoliv  facilities . For  example ,  in  Hungary ,  the  log  files  will  be  transferred  at  6:05 .  And  in  Romania ,  the  lot  folks  will  be  transferred  at  7:15 .  And  here  you  see  the  folder  structure  on  the  server .  It  starts  with  the  directory  of  the plant ,  so   Autoliv  Hungary,   Autoliv  Romania ,  then  in  each  folder  of  the  plant  you  find  separate  folders  of  the  machines .  And  then  in  each  machine  folder ,  you  will  find  folders  of  the year  and  the  months ,  and  in  the  last  stage ,  you  will  find  the  daily  log  files . Since  JMP  16 ,  each  action in  JMP  is  recorded  in  the  Enhanced  Log .  Data  work  can now  be  saved  as  playable  script  per  point -and -click . Jordan  Hiller  from  JMP, he  says,  "JMP  writes  90%  of  your  code , the  skeleton ."  So  the  consequence  is  that  the  other  10%  is  learning  by  doing,  and  this  presentation  is  to  give  you  a  small  idea  how  you  can  write  your  own  scripts  for  automated  data  analysis  and  root  cause  finding .  So  I  will  give  you  some  little  short  scripts   which  you  can  copy  and  paste  into  your  own  scripts . In  the  beginning,  we  would  start  with  the  multiple   file  import,  then  we  create  the  relevant  columns .  We  select  the  relevant  columns  and  delete  the  irrelevant  columns .  This  procedure  is  independent  of  the  order  of  columns  because  sometimes  some  columns  are  added  in  the   log file .  But  this  procedure  takes  the  name  of  the  columns  and  we  are  quite  independent  of  any  other  order  tree .  Then  of  course  we  clean  the  data  in  the  first  step . Here  we  have  an  example .  We  have  a  product  family  and  there  is  an  empty  space  and   retrac tor  triple  X ,  but  we  would  like  to  have  retractor  to  triple  X  without  this  empty  space ,  and  here  you  see  this  corresponding  script .  Then  we  transpose  the  data  into  the  database .  We  use  the  command  from  JMP  new  SQL  query ,  we  say  what  kind  of  connection  string  we  would  take ,  and  then  we  use  the  function  in  JMP  of  custom   SQL  and  write  our   SQL  command  and  then  we  run  it  in  the  foreground  because  run  foreground  ensures  that  the  transfer  in  the  database  will  be  complete  before  the  next  procedure  will  run .  And  don't  forget  to  close  all  and  to  exit . So  here  we  start  with  the   multiple file  import,  and  here  you  see  once  again  the  folder  structure ,  and  here  in  the  beginning  you  see  what  kind  of  folder  you  select .   One  of  the  best  thing  is  that  you  can  see  include  subfolders  because  our  daily  log  file  is  in  a  very  sub-sub  folder ,  and  this  helps  us  a  lot .  We  are  interested  in  log  files  and  we  are  interested  in  data  from  6:15  to  6:15 ,  and  it  was  the  21st  of  December  last  year  when  we  uploaded  the  log  files . Here  you  can  see  the  relevant  files  which  are  found  in  this  time  slot  with  similar  files  aspect,  and  we  see  here  a  tabulator  is  my  separator  of  the  fields .  This  is  the  result  of  the multiple   file  import .  The  worksheet  with  the  machine   ideates  the  start  date,  start  time,  seatbelt,  and  here  you  see  the  results  of  the  tilt lock  testing .  Tilt  right  result,  it  is  pass  and  fail .  T ilt  right  angle ,  here  you  have  the  angle  and  here  you'll  see  the  other  things . If  you  edit  your  source ,  then  you  get  your  script  and  now  you  can  take  this  copy  and  paste  it  into  your  own  script  and  that  will  be  your  first  script  you  can  run . Now  we  would  like  to  transfer  this  data  into  the  database .  We  transfer  1,000  rows  per  loop .  So  here  you  see  from  the  worksheet ,  the  first  row,  the  row  number  1,000 .  And  we  say  get  the  rows  from  the  first  ro w  and  smaller  is  row  number  1,000,  and  call  it  My List .  So  DT  is  my  data  table ,  here  are  the  number  of  rows  with  R,  and the  row  tells  me,  "I  don't  want  to  have  the  column  names .  I  would  like  to  have  the  values ." And  this  is  how  the  list  looks  like .  Here  you  see  that  the   upload  date  is  in  brackets  or  in  quotation  marks .  And  here  is  the  start  date  and  the  start  time  also  from  type  character ,  because  we  have  had  some  difficulties  to  transfer  the  data,  and  this  was  a  nice  script  and  it  works . Here  it's  the  end .  So  if  you  look  here  in  this  table ,  you  see  that  left  result  is  empty,  is  the  character ,  and  here  left  is  also  empty ,  which  can  be  seen  here  by  double  quotation  marks  and  here  by  a  dot .  But  SQL  doesn't  know  any  numeric  empty  cells ,  and  therefore  we  use  the  next  trick  to   make  a  substitution . First  of  all ,  we  would  like  to  get  rid  of  the  double  quotation  mark  and  would  like  to  have  only  one  of  it.  Therefore ,  we  say  substitute ,  and  because  this  double  quotation  mark  is  a  very  specific  character ,  we  have  to  use  backslash  and  exclamation  mark  in  the  front  and  then  we  replace  it .  Then  we  would  like  to  get  rid  of  the  first  and  the  last  record .  Therefore ,  we  say  remove  the  first  and  the  last  character  and   SQL  doesn't  know  curly  brackets .  Therefore ,  we  replace  them  with  a  round  bracket . Here  in  the  case  of  the  dot,  we  cannot  directly  remove  it  because  it  is  also  included  in  real  numeric   values ,  so  we  use  a  little  trick .   We  say  replace  dot  with  the  comma  into  null  with  the  comma . Here  in  green  below,  you  see  the  resulting  S QL  where  you  list .  This  is  the  way  how  it  should  look  like  in   SQL .  So  we  have  a  queue  once  again,  and  the  corresponding   SQL  command   used  in  custom SQL  is  nothing  else  than  a  plain  string ,  and  that  goes  directly  into  the  database . The  form  is  here  once  again .  We  use  an  SQL  template  and  then  we  say ,  okay ,  insert .  Here  comes  the  name  of  the  database .  SPC  is  the  name  of  the  table  in  our  database,  and  here  in  brackets ,  there  are  the  column  names  in  the  database ,  then  the   values  and  table. And  now  we  use  the  same  trick  as  before .  We  substitute  table  with  x,  and  x  is  my   value  list .  Here  it  is .  And  this  is  called   SQL .  Then  we'll  say  new   SQL  query ,  your  connection  string .  Then  we  use  the  function  and  JMP custom  SQL,   SQL,  and  if  you  would  like  to  see  how  does  SQL  look  like ,  it  looks  like  this .  One  main  trick  I  learned  from   the  staff  from  JMP  was  to  use  this  substitution .  It's  a  very  good  tool   to  get  such  kind  of  commands . Every  program  is  started  via   Task   Scheduler .  So  here  it  is  a  display  of  a  Task   Scheduler .  On  the  page ,  General ,  here  you  can  see  myself  as  also  author , and  here ,  a  trial  run,  whether  I'm  logged  on  or  not ,  because  it  will  also  run  at  the  weekend  and  on  holidays .  It'll  run  all  the  time. Here  it  is  quite  necessary  to  choose  such  Windows  Server ,  which  belongs  to  your  s erver  you  will  have  installed  your  JMP .  So  if  you  choose  the  wrong  server  here ,  you  could  have  background  processes . Here  we   trigger   our  transfers,  scripts  daily  at  6:15 .  And  if  you  check  your  history ,  then  it  should  look  like  this .  Your  task  should  be  completed .  It  shouldn't  look  like  this,   task  stopping  due  to  time out  reached  because  that  means  you  have  some  background  processes  and  that's  not  good . Here  we  have  some  field  action  in  the  Task  Scheduler,  and  here  we  browse  the  location  of  the  batch  data  file .  And  the  batch  data  file  is  nothing  else  than  the  notepad .  Here  you  have  the  location  where  JMP  is   installed ,  and  here  is  the  location  of  your  JMP  script .  And  don't  forget  to  say  exit  at  the  end . If  you  use  a  batch ,  then  you  have  to  use   slash -slash -exclamation -mark  in  the  first  line .  So  not  in  the  second ,  not  in  the  third  line .  It  must  be  in  the  first  line . And  the  key  idea  of  every  program  we  use  is  we  have  the  same  program ,  but  still  in  the  beginning,  we  say  what  kind  of  plant  is  it ?  So  for main here,  we'll  have  the  same  program .  But  then  here ,  instead  of   ALH ,  there  will  be  ARO,  like  Autoliv  Romania .  And  if  we  use  the  multiple  file  import  in the  beginning ,  then  we  say  evaluate  your   plant .  So  the  only  thing  you  have  to  change  is  in  the  beginning ,  the  plant  name.  That's  all . Here  you  see  our  daily  log  file .  The  structure  is  always  the  same ,  so  it  has  two  tables  on  the  top ,  followed  by  two  graphs .  And  here  we  see  all  tests  over  all  machines .  And  here  you  see ,  we  have  had  nine  times  not  okay   values  of  tilt lock  overall ,  and  a  lot  of  okay   values,  and  the  corresponding  percentage  is  given  with   0.33%  and  99 .67% . Here  we  have  the  number  of  not  okay,  the  percentage  of  failure,  and  same  for  pass .  And  here  you  see  the  absolute  number  of  the  test  results  pass  and  fail  for  Local Confection Line .  So  here  we  have  three  ATOS  machines ,  and  we  can  see  that  we  have  five  times  tilt  right  was  not  okay,  three  times  tilt  left  was  not  okay ,  and  one  time  tilt  backward  was  not   okay .  And  tilt lock  overall  is  the  summary  of  all  four  tilt lock  angles,  so  here  we  have  nine  not  o kay . On  the   right-hand  side ,  you  see  the  same  bar  chart,  but  now  the  scale  is  different .  Here  we  have  a  percentage  scale .  And  as  you  can  see  that  here  for   tilt  right  and  t ilt  left,  our   scrap rate  is  larger  than  1% .  Therefore ,  an  alarm  must  be  created . But  first  of  all ,  I  would  like  to  describe  how  daily  reports  could  be  created  because  the  same  idea  is  used  to  create  alarm  reports .  First  of  all ,  we  create  a  new  window ,  which  is  a  vertical  box ,  and  it  is  called  Report .  Then  we  create  a  second  new  window ,  which  is  a  horizontal  box ,  and  that  is  called  Table .  In  this  third  step ,  we  create  a  table ,  call  it  tab1 ,  make  a  report  out  of  it,  and   appended  to  the  table  of   the  horizontal  box . Then  we  make  the  same  thing  once  again .  So  we  have  the  second  report ,  which  would  also  appended  to  the  horizontal  box .  And  at  the  end ,  the  horizontal  box  will  be  appended  to  the  vertical  box .  And  this  is  how  it  looks  like . If  you  would  like  to  add  some  graphs ,  then  you  have  to  create  one  more  horizontal  box ,  which  will  also  appended  to  the  vertical  box ,  and  this  is her .  And  then  you  will  have  the  graphs  below  the  tables .  Here ,  once  again ,  some  ideas ,  some  scripts .  I  hope  it  will  help  you . We  save  our  daily  reports  as  a  picture .  We  don't  save  it  as  a  PDF  because  we   are  not  interested  in  all  this  page  breakage .  We  would  like  to  have  high  flexibility  and  no  additional  software .  If  we  would  have  used  a  PDF ,  then  we  would  have  had  four  pages , so  Table  1 ,  Table  2,  and  two further  pages  for  the  graphs . And  so  how  do  we  store  the  string ?  We  create  a  variable ,  and  this  is  nothing  else  than  the  path  w here  we  would  like  to  save  our  report .  So  here  it  is  a  report .  Here  it  is   ALH.  As  I  said  before ,  we  say  evaluate  the plants  and  we  will  have  the  right  plant  there .  And  then  here  daily  test  result ,  here  comes  the  timestamp,  and  we  say   PNG ,  and  these  vertical  lines  mean  we  concatenate  everything  and  then  we  save  the  picture  with  this  variable  name  and  that's  it . Here   I  would  like  to  show  you  the  rules  for  an  alarm  for  every  level .  So  we  have  several  levels.  If  we  have  more  than  200  parts ,  then  we  will  have  an  alarm  if  the  scrap  rate  is  larger  than  1% . If  we  have  only  a  small  number,  so less  than  200  parts ,  then  we  will  have  an  alarm  with   not  okay  parts  if  we  have  more  than  five  not  okay  parts ,  which  means  a  scrap  number  of  2% .  But  we  can  also  have  a  potential  alarm  if  three  parts  are  not   okay . On  the  first  level ,  we   take  this  table  from  the  daily  report  overall  machines .  And  if  you  take  a  look,  then  tilt lock  overall,  tilt  right,  and  tilt  left  have  more  than  three  not  o kays .  So  here ,  therefore  we  have  potential  alarms . If  we  have  potential  alarms ,  we  dive   deeper .   Now,  we  take  the  machine  into  account .  So  here  you  see  machine   123xx ,  here  comes   machine .   124xx ,  and  so  on .  And  then  you  can  see  here ,  if  we  take  the  machine  into  account ,  then  the  scrap  rate  for  tilt  right  and  t ilt  left  is  larger  than  1% .  Therefore ,  we  have  an  alarm . For  tilt lock  overall ,  the  scrap  rate  is  low ,  but  we  have  an  potential  alarm .   So  we  will  create  an  alarm  for  this  machine,  and  now  we  use   Atraq .  Atraq  is  a  traceability  system  of   Autoliv,  so  it  has   information  which  components  are  included  in  which  retractor ,   so  one  retractor ,  and  we  have  the  total  information  of  every  part . Here  you  see  the  display  of  using  the  database  in  JMP .  You  see  two  tables .  The  first  table  is  that  table  which  we  transferred  into  the  database  based  on  our  test  results  and  re tractor  information . Then  we  have  the  second  table ,  which  comes  from  the  traceability  system .  And  if  you  press  this  little   [inaudible 00:23:58] ,  then  you  come  through  this  picture  and  we  make  a  left  outer  join .  And  now  how  do  we  make  our  join ?  We  use  the  internal  barcode .  In  the  beginning ,  I  told  you  that  every  retractor  has  a  unique  internal  barcode ,  and  this  unique  internal  barcode  is  called  serial .  So  if  they  match  together,  then  I  have  all  information  and  therefore  I  make  a  left  outer  join . This  is  how  an  alarm  report  looks  like .   It  starts  once  again  with  the  table .  It  tells  us  the  upload  date ,  the  location ,  which  machine  is  in  fact  effected  by  this  alarm .  Here  is  a  test ,  and  here  once  again ,  the  information  about  number  and  percentage  of  being  okay  and  not   okay .  And  the  same  information  is  given  here  in  the  graphs;  absolute  number ,  percentage  number . Now,  we  take  the  information  from   our  matching  from  ATOS  data .  The  first  table  is  for  ATOS  table   data ,  and  here  we  consider  tilt  lock  overall,  and  now  you  see  this  seatbelt  is  effected.  We  also  consider  machine  parameter .  And  what  does  the  15  mean ?  Here  is  a  translation .  It  means  lower  specification  limit .  Upper  specification  limit  is  27  and  so  on .  So  every  value  here  has  these  kind  of  title . Here  below,  these  are  the  component  data  given  by  the  traceability  system  Atraq.  We  have  the   CS- Ball,  CS- Sensor,  and  every  component  has  four  columns:  part  number,  lot  number ,  box  number,  and  supplier.  Part,  lot, box,  supplier .  So  this  is  some  information  we  forgot . And  now  we  start  our  predictive  screening .  First  of  all ,  we  try  to  find  out  what  was  constant .  If  something  is  constant ,  then  it  will  not  have  an  effect  on  your  okay  and  not  okay   values ,  and  the  remaining  predictors  are  used  in  a  predictive  screening .  Here  you  see  direct ly  the  results  of  the   combinations ,  and  you  see  that  the  shift  itself  has  a  very  high  impact .  So  we  like  this  predictive  screening  because  it  is  easy  to  read  for   non -statistic ians  and  it  identifies  predictors  which  might  be  weak  alone  but  strong  when  used  in  combination  with  other  predictors . And  then  based  on  this  predictive  screening ,  we  append  graphs  to  the  report  with  a larm  report,  and  we  colour the  graphs  according  to  the  predictive  screening .  First,  we  planned  out  the  shift  as  relevant .  Here  you  can  see  in  blue  the  afternoon  shift .   You  can  see  directly  that  there  was  no  failure  for  the  morning  shift ,  and  the  red  lines  here  are  specification  limits ,  but  it  starts  in  the  afternoon . Box Serial  was  also  an  significant  predictor ,  and  now  you  can  see  that  the  purple  and the  blue  Box Serials  also  have  an  effect ,  and  this  is  our  root  cause  analysis .  So  test  this  Box Serials.  They  behave  different  to  the  others . Now  we  save  the  alarm  report,  and   we  would  like  to  send  this  link  of the  alarm  report  to   Autoliv's  dispatch  system  called  Leading 2Lean .  Leading2Lean  is  configured  to  automatically  send  notifications  to  the  correct  owner .  Usually,  you  can  send  an  mail ,  but  sending  a  notification  via  Leading 2Lean  includes  a  dispatch  process,  and  it  must  be  closed . Here ,  this  is  the  way  we  do  it .   First  of  all ,  we  define  a  variable  called  alarm .  It  gives  me  the  path  and  the  location  of  the  corresponding  alarm  report .  Then  we  use  an  associative  array,  so  we  see  also  the  site .  And  here  as  a  description ,  we  include  the  alarm .   This  is  sent  via  H TTP  request .  Here  we  have  the  fields  array ,  which  we  defined  here  before,  and  then  we  send  it .  This  is  the  way ,  so  make  copy  and  paste   the  skeleton  of  your  JMP  script . This  is  how   Leading2Lean  looks  like ,  the  dispatch   system.  Here  we  have  a  dispatch  number ,  the  name ,  the  date  when  it  was  created .  Here  we  have  the  link  which  we  sent  via  HTTP  request .  And  if  you  press  it  or  you  can  also  open  it  via  email ,  then  you  will  get  this alarm  report . This  total  process ,  which  I  have  described  using  queries ,  make  predictive  screening  and  make  HTTP  request ,  everything  could  be  realized  with  JMP .  And  in  the  same  way ,  I  would  like  to  go  make  such  analysis  for  components  based  on  subassembly . As  John  Hiller  said,  JMP  writes  90%  of  your  code,  the  skeleton .  I  hope  that  I  could  have  given  you  some more  percentage  for  your  own  scripts .  I  hope  that  this  presentation  helped  you  a   lot,  and  that  you  like  it  as  much  as  me  to  work  with  JMP .  Thank  you .
Batch processes are subject to high variability: raw material composition, initial condition, unit degradation, and their intrinsic dynamic nature. Additionally, they are characterized by several distinct phases and steps that drastically change the conditions during the manufacturing process. In this presentation, we will illustrate with an industrial example how to use data science and machine learning to convert this high variability and apparent excess of data into valuable information. First, we will show how to summarize batch properties into features and use the open-source Predictor Explainer Add-in to identify the most relevant ones using AutoML and ExplainableAI. Then, we will discuss the need to align the data timewise before performing trajectory analysis, briefly introducing the pros and cons of different methodologies to achieve this result. Finally, we will dive into trajectory analysis. In this last step, we will use the Functional Data Explorer functionality of JMP Pro to monitor and identify deviations. Analyzing these deviations will lead us to identify key process control improvements to optimize production further.     Thanks,  everybody.  I'm  Mattia  Vallerio,  Advanced  Process  Control  at  Solvay s ite  in  Italy  in   Spinetta Marengo.   Today,  I'm  here  to  present a  work  that  we  did  together  with  the  University  of  L euven on  the  use  of  the  analysis  of  industrial  batch  data.  In  more  specific,  I  will  present  a  JMP  plugin  that  we  developed  that  is  using  autoML  to  do  feature  screening,  and  then  I  will  be  moving  on  to  use  functional  prediction  Explainer  to  analyze  batch  data. T he  idea  is  on  one  side,  the  autoML  will  be  used  for  automated  screening  of  relevant  parameters,  and  on  the  other  side,  the  functional  principle  component.  The  idea  there  is  to  use  it  for  anomaly  detection  on  batch  manufacturing  processes. While  doing  that,  I  will  also  talk  about  the  need  of  align  data  time wise  to  be  able  to  properly  analyze  it  and  why  you  need  to  do  it  and  how  you  could  do  it  in  a  simple  way.   Just  for  reference,  this  is  a  work  that  has  been  published  in  a  book,  but  it's  also  available  in  archive. T his  is  the  reference  with  all  the  authors  listed  there  and  you  can  download  it  for  free.   Feel  free  to  have  a  look  at  it  and  you  will  find  some  more  details  on  what  I  will  talk  about  today. In  the  same  way,  the  plugin  that  I  will  present  is  both  freely  available  in  GitHub,  but  also  on  the   JMP community  page  as  well  in  the  material  for  this  talk  and  also  on  a  dedicated  page  that  is  also  called  predictor-e xplainer. M oving  back  to  the  talk  today,  the  data  that  we  use  is  based  on  a  use  case  that  was  published  by  Salvador  Munoz  back  in  2003.   There,  you  can  download  this  code  where  he's  using   PCA and PLS method  to  analyze  batch  data. T here,  the  use  case  contained  within  is  also  used  in  this  talk  as  well  and  in  the  publication  that  I  showed  just  before. I f  we  look  at  the  data  that  we  are  analyzing,  basically  it  is  drying  process.  This  drying  process  is  composed  of  three  different  phases:  the  deagglomeration  phase,  the  heating  phase,  and  the  cooling  phase  which  you  can  see  here  as  phase  1,  2,  and  3.  T he  purpose  of  this  process  is  fairly  simple,  is  just  to  remove  solvent  from  the  dry  cake  for  the  material  that  has  been  introduced  into  this  drying  unit. A s  you  can  see,  we  have  different   initial  cake  weight  that  is  introduced  into  the  system,  and  there  are  different  variation  according  because  the  starting  material  is  different  every  time.  T he  purpose  is  to  reach  specific  target  concentration  for  the  solvent  at  the  end  so  it  doesn't  have  to  be  too  dry  or  too  wet  at  the  end  of  the  phase.  Y ou  can  clearly  see  already  from  this  picture  that  we  have  some  variation  in  shape  and  time  duration   of  the  temperature  profile  and  therefore  also  of  the  process  itself. If  we  go  a  bit  further  in  analyzing  the  data,  then  you  can  see  that  we  have  a  variety  of  different  lengths  of  batch  duration.  This  is  the  color   on  the  right  side  on  the  legend.   You  can  clearly  see  here  even  more  clearly  than  before  that  there  are  different  shapes. T his  is  also  true  for  the  solvent  concentration.   As  you  can  see  already,  this  shouldn't  be  too  much  of  a  shocker  for  anybody  that  is  in  the  process  industry.   T he  longer  the  batch,  the  lower  the  solvent  concentration,  the  shorter  the  batch,  the  higher  the  final  length  concentration  more  or  less  with  some  few  exceptions. But  as  you  can  see,  the  length  is  all  over  the  place  and  also  the  main  phases,  they  are  not  aligned.  If  you  would  take  now  data  for  all  these  batches  and  start  to  analyze  it,  you  will  be  comparing  samples.  For  example,  at  this  point  in  time  you  will  be  comparing  data  from  the  de agglomeration phase  with  data  from  the  heating  phase,  or  even  from  the  cooling  phase  with  the  deagglomeration  phase.  O f  course,  this  is  not  what  we  would  like  to  do. That's  why  it's  important  before  you  do  anything  else  with  the  data,  it's  important  that  you  actually  squeeze  or  shrink  or  enlarge  data.  But  in  order  to  have  all  the  different  batch   have the  same  length. Y ou  can  do  this  in  different  ways.  This  is  technically  called  dynamic  time  warping.  This  is  also  a  feature  that  is  included  into  JMP  when  you  do  functional  data  exploration.  But  there  are  different  ways  to  do  this.  You  have  very  complex  mechanisms  and  algorithms  that  have  been  developed  during  the  years. T he  reference  for  these  methods  you  can  find  in  the  publication  that  I  just  showed  you. But  the  drawback  for  this… O ne  of  the  drawbacks  of  the  advanced  methodology  is  that  you  need  a  reference  trajectory  in  order  to  be  able  to  use   most  of  the  dynamic time-warping  algorithm. There  are  other  ways  that  you  could  use  to  synchronize  the  batches.  One  is  that  if  you  have  a  monotonous- increasing  latent  variable,  most  of  the  time,  this  is  the  conversion  or  the  total  amount  of  material  that  is  fed  inside  the  reactor, so  t he  cumulative  feed  in  the  reactor.  This  can  be  used  as  a  way  to  plot  the  data  in  a  system  in  a  standardized  way  and  to  have  all  the  data  aligned. The  methodology  that  we  used  for  this  use  case  for  this  talk  and  also  that  we  are  proposing  in  the  paper,  in  the  article  that  we  wrote  is  to  normalize  the  data  based  on  the  automation  triggers. B y  automation  triggers  we  mean  the  change  in  the  different  phases. E very   beginning  and  end  of  the  phase  is  then  normalized  between  0  and  1,  as  you  can  see  here.  T he  deagglomeration  phase  starts  from  1  and  goes  to  2,  and  the  each  phase  goes  from  2  to  3,  and  the  cool down  phase  goes  from  3  to  4.   Then  all  the  data  is  squeezed  or  stretched  to  fit  into  this  bucket. T hen  something  very  nice  happens  that  you  can  directly  see  abnormality  or  abnormal  batches  in  a  more  clear  way  than  what  you  would  have  done  on  the  left  side. Then  if  you  would  look  in  the plot of  the  phase time,  so  the  one  in  the  middle,  then  you  can  clearly  see  that  the  inclination  of  the  line  basically  tells  you  how  long  the  batch  lasted.  T he  more  steep  the  line,  the  longer  the  phase  that  we  are  currently  looking  at. The  drawback  of  this  methodology  is  that  it  cannot  be  applied  online,  of  course.  T his  can  only  be  applied  once  the  batch  is  finished  or  once  the  phase  is  finished.  But  of  course,  online,  it's  basically  impossible  to  know  when  this  is  going  to  end. Y ou  therefore  need  to  resort  to  other  kind  of  alignment  procedure  like  dynamic  time- warping  that  is  described  in  the  paper.   I  won't  be  touching  that  today.  T hat's  it  for  this. How  do  you  analyze  actually  batch  data?  There  are  different  ways  to  do  that.  T he  first  way  that  we  are  looking  at  is  by  using  fingerprints.  W hat  do  we  call  fingerprints?  Basically,  fingerprints, y ou  can  define  it  as  aggregated  or   a  statistical  summary  of  different  summary  statistics  of  the  data  that  have  physical  meaning  or  engineering  value.  T hese  are  normally  the  variables  that  your  engineer  look  at  to  know  if  the  batch  is  going  correctly  or  has  been  performing  well  or  not.  If  you  ask  your  experts  in  the  field  or  in  the  process,  they  will  have  this  kind  of  KPI  that  they  are  monitoring  to  know  if  a  batch  has  been  performing  well  or  not. F or  example,  one  of  that  could  be  the  maximum  level  of  the  tank  in  the  deagglomeration  phase  or  the  maximum  temperature  in  the  drying  phase  or  the  standard  deviation  between  the  set  point  and  the  measured  variable  during  the  drying  phase or…  I  don't  know.  You  name  it.  You  can  go  as  crazy  as  you  want,  and  you  can  build  basically  as  many  features  as  you  want  starting  from  the  data  that  you  have.  T his  is  a  way  to  remove  the  burden  of  the  transient  behavior  of  batches,  and  it's  a  way  to  actually  compare  between  batches  by  using  simple  statistics  to  compare  different  features  of  the  batch. The  problem  with  this  is  that  you  can  imagine  that  you  can  end  up  with  a  lot  of  different  statistics  that  you  have  to  track  and  monitor,  and  sometimes  it's  very  difficult  to  understand  which  ones  are  really  relevant  or  which  are  not  relevant  at  all.  T herefore,  that's  why  we  developed  this  plugin  that  I  showed  you  just  before  which  uses  autoML  to  basically  do  a  feature  selection  on  all  these  fingerprints  that  you  can  create  yourself. The  add-in  can  be  installed  by  everybody  on  JMP.  I t  basically  looks  like  this.  I t  looks  like  any  normal  menu  that  you  would  have  in  JMP.  I t  requires  you  to  install  a  Python  installation  that  also  is  automatically  managed  by  the  installer  of  this  plugin  as  well. I f  you  want  to  do  it,  let's  say,  let's  try  to  use  it. W e  want  to  model  the  final  concentration  of  the  solvent,  which  is  our  Y.   You  can  just  basically  pop  in  all  the  sensor  data  that  you  have,  and  it  will  automatically  create  all  the  different  feature  engineering.   We'll  take  the  maximum,  the  minimum,  the  standard  deviation,  the  median,  the  mean,  and  all  the  statistics  you  possibly  imagine  of  all  the  variables  that  we  introduce.   If  you  have  information  on  the  batch  ID  and  the  phase  ID,  then  you  can  just  plug  it  in. Additionally,  if  you  have  the  Python  installation,  then  you  can  ask  the  tool  to  do  a   SHAP plot  for  the  SHAP  value  of  the  different  features  to  get  a  better  understanding  of  what  the  boosted  tree  is  doing  behind  the  scenes  to  actually  do  the  magic  and  use  this  and  select  the  features  that  are  relevant  or  not. If  I  just  click  on…  T hen  you  can  tweak  your  number  of  trees  and  signal-t o- noise  ratio  and  you  can  do  whatever  you  want.  You  can  even  add  weights.  You  can  choose.  I f  we  click  on  OK,  then  the  magic  happens.  Now,  as  you  see,  it's  still  computing  because  this  is  the  Python  script.  Behind,  it  is  computing  the  SHAP  values,  so  it  might  take  some  time  before  we  get  the  results. But  let  me  see  if  I  can… Yeah, I can  move  here  already. T his  is  the  result  basically.  Y ou  can  see  that  we  have  the  different… A s  I  said,  the  tool  basically  generates  a  lot  of  different  statistical  aggregation  of  the  data.   You  have  the  standard  deviation  of  the  agitator  speed,  standard  deviation  of  the  torque,  mean  of  the agitator  speed.  Y ou  can  see  it  for  yourself. T hen  we  also  have… Oops, s till  computing.  That's  the  beauty  of  doing  it  live.  Sometimes  it  doesn't  go  as  planned. Here  it  is.   This  is  the   SHAP plot  and  we're  going  to  look  at  it  later.  Let's  do  it  again,  but  without  the  SHAP  value  request  just  because  I  want  to  show  you  another  feature.  I  won't  click  on  that.  I  won't  be  doing  it  anymore.   Now… Oh, the SHAP  [inaudible 00:16:57].  I'll  do  it  again  afterwards,  but  let's  move  on  with  this. As  you  can  see,  we  also  have  random  and  uniform  noise  with  statistical  feature  of  this. T his  is  being  introduced  as  a  way  of  cutting  off,  as  a  way  of  selecting  which  features  are  really  relevant  and  which  features  are  not  relevant  at  all  or  cannot  be  distinguished  by  noise,  actually.  This  is  standardly  built  in  as  well.  I t  gives  you  an  automatic  cut off  as  well. O ne  of  the  things  that  you  can  see  is  that  he  selected   the  torque  and  the  agitator  speed  as  one  of  the  interesting  variables  to  look  at.  Now,  you  cannot  really  use  this  as… A ctually,  if  you  think  about  it,  it's  quite  understandable,  because  depending  on  the  amount  of  the  wetness  of  the  cake  that  you  introduce,  then  of  course  the  torque  consumed  by  the  agitator  will  be  higher  or  less  or  lower  depending  if  it  has  to  work  more  or  less. I t's  completely  normal  that  at  the  beginning,  the  ones  that  are  a  bit  wetter,  they  might  be  less  resistant,  and  the  ones  that  are  less  wet  will  have  a  little  bit  more   resistance. But  this  is  the  kind  of  feedback  that  you  get  from  the  tool.   The  standard  output  is  a  plot  with  the   variables  that  are  more  relevant.  You  might  have  seen  passing  by it.  It  also  makes  a  parallel  coordinate  plot  of  the  output  as  well  with  the  color  on  the  target.   Here,  you  can  see  again  that  if  the  torque  is  a  bit  higher,  then  the  final  concentration  of  the  solvent  is  also  a  bit  higher,  and  if  the  torque  goes  down,  then  the  wetness  of  the  cake  at  the  end  is  lower.  T he  agitator  speed  is  basically  the  effect  of  the  torque  as  well.  T his  is  also  the  torque.  But  this  is  just  a  visual  representation  of  what  the  tool  does  as  we  were  seeing  before,  but  it's  not  there  anymore.  I  don't  see  it. Y ou  can  also  have  SHAP  values  to  look  at  the  data   in  a  different  way.  T he  SHAP  values,  if  you're  not  familiar  with  the  term,  is  a  way  of  visualizing  the  impact  or  the  effect  of  the  different  values  on  the  target.   It's  a  way  to  explain  actually  the  result  of  the  machine- learning  algorithm  that  you  use  behind. L et's  try  to  do  it  maybe  one  more  time,  maybe  selecting  fewer  parameters,  let's  say,  torque,  agitator,  and  dryer temperature  set  point,  which  are  the  ones  that  have  been  [inaudible 00:20:48].  Okay,  like  that.  Then  we  do  with phas e and batch. T hen  we  ask  him  to  do  the  SHAP  plot.  Let's  see. Coming. B ear  with  me  a  bit  with  this.  It  should  be   any minute, any  second  now.  Oh,  here  we  go. The  legend  gives  you  the  normalization  of  the  value. I f  the  points  is  on  the  left  side,  then  you  have  a  negative  effect  on  the  target  value,  and  if  your  point  is  on  the  right  side  of  this,  then  you  have  a  positive  effect  on  the  target  value. Now,  as  you  can  see  again,  the  torque  is  one  of  the  most  important  ones.  Y ou  will  basically  see  the  same  that  we  have  seen  in  the  parallel  coordinates  plot  and  in  the  results  of  the  analysis  as  well.   If  you  have  a  lower  value  of  the  torque,  then  you  have  a  negative  effect  or  you  are  on  the  left  side  of  the  curve,  and  if  you  have  a  positive  value  of  the  torque  or  higher  value  of  the  torque,  then  you  have  a  positive  effect. T hen  you  can  analyze  this  for  the  other  variables  as  well. But  this  is  a  very  powerful… W e  think  it's  a  very  powerful  tool  to  visualize   the  effect  and  to  break  down,  to  analyze  what  the  actual  algorithm   spits back  to  you  in  a  more  efficient  way.  At  least  this  is  what  we  think  and  that's  why  we  included  it  into  this  tool. T hen  you  can  just  scroll  down  and  look  at  all  of  the  variables.   Then  of  course,  we  still  have  the  random  uniform  and  the  noise  as  well  inside,  even  though  it's  not  really  relevant. You  might  have  noticed  that  batch  ID  was  also  relevant  as  well.  It's  a  bit  fishy,  this  plot,  right?   This  actually  is  a  good  point  to  move  into  the  next  part  of  the  talk.   That  is  to  have  anomaly  detection  for  batches  or  to  have  a  way  of  analyzing  if  one  of  the  variable  is  going  out  of  spec. T he  standard  way  to  do  this  for  batches  or  for  industry  in  general,  is  to  look  at  some  KPIs  and  see  if  they  evolve  during  time.  F or  example,  we  might  want  to  look  at… No, that's not  what  they  I wanted  to  open.  We  might  want  to  look  at  the  different  phases  and  the  duration,  to  have  a  look  at  the  variation  that  we  see.  W e  expect  a  lot  of  variation  in  the  deagglomeration  phase  and  a  little  bit  less  in  the  heat  phase  and  the  cool down. T he  other  way  to  look  at  this  is  basically  to  do  a  control  chart  of  different  parameters  and  see  if  these  parameters  are  inside  the  limits  that  you  have  specified  or  not.  O ne  way  to  look  at  that  is  to  look  at  the  target  function,  for  example,  that  will  be  one  of  the  first  variables  that  you  need  to  monitor. N ow,  if  you  remember  the  graph  that  we  showed  before  where  you  could  see  that  the  batch  ID  had  an  impact  on  the  solvent.  Now  plotted  like  this,  it  makes  more  sense  what  we  are  looking  at  in  that   SHAP plot.  It  is  because  there  has  been  definitely  a  trend. S tarting  from  batch   0  towards  batch  70,  there  is  a  variation  on  where  the  final  solvent  concentration  has  been.  U p  to  batch  30,  we  were  on  target,  then  we  went  under  target,  and  then  we  went  too  much,  high solvent  as  well.  Definitely  something  changed  during  the  process,  and  therefore,  we  had  this  kind  of  visualization  in  the  SHAP  plot  as  well.  I t  picks  up  that  the  batch  ID  is  relevant  to  predict  the  final  solvent  concentration,  but  it's  just  an  artifact  of  this  data. N ow,  we  don't  know  if  this  is  different  batches.  Most  likely,  there  are  different  batches  of  different  product,  and  the  initial  concentration  differed  from  different  campaigns  or  something  else  was  going  on.  But  this  is  an  additional  uncertainty  that  is  inherent  of  batch  processes  that  if  you  have  this  variation  in  your  raw  materials.  This  is  also  true  for  other  process  as  well,  but  for  batch  processes,  it's  much  relevant  as  you  can  see  here. One  way  to  look  at  data  or  to  do  anomaly  detection  that  has  been  widely  published  and  is  also  widely  used  in  this  industry  is  the  use  of  PCA  analysis  like  PCA  and  PLS  combination  to  understand  the  multivariate  space  at  the  specific  point  in  time.  I f  this  is  not  representative  of  what  is  going  on  in  the  batch  or  in  the  ongoing  batch,  then  you  will  have  an  alarm. I t's  a  multivariate  way  to  look  at  the  data. Now,  with  the  functional  predictor  explainer,  now  we  can  basically  do  the  same,  but  instead  of  using  standard  PCA,  we  will  use  the  entire  information  of  the  trend.  This  is  a  standard  tool  that  you  can  find  inside  JMP.  It  is  in  specialized  model.  T his  is  Functional  Data Explorer .  It's  a  part  of  JMP Pr o,  only  JMP Pro.   That's  what  I'm  using.  If you  have  it,  then  you  can  use  it.  We  can  do  basically  the  same  or  we  can  do  this  analysis.  I  already  run  it  so  we'll  just  relaunch  it. Basically,  what  you  see  is,  for  example,   if  we're  looking  at  the  tank  level  as  a  function  or  as  a  variable,  then  it  gives  you  summary  statistics.  T he  idea  behind  the  FPCE  like  PCA  is  instead  of  creating… It's  basically  creating  and  identifying  eigen functions  that  can  explain  the  shape  that  we  see  in  a  specific  percentage.  I n  this  case,  for  the  tank  level  it  just  identified  two  eigen functions,  and  the  sum  of  this  function  can  explain  97.3%  of  the  shape  of  the  totality  of  the  shapes  that  we  see. N ow,  here  you  see  all  the  shapes  on  the  left,  and  you  can  clearly  see  that  there  are  some  that  are  not  represented  by… They're  not  similar  to  the  rest.  You  can  play  around  a  bit  and  increase  the  number  of  shapes  to  include  the  third  eigen function,  but  automatically,  JMP  selects  for  you  the  most  appropriate  number  of  eigen function   to  have  a  trade off  of  explanation  of  the  shape. I f  you  go  back  to  two… There we go. How  does  this  work?  Basically,  as  you  can  see,  you  have  all  the  batches  here,  and  then  you  have  the  score  plot  which  is  actually  what  allows  you  to  understand  which  batches  are  anomalous  and  which  are  not  anomalous.  Y ou  have  definitely  batch  61  that  is  a  bit  out  there  with  respect  to  the  rest.   Then  as  you  can  see  here,  you  have  batch  55.  Going  left  from  right,  you  can  see  that  there  is  an  evolution  of  the  batches  on  the  Component  1  axis,  which  was  this  specific  shape  over  there. A ccording  to  where  you  are  on  this C omponent  1  axis,  then  the  batches  will  have  different  shapes,  and  the  max  level  basically  will  increase  and  increase  until  you  reach  batch  55  and  batch  66,  which  are  a  bit  anomalous  with  respect  to  the  rest.  This  basically  is  the  same  concept  of  having  a  PCA  but  with  shape  function  analysis  instead  of  multivariate  analysis  done  row  by  row  and  point  by  point. The  idea  in  the  end  is  that  you  could  use  this  online  to  understand  if  the  batch  is  inside  the  specification  or  outside  of  the  specification.  Y ou  could  do  it  per  phase,  for  example.  If  you  have  a  specific  shape  for  one  of  the  variables  that  you  need  to  trend,  then  you  could  use  this  to  analyze  and  see  where  you  are. The  same  is  true  for  the  other  shapes  that  we  have,  the  other  variables.   You  can  do  this  for   the  dryer  temperature  variable.  In  this,  case  we  have  three  different  eigen function,  and  this  explains  up  to  87%  of  the  variation.  A gain,  by  looking  at  the  score  plot,  you  can  spot  anomalous  batch  basically  just  by  looking  at  this, so  batch  34  as   a  flat  top  while  all  the  other  batches  have  a  pointy  shape  that  you  can  find  back  in  basically  all  the  other  ones. The  model  that  is  coming  out  can  be  used  for on line  anomalies  detection  if  you  can  implement  it.  B y  the  way,  if  you  have  the  new  version  of  JMP,  you  can  connect  directly  to  your  process h istorian   if  you  have   OSI PI.  Otherwise,  there's  been  another  talk  by  my  colleague,  Carlos,  about  the  use  of  another  plugin  that  we  developed  to  extract  data  from  your  historian  which  can  connect  to  both   OSI PI and  Aspen  21.   You  can  download  your  data  directly  and  plug,  pop  it  in  and  see  if  a  batch  has  been  behaving  according  to  your  specification  or  not. Basically,  I  think  this  more  or  less  cover  what  I  wanted  to  show,  and  the  idea  that  we did the  two  different  methodology  that  we  have  been  using  in   Solvay to  look  at  process  data.  Looking  forward  to  see  you  at  the  summit  in  Spain  next  month  in  March  if  you  are  there.  Otherwise,  feel  free  to  reach  out  to  me  or  to  any  of  my  co author  if  you  need  more  information. Just as  a break up,  this  is  the  place  where  you  find  the  article  that  we  published  about  this  with  a  little  bit  more  information  and  a  little  bit  more  detail  with  respect  to  what  I've  just  shown  to  you.  A gain,  it's  open  source.  You  can  download  it  for  free  from  the  link  and  it's  all  there  for  you  to  look  at  and  browse.  Thanks  again  for  your  attention. T hat's  it.
The three PQ batch concept in the pharmaceutical industry is being replaced by continued process verification. However, there is still a validation event (stage 2) before going into commercial manufacturing (stage 3). In stage 2, you are supposed to prove future batches will be good, and in stage 3, you are to prove that the validated state is maintained. If this can be done, there is no need for end-product testing any longer leading to large QC cost reductions. JMP has the ideal toolbox for both stages. If the process can be described with a model, prediction intervals with batch as a random factor can be used to predict the performance of future batches. To live up to the assumptions behind the model, JMP has an excellent toolbox to do the sanity check. Even cases with variance heterogeneity can be handled through weighted regression. A JMP script that combines the necessary tools described above to calculate control limits, prediction limits, and process capability in stages 2 & 3 will be demonstrated. The script has been heavily used by many of our customers. It only requires that data can be described with a model in JMP.     Thank you  for  giving  us  this  opportunity  to  talk  about  how  we  use  JMP  in  pharma  process  validation  and  continuous  process  verification.  My  colleague,  Louis  and  I  will  give  you  a  guide  to  the  background  of  it,  how  we  do  it,  and  what  is  the  outcome. The  agenda  is  that  first  we  will  describe  the  background  and  the  concepts.  Since  its  validation,  we  must  justify  the  assumptions  we  have  behind  our  models,  so  sanity  check  is  important.  Then  I  will  go  on  to  talk  about  the  particular  methods  we  are  using  and  which  formulas  we  have  to  make  for  things  that  are  not  there  in  JMP  upfront. Then  Louis  will  take  over  and  show  how  the  JMP  script  is  doing  it  to  automate  the  calculations  and  implement  what  is  missing  in  JMP.  Finally,  we  will  make  some  conclusions  and  hopefully  there  will  also  be  time  for  questions. Now  to  the  background  and  the  concept  we  are  working  with.  Louis  and  I  come  from  a  company  called  NNE,  and  we  do  consultancy  within  the  pharmaceutical  industry,  and  we  are  JMP  partners,  so  we  help  our  clients  getting  value  out  of  the  data,  of  course,  using  JMP. A big  issue  within  the  pharmaceutical  industry  is  process  validation  when  they're  going  to  launch  a  new  product. T raditionally  for  many,  many  years,  it  has  been  done  by  making  three  batches.  I f  they  were  good,  the  validation  was  passed. But  this  is  not  really  predicting  the  future,  this  is  predicting  the  past.  So, o f  course,  you  can  make  three  batches  that  were  good,  and  then  the  problem  can  appear  later  on.   To  compensate  for  that,  that,  the  industry  has  traditionally  had  an  extensive  and  thereby  costly  QC  inspection  on  every  batch  to  ensure  it  was  okay. However,  if  we  could  prove  that  all  future  batches  would  be  good,  then  we  could  reduce  the  QC  cost  heavily.  But  how  do  we  predict  that  the  future  batches  will  be  good  with  confidence?  Yeah,  we  could  use  prediction  intervals  or  individual  confidence  levels,  as  I  call  in  JMP,  at  least  in  some  menus. If  we  put  batch  as  a  random  factor,  then  the  validation  batches  we  are  looking  at  will  be  seen  as  a  random  sampling  of  all  future  batches.  T hereby,  we  can  predict  how  future  batches  are  going  to  be.   If  this  prediction  is  inside  specification,  we  have  actually  proven  that  all  future  batches  will  be  good  given  that  we  can  maintain  the  validator  state  force. However,  this  might  take  more  than  three  batches,  but  do  they  all  have  to  be  in  state  two,  as  we  show  in  this  graph?  Maybe  not.   We  actually  should  just  still  make  three  batches  in  your  first  process  of  validation.   If  it's  not  enough,  then  make  the  rest  in  which  we  call  straight  three  A,  which  is  after  commercialization,  because  then  you  can  just  sell  them  one  by  one.   It's  not  a  big  problem  that  you  have  to  make  more. However,  this  requires  two  things  to  be  fulfilled.  The  first  is  that  we  still  have  this  extensive  QC  test  that  we  can  find  out  if  your  batch  is  bad  because  we  are  not  yet  tooled,  it  will  not  happen.  A lso,  of  course,  it  requires  that  the  estimated  performance  is  okay.  How  can  we  see  if  the  estimated  performance  is  okay?   We  can  do  that  by  seeing  if  control  limits  are  inside  the  specification. Then  we  can  go  on  the  market  and  later  on,  when  also  prediction  limits  are  in  such  specification,  then  we  can  actually  remove  or  at  least  reduce  the  QC  inspection.   We  might  be  going  from  monitoring  the  product  until  monitoring  the  process. What  is  it  all  about?  Validation  is  about  predicting  the  forms  of  future  batches  with  confidence,  not  just  predicting  the  past.  How  can  we  do  that?  We  can  do  that  by  using  prediction  or  12 intervals  in  JMP.  How  many  batches  will  we  suggest  to  make  in  stage  two?  We  still  suggest  three,  but  you  might  not  be  done. How  can  you  then  pass  stage  two  with  few  batches  so  we  can  go  on  to  the  market?  That's,  as  I  said  before,  look  at  control  intervals  because  they  are  without  confidence.  You're  not  punished  by  only  having  three  batches.  T hen  later  on,  look  at  prediction  or  12  intervals. If  control  intervals  are  inside  specification,  prediction  intervals  are  not.  The  most problem  of  the  reason  for  that  is  lack  of  data.  This  shouldn't  be  limiting  you  to  go  onto  the  market. Then  how  many  batches  to  make  in  stage  3A?  That's  fairly  simple.  Simply  until  your  prediction  limits  are  inside  specification,  then  you  have  passed  your  stage  3 A.  These  limits  you  can  also  use  as  prediction  limit  after  stage  3 A  when  you  go  into  stage  3B. We  can  also  see  it  on  this  flow  chart  where  you  in  upper  left  corner,  start  with  analyzing  your  validation  ones,  typically  tree.  Then  you  calculate  your  prediction  limits  in  JMP  as  a  random  factor.  If  they  are  inside  spec  limits,  everything  is  fine.  You  have  passed  both  stage  2  and  3 A  and  everything  is  fine. If  it  turns  out  that  prediction  limits  are  too  wide  compared  to  the  spec  limit,  I  would,  as  a  next  thing  l ook  at  my  control  limits  which  are  without  confidence.   If  they  are  inside,  then  we  just  need  more  data  and  we  will  collect  more  data  and  we  calculate  and  a  point  of  time  we  will  go  out  here  and  have  passed  also  strict  TA. Of  course,  if  it  happens  that  even  control  limits  are  outside  specification,  then  actually  the  estimated  value  is  bad  and  you  are  not  really  ready  for  validation  and  you  have  failed.  Hopefully,  this  is  not  going  to  happen. Let's  go  on  to  the  sanity  check.  Now  I  will  go  into  into  JMP.  I  will  try  to  demonstrate  how  it  works.  Here  you  see  a  JMP  table  which  we  have  made,  or  not  we  have  made,  it  actually  comes  from  ISPE,  International  Society  of   Pharmacoepidemiology,  who  put  us  this  data  set  as  a  good  example  on  how  a  validation  set  should  look  like.  Many  companies  or  consultancy  has  tried  to  calculate  on  that  and  see  how  would  they  conclude. It's  basically  a  data  set  from  tablet  manufacturing,  and  classically  it  contains  these  three  batches  A,  B  and  C.  Then  we  are  taking  out  powders  in  15  locations  in  the  blender,  and  then  we  are  measuring  the  API  content,  that's  the  strength  of  the  actual   [inaudible 00:06:52]   ingredient,  which  would  be  between  85  and  115  to  be  inside  specification. If  you  look  at  the  variability  chart  of  the  data,  you  can  see  here  we  have  the  three  batches  A,  B  and  C,  and  we  also  have  the  15  locations  in  each  batch.  Then  within  each  batch  location  combination,  we  have  four  measurements. The  first  you  can  see  is  that  the  within  location  variation  on  batch  B  is  somewhat  bigger  than  on  batch  A  and  C.  We  do  not  have  ba tch heterogeneity.  To  make  that  more  objective,  you  can  put  S  control  limit  on  your  standard  deviation  chart  and  you  can  see  that  you  are  out  of  control  in  Batch  B. Another  way  of  seeing  it  would  be  to  make  a   heterogeneity  of  variance  test  as  you're  seeing  down  here.  Then  it's  also  very  clear  that  there's  more  within  or  in  location  variation  on  batch  B  than  on  the  other  ones. Strictly,  you  cannot  pool  these  variances  however  you  can,  but  we  need  then  to  do  weighted  regression  where  we  weigh  with  inverse  variance.  That's  the  next  step  we  will  do,  go  into  log  variance  modeling,  which  you  can  do  in  JMP. We  have  simply  here  just  made  a  log  variance  model  where  you  can  see  that  the  location,  the  batch,  the  interaction  all  have  a  significant  influence  on  the  level,  but  only  the  batch  has  a  significant  influence  on  the  variance.  As  you  can  see  that  the  variance  for  batch  B  is  somewhat  bigger  than  within  variance  on  batch  A  and  C. When  you  have  built  this  model,  you  can  from  up  here  just  save  the  columns  and  you  can  save  your  variance  formula.  I've  already  done  that.   It's  already  in  my  data  set.   If  I  unhide  here,  here  you  can  see  my  variance  column  from  my  log  variance  model.   You  have  a  formula  here  saying,  what  is  it  in  group  A  and  C  for  batch  A  and  C,  and  what  is  it  for  batch  B?  Based  on  that,  I  can  now  make  weighted  regression  where  I  simply  just  make  a  weight  column  where  I  weight  my  regression  classically  with  inverse  variance. Then  I'm  ready  for  making  a  sanity  check  on  my  data  because  now  I  can  pool  the  variances  after  weighted  regression,  so  I  can  start  doing  sanity  check  on  my  data.   The  first  thing  I  would  do  is  to  look  at  a  systematic  model  where  I  have  my  location,  batches,  and  the  interaction. T hen  I  can  see  on  my   [inaudible 00:09:28]  type  residual  plot,  I  have  no  outliers  and   outlaws ,  but  it  is  inside. I  can  also  see  on  my  box  plot  transformation,  there's  no  need  to  transform  my  data.   Here  it's  reasonable  to  assume  that  I  have  these  normal  distributed  residuals,  which  is  the  assumption  behind  a  fixed  least  squares  model. Then  next  step  is  to  put  batches  as  a  random  factor,  not  just  a  systematic  factor,  because  I  would  like  to  predict  future  batches.   This  is  what  I  have  done  here.  T hen  suddenly  you  get  these  variance  components.  What  is  the  between  batch  variation?  What  is  the  batch  location  variation?  W hat  is  the  residual  mean  within  batch  within  location  variation? Since,  of  course,  I  have  weighted  it  with  inverse  variance,  this  number,  of  course,  gets  close  to  one.  However,  in  this  model,  there  are  also  some  additional  assumptions,  namely  that  the  batch  effect  and  the  batch  location  effect  is  randomly  distributed,  meaning  that  these  random  effect  predictions  up  here  should  follow  a  normal  distribution. To  test  that,  you  can  just  put  them  into  a  table,  which  I've  done  here,  and  we  can  test  the  loops  just  making  an  empty  model  where  you  put  them  as  a  response.   You  can  see  both  the  batch  signals  or  the  batch  variances  and  the  interaction  variances  all  are  inside  the  residual  limits.  Also  here  we  can  justify  the  assumption  that  these  are  just  random  factors.  This  is  the  sanity  check  of  the  model. Now  to  make  models  for  batch  A  and  batch  C,  we  need  to  scale  them  differently.   That's  what  I  have  done  in  these  columns  over  here.   If  I  unhide  my  columns  here,  you  can  see  I  made  an  AC  scale  weight  where  I  simply  just  multiplied  my  weight  factor  with  the  scaling.  So  I  get  the  variation  I  have  seen  for  batches  A  and  C,  and  the  same  thing  for  my batch  B.  I  can  scale  the  weight  with  the  variance  I  would  like  to  have  as  a  residual  variance,  which  is  just  coming  from  my  block  variance  modeling.   That's  also  pretty  straightforward  to  do. N ow  I  can  actually  have  my  two  models  from  where  I  can  predict.  One  where  I  scale  with  my  AC  scale  weight  and  one  where  I  scale  with  my  B  scale  weight.   Thereby  I  can  see,  if  I  just  run  one  of  them,  one  scale  with  the  AC  scale  weight,  then  you  can  see  that  you  have  this  model,  you  have  these  residuals,  you  have  these  balance  components. What  is  very  easy  in  JMP  is  just  to  go  up  here  and  you  can  go  and  you  can  save  your  columns.  F rom  up  here,  you  can  actually  save  the  prediction  formula  that  would  be  your  center  line.  But  you  can  also  save  your  prediction  interval  that  here  are  called  individual  confidence  interval. I  have  done  that  both  for  this  model  and  also  for  the  other  model.  I  also  want  the  other  model  with  the  different  scaling  for  batch  B.  T hen  I  save  these  columns  as  individual  confidence  intervals.   Then  they  are  just  saved  down  here  as  these  prediction  formulas.  Then  I  can  make  a  combined  one  by  just  combining  these  depending  on  which  batch  group  things  are  coming  from.   Now  I  have  limits  for  both  batch  A,  B  and  C,  which  I  can  plot  on  the  same  graph. This  is  straight  over  to  do.   Until  now,  I  just  used  what's i nside  JMP.  But  to  get  the  control  limits  and  to  get  the  tolerance  limit,  I  need  to  calculate  them  myself,  because in  JMP,  you  don't  have  tolerance  limits,  you  don't  have  control  limits  in   Fit Model. Of  course,  you  have  control  limits  when  you  do  control  chart,  but  that's  also  only  for  one  mean  and  one  standard  deviation.  Of  course,  you  also  have  tolerance  limits  in  analyzed  distribution.  But  again,  that's  only  for  one  mean  and  one  standard  deviation. Here  we  have  to  calculate  them  ourselves.   The  first  thing  we  then  do  is  just  get  the  variance  components.  This  is  what  we  have  here.  W e  just  saved  the  variance  components  from  the  two  models,  one  where  we  scaled  to  batch  A  and  C,  and  one  where  we  scale  to  batch  B. Because  when  we  have  the  variance  components,  it's  pretty  straightforward  to  calculate  the   control  limit  because  it's  basically  just  the  prediction  formula   minus  a  normal  quantile,  and  then  the  square  root  of  the  total  variance,  and  then  either  variance  A  and  C  or  B,  depending  on  what we're  looking  at.  I t's  actually  pretty  straightforward  to  calculate  the   control  limits  that  way. The  same  we  can  do  for  the  tolerance  limits.  If  you  prefer  tolerance  limits  instead  of  prediction  limits,  many  companies  do.   Again,  unfortunately,  they  are  not  in   JMP  model,  but  we  can  calculate  them.   There  we  simply  just  put  in  the  classical  formula  for  a  tolerance  limit.  Where  one  sided,  we  put  in  the  T  quantile,  we  put  in  the  degrees  of  freedom,  we  put  in  the  normal  quantile,  and  as  before,  of  the  proportion,  and  then  we  multiply  by  the  total  balances  from  the  variance  component. The  only  little  bit  tricky  thing  here  is  that  we  need  the  degrees  of  freedom  under  total  variance.   JMP  is  not  really  giving  you  that.  It's  only  giving  you  degrees  of  freedom  in  systematic  models.  But  we  can  see  based  on  the  width  of  the  prediction  interval,  that's  what  we  have  here.  Then  we  can  actually  calculate  based  on  the  width  of  the  prediction  intervals,  how  many  degrees  of  freedom  do  we  have  there,  and  then  we  just  apply  the  same  degrees  of  freedom  on  the  tolerance  limits  and  then  we  are  ready  to  go  and  we  can  save  all  of  these  limits  down  here.   If  we  want,  we  can  plot  them.  This  is  what  we  have  done  here. Here  you  can  see  actually  the  prediction  limits  and  the  tolerance  limits  and  the  control  limits.  The  control  limits  are  the  dotted  line,  the  prediction  limits  are  the  full  line,  and  the  rough  dotted  line  are  the  tolerance  limits.   You  see  that  for  the  15  different  locations  and  for  the  three  batches  A,  B  and  C and  of  course,  limits  are  wider  for  batch  B  because  it  has  more  within  location  variances. You  can  actually  see  that  even  the  control  limits  at  the  dotted  line  are  outside  for  location  1  and  2  and  actually  also  four.  Even  though  this  process,  this  is  a  good  example  of  a  validation,  I  would  say,  well,  if  batch  B  is  representative,  you  have  actually  failed  the  validation. If  you  look  at  A  and  C,  then  all  the  dotted  lines  are  inside,  meaning  that  we  have  passed  stage  two  because  the  estimate  value  is  fine.  But  on  a  few  of  them,  you  can  see  that  the  prediction  limits  are  just  outside.  But  when  you  have  this  big  gap  between  control  limits  and  prediction  limit,  just  one  batch  more  would  probably  make  them  inside.  This  is  how  it  works. Some  companies  they  do  not  think  it's  enough  to  look  at  if  the  limits  are  inside  specification.  They  would  actually  like  to  get  a  Ppk  capability  index  on  them.   We  also  put  that  one  into  our  method  here.   If  you  look  at  the  formulas  here,  you  can  see  where  we  have  the  prediction  limits,  that's  also  put  in  a  Ppk  formula. Basically  what  we  do,  we  just  take  the  prediction  formulas,  this  into  the  spec  limit,  and  and  divided  by  the  half  width  of  the  prediction  limits.   This  is  this  Ppk  that  corresponds  to  the  prediction  limits.   Because  they  are  with  confidence,  this  is  actually  Ppk  with  confidence. You  can  do  exactly  the  same  with  your  control  limits.  Just  take  the  classical  Ppk  formula  and  just  put  in  control  limits  instead  of  prediction  limits.  B ecause  control  limits  are  just  estimated  values,  this  is  Ppk  without  confidence.  Of  course,  you  can  do  exactly  the  same  thing  also  for  the  tolerance  limits,  which  we  have  down  here,  a  Ppk  based  on  the  tolerance  limits.  You  can  also  calculate  all  of  these.   If  you  want,  of  course,  you  can  plot  them  on  a  graph  as  you  see  here. Up  here,  we  have  estimated  Ppk  without  confidence,  that's  the  red  one.  T hen  we  have  the  blue  one  that  is  based  on  tolerance  limits  and  the  green  one  that  is  based  on  prediction  limits.   Up  here,  it's  for  the  good  group  A  and  C   with  indication  variation.   Down  here  how  it  is  with batch  B, is  a  bigger  one. You  can  actually  see,  as  we  saw  with  the  limits  for  batch  B  variation,  even  the  estimated  value  has  a  Ppk  below  one  for  one,  two,  and  four  positions.  While  all  the  estimated  value  is  nicely  about  one  for  A  and  C. A lso  the  Ppk  based  on  prediction,  the  green  one,  most  of  them  are  about  one,  and  they're  very  close  to  one  on  location  one  and  two.  This  would  just  require  one  additional  batch  probably,  and  then  they  would  be  inside. However,  as  you'll  probably  see  it's  a  bit  tedious  to  do  all  these  calculations  by  hand.   This  is  why  we  have  made  a  script  for  it,  which  Louis  is  now  going  to  demonstrate.  I  should  just  Louis  take  over  now  and  shows  how  the  script  works.  I'm  sure  he  will  also  make  a  few  more  examples. Yes,  Pierre,  thank  you  very  much.  I  will  just  try  to  share  my  screen  here  then.  Yes.  First  thing  first,  as  Pierre  has  shown  us,  we  face  a  lot  of  issues  during  the  special  validation.  Luckily  for  us,  JMP  has  a  lot  of  straight  up  box  functionalities  to  help  us  deal  with  these  issues.  We  really  like  to  use  the  functionalities  coming  out  of  the  box  with  JMP  because  this  really  lessens  our  validation  burden  when  going  forward. Some  of  the  things  JMP  has  solved  for  us  is,  for  example,  the  issue  that  we  are  working  with  complex  processes  and  these  cannot  be  described  by  just  one  mean  and  one  standard  deviation.  Here  to  deal  with  this,  we're  leveraging  the   Fit Model  platform  in  JMP  where  we  can  have  multiple  systematic  factors  to  handle  the  many  means,  and  we  can  also  have  multiple  random  factors  to  deal  with  many  variance  components. Then  there's  the  issue  with  data  often  needing  some  data  transformation.  Again,  JMP  has  the   Box-Cox  transformation  window  we  use  to  deal  with  this  directly  in  the   Fit Model  platform.  Then  we  usually  see  a  lot  of  outliers  we  have  to  handle  in  bigger  data  sets.  H ere  we  again  leverage  the   Fit Model  platform,  more  specifically  the  student  sized  residuals  plot  where  we  exclude  outliers  based  on  the  Bonferroni  limits. Here  it's  just  important  to  note  that  at  least  our  concept  is  that  we  like  to  exclude   the  actual  model,  but  we  always  report  them  separately.  It  has  to  be  visual  that  these  are  not  included  in  the  analysis. Then  there  is  the  lack  of  homogeneity  of  variance,  which  Pierre  also  showed  how  to  deal  with  using  the  log  variance  modeling  to  get  out  the  variation  and  then  use  this  variance  to  compute  a  weight  formula. Then  again,  there  is  the  prediction  intervals  which  we  use  quite  a  lot  here.  Here  as  well  JMP  is  ready  to  go  with  individual  confidence  intervals  directly  in  the   Fit Model  platform. But  then  as  Pierre  also  mentioned,  we  felt  like  there  were  some  parts  where  JMP  could  not  bring  us  all  the  way  home.  This  includes,  for  example,  controlled  intervals  in  JMP,  which  can  only  handle  the  one  mean  and  one  variance  component. How  we  overcome  this  is  that  we  calculate  controlled  intervals  from  the   Fit Model  using  the  prediction  formula  and  standard  error  on  prediction  formula  and  residuals  to  calculate  the  total  variance,  and  from  here,  the  control  intervals. Then  there  is  the  fact  that  tolerance  intervals  is  preferred  by  many  of  our  customers  because  this  allows  you  to  separately  set  the  confidence  and  coverage.  However,  we  only  found  that  JMP  can  do  tolerance  intervals  in  the  distribution  platform,  and  it  can  only  handle  one  mean  and  one  variance  component  again. Of  course,  in  the  script  built- ins  that  it  can  calculate  the  tolerance  intervals  similar  to  the  control  intervals.  W hen  we  do  the  calculation  of  the  tolerance  intervals,  we  use  the  same  degrees  of  freedom  as  used  when  JMP  calculates  its  prediction  intervals. Then  maybe  the  primary  reason  for  doing  this  script,  or  the  initial  reason  at  least,  is  that  calculation  and  visualization  of  all  these  intervals  is  very  time  consuming  and  very  prone  to  human  errors.   Actually,  we  decided  to  make  this  script  to  automate  calculations  and  visualizations. Then  lastly,  many  of  our  customers  require  capability  and  dialysis  such  as  Ppk.   We  have,  of  course,  also  included  this  in  our  script  such  that  it  calculates  Ppk  corresponding  to  both  prediction  limits,  what  we  call  Ppk  with  confidence,  and  also  control  limits,  what  we  call  Ppk  without  confidence. To  talk  about  the  script  a  little  bit,  the  script  we  have  designed  here  consists  of  two  parts.  We  primarily  have  the  JMP  script  you  see  here  in  the  middle,  and  then  we  also  have  a  template  document,  as  you  see  up  here  in  the  upper  left  corner. When  making  this  script,  there  were  a  few  things  we  really  wanted  to  achieve.  We  wanted  to  make  it  really  transparent,  meaning  that  our  customers  could  go  in  and  look  under  the  hood,  so  to  say,  and  follow  the  calculations  of  these  different  limits  and  potentially  even  edit  them  themselves,  which  takes  us  to  the  next  thing. We  wanted  this  to  be,  at  least  to  some  extent,  customizable  by  our  customers.  If  they  don't  like  the  way  we  calculate  Ppk,  we  would  like  to  give  them  the  option  to  go  in  and  edit  the  equations. The  template  document  in  itself  is  actually, so to say,  an  empty  data  table.  It  has  a  lot  of  defined  columns  with  column  formulas,  but  it  has  no  rows  in  there  yet.  What  our  script  does,  essentially,  is  that  it  works  from  this  template  document  as  an  input.  It  also  takes  the  original  data  file  on  which  you  base  your  analysis  as  an  input,  and  then  it  takes  the  model  window. Here  it  is  to  be  said  that  the  model  window  is  assumed  to  be  a  complete  model.  So  you  have  done  all  the  all  your  data  manipulations,  ensured  that  this  is  a  proper  response  you  get  from  the  model. A lso  you  have  made  these  sanity  checks  and  justified  the  assumptions  behind  the  model  ensuring  it's  valid. What  the  script  does  itself  is  that  it  takes  the  template  document,  it  takes  the  original  data  file,  and  then  it  copies  the  data  from  the  original  data  file  to  the  template  document.   We've  done  this  such  that  we  don't  interfere  with  our  customer's  original  data  either. It  then  also  takes  all  the  relevant  parameters  from  the  model  window  and  insert  it  into  this  template  document.  F rom  there  on,  more  or  less,  this  template  document  will  fill  out  itself  through  the  already  defined  formulas  in  the  columns. This  template  document  will  then  in  the  end  be  converted  to  an  output  table  where  the  first  many  columns  would  look  very  much  like  the  original  data  file,  and  then  you  will  have  the  remaining  calculation  and  input  values  as  additional  columns  further  on. I  think  we  should  go  to  an  example  and  see  how  this  looks  in  real  life.  I  will  just  try  to  stop  sharing  this  screen  and  share  this  one.  What  we  have  here  is  in  fact,  the  same  ESBIC  case  as   [inaudible 00:25:20]   here.  What  you  would  do  is  you  would  have  your  data,  you  will  run  a  model.  In  this  case,  I  run  the  model  where  peer  scale  it  to  correspond  to  the  variation  in  batch  A  and  C,  the  batches  with  the  lower  variation. What  you  simply  do  is  that  you  have  your  model,  ensure  it's  done  in  sense  of  sanity  checks  and  so  forth.  Then  we  run  this  script  on  this  model,  which  is  here.  We  have  the  model  and  we  run  the  script. We'll  see  a  few  things  pop  up  here.  First,  we  have  the template  document,  output  document  which  has  now  been  populated  with  a  lot  of  different  rows,  some  table  variables,  and  so  forth.  We  have  our  Ppk  graph  here  visualizing  the  Ppk  based  on  different  limits.  The  blue  being  based  on  control  limits,  the  red  being  based  on  prediction  limits,  and  the  green  one  being  based  on  tolerance  limits.  Then  lastly,  we  have  our  graph  here  plotting  all  the  data,  specification  lines,  and  as  well  our  prediction  limit,  control  limits,  and  tolerance  limits  here. We  get  the  image  that  Pierre  also  showed  earlier.  What  we  see  here  is,  in  fact,  that  if  we  look  at  our  data  and  assume  all  batches  varies  as  much  as  batch  A  and  C,  we  are  actually  in  an  okay  position  here.  At  least  we  would  have  passed  our  stage  two  because  control  limits  are  for  all  batches,  all  locations  within  specification. We  do,  however,  have  a  small  problem  out  here  on  the  location  1  and  2,  where  we  see  at  least  the  prediction  of  tolerance  limits  extending  a  bit  over  the  upper  specification  limit.  However,  we  believe  this  could  be  fixed  by  just  adding  a  few  more  batches  to  know  the  variation  better. But  what  we  also  see  here  is,  of  course,  we  have  an  observation  outside  our  limits,  and  this  is  because  what  we're  looking  at  here  is  in  fact  batch  B,  which  as  we  remember,  had  a  higher  variance.  I  would  just  try  to  run  this  model  here,  scale  according  to  batch  B,  so  you  can  see  the  difference. Now  what  we  see  here  is  that  all  our  limits  has  moved  out  from  the  mean.  Now  they're  actually  going  beyond  specification,  not  only  on  the  first  two  locations,  but  on  multiple  locations  here,  which  would,  in  essence,  for  us,  especially  because  you  have  the  control  them  as  outside  specification,  mean  that  you  have  failed  your  validation  here.  Maybe  not  the  best  example  for  this  data  set  to  showcase  a  good  validation. Now  I  thought  I  would  just  try  to  go  through  how  we  envisioned  the  entire  process  relating  more  to  the  flowchart,  Pierre,  showed  you  in  the  beginning  of  this  presentation.  I  have  brought  a  customer  example  from  an  actual  PPQ  validation  run.  They  ended  up  producing  a  total  of  six  batches  here. What  we  have  done  here  is  that  we  have  made  a  model.  I  will  just  deploy  local  data  filter  to  simulate  where  it  started,  having  only  these  three  batches.  Then  what  we  have  here  is  just,  so  to  say,  our  finished  model.  What  we  do  now,  after  having  produced  the  first  batches,  analyzed  the  data,  we  just  run  the  script.  We  get  a  result  looking  like  this. The  scenario  we  see  here  is  that  both  our  control  limits  and  prediction  limits,  the  prediction  limits  up  here,  tolerance  limits  here,  are  both  very  much  outside  the  specification.  However,  lucky  enough  for  us,  our  control  limits  are  actually  inside  specification. This  means  in  the  sense  of  the  flowchart  that  we  now  have  passed  stage  2  and  we  can  actually  start  from  now  on  to  when  we  do  our  continued  validation,  getting  our  control  prediction  limit  and  tolerance  limits  inside  specification,  that  we  actually  can  start  sending  batches  to  market  simultaneously. But  bear  in  mind  here,  we  still  have  a  pretty  high  QC  effort  in  the  end.  The  reason  we  believe  that,  because  many  customers  also  ask,  how  can  we  do  this  if  we  are  so  uncertain  about  the  batches  we  produce?  But  if  you  want  to  know  the  quality  of  your  already  produced  batches,  we  actually  have  to  look  at  this  in  a  different  way. Here  in  this  model,  we  have  batch  as  a  random  effect  because  as  we  consider  it  as  a  part  of  a  larger  population.  But  if  we  want  to  look  at  the  batch  individually,  we  actually  have  to  put  it  as  a  systematic  effect  in  here.  If  I  run  the  script  on  this  one,  we  will  see  the  limits  narrowing  significantly  in  here.  I  forgot  to  deploy  data  filter,  but  I  think  this  makes  sense  anyway. What  we  see  here  is  that  the  limits  now  are  much  closer  to  the  center  lines  of  each  batch  here,  and  they  are  way  within  our  specification  because  we  are  now  looking  at  the  batches  we  have  already  produced  and  not  trying  to  predict  the  performance  of  future  batches.  Which  is  also  why  we  believe  that  this  is  an  acceptable  point  to  go  to  market  while  still  having  the  high  QC  effort  at  the  end. If  I  just  continue  with  our  example,  what  we  saw  in  this  example  was  we  have  passed  stage  two  because  control  limits  are inside  specification,  but  prediction  limits  are outside.  The  next  step  here  would  be  to  simply  produce  another  batch,  create  your  model,  it's  done  here,  and  then  run  the  script  again. What  we  see  now  here  compared  to  the  first  view  is  that  our  prediction  and  tolerance  limits  are  moving  towards  spec  and  also  toward  our  control  limits,  which  is  the  behavior  we  will  assume  as  we  start  to  get  the  between  batch v ariants or s tarting  to  know  the  between  batch  variants  better  and  better. However,  still,  prediction  limits  are  not  inside  specification,  so  the  routine  would  simply  be  to  do  everything  again.  Now  because  I  know  how  this  will  end,  I  will  simply  include  the  two  last batch  and  we  will  on  the  model  again. Finally,  after  including  the  sixth  batch,  the  results  we  get  here  are  that  all  our  limits  are  now  within  our  specification,  which  means  we  have  passed  both  stage  2  after  3  batches,  and  now  we  have  also  passed  stage  3 A.  Which  means  we  now  go  into  stage   3B,  where  we  will  start  to  put  our  attention  on  reducing  the  high  amount  of  QC  effort  in  the  end  and  replacing  this  with  continued  process  verification  and  CTV. Just  to  sum  up,  I  will  go  back  to  the  presentation.  The  purpose  of  a  validation  is,  at  least  in  our  opinion,  to  predict  the  future  of  batches  will  be  okay.   We  firmly  believe  that  if  you  can  describe  your  validation  set  with  a  model,  you  can  also  predict  the  future  with  confidence.  This  can  be  done  from  either  prediction  intervals  or  tolerance  intervals. We  believe  that  JMP  has  unique  possibilities  to  model  your  data  and  also  to  justify  assumptions  behind  your  model. I t  offers  the  opportunity  to  look  at  variance  heterogeneity.   If  we  do  not  have  this,  we  have  the  option  to  do  a  log  variance  model  to  find  a  weight  factor  as  Pierre  showed  you,  and  include  that  in  our  normal  regression  models. Then  we  can  look  at  whether  or  not  the  residuals  are  normally  distributed.   If  they  are  not,  we  can  do  a   Box-Cox  transformation,  also  directly  from  the   Fit Model  platform.  Then  we  have  the  problem  of  outliers,  which  we  can  handle  through  the  student  task  residual  plot,  excluding  observations  outside  the  95 %   Bonferroni limits.  Again,  a  window  directly  accessible  through  the   Fit Model  platform. Then  lastly,  there  is  also  that  we  request  our  random  factors  to  be  normally  distributed,  at  least  effect  fronted.   This  we  did  by  the  blue  test,  and  here  we  can  also  exclude  outlier  level. Then  to  make  the  process  easier  for  us,  our  customers,  and  also  adding  a  bit  on  top  of  this  to  actually  achieve  the  functionalities  we  desire,  we  have  decided  to  make  a  script  that  can  automate  visualization  of  prediction  intervals.  It  calculates  and  visualizes  tolerance  intervals  using  the  same  number  of  degrees  of  freedom  as  when  JMP  calculates  prediction  interval.  It  also  calculates  control  limits  for  more  complex  processes  where  you  have  many  means  and  many  variance  components.   Lastly,  it  calculates  also  capability  values  for  where  these  might  be  needed.
In many instances, relevant data exists, yet often, it is not directly accessible, and either cannot be utilized for data-driven analyses or requires painstaking manual efforts to extract. One classical instance of this type is PDF documents. In this presentation, we will demonstrate an example of standardized PDF reports in the Laboratory Information Management Systems (LIMS) and how the JMP Scripting Language can automate data extraction from these PDF files. The presentation will also show how the resulting scripts can be packaged as an Add-in for distribution to many users.   ____________________________________________________________________________   Explanation of the attached materials: 2023-03-20 Automated extraction of data from PDF documents using the customized JMP Add-ins.pdf --> The slide deck in which the numbering of the examples refers to the correspondingly enumerated sections in  02 Step by step development.jsl --> The JSL file that guides you through the step by step process of the JSL code development in order to read the  Freigabedaten_Beispiel.pdf --> Examplary sample data stored in a PDF file 03a Functional code.jsl --> Summarizes all code developed in 02 03b Custom_Functions.jsl --> Example file to demonstrate how multiple JSL files can be packaged in a JMP add-in 03c Values for add-in creation.txt --> The values utilized to define the JMP add-in Example PDF Data Parse.jmp --> The JMP add-in created from 03a - 03c. PDF Data Load Example.jmpaddin --> The same add-in but with extended functionalities (files selection, progress window, etc.) that were not discussed in the presentation     Good  day,  everyone.  My  name  is  Peter  Fogel.  I'm  an  employee  of  CSL  Behring  Innovation,  and  it's  my  pleasure  today  to  talk  to  you  about Automated  Extraction  of  Data  from  PDF  Documents  using  what  I  call  Customized  JMP  Add-ins.  More  or  less,  let  me  give  you  a  little  bit  of  an  high  level  overview  of  what  we're  going  to  do  today. First  of  all,  I  want  to  motivate  in  the  introduction  why  we  should  actually  want  to  extract  data  from  PDF  documents.  Then  second  of  all,  in  the  approach,  I  want  to  show  you  how  you  can  leverage  JMP  to  actually  really  do  so,  and  what  it  actually  means  to  use  JMP  and  to  create  JMP  scripts.  Finally,  we  want  to  really  transfer  those  JMP scripts  into  what  I  would  call  an  add- in,  and  I  want  to  explain  a  little  bit  why  add- ins  are  actually  the  better  way  to  store,  if  you  like, JMP  scripts.  Finally,  I  want  to  tell  you  what  you  can  do  once  you  are  actually  at  the  level  of  JMP. Why  should  we  actually  use  PDF  documents and  want  to  extract  data  from  it?  Well,  on  the  right- hand  side,  you  see  one  example  of  a  PDF  document,  and  you  see  that  it  actually  contains  quite  a  lot  of  data.  Quite  often,  this  data  is  unfortunately  not  really  accessible  in  any  other  way.  Be  it  for  questions  of  old  software  systems,  be  it  in  any  proprietary  software,  be  it  of  whatever  it  actually  is. Sometimes,  really,  PDF  documents,  and  here  you  can  really  also  replace  the  word  PDF  with  any  other  document  format  is  really  the  only  choice.  You  want  to  actually  have  this  data,  or  otherwise,  you  would  really  need  to  actually  have  a  lot  of  manual  operations  to  do  on  the  data,  which  is  both  annoying  but  potentially  also  really  demotivating  for  your  team  members. The  latest  point  is,  if  you  don't  have  the  data  at  hand,  well,  you  can't  make  the  decisions  you  want  to.  Quite  often,  data  is  key  to  making  informed  decisions.  Without  informed  decisions,  well,  that's  really  a  disadvantage  in  today's  business  world. What  I  want  to  show  you  now  is  really  how  can  we  actually  use  structured  data  in  PDFs  files,  how  can  we  leverage  them  using  JMP,  and how  can  we,  based  on  that,  really  make  decisions.  Today,  I'll  only  focus  the  aspect  of  really  how  to  get  the  data  out  of  the  PDF  and  how  to  really  give  it  over  to  the  user,  everything  else,  how  to  analyze  the  data  and  so  on,  could  be  then  a  topic  for  another  talk  at  another  time. Before  we  actually  start  really  with  JMP  itself,  let's  talk  a  little  bit  about  what  I  would  call  the  guiding  principle.  The  first  part,  I  believe,  is  really,  first  of  all,  understand  what  you  want  to  do.  If  you  don't  understand  the  topic  itself,  you  can't  really  work  with  it.  In  this  case,  we  know  we  have  any  PDF  document,  or  potentially  also  multiple  PDF  documents,  which  we  want  to  actually  parse. Then  we  might  need  to  actually  do  some  organization  of  the  data.  And  finally,  potentially  also  do  system  processing  depends  obviously  on  what  is  in  there  and  what  specifics  we  have.  But  in  the  end,  that  could  be  more  or  less  a  three- step  approach.  From  there  on,  you  could  be  ready  to  do  any  data  analysis  you  want  to  do.  Really  understand  your  question  at  hand  and  we'll  do  so  also  in  the  next  slides  in  a  little  bit  more  detail. The  next  part  is  really  break  it  down  into  modules.  The  more  modules  you  have  and  the  better  they  are  defined,  the  easier  it  is.  Really  make  your  problem  into  smaller  pieces  and  then  you  can  really  tackle  each  piece  on  its  own,  and  it's  much  easier  than  if  you  actually  have  one  big  chunk  of  things  to  do  at  the  same  time. The  third  part,  I  believe,  is  always  use  JMP  to  the  best  you  can  do,  because  JMP  really  can  do  quite  a  lot  of  what  I  would  call  heavy  lifting  for  you.  We'll  see  one  example,  which  in  this  case  will  be  the  PDF  Wizard,  but  there  are  many,  many  more  things  that  you  could  do  from  analysis  platforms  like  the  distribution  platform,  over  other  platforms.  They  can  really  do  a  lot  for  you,  and  in  the  end,  you  just  have  to  scrape  the  code  and  that's  it.  You  can  really  get  it  more  than  for  free. The  fourth  point,  I  believe,  if  you  define  more  fields,  really  also  make  sure  that  they  are  standardized.  Standardized,  this  sense  really  means  they  should  have  defined  inputs  and  outputs  so  that  actually  if  you  figure  out  I  want  to  do  this  part  of  one  of  the  modules  slightly  differently,  it  still  doesn't  break  the  logic  of  the  code  after  all  because  it  still  has  the  same  inputs  and  outputs. The  last  part,  I  hope  should  be  clear,  let's  first  focus  on  functionality  and  then  later  on,  really  make  it  user- friendly  and  really  suitable  for  any  end  user.  That's  also  what  we  will  do  today.  We'll  really  focus  more  on  functionality  today  and  less  on  the  appearance. Let  us  now  very  shortly  look  into  our  PDF  documents,  and  I'll  also  share  that  with  you  in  a  second,  the  actual  document.  But  now  let's  first  look  into  this  snapshot  here  on  the  right- hand  side.  What  do  we  see?  Well,  this  PDF  is  actually  consisting  of  several  pieces.  The  first  one  is  typically  this  header,  which  just  holds  very  general  information  of  which  we  might  just  use  some  of  them,  but  potentially  the  all. Then  we  actually  get  an  actual  table,  which  is  this  data  table  here  which  has  both  a  table  header  as  well  as  some  sample  information.  If  we  look  into  that  now  in  an  actual,  let's  say,  an  actual  sample,  then  we  can  actually  look  into  this  PDF  and  we'll  see  that, we can  share  that  with  you.  It  really  looks  like  that.  You  see  this  table  continues  and  continues  and  continues  across  multiple  pages. On  the  last  page,  we'll  actually  see  that  there's  again  data  and  then  at  some  stage,  we'll  have  some  legend  down  here.  Potentially,  obviously,  we'll  also  note  that  there  might  not  necessarily  be  data  on  this  page,  but  we  can  rather  just  have  the  legend  here.  Just  as  a  background  information,  we  know  now  how  the  structure  of  this  document  works. More  or  less,  we  can  also  state  the  first  page  is  slightly  differently,  then  we'll  actually  have  our  interior  pages,  and  the  last  page,  as  mentioned,  can  contain  sample  information,  but  it  does  not  have  to,  and  it  certainly  contains  always  the  legend. If  we  now  get  a  little  bit  into  more  details,  we'll  actually  see  that  each  more  or  less,  let's  say  line  or  each...  Let's  call  it  line  or  entry  in  this  data  table,  actually  consists  of  a  measurement  date  as  typically  also  than  actual  measurement.  Those  again  are  actually  separated  into  multiple  pieces. You  will  have,  for  example,  here  the  assay,  which  in  this  case  is  just  called  end  date,  or  here,  let's  say  the  user  side.  You  might  then  also  have  some  code  or  assay  code,  it  depends.  You  will  have  a  sample  name,  you  will  also  have  a  start  and  end  date  typically.  You  might  have  some  requirements  and  so  on  and  so  forth  until  finally,  you  get  what  we  call  the  reported  result  at  the  end. Our  idea  would  be  really  how  to  get  that out.  We'll  see  actually,  yes,  this  first  line,  if  you  like,  of  each  entry,  that  actually  holds  different  information  than  the  second  one.  This  third  line  actually  just  holds  here  what  we  call  WG  in  terms  of  the  requirements.  It's  not  really  yet  that  perfectly  structured,  but  we  see  there  is  a  system  behind  this  data,  and  that  really  allows  us  to  really  then  scrape  the  data  to  parse  them,  to  really  utilize  them  to  their  full  extent. Let's  now  again  break  it  down  in  modules,  as  I  said.  What  we  can  do  is  we  can  again  think  around  this  three- step  process,  and  I  believe  what  we  could  also  do  is  we  could  actually  try  to  break  it  down  in  even  more  steps.  The  first  step  could  be,  and  that  is  now  really  user  dependent  then,  that  actually  user  says,  Please  tell  me  which  PDFs  to  parse.  The  user  tells  you,  It's  PDF  one,  two,  three,  for  example. Then  you  would  actually  say  per  PDF,  I  always  do  exactly  the  same,  because  in  principle,  every  PDF  is  the  same.  One  has  more  pages  than  the  other,  doesn't  matter,  the  logic  always  stays  the  same.  You  would,  first  of  all,  try  to  determine  the  number  of  pages.  This  we  won't  cover  today,  but  in  general,  we  can  think  around  it. Then  you  might  actually  want  to  read  general  header  information  as  we  know  it,  and  obviously  process  it.  We  might  certainly  want  to  read  the  sample  information  and  process  that.  We  might  want  to  combine  that.  Again,  this  one  we'll  skip  today,  and  we  obviously  want  to  combine  the  information  across  files. Now  that  means  at  that  stage,  we  would  really  have  all  the  information  available  that  we  want  to.  Finally,  just  what  we  need  to  do  is  we  need  to  actually  tell  the  user,  now  tell  us  where  to  store  it.  Finally,  we  want  to  store  the  result.  Again,  those  two  last  steps  we  won't  cover  today,  but  I  guess  you  can  really  imagine  that  that  is  something  that  is  not  too  complicated  to  be  achieved. Now,  let's  actually  jump  into  JMP  itself.  What  I  want  to  show  here  is  that  really,  let  JMP  do  the  help  lifting.  This  case,  in  particular,  let  actually  the  PDF  Wizard  do  all  the  powering  of  the  data  for  you,  and  if  you  like  all,  you  then  have  to  do  is  really  change  the  structure  of  the  data.  But  more  or  less,  you  actually  can  leverage  the  JMP  Wizard  or  the  PDF  Wizard  in  JMP  to  a  full  extent. At  that  stage,  let's  really  switch  very  quickly  over  to  JMP  itself,  and  let's  see  how  that  works.  I've  taken  here  this  example  which  is  called  just  Freigabedaten Biespiel.p df and  we'll  actually  see  what  happens.  If  you  open  that  either  by  double- clicking  on  it  or  by  actually  going  via  File  and  Open  or  this  shortcut  File  Open,  then  you  can  actually  see  that  if  we  select  a  respective  PDF  file,  you  can  actually  use  the  PDF  Wizard,  and  now  let  me  make  that  a  little  bit  larger  for  you  to  actually  read  the  data. We  see  that  from  the  beginning,  actually  JMP  already  auto- detects  some  of  those  data  tables  in  here,  but  we  now  want  to  be  really  specific  and  we  just  want  to,  in  this  case,  only  look  at  the  header.  Let's  ignore  that  for  now  and  let's  really  just  look  at  the  general  header  table.  We  would  say  in  that  case,  it  starts  here  with  the  product  and  adds  with  the  LIMS  Product  Specification.  So  we  can  draw  just  simply  a  rectangle  around  it,  let  that  fall,  and  you'll  actually  see  in  an  instant  what  happens  over  here. You'll  see  JMP  recognizes  that  one  has  two  lines.  That  seems  to  be  about  right.  It  also  recognizes,  well,  in  principle,  I  have  only  two  fields.  Now,  one  could  argue,  well,  this  one  is  one  field,  this  one  is  a  field,  and  this  one  is  a  field.  So  it  might  or  might  not.  It  depends  a  little  bit  also  on  how  you  want  to  process  the  data,  say  JMP,  please  split  here  the  data.  If  we  don't  want  to  do  so,  we  really  need  to  actually  look  at,  yes,  this  second  part  of  the  field  starts  with  something  like  a LIMS  log  number. In  any  case,  we  now  have  more  or  less  data  at  hand  in  the  format  and  could  just  say  okay,  JMP  will  actually  open  that  data  to  the  force.  Now,  very  interestingly,  what  we  can  directly  do  is  we  can  actually  look  into  the  source  script  and  we  can  see,  oh,  there's  actually  code.  And  this  code  we  can  really  leverage.  I  would  now  just  copy  this  code  for  a  second.  We  could  now  actually  create  a  first  script.  For  this,  I'll  just  actually  open  a  script  all  by  myself.  I'll  very  quickly  open  that  for  you. We  can  actually  add  here  the  code.  What  you  should  actually  see  is  that  this  code  that  I've  just  added  is  really  the  same  as  the  code  that  we  have  down  here.  It  has  no  difference  whatsoever.  So  let's  just  use  the  code  as  it  is.  Now,  if  we  look  a  little  bit  closer  at that  code,  we'll  actually  see  that  there  are  a  couple  of  things  we  can  see. The  first  one  would  be  that  this  actually  is  just  the  file  name  of  the  file  that  we  used.  Instead  of  actually  having  their  long  file  name,  I  said  down  here,  okay,  let's  define  that  as  a  variable  and  let's  just  use  the  file  name  here.  What  we  also  see  is  that  this  table  name  that  was  here  is  actually  the  name  of  the  table  how  it  actually  is  returned  by  JMP. In  this  case,  we  would  potentially  not  just  call  it  something  like  that,  but  rather  this  case  had  that  information.  And  then  more  or  less,  we  also  see  that  JMP  actually  tells  us  how  it  actually  passed  that  PDF  table.  In  this  case,  it  says  it  was  page  one,  and  it  says  I  actually  looked  for  data  in  this  rectangle.  Everything  else  was  done  automatically. If  we  execute  this  statement  now,  we  actually  see  it  gets  us  exactly  the  same  data  as  previously,  and  that  is  it.  So  far,  so  good.  That  is  just,  if  you  like,  all  until  now  about  the  reading  of  a  PDF  file.  However,  as  I  said,  we  actually  also  wanted  to  look  at  the  actual  sample  data,  not  only  the  header  data,  but  also  the  sample  data. Let's  now  do  that  once  more.  Let  me  enlarge  that  again  a  little  bit  so  that  we  can  look  at  that.  A gain,  you  could  say,  Okay,  in  this  case,  let's  ignore  the  data.  Let's  again  focus  only  on  one  specific  part,  in  this  case,  the  sample  data  only  here  on  page  one.  Where  does  the  sample  data  start?  Well,  it  starts  here  with  the   LIMS  Proben number.  It  goes  down  exactly  until  here  and  also  out  until  the  scales  column  if  you  look. We  can  read  that  now  in  assays.  What  we  would  now  see  directly  is  both  at  looking  over  here  but  also  looking  over  here,  that  JMP  actually  utilizes  two  lines  as  a  header,  so  two  rows.  That  is  not  really  what  we  desire  because  only  the  first  line  is  really  the  header.  Everything  else  actually  is  content.  If  you  right- click  on  this  red  triangle,  you  could  actually  adjust  that  and  say,  Oh,  I  don't  want  to  use,  in  this  case,  two  rows  as  a  header,  but  only  just  one. Now,  once  you  change  that,  you  see,  okay,  we  start  with  the  end  date  as  the  first  actual  value  here.  That's  perfectly  fine.  The  other  part  that  we  might  actually spot  is  that  this  first,  if  you  like,  column  actually  contains  two  extra  columns.  Here,  the  one  that  actually  holds  the  sample  number,  and  here,  the  start  date.  The  reason  for  that  is  that  actually,  many  of  those  values  are  actually  too  long  to  be  broken  into  two  columns. We  can  now  tell  JMP,  please  enforce  that  it  is  broken  into  two  columns  by  right- clicking  into  more  or  less  the  right  vertical  position  and  then  telling  it,  Please  add  here  column  divider,  and  would  now  directly see  that  yes  JMP  splits  that.  More  or  less  we  now  get,  unfortunately  here,  a  little  bit  of  a  mess  for  always  this,  let's  say,  first  column  where  actually  SOP  word  here  is  split  as  an S and  OP,  but  therefore  we  have  a  start  column. Here ,  I  would  say,  let's  appreciate  as  it  is,  obviously,  keep  in  mind  that  we  split  this  field  always,  which  is  a  little  bit  unfortunate,  but  it  is  good  as  it  is  for  now. Again,  if  you  capture  that  content,  you  would  get  a  JMP  data  table,  and  for  that,  you  could  again  source  or  use  the  source  script  to  actually  look  at  the  code.  If  you  compare  this  code  to  the  code  I've  captured  now  here  previously,  you  would  see  it  is  pretty  exactly  the  same,  potentially  up  to  this  field  where  we  actually  set  the  header  or  the  column  divider.  That  might  be  shifted  a  little  bit  only,  but  the  remainder  is  exactly  the  same. We  could  really  read  here  how  that  actually  works.  You  see  that  you  have  one  header  row,  you  see  that  it's  page  one.  You  again  have  defined  a  rectangle  for  where  you  want  to  read,  and  here  you  have  also  defined  column  borders  as  we  more  or  less  want  to  appreciate. Again,  as  previously,  you  could  actually  say,  Let's  source  out  this  name,  and  let's  also  source  out  this  table  name  or  replace  them.  And  that  is  more  or  less  what  we  call  now  our  content  file.  If  I  close  that  and  we  just  run  once  more  this  code,  you  would  actually  see.  That  creates  our  JMP  data  table  as  we  want  to.  Getting  more  or  less  the  first  shot  at  your  data  seems  perfectly  fine  is  not  way  too  complicated,  I  would  argue. Now,  how  do  we  go  from  here?  We  have  now  the  data  in  principle,  but  obviously,  we  need  to  organize  that  a  little  bit.  For  this,  we  can  actually  take  a  number  of  features,  and  it  depends  a  little  bit  as  to  what  we  want  to  do.  There  is  things  where  we  can  actually  use  the  lock,  which  actually  records  more  or  less  all  your  actions  in  JMP  on  the  graphical  user  interface.  From  there,  you  can  actually  really  script  code.  That  is  something  we'll  see  just  as  an  instance  here. In  addition,  you  could  also  use  the  scripting  index,  which  I  highly  recommend,  which  really  holds  quite  a  number  of  functions  and  examples.  And  so  really  helps  you  to  actually  also  use  them.  We  can  use  the  formula  editor,  I  believe,  and  we  can  also  use  the  copy  table  script,  for  example,  to  really  get  things  going. Now,  let's  demonstrate  that  again  at  our  JMP  data  table.  In  this  data  table,  we'll  actually  see  that  we  have  a  number  of  things  in  here.  For  example,  we  want  to  now  actually  get  that  organized  in  meaningful  form.  First  of  all,  let's  define  how  that  format  should  look  like.  Let's  open  a  new  JMP  data  table,  which  will  be,  if  you  like,  our  target.  Into  this  data  table,  we  want  to  write,  and  let's  define  what  should  be  it. We  could,  for  example,  say  the  first  thing  we  want  to  do  is  that  we  have  here  the  assay,  for  example.  We  then  potentially  would  also  want  to  have  an  assay  or  just  assay  code,  it depends  on  what  you  want  to  call  it.  We  might  want  to  have  here  the  sample  name  because  obviously,  that  is  now  this  field  that  should  be  captured  as  well  because  that  is  highly  relevant. You  might  also  want  to  include  a  start  date  or  an  end  date,  and  so  on,  and  so  forth  until  you  actually  have  more  or  less  included  all  of  those  fields  as  you  want.  Now,  I  would  at  that  stage  also  say  they  should  actually  be  just  by  now  because  this  data  over  here  is  also  correct.  So  if  you  want  like  attribute,  if  you  like,  we  should  also  do  so  here  and  standardize  those  attributes  by  selecting  actually  data  type  and  say,  yeah,  that  should  be  correct  at  that  stage. Now  we  have  that  data  table,  but  obviously,  this  doesn't  help  us  so  much  because  that  is  not  reproducible  by  now.  However,  there  is  the  option  to  really  record  that.  For  example,  you  could  say,  copy  the  table  script  without  data,  and  I'll  do  so  for  a  second,  and  I  would  now  insert  that  script  here  as  well.  If  we  look  at  that,  we'll  see  that  we  actually  created  a  new  data  table  which  has  the  name  Untitled  4,  and  obviously,  we  can  change  that. It  has  so  far  zero  rows  and  it  has  all  the  different  columns  that  we  just  created  from  assay  to  start.  We  could  give  it  a  name  and  I've  actually  created  here  a  data  table  that  has  just  the  name  data  for  page  one  that  holds  those  first  four  attributes,  as  well  as  all  the  others  that  we  actually  want  to  have.  Let's  actually  leverage  that  and  continue  with  this  one  as  this  one  was  really  just  a  demonstration. Let's  create  that  one.  Let's  run  it,  and  you'll  actually  see  that's  just  a  data  table  as  it  should  be  with  all  the  fields  that  we  want  to  fill  from  now  on.  What  we  also  want  to  do  for  now  is  we  want  to  recall  this  data  table,  which  is  just  called  something  like  that,  and  we  call  it  to  actually  call  that  that  content,  in  that  case,  say  and  we  actually  want  to  abbreviate  this  LIMS Proben minus number to LIMS Probe  for  simplicity. Now,  what  do  we  actually  want  to  do?  We  actually  want  to  work  with  the  data  a  little  bit,  and  I  want  to  illustrate  two  examples  how  we  could  do  so.  Let's  look  first  at  this  column  unfold.  Within  this  one,  you  see  that  there  is  actually  the  A G  and  also  the  WG,  and  we  might  actually  want  to  split  that  into  two  separate  columns  to  really  make  sure  that  in  one  column  later  on,  we  can  more  or  less  capture  the  AG  values  and  in  another  one,  the  WG  values,  and  that  not  the  sample  information  as  here  is  split  really  across  three  rows,  but  rather  following  what  I  would  call  a  date  target  or  a  fair  data  format  in  one  room. How  could  we  do  so?  Let's,  in  this  case,  just  insert  the  column  and  let's  call  this  column  AG,  say  requirement  just  to  more  or  less  translate  the  word  unfold  into  English.  Now,  what  would  we  want  to  see?  We  would  actually  say  if  there  is  an  AG  in  here,  then  let's  capture  the  value  after  the  AG  in  this  column.  If  there's  nothing  there,  then  let's  capture  nothing.  And  if  there's  WG,  then  let's  also  not  capture  anything  because  that  does  relate  to  age. How  could  we  do  so?  Well,  I  would  say  let's  build  a  formula.  Formula  typically  is  really  the  best  place  to  start.  What  do  we  want  to  do?  As  I  said,  we  want  to  do  something  conditional,  which  means  if  there's  an  AG  in  there,  we  want  to  see  something  in  there.  If  there's  no  AG  in  there,  then  let's  not  do  so.  The  easiest  way  to  do  so,  I  would  say,  is  the  if  condition,  which  really  tells  you  if  there's  something,  then  do  something,  and  if  there's  nothing  in  it,  then  do  something  else. We  would  say  here  if  contains  and  contains  really  looks  for  a  substring  if  you  like.  We  would  actually  look  now  for  this  column  which  is  called  Anforderung,  and  we  would  look  for  the  word  AG,  and  we  say  that  should  happen  something,  and  if  not,  then  something  else  should  happen.  Now,  we've  actually  just  created  a  very  simple  if  statement.  And  more  or  less  those  two,  we  would  still  have  to  specify. However,  even  at  that  stage,  we  could  actually  look  like  if  that  what  we  described  makes  sense.  We  would  see  whenever  there's  an  AG  like  here  or  here  in  our  column,  Anforderung,  then  we  would  see  a  then  statement,  which  is  good.  Otherwise,  we  would  see  here  just  the  else  statement,  which  is  also  good.  So  let's  modify  that  a  little  bit. What  would  we  want  to  see  in  the  then  statement?  Ideally,  I  would  say  we  want  to  see  more  or  less  what  is  called  in  the  Anforderung  filter  or  the  Anforderung  column,  but  really  getting  rid  of  this  AG  part  and  just  keeping  it  in  mind.  To  do  so,  you  have  many  options.  One  of  them,  I  would  say,  is  so- called  Regex  or  Regular  Expression,  which  really  says,  take  what  is  in  this  column,  look  for  this,  in  this  case,  AG  part,  replace  this  by  nothing,  and  then  actually  give  me  back  the  remaining. You  would  see  if  we  do  so,  then  we  would  actually  looking  at  more  or  less  the  whole  expression,  we'd  see  if  there  is  AG  with  a  minus,  we  will  actually  get  a  minus  as  a  return.  If  there  is  a  smaller  equal  to  50  minutes,  we'll  get  the  smaller  equal  to  50  minutes.  That  sounds  good.  The  else  statement  assay ,  we  would  actually  just  say,  let's  make  there  an  empty  statement,  so  nothing  else  should  be  returned.  And  that  actually  really  would  work. You  see,  if  we  go  to  this  column,  only  whatever  you  have  this  AG,  it  will  return  the  value  after  the  AG.  That  looks  perfect.  Now,  I  would  actually  use  more  or  less  this  idea  or  this  logic  to  actually  include  it  in  my  script.  We  could  also  again  capture  the  code  from  the  data  table  and  we  would  see  it  down  to  formula.  But in  principle,  we  could  also  capture. Before  we  do  so,  I  have  inserted  here  a  little  bit  of  additional  information,  which  means  in  case  we  would  actually  read  the  last  page,  we  saw  that  there  was  the  legend.  And  in  this  case  we  said,  let's  remove  the  legend  and  it  should  be  good.  In  addition,  I  also  said  if  there  should  be  any  completely  empty  rows,  I  would  want  to  remove  them. Now  to  continue,  I  would  actually  say,  let's  look  now  for  where  are  the  samples,  and  then  let's  capture  actually  the  data  of  each  sample.  In  this  case,  we  would  look  into  where  our  samples  and  we  would  see,  let  me  very  quickly  execute  this  part,  would  actually  execute  and  would  see,  okay,  that  is  actually  a  start  where  each  sample  starts. It  looks  actually,  in this  case,  only  for  where  more  or  less  this  value  of  end  is  missing.  Similarly,  where  the  Anforderung  is  missing  because  those  are  the  two  columns  that  define  where  actually  only  the  sample  resides  if  we  have  to  move  up  the  column. Now,  iterating  across  each  sample  on  its  own,  we  would  actually  look  at  where  is  the  data.  Taking,  for  example,  this   Losezeit  sample  here  as  the  second  sample,  we'd  look  at,  okay,  the  assay,  or  we  first  look  at  where  does  it  start.  It  would  start  in  this  case  at  row  4.  Would  actually  now  combine  the  data  of  those  two  fields  to  get,  again,  a  full  name. We  would  look  actually  at  where  does  the  assay  sit.  The  assay  is,  if  you  like,  just  in  verbal  names,  it  would  be  actually  the  first  part  of  this  whole  string,  if  you  like,  just  before  the  forward  slash.  You  could  really  just  capture  that,  potentially  also  removing  the  one  because  that  doesn't  make  sense.  Similarly,  you  could  look  into  the  code,  which  would  be  really  the  second  part  here,  which  you  could  get  from  there,  and  so  on,  and  so  forth. Now,  obviously,  I  agree,  this  part  of  code  doesn't  look  way  too  simple,  but  if  you  read  it  very  carefully,  it  actually  always  has  more  or  less  the  same  structure.  You  look  at  the  part  of  the  code  that  is  in  the  respective  line  at  the  respective  field  and  potentially  to  do  a  little  bit  of  twisting  just  as  we  did  with  the  AG  column.  If  you  look  at  this  AG  column,  you'll  actually  see  there's  again  our  regular  expression,  there  is  the  AG  part  that  we  replace  by  nothing,  and  that's  more  or  less  it  as  we  do. If  you  have  done  so  now,  you  would  actually  want  to  create  here  one  additional  line  where  you  can  actually  now  enter  all  the  data  that  we  have  captured.  How  would  we  do  that?  We  would  actually  say,  right- click  onto  that  here,  sorry,  right- click  onto,  left- click  onto  the  row  menu  and  say  Add  Rows  and  enter  there. Now,  interesting  enough,  at  that  stage,  you  could  really  look  also  into  the  Log  statement  and  see  there,  there's  one  statement  that  says  Add  Rows  and  you  could  just  copy  this  part  about  add  rows.  This  is  really  more  or  less  the  same  as  I  did  here.  You  see  there's  also,  in  addition,  this  At  end.  Typically,  that's  the  default  value  so  it  doesn't  matter  if  I  have  it  or  not,  but  that's  it. From  there  on,  I  could  really  say  if  I  have  included  that,  I  actually  just  copy  all  those  values  that  I  had  previously  here,  everything  that  starts  with  a  C  into  the  respective  column.  Sorry,  into  the  respective  column.  In  principle,  it  should,  if  I  know  correctly,  execute  that  at  once.  It  should  now  actually  work  as  is.  So  we  see  actually  the  second  row  now  was  the  one  that  was  correctly  added,  or  if  I  delete  them  for  a  second.  Again,  that  should  now  execute  as is. We  could  really  do  so  line  by  line  by  line,  and  we'll  see  if  we  do  that  across  all  the  samples,  which  should  be  very  good.  Now,  let's  return  at  that  stage  a  little  bit  into  the  presentation  and  look  how  we  continue  from  there.  Now,  we  have  actually  at  that  stage  really  captured  all  the  sample  information,  but  we  want  to  make  it  a  little  bit  more  handy  for  like.  So  far  it's  a  little  bit  of  massive  code,  but  we  can  certainly  break  it  down  a  little  bit  better. That  is  what  we  would  do  now.  We'd  really  say,  let's  make  out  functions  from  it.  And  functions  have  really  the  nice  feature  of  they  tell  you  what  to  actually  have  here  as  an  input  and  what  to  have  it  as  an  output.  That  really  means  you  have  that  standardization  of  inputs  and  outputs  anyways.  In  my  eyes,  it's  also  way  easier  to  debug  and  to  maintain.  You  have  no  need  for  any  copy- pasting  operation.  In  my  eyes,  it  also  really  enforces  a  good  documentation  of  code. Let's  do  so.  What  could  we  do  now?  As  we  have  seen  previously,  when  we  actually  read  our  data,  we  use  this  open  statement  and  just  said  that's  it.  However,  here  we  could  now  also  say,  let's  define  a  function  which  just  has  a  file  name,  then  we  read  the  data  and  we  return  the  data.  In  principle,  it's  not,  let's  say,  too  different  from  what  we  did.  Just  that  we  actually  say  it's  a  function  which  takes  one  argument,  in  this  case,  the  file  name,  could  be  also  multiples,  and  which  returns  something.  If  we  actually  execute  that,  we'll  see,  oh  yeah,  that  actually  created  exactly  that  data  table  that  we  initially  brought  in. Similarly,  you  could  also  do  so  and  say,  oh,  we  just  transformed  the  data  by  creating  a  new  data  structure  and  then  by  actually  changing  the  data  or  let's  say  organizing  it  as  we  want  it.  If  we  also  more  or  less  initialize  that  data,  we  would  see,  yeah,  also  that  should  work  as  is.  So  we'll  see  here.  This  more  or  less  now  concepts  exactly  to  what  we  did  previously.  So  it  really  means  you  have  just,  if  you  like,  only  two  functions  which  you  can  call,  which  I  believe  is  a  really  good  way  of  organizing  your  code. Now,  let's  more  or  less  think  also  about  the  last part.  And  the  last  part  in  my  eyes  is  really  a  little  bit  around  UX  or  user,  let's  say,  experience.  That  means  a  little  bit  around  how  should  I  present  it  to  the  user?  What  I  believe  is  that  you  can  certainly  play  around  with  which  data  tables  are  visible  at  which  stage.  And  you  see  here  a  really  short  snippet  around  that  you  could  create  a  data  table  from  the  beginning  as  invisible,  or  you  could  just  more  or  less  hide  it  after  being  created  at  the  initial  stage. Or  you  could  actually  say,  if  I  actually  store  data,  I  could  provide  users  a  link  to  the  directory  directly,  which  means  they  don't  have  to  actually  look  for  that  file,  but  really  can  just  click  on  the  link  and  see  now  the  directory  opens.  Or  you  could  actually  inform  the  user  about  the  progress  of  your  execution,  and  so  he  or  she  knows,  Oh,  I'm  still  at  File  1,  but  already  at  page  8  out  of  12. There's  a  number  of  options  that  you  could  do,  but  obviously,  as  I  mentioned,  I  would  take  that  only  once  I've  really  implemented  the  whole  code.  Now,  more  or  less  what  we  can  state  at  that  stage  is  that,  yes,  we  have  now  more  or  less  all  the  code  in  place  to  really  run,  let's  say,  this  collection  of  data  from  our  data  table  or  from  our  PDF  files  into  a  data  table. However,  there  is  one  issue  and  that  is  more  or  less  the  issue  of  really  bringing  it  to  the  user.  The  point  being,  more  or  less  I  have  one  big  JMP  file,  potentially  it  has  quite  a  lot,  let's  say,  offline,  and  the  user  in  principle  has  to,  at  least  up  to  some  degree,  interact  with  that. T hat  is  something  I  typically  would  want  to  avoid  because  that  is  not  really,  let's  say,  something  users  want  to  do,  and  I  would  also  be  a  little  bit  scared  that  they  might  break  the  code. Instead,  I  would  turn  to  JMP  Add- in,  which  has  the  nice  feature  of  being  only  one  file  and  it  just  requires  a  one- click  installation.  The  other  part  is  it's  easily  integrated  into  the  JMP  graphical  user  interface.  You  don't  have  to  interact  with  the  script.  You  have  a  lot  of  information  at  your  fingertips,  and  there's  actually  a  lot  of  information  of  how  you  can  do  more  or  less  create  an  add- in. There  is,  for  example,  the  add- in  manager,  I've  added  here  the  link,  but  there's  also  the  option  to  actually  do  so  on  a  manual  or  script- based  way.   I  believe  while  it  takes  a  little  bit  higher  effort,  it's  actually  much  better  in  terms  of  the  understanding.  I  want  to  show  you  very  quickly  how  that  works. For  that,  I've  actually  created  in  my  folder  where  I've  stored  all  the  data  so  far,  so  all  the  JMP  codes  so  far,  I've  actually  created  once  the  functional  code,  which  actually  holds  all  the  code  that  we've  created  just  in  a  slightly  more  organized  form  if  you  like.  You  might  actually  really  recognize,  again,  this  read  sample  data  page  or  this  t ransform  sample  data.  Plus  I've  added  here  an  additional  file  which  really  just  holds  an  example  of  additional  code. You  could  imagine  that  potentially  you  want  to  outsource  the  functions  from  the  functional  code  to  the  custom  function,  say,  for  example,  to  really  make  the  code  better  readable,  or  so  on  and  so  forth.  Now,  you  could  actually  say  from  those  two,  I  want  to  create  a  JMP  add- in.  Simply  by  saying,  okay,  I  go  to  File,  sorry,  I  go  to  File  and  New. There,  you  have  the  option  to  create  the  add- in.  You  would  now  actually  have  to  specify  a  name  and  a  ID.   I've  now  just  thought  about  it  previously  and  so  will  not  really  care  too  much  about  what  they  are  called.  But  please  really  look  at  more  or  less  the  suggestions  for  JMP  add-ins.  You  would  look  into,  oh,  which  menu  items  do  I  have?  And  so  you  would  add  a  command,  you  would  give  it  a  name,  let's  say  in this   case,  launch  PDF  creator,  and  you  would  have  to  specify  if  either  you  want  to  add  here  the  JS  code  or  if  you  actually  have  it  in  the  file. In  this  case,  I  would  say,  let's  use  it  in  the  file  as  we  did  it.  It  should  actually  be  in  here  and  you  would  include  that  one.  Similarly,  you  could  actually  see  that  there  are  a  number  of  additional  options  like  startup  or  exit  scripts.  At  the  end,  you  have  to  include  any  additional  file  you  want  to  have  of  it.  In  this  case,  let's  just  assume  it  would  be  our  custom  function  code.  In  the  end,  you  can  more  or  less  save  that  as,  say,  our  example  PDF  data  browser  add- in. Once  that  is  actually  stored,  you  can  simply  install  that  by  actually  double- clicking  on  that install,  and  you  would  see  that  you  have  under  add- in  now  a  launch  PDF  reader,  which  in  this  case  would  really  just  read  this  one  specific  PDF.  So  it's  still  quite  fixed.  There's  quite  a  lot  of,  let's  say,  information  which  we  could  make  more  dynamic ,  for  example,  the  file  selection  as  I  mentioned  at  the  beginning.  But  that's  more  or  less  at  least  one  way  how  you  could  read  the  data. Now,  let's  return  here  very  quickly  to  a  little  bit  of  what  we  could  do  in  addition.  We  could  have  really  a  short  look  also  into  JMP  add- in.  I  would  say  that  a  JMP  add-i n,  and  that  is  very  nice  about  it,  actually  contains  really  more  or  less  every  single  one  tool.  Let's  look  at  our  example  PDF  data  path  and  we'll  see  where  it  was  installed. In  addition,  if  you  look  into  that,  you  will  actually  see  it  holds  all  the  JSR  code  that  we  have,  plus  two  additional  files  which  define  actually  what  that  add- in  is  named  and  what  its  ID  is,  plus  more  or  less  the  graphical  or  the  integration  into  the  graphical  interface.  If  you  read  that  a  little  bit  careful,  those  two  statements  in  here,  you  will  actually  see  how  you  can  easily  adapt  them  to  your  purposes  if  needed. The  last  part  I  actually  want  to  show  here  is  actually  what  you  could  also  do  if  you  had  it  fully  functional.  This  is  more  or  less  what  I  want  to  show  you  now  at  that  stage.  We'll  install  what  I  would  call  the  final  add- in.  A  little  bit  of  the  add- in,  having  also  in  addition,  let's  say,  a  little  bit  of  the  user- friendly  tools.  You  could  see  I  have  to  edit  that  now  under  here,  this  GDC  menu. I  would  have  a  little  bit  of  buttons  to  click  a  few  more  than  potentially  previously.  You  could  actually  say,  Oh,  what  do  I  want  to  actually  read?  In  this  case,  I  would  want  to  read  those  seven  files.  As  mentioned,  they  are  all  copies  of  each  other  just  to  have  examples  here.  We  would  see  that  there  in  principle  should  be  also  a  progress  window  here  which  waits  now  for  demo  purposes  after  each  file  for  two  seconds,  reads  each  file,  we  see  also  that  the  speed  of  the  reading  is  actually  quite  impressive,  I  believe. At  the  end,  you  see  there's  data  being  progressed  in  the  background.  The  user  sees  that  also  in  principle  but  doesn't  see  it  in  the  foreground.  The  user  is  not  really  annoyed  in  the  foreground,  but  only  once  the  data  are  processed,  we'll  get  here  a  final  result  and  we'll  actually  see  that  this  is  the  whole  data  table.  It  holds  data  from,  let's  say,  the  first  file  until  more  or  less  the  last  file,  so  on  file  number  is  called  six,  and  that  would  be  more  or  less  the  way. Now,  as  I  mentioned,  that  is  until  now,  I  believe,  also  quite  a  lot  to  do.  So  we  could  still  ask,  what  is  next?  Is  there  any  next  step?  I  would  argue,  yes,  there  is.  The  first  one  in  my  eyes  is  really  celebrate.  Getting  until  this  stage  is  really  not  a  triple  task  and  it  is  really  a  true  achievement.  Really  be  happy  about  it,  really  concrete  yourself  that  is  really  an  achievement. The  second  part  is,  in  principle,  you  might  want  to  do  a  little  bit  more  around  it.  You  might  want  to  think  about  code  versioning.  How  do  you  actually  work  with  going  back  a  version  or  going  ahead  a  version?  If  you  have  developed  that  or  looking  into  feature  which  doesn't  work  anymore,  but  stuff  like  that.  Code  versioning,  I  believe,  is  quite  helpful. Similarly,  if  you  think  about  collaborative  development,  Git  might  be  an  answer  there.  If  you  think  about  unit  testing,  so  how  to  really  ensure  that  even  though  you  have  once  tested  your  code  and  you  have  now  changed  it  a  little  bit,  it  still  works,  then  unit  testing  might  be  the  answer.  If  you  want  to  deploy  more  or  less  add-ins  to  a  larger  user  base,  you  still  have  to  think  a  little  bit  around  how  that  works.  There  is  so  far,  I  believe,  no  really  good  solution  on  the  market. The  other  part  is,  obviously,  I  would  love  to  hear  feedback  and  any  questions.  You  can  reach  me  under  this  email  address  and  I'm  happy  to  hear  more  or  less  any  suggestions,  criticism,  whatever  it  is,  please  feel  free  to  reach  out  and  I  hope  you  could  learn  a  bit  today.  I'm  really  happy  to  share  with  you  the  script,  the  code,  the  presentation,  everything  that  I  showed  you  in  the  last  30-ish  minutes.  Thank  you  very  much  and  have  a  wonderful  afternoon.
A picture is said to be worth a thousand words, and the visuals that can be created in JMP Graph Builder can be considered fine works of art in their ability to convey compelling information to the viewer. This journal presentation features how to build popular and captivating advanced graph views using JMP Graph Builder. Based on the popular Pictures from the Gallery journals, the Gallery 8 presentation highlights new views and tricks available in the latest versions of JMP. We will feature several popular industry graph formats that you may not have known could be easily built within JMP. Views such as Integrated Tabular Graphs, Satellite Mapping, Formula Based Graphs, and more will be included that can help breathe new life into your graphs and reports!       Welcome,  everybody.  The  picture  is  from  the  Gallery  8.  My  name  is  Scott  Wise.  I'm  a  senior  systems  engineer  and  data  scientist.  Every  year,  we  get  a  chance  to  show  you  six  or  more  views  that  are  really  compelling  or  cool  graphs  that  you  probably  didn't  know  you  could  generate  through  the  JMP  Graph  Builder.  I  want  to  leave  you  with  something  a  little  more  interactive  to  start  with.  Hopefully,  this  is  something  that  can  help  amaze  your  friends. Our  inspiration  came  when  my  daughter,  Sammy,  and  I  were  having  a  lot  of  fun  at  the  National  Video  Game  Museum  in  Frisco,  Texas.  Now,  besides  me  being  able  to  relive  my  childhood  of  all  the  arcade  games  and  the  home  video  games,  they  did  a  good  job  showing  you  how  the  technology  improved.  A  game  that  I  particularly  liked  in  the  arcade  was  Atari's  Battle  Zone.  It  was  the  first  arcade  game  that  was  successful  in  big  numbers  that  enabled  you  to  use  3D  vector  graphics.  You  felt  like  you're  in  a  3D  planet,  Battle  Zone,  as  well  as  was  a  first  person  perspective  because  you  felt  like  you  were  in  the  tank. It  had  all  these  obstacles  littered  around  there  like  these  queues  and  there's  pyramid  back  hide  behind  these  couldn't  drive  through  them,  but  they  were  great  shields.  They  protect  you  from  the  enemy  fire  and  you  could  duck  out  and  take  a  shot. This  was  actually  big  technology  for  the  time.  It  took  a  lot  more  electronics  and  programming  to  do  3D  rendering,  but  they  had  to  answer  a  problem.  Can  you  recognize  your  orientation  to  a  solid  shape?  If  there's  a  wall  depicted,  are  you  behind  the  wall,  in  front  of  the  wall? Given  that,  Sammy  and  I  came  up  with  two  challenges  in  Graph  Builder.  I'm  going  to  show  you  two  shapes.  The  first  shape  is  a  basic  shape.  Just  using  a  custom  map,  I'm  going  to  put  that  shape  in  the  Graph  Builder.  Also  in  that  Graph  Builder  pane,  there's  going  to  be  two  points.  There's  going  to  be  a  point  A  and  a  point  B,  and  I  want  to  know  if  point  A  is  inside  or  outside  the  shape.  I  want  to  do  the  same  thing  with  point  B,  and  I'm  only  going  to  give  you  three  seconds.  Let  me  bring up  the  data.  Are  you  ready?  Let's  get  these  in  the  head  or  write  them  down.  Three  seconds. All  right.  I  imagine  everybody  didn't  think  that  was  too  challenging.  Let's  take  a  look  at  the  answers.  Point  A  is  in,  point  B  is  out.  Now,  this  one  one  is  really  easy  to  eyeball.  I  can  just  tell  that  point  B  is  outside  the  U  shape.  In  fact,  if  I  click  into  the  shape  or  I  color  by  the  shape  in  Graph  Builder,  you  can  readily  see,  Okay,  B  is  outside  in  the  nonshaded  area  A  is  inside.  Well,  that's  all  well  and  good.  But  what  about  the  next  shape? This  one's  going  to  be  a  little  more  challenging  for  you.  It's  a  spiral  shape.  Same  instructions.  I  want  to  know  if  point  A  is  in  or  outside  the  shape.  In  this  case,  it'll  be  a  spiral.  I  want  to  know  if  point  B  is  inside  or  outside.  Three  seconds.  Are  you  ready? All  right.  Did  you  get  that  answer  correct?  Let's  see  what  the  official  word  is.  Point  A  is  in,  point  B  is  out.  Now,  this  one  was  a  little  harder  to  eyeball.  I  didn't  give  you  that  much  time  to  trace  it  with  your  finger.  I'm  just  looking  at  it  and  making  a  guess.  I  don't  know  which  way  to  guess.  Now,  if  I  can  click  into  the  Graph  Builder  and  highlight  the  point,  now  I  can  see  A  is  in  and  B  is  out. But  it's  hard  to  see  if  I  don't  have  that  capability.  This  was  the  problem  those  video  game  designers  ran  into  with  both  the  U  shape  and  the  spiral  shape  or  any  shape  they  would  run  into.  They  developed  a  methodology.  It's  called  Ray  Casting.  Think  about  drawing  a  line  out  in  any  direction  from  the  A  point,  from  the  B  point, and  you  just  pretty  much  count  the  number  of  times  it  intersects,  crosses,  goes  across  one  of  the  shape  lines. If  it  crosses  an  odd  number  of  times,  the  point  is  within  or  it's  in  the  shape.  If  it  crosses  an  even  count  of  times,  the  point  is  outside  or  out  of  the  shape.  Let's  see  how  that  works.  Let's  go  back  to  our  U  shape.  What  I  did,  I  included  a  column  here  that  would  enable  me  to  also  include  intervals.  We  will  see  how  to  do  this  a  little  later  when  we  talk  about  forest  plots.  I'll  just  look  at  the  finished  product  here  and  you  see  I've  driven  an  interval  plus  or  minus  30  around  B  and  A  enough  to  get  through  the shape.   Go  with  B.  I'm  at  B  right  now.  Pick  a  direction.  I'll  go  right.  I  see  one  crossing,  two  crossing.  That's  a  two.  It's  even.  It  means  it's  outside  the  shape. What  about  A?  Let's  go  same  thing.  Let's  go  right.  1,  2,  3.  Three  is  odd,  so  it's  inside  the  shape.  Very  cool.  Will  this  help  us  with  the  harder  one?  I  bet  it  will.  Let's  take  a  look.  Here  we  go.  Let's  look  at  B.  Go  either  direction.  I'll  go  this  direction.  I'll  go  left  this  time.  1,  2,  only  two.  It's  even,  so  it  means  it's  out. What  about  A?  A's  right  here.  Okay,  there's  one  line  crossing,  two  line  crossings,  three  line  crossings  going  to  the  left.  It's  even,  it's  thin.  That  easy.  Now  you  know  something  cool  you  could  do,  know  what  Ray Casting  is?   You  got  something  that  can  help  amaze  your  friends. Let's  go  have  some  more  fun  with  the  Graph  Builder.  Let's  see  what  we  have  now  in  our  pictures  from  the  Gallery  8.  This  year  we  got  formula- based.  We  have  Tabular  data  that's  been  integrated  with  the  graph.  We  have  a  flow  parallel,  a  special  type  of  parallel  plot.  Forest  plots  that  make  use  of  those  intervals.  Percentile in  the  factor  for  doing  comparisons,  that's  cool.  We  can  even  do  satellite  drill- downs. Let's  dive  right  in.  Now,  I'm  going  to  give  you  this  journal.  Each  journal  is  going  to  have  everything  I'm  showing  you.  It's  going  to  have  pictures.  It's  going  to  have  instructions,  why  it  is  all  hired  tips,  even  the  step  by  step  instructions  on  how  to  do  this  yourself  in  Graph  Builder.  Then  I'm  going  to  give  you  the  raw  data. Now,  with  this  graph  that  we're  going  to  look  at  here,  one  of  the  tips  is  we  need  to  include  a  formula  and  all  its  elements  in  the  data  table. W hat  was  happening  was  my  father  challenged  me  to  help  him  buy  a  garden  hose.  He  was  doing  some  spraying,  so  he  attached  this  little  spray  wand  to  the  end  of  the  hose,  and  he  wanted  to  get  good  water  flow.  He  knew  there  was  a  certain  water  pressure  coming  out  of  the  tap,  but  also   knew  he  could  buy  small  or  long  length  hoses,  and  he  could  buy  small  or  larger  diameter  of  the  hose.   He  wanted  to  see  which  one  worked  the  best. To  do  this,  all  we  had  to  do  was  find  the  formula  and  put  that  formula  into  JMP.  It  is  right  here  under  this  waterflow  rates. T here's  the  constants  that  have  been  customized  for  hoses  and  how  you  work  in  diameter,  pressure,  and  length. Then  I  have  all  of  my  components  for  that  formula.  With  different  size  hoses,  different  diameters.  It  looks  like  I've  got  three  different  diameter  hoses,  three  different  pressures,  four  different  lengths  available. Now  when  I  go  into  the  graph  builder,  now  I  can  just  put  the  water  flow  on  the  Y.  I  want  to  see  length  on  the  X,  maybe  diameter  on  the  overlay [inaudible 00:09:24], there we go .   Then  maybe  I'll  put  water  pressure  on  the  group backs  because  I  know  I  can   right-click  here  and  do  one  level  at  a  time. Now  I've  got  a  smoother  line  that's  not  really  telling  the  full  story.  Neither  is  just  doing  a  line  because  the  line  is  just  showing  me  I'm  just  really  plotting  the  points.  I'm  not  really  plotting  the  plots  between  the  point,  but  I  have  a  formula  here.  I  should  be  able  to.   Yes,  now  I  can  go  in  and  do  a  formula.  You  can  right- click,  change  that  line  to  a  formula,  or  you  can  click  right  up  here  into  the  highlighted  icon.  Now  you  can  see  I've  gotten  the  formula  baseline. Now  I  could  probably  answer  something  about  a  60  foot hose  and  where  I  expect  the  water  flow  to  be  at  a  certain  pressure  and  a  certain  diameter.  It  worked  out  well  that  the  bigger  your  diameter,  0.75,  like  three  quarter,  definitely  was  the  green  line.  It  had  the  best waterflow  performance  and  the  shorter  the  hose  had  the  better  water  flow  performance.  It's  because  as  water  travels  through  a  long  hose,  it  does  rub  against  the  insides  creating  friction  and  that  slows  in  your  water  flow. Now,  the  other  thing  you  might  see  me  do  from  time  to  time  is  you  might  see  me  dragging  pictures  and  it's  literally  as  easy  as  just  grabbing  a  picture,  dragging  that  picture  in  there.  You  can  right- click.  It's  got  an  image  area  under  the   right-click.  You  can  size  and  scale  it.  I  can  fill  the  graph  completely.  I  can  go  right  back  to  it  and  put  some  transparency  so  I  can  make  the  points  pop  on  top,  and  now  you  can  get  a  better  view. That  was  our  first  graph  that  you  probably  didn't  know  you  could  do  in  Graph  Builder.  Been  there  for  a  while,  just  been  hidden  from  many.  Now,  let's  talk  about  something  that  came  in  in  JMP  17,  Tabular  data.  I  want  to  thank  Joseph  Reece  for  helping  me   get  this  inspiration  and  some  support  to  come  up  with  the  best  solution. We're  able  to  create  not  only  reference  lines  in  tables  below  the  graph,  but  they're  actually  integrated  in  with  the  graph.  This  is  something  really  special  they  added  in  JMP  17  to  get  this  integrated  Tabular  data.  Let's  bring  up  this  data  set.  This  data  set  is  chemical  production. In  this  data  set,  I  am  going  to  pull  up  the  graph  builder.  I  am  going  to  put  the  material  vendor  on  the  X,  and  I'm  looking  to  see  if  there's  a  difference  among  those  vendors  in  terms  of  my  rate  of  reaction  of  my  process  when  I  use  their  products.  I  like  box  plots,  I'll  change  out  the  box  plots.  Not  a  lot  I  can  do  to  help  my  comparison  here.  Maybe  I  can  go  in  this  lower  left  hand  side  of  box  plot  element  and  turn  on  these  confidence  diamonds.  Maybe  I  could  color  by  the  rate  of  reaction, maybe  back  means  lower  than  green.  I'm  not  exactly  sure. Now,  Joseph  recommended,  "Hey,  why  don't  you  add  back  in  the  points?"  I'm  going  to  just  right- click  in  here,  add  points.  But  this  time,  instead  of  looking  at  all  the  points,  let's  look  at  a  mean  of  the  points.  Let's  look  at  the  confidence  interval,  lines  up  with  the  ends  of  my  means  diamond,  makes  sense. Instead  of  an  air  band,  now  I  can  do  a  band  or  a  hash  band.  That's  cool  and  that's  given  me  a  better  look.  I  get  a  little  more  confidence  that  Acme  might  be  different  than  green.  It  would  help,  though,  to  have  a  reference  line.  All  I  got  to  do  is  go  into  the  area,  the  graph  area,   right-click,  add  a  caption  box.  These  are  all  hidden  under  caption  boxes.  You're  like,  Well,  I've  done  caption  boxes  and  that's  what  I  expected  to  happen.  It  just  put  the  mean  up  there. Well,  what  we  can  instead  do  is  we  can  change  the  location  now  and  you  can  make  it  an  axis  reference  line.  There  it  goes  right  there  at  the  bottom.  I'm  going  to  go  right  back  into  this  area  and  I'm  going  to  add  a  second  caption  box.  I'll  close  up  these  others.  I  don't  want  to  see,  but  now  I  can  add  the  mean,  not  over  top  the  other  one,  but  my  location  now  could  be  an  axis  table. I  can  even  add  more  summary  statistics  like  maybe  the  standard  error.  I  can  click  on  this  numbers  format.  I  can  go  and  maybe  do  like  a  fixed  decimal  point  and  I'll  do  it  with  two.  I'll  just  say  done. Now  I've  got  a  really  good  view.  All  that's  left  is  for  me  to  clean  up  the  legend.  I'll  go  to  Legend  settings.  I  don't  need  all  these  things,  maybe  just  the  one  that  shows  the  color  gradient.  I'll  go  to  the  position,  drop  it  to  the  bottom.  Then  I'll   right-click  on  it  and  go  to  the  gradient,  and  now  I'm  moving  horizontal  in  this  direction.  That's  what  it  will  look  like.  I  like  that. Now  I've  got  my  graph.  Going  back  under  the  Graph  Builder  hotspot,  I  can  go  to  redo,  go  to  column  switcher,  and  now  I  can  switch  out  the  ready  reaction  with  a  couple  of  the  other  continuous  measures.  Watch  what  happens  when  we  go  from  ready  reaction  to  agitation.  This  recalculates.  All  the  axis  table  recalculates.  Your  reference  line  will  recalculate.  Your  table  of  summary  statistics  at  the  bottom  wind  up  under  the  columns  recalculates.  This  is  a  wonderful  thing  you  can  add  to  your  charts  and  put  these  in  the  dashboards  and  share  these  with  each  other,  even  on  the  cloud.  Things  like  JMP  Live,  this  would  be  awesome.  That  is  actually  Tabular  data. Let's  go  to  our  next.  Let's  go  to  our  next  view.  Our  next  view  is  Flow  Parallel  Plots.  I  want  to  thank  Jeb  Campbell  for  helping  me  with  the  inspiration  and  also  the  solution  for  this.  Might  look  like  a  regular  parallel  plot,  but  I  want  you  to  see  it's  outflowing or  I say inflows,  let  me  get  this  right,  inflows  are  coming  in  to  a  big  bucket  of  budget,  and  then  I  see  outflows  going  out  like  there's  the  taxes  and  here's  savings  and  it  further  gets  split  up. How  do  I  get  these  inflows  and  outflows  into  the  same  parallel  plot?  The  first  thing  I'm  going  to  do  is  in  my  data,  I'm  going  to  make  sure  that  every  branch  of  my  data,  starting  from  the  back  and  going  forward.  I  had  12K  outflow.  That  came  from  a  20K  outflow  1.  Outflow  2  savings  was  12K  it  was  part  of  the  20K  in  outflow  1  along  with  Roth  and  savings,  which  made  up  the  20K.  It  went  into  the  total  of  101K.   Out  of  that  one,  the  inflow  was  part  of  the  money  I  got  from  my  job,  which  was  90K.  But  the  amount  for  this  branch  is  12K.  That  way,  the  amounts  will  add  up  to  the  total,  which  will  be 10 1  K. All  this  is  set  up.  These  are  all  categorical.  If  I  go  into  Graph  Builder,  I'm  just  going  to  lay  out  all  these  categorical  ones  in  the  X.  I'm  going  to  size  by  the  amount.  I'm  going  to  color  by  the  outflow,  and  now  I'm  going  to  select  the  parallel  plot.  You  can   right-click  and  select  it  or  you  can  select  it  from  the  icon.  Now  I  should  have  something  that  looks  familiar  to  what  I  want,  but  this  little  bit  in  here  doesn't  look  like  it's  all  resetting.  I  need  this  to  reset,  right?  The  inflows  go  into  the  big  bucket  and  then  the  outflows  come  from  the  big  bucket. You  do  that  by  clicking  on  this  combine  sets.  When  I  do  that,  it  gets  me  the  right  behavior.  I'm  going  to  say  done.  Let's  take  a  look  at  it  here.  You  can  play  with  the  ordering  here  to  make  it  look  a  little  more  pleasing  to  the  eye.  Now  I  can  pick  one  of  these  outflows  like  this  auto  car  payment  and  I  can  see  the  8K  comes  from  here.  It  was  part  of  a  bigger  auto  K,  which  was  11  K.  That  was  part  of  the  side  hustle  money  went  to  that.  I  can  see  home,  I  can  see  home.  Most  of  that  was  the  home  mortgage.  There  was  2 K  here  for  the  upkeep.  I  can  follow  that  one  all  the  way  back  in  to  see  it  came  from  my  job  money.  But  that's  where  it  came  in  for  out  of  this  total  budget  I  fed.  It's  really  cool.  We  can  do  input  output  boxes,  project  budgets.  There's  a  lot  you  could  do  with  this. All  right,  so  really  cool  view.  Forest  Plots.  As  I  mentioned  before,  intervals  are  a  really  cool  way  to  do  a  lot  of  comparisons.  This  I'm  looking  at  some  mean  comparisons  among  three  of  the  four  Cs  of  Diamond  buying,  clarity,  color  and  cut.  I  have  different  levels  of  them.  I  want  to  see  if  there's  a  difference  as  it  occurs  to  the  price  of  the  Diamond.  Say  you're  shopping  for  a  engagement  Diamond.  What  I  will  do,  I  will  go  pull  up  the  data.  This  is  some  summarized  data.  Again,  I  have  color,  clarity,  price.  You  can  see  I've  got  different  levels  of  those. I  have  the  number,  the  mean  standard  error  in  the  lower  and  upper  95 %  confidence  interval  around  the  mean.  All  that's  just  been  saved  into  a  JMP  table.  I  will  go  to  the  Graph  Builder.  I  will  go  put  my  X,  which  is  my  three  Cs,  three  of  the  four  Cs.  I'm  going  to  put  my  level  right  to  the  right  of  the  X.  Now  they're  lined  up.  That  looks  pretty  nice. I  will  put  the  mean  on  the  X.  Now  I  will  color by  the  level.  Now,  how  to  get  the  intervals?  Well,  there's  an  interval  box.  If  you  only  have  one,  you  can  drop  things  up  into  a  corner.  You  have  to  play  around  with  it  a  little  bit.  You  can  drop  things  up  into  a  corner  of  this.  But  if  you  have  both  sides  of  the  interval,  you  can  grab  them  both  and  put  them  right  there  in  the  interval  box.  I  say  done.  It  did  a  nice  job.  Now  it's  really  easy  to  see  what  groups  together,  what  might  be  statistically  different  from  the  95 %  confidence  interval  compared  to  another  level. I'm  going  to  make  it  easier  on  my  eyes.  I'm  going  to   right-click,  go  to  access  settings  under  that  X.  I  might  show  a  grid  which  should  give  me  a  little  outline  and  I'm  going  to  reverse  the  order.  Now  when  I  did  this,  I  can  see  that  okay,  the  very  fine  clarity,  that  almost  flawless  clarity  and  the  very,  very  slight  imperfections  are  different  than  the  others,  but  it's  different  in  a  bad  way.  They're  actually  cheaper  prices.  That  doesn't  make  sense. Their  clear  diamonds  are  category  D,  so  K  would  be  more  cloudy.  I  can  see  there  is  a  group  which  is  different  than  some  of  the  others.  But  some  of  these  less  clear  diamonds  are  more  expensive.   Maybe  color  is  not  the  right  thing  to  look  at.  But  I  can  see  there  is  a  logical  order  of  cut.  An  ideal  diamond  should  be  cut  better  and  be  worth  more  money  than  the  ones  that  are  not  cut  very  good.   You  can  see  that.  You  can  see  which  ones  are  different.   This  is  a  nice  way  of  doing  means  comparison,  interval  comparisons.  Does  intervals  contain  a  certain  reference  amount.  Intervals  contain  zero.  There's  a  lot  of  ways  to  use  this,  but  you  can  do  forest  plots  now  in  Graph  Builder. We're  cooking  right  along.  Let  me  get  to  the  next  one.  This  is  Percent  of  factors.  If  you  have  ranked  or  scale  data,  this  is  a  great  way  of  doing  comparisons  on  a  zero  to  100 %  scale. My  family  likes  to  visit  all  the  coffee  shops  in  Austin.  Here's  some  old  rating  and  sentiment  data  that  came  from  Yelp.   You  got  ratings  here.  Sometimes  they  got  sentiment  in  here.  It's  a  lot  of  fun.  These  are  all   coffee  houses  that  are  still  open  in  Austin,  we  go  to  some  of  these  and  it's  easy  to  set  up  now. Go  to  Graph  Builder,  just  put  your  levels.  That's  my  coffee  shop.  Put  the  ratings  on  the  overlay.  Don't  have  to  put  anything  on  the  X.  I'm  going  to  ask  for  bars.  Instead  of  side  by  side,  I  go  stacked.  Okay,  am  I  done?  No,  it's  going  from  count  from  zero  to  250.  It's  not  showing  me  for  zero  to  100  %.  How  do  I  do  that?  Really  easily  change  your  summary  statistic  to Percent  of  factors. Change  that,  it  fills  it  in.  Now  you  can  see  it.  Really  nice,  really  interactive.  I  also  had  within  my  data,  I  had  a  low  high  rating.  I  could  see  all  the  high  ratings  were  the  things  where  it  looked  like  I  gave  it  4  stars  or  5  stars.  What's  really  nice  about  that  one  is  now  I  could  come  over  here,  I  can  go  order  by.  I  can  now  order  by  another.  It  doesn't  have  to  be  in  the  graph,  just  has  to  be  in  my  data  table  as  a  column.  I  can  do  that  high  rating,  I  can  say  go,  and  now  I  can  see  that  wow  this  safety,  I'm  saying  that  correctly,  was  the  highest  rating.  Flight  path  coffee  is  one  that  my  family  really  likes.  This  one  right  in  here,  got  a  lot  of  positive  ratings. I  could  even  play  around  with  filtering  by  the  vibes  here.  I  put  vibe  sentiments,  the  review  mentioned  the  word  vibe  and  it  was  positive.  I  selected  it  and  you  can  see  that  flight  path  came  out  pretty  good  as  well.  Good  music,  good  place  to  study,  good  location,  all  just  the  right  vibes,  the  right  crowd,  nice  place  to  hang  out. We  got  time  for  our  last  pictures  from  the  gallery.  We're  going  to  look  at  some  satellite  mapping.  Really,  all  the  mapping  is  changed  in  JMP  17.  You  can  drill  down,  I  think,  in  even  better  detail  now  because  we  switched  to  the  Map  Box  type  math. Remember,  to  do  a  map  in  JMP  Graph  Builder,  you  just  need  positional  data  here  at  latitude  and  longitude.  I'm  going  to  look  into  some  of  these  places  I  stayed  at  and  different  places  in  California.  I'm  going  to  focus  on  this  Delta  King.  You   right-click,  go  to  Graph  Builder  and  you  pick  back  roadmap,  street  map  service,  and  here's  all  your  options. I  have  a  couple  of  these  saved  for  us.  These  are  just  hotels  I  stayed  at  over  the  years.  Here's  a  dark  view  of  just  all  the  US  hotels,  and  I  can  see  where  these  these  things  lie,  that's  a  nice  view.  People  like  dark  mode.  I  can  go  look  at  more  of  a  topographical.  They  have  an  outdoor  view,  so  you  can  get  an  idea,  is  it  in  the  water,  is  it  next  to  a  lake,  next  to  a  river,  that  type  of  thing,  in  a  city. You  can  go  to  the  street  view  as  well.  The  street  view,  again,  works  just  like  you  would  expect.  You  can  drill  down  on  more  detail  to  more  detailed  street  levels.  I  find  instead  of  using  this  plus  or  minus  up  here,  I  like  using  the  Magnifier  tool  in  JMP.  If  I  click  on  this  one,  I  can  see  that  this  Delta  King.  Oh,  my  goodness.  What  is  that?  That's  not  a  hotel.  Now  I  can  go  and  I  can  switch  my  background  graph  away  from  the  streets  and  I  can  give  it  a  satellite. Now  my  satellite  will  go  and  it'll  show  me,  wow,  that's  a  ship.  The  Delta  King  is  one  of  the  old  paddle  wheeled  steamboats  that  used  to  fly  between  Sacramento  and  San  Francisco  and  is  still  there.  It  is  now  a  hotel  you  can  stay  at.  Thanks  to  my  friend,  coworker  Bonnie  Rigo,  who  gave  me  gave  me  a  chance  to  experience  staying  at  the  Delta  King  once.   We  had  a  good  stay  there  in  a  very  unique  hotel. Okay,  so  there's  other  really  cool  views.  I'll  let  you  explore  those,  including  the  Luxor  in  Las  Vegas  and  the  Fountain  Blue  in  Miami  Beach.  I've  got  some  good  ones  in  here,  so  you  can  go  play  with  this  data. But  what  I'd  like  to  do  is  wrap  up  here.  I  did  include  a  bonus  picture  from  the  gallery.  This  is  a  combination  painter  chart,  a  combination  of  line  charts  and  Pareto  charts  and  bar  charts  that  can  be  ordered  and  show  increasing  or  decreasing  performance  of  defect  reduction.  This  was  used  at  Ford  in  the  RATD  program  and  is  very  popular  for  folks  doing  defect  reductions.  If  you  want  to  learn  how  to  do  the  painter  chart?  You  can  do  it  in  JMP  17  in  the  Pareto  platform,  but  in  any  version  of  JMP,  you  can  get  there  just  from  the  graph  builder. All  right.  Where  to  learn  more?  There  are  lots  of  other  pictures  from  the  galleries.  Years  that  we've  gone  and  did  more  views.  Go  look  at  all  of  the  galleries.  We're  on  our  8th.  There's  also  one  through  seven  to  look  through.  You  can  as  well  take  a  look  at  the  blogs  on  the  JMP  community.  There's  a  lot  of  them  that  have  been  done  on  these  graphs  or  on  other  really  cool  views. There's  other  presentations  and  tutorials  and  training.  I  recommend  you  will  have  these  in  the  journal.  Zane  Greg,  the  father  of  the  Graph  Builder,  it's  always  good  to  learn  from.  As  well  as  our  training  resources  in  our  new  Learn  JMP  area  in  the  JMP  community  where  we  have  formal  training  as  well  as  mastering  JMP  training  on  things  like  graph  builders  and  dashboards. If  you  want  to  suggest  views,  please  do  go  to  the  community  and  put  them  in  the  JMP  wish list.   We  get  some  of  our  ideas  from  you  saying,  "Would  it  be  great  if  JMP  Graph  Builder  could  do  this  and  look  like  this?"  This  would  be  so  helpful.   A  lot  of  these  will  make  it  into  releases  of  JMP. All right,  so  we  are  done  with  our  presentation.  I  hope  you  enjoyed  pictures  from  the  Gallery  8.  I  just  want  you  to  go  out  and  enjoy  the  rest  of  the  presentations.  But  for  sure,  go  have  fun  graphing  and  exploring  your  data  in  JMP  Graph  Builder.  Thank  you.
Often as we are trying to gain insights from our data, understanding that two variables are related is not enough. We need to dig deeper and ask questions like: under what circumstances are they related? For whom are they related, why are they related, and how? Moderation (i.e., interactions), mediation, and moderated mediation models allow us to answer these types of questions. These models are popular and important but cumbersome to fit. Furthermore, visualizations essential for understanding interactions are difficult to create from scratch. This presentation will describe the Moderation and Mediation Add-In for JMP Pro, which enables easy specification, fitting, and visual probing of interactions in three popular models: moderation, first-stage moderated mediation, and second-stage moderated mediation. With minimal user input, the add-in automatically specifies and estimates the appropriate model. Then, the results are processed and packaged into ready-to-publish output. An interactive Johnson-Neyman plot, as well as a simple slopes plot, is created. We will provide an in-depth demonstration of these features using an example from psychology. Academics and data analysts across the social, behavioral, educational, and life sciences will benefit from this novel functionality.    Blog post describing the Moderation and Mediation Add-In: https://community.jmp.com/t5/JMPer-Cable/Who-what-why-and-how-Tools-for-modeling-and-visualizing/ba-p/527173      All  right .  Hi ,  everybody .  My  name  is  Haley  Yaremych .  I  worked  at  JMP  this  past  summer  as  a  statistical  testing  intern ,  and  I'll  be  returning  this  coming  summer  in  the  same  role .   This  past  summer ,  I  built  an  Add- in  that  helps  users  fit  and  visualize  interactions .   I'm  excited  to  talk  to  you  all  about  that  today . Okay ,   to  set  up  the  example  that  I'm  going  to  be  using  throughout  the  talk .  Let's  take  a  look  at  this  clip  from  a  website  called  The  Science  of  People.com .   This  clip  reads .  Do  you  know  the  impact  of  your  work  when  we  don't  have  our  why  at  the  front  of  our  mind ?  It  can  be  hard  to  feel  motivated  and  excited  about  what  we're  doing. When  we  get  busy  or  overwhelmed .  The  why  just  seems  to  slip  away .   This  clip  tells  us  that  when  we  feel  that  our  work  has  meaning ,  this  tends  to  lead  to  greater  job  satisfaction .   With  a  structural  equation  modeling  path  diagram ,  we  would  display  that  cause  and  effect  relationship  like  this . But  if  we're  too  overwhelmed  at  work ,  this  relationship  might  weaken .   The  meaningfulness  of  our  work  should  be  related  to  job  satisfaction ,  but  only  if  overwhelm  is  low .   Conceptually ,  we  could  represent  that  like  this .   In  the  social  sciences ,  this  is  what  we  call  moderation ,  because  overwhelm  is  going  to  moderate  that  relationship  between  meaningfulness  and  job  satisfaction .  But  more  widely ,  this  is  known  as  an  interaction .   When  we  find  a  significant  interaction ,  we  need  to  visualize  it  in  order  to  understand  what's  going  on .   To  do  that ,  we  often  need  to  look  at  simple  slopes . A  simple  slope  describes  the  relationship  between  the  predictor  and  the  outcome  at  a  particular  value  of  the  moderator .   In  this  plot ,  we're  taking  a  look  at  the  relationship  between  meaningfulness  and  job  satisfaction  at  three  different  values  of  overwhelm .   The  red  line  is  that  relationship  when  overwhelm  is  low .  The  blue  line  is  when  overwhelm  is  at  its  mean ,  and  the  purple  line  is  when  overwhelm  is  high .   Just  as  we  would  expect ,  the  relationship  between  meaningfulness  and  job  satisfaction  is  the  strongest .  When  overwhelm  is  low  and  won't  overwhelm  is  high ,  that  relationship  weakens . Being  able  to  visualize  simple  slopes  is  a  really  essential  part  of  fitting  and  understanding  models  that  involve  interactions .  But  in  order  to  publish  these  results ,  we  also  often  need  details  about  the  values  of  those  simple  slopes  and  their  statistical  significance  at  different  values  at  the  moderator .  Just  like  I've  shown  here  for  high  and  low  values  of  overwhelm . We  can  also  take  things  a  step  further  beyond  simple  moderation .   This  clip  also  mentions  that  meaningfulness  might  result  in  greater  job  satisfaction  because  it  tends  to  lead  to  greater  motivation  at  work .   There  might  be  a  cause  and  effect  pathway  here ,  and  this  is  what  we  would  call  mediation .  But  again ,  overwhelm  needs  to  be  low  in  order  for  these  benefits  to  play  out .   We  might  expect  that  overwhelm  needs  to  be  low  in  order  for  this  first  effect  to  be  present .   We  would  call  this  first  stage  moderated  mediation .  Or  we  might  think  that  low  overwhelm  is  more  important  for  the  second  effect  to  be  present .  We  would  call  this  second  stage  moderated  mediation . In  these  moderated  mediation  models ,  if  we  find  a  significant  interaction ,  we  still  need  to  probe  that  and  assess  significance  at  different  values  at  the  moderator .  But  this  time ,  we're  interested  in  plotting  and  testing  this  entire  effect  of  meaningfulness  on  job  satisfaction  through  motivation .   We  call  this  the  indirect  effect .  We're  going  to  see  an  example  of  this  in  our  demo  in  just  a  few  minutes . These  types  of  questions  come  up  all  the  time ,  not  only  in  social  science  research ,  but  also  in  other  areas .   Given  their  popularity ,  it's  no  surprise  that  we've  had  a  lot  of  requests  from  JMP  users  to  incorporate  quick  and  easy  ways  of  fitting  and  visualizing  these  types  of  models .   A  lot  of  these  user  requests  mentioned  moderation ,  mediation  and  simple  slopes .  The  Jason  Nieman  Plot  is  an  extension  of  the  simple  Slopes  plot  that  I  showed  earlier ,  and  I'll  get  to  that  in  a  few  minutes .  But  basically  these  are  all  different  jargony  ways  of  asking  for  the  same  functionality . You'll  notice  that  a  lot  of  these  requests  mention  the  process  macro .   The  process  macro  is  a  very  widely  used  tool  for  fitting  these  types  of  models .   It  provides  easy  model  fitting  and  a  lot  of  numeric  output  about  these  models .  But  right  now ,  it  doesn't  provide  visualizations .   The  burden  would  be  on  the  user  to  take  this  numeric  output  and  create  a  graph  with  it  elsewhere .   That  can  be  very  cumbersome  and  error  prone .   This  is  a  really  important  drawback  because  these  graphs  are  essential  for  understanding  interactions . Just  to  give  you  a  sense  of  how  difficult  it  is  for  the  user  to  create  these  graphs  on  their  own ,  these  are  the  formulas  that  underlie  the  two  plots  that  you're  about  to  see  in  the  demo .   Imagine  having  to  code  these  up  yourself .  It  would  be  really  tough .   With  this  add-in ,  we  wanted  to  draw  upon  the  strength  of  the  process  macro  that  make  it  so  popular  so  easy  and  automated  fitting  of  these  models .  But  then  we  also  added  features  that  cannot  be  found  elsewhere  and  that  really  capitalize  on  the  unique  strengths  of  JMP .  Engaging  visualizations  that  otherwise  would  be  really  tough  for  users  to  make  from  scratch . Here's  a  quick  summary  of  the  features  of  our  Add- in ,  as  well  as  what  users  are  currently  up  against .  If  they  want  to  fit  these  models  with  the  structural  equation  modeling  platform  JMP  but  without  the  Add- in .   We've  automated  all  the  details  of  model  fitting  and  without  the  adding ,  there's  a  lot  of  data  preprocessing  that's  often  required  and  it  can  be  difficult  to  specify  the  correct  structural  equation  model . We  also  provide  a  lot  of  numeric  output ,  but  we're  also  going  to  sift  through  that  output  and  do  the  further  calculations  with  it  that  are  needed  to  really  distill  that  output .  Then ,  as  I  mentioned ,  all  visualizations  are  now  automated  so  users  can  avoid  those  complex  formulas . Now  I'm  going  to  JMP  over  to  a  demo  using  the  second  stage  moderated  mediation  model  with  the  Add-in .   Here's  the  model  that  we're  going  to  fit .   Within  JMP ,  I'm  going  to  open  up  our ...  Oops,  I  moved  my  bar  here .  Okay .  I'm  going  to  open  up  our  moderation  mediation Add-in .   I'm  going  to  put  the  second  stage  moderated  mediation  model . Within  the  user  input  window ,  the  first  thing  we  see  is  these  figures .   Like  I  mentioned ,  a  difficult  aspect  of  fitting  these  types  of  models  can  be  understanding  how  to  make  the  JMP  from  what  we  think  is  going  on  conceptually  to  the  statistical  model  that  needs  to  be  fit .   The  goal  of  these  figures  is  just  to  take  that  burden  away  from  the  user ,  and  the  only  input  that  we  need  from  the  user  is  just  to  select  a  variable  for  each  role .   I'm  going  to  do  that  here . Then  optionally  any  number  of  covariates  can  be  added .   By  default ,  any  variables  involved  in  an  interaction  term  are  going  to  be  mean  centered .  But  this  can  be  turned  off  or  they  can  be  centered  around  a  user  specified  value .   Then  those  plots  that  I  mentioned  are  only  going  to  be  shown  in  the  output  if  the  interaction  is  significant  at  alpha  0.05 .  But  this  can  also  be  turned  off . When  I  click  okay  and  I  pull  up  our  output ,  the  first  thing  we  see  is  the  output  from  the  structural  equation  modeling  platform .  But  again ,  this  can  be  a  lot  to  sift  through .   The  goal  of  this  moderation  detail  section  is  to  pull  out  all  the  most  important  parts  of  the  ACM  output  to  do  any  necessary  computations  with  that  output ,  and  then  to  package  everything  into  sentences  that  can  be  easily  understood and   copy  and  paste  it  into  a  publication  or  report . You'll  see  here  we  get  some  details  about  the  conditional  indirect  effects .   Again ,  these  are  very  similar  to  simple  slopes ,  but  now  we're  calling  them  indirect  because  the  effect  of  meaningfulness  on  job  satisfaction  is  traveling  through  motivation . The  next  action  here  is  going  to  be  our  Jason  name  and  plot .   This  plot  really  is  the  state  of  the  art  method  for  probing  an  interaction  because  it's  going  to  provide  a  lot  more  detail  than  the  simple  Slopes  plot  that  I  showed  earlier .   Here  on  the  X  axis ,  we  have  the  moderator ,  so  overwhelm  is  on  the  X  axis  and  then  the  Y  axis  is  going  to  be  the  effect  of  meaningfulness  on  job  satisfaction  through  motivation .  That  indirect  effect  is  what's  changing  as  a  function  of  overwhelm .   We're  looking  at  that  effect  at  each  possible  value  of  overwhelm . We  can  see  that  that  effect  is  weakening  as  overwhelm  increases .  But  this  plot  can  sometimes  be  kind  of  hard  for  people  to  wrap  their  head  around ,  mainly  because  we  have  an  effect  on  the  Y  axis .   As  in  this  example ,  although  most  of  these  effects  are  positive ,  they're  just  becoming  less  positive  as  overwhelm  is  increasing .   This  can  sometimes  be  a  little  confusing .   To  make  things  even  clearer ,  we  added  graph  flights  to  this  plot . When  I  hover  over  this  line ,  I'm  going  to  see  a  graph  fit  that  shows  me  the  effect  of  meaningfulness  on  job  satisfaction  at  this  particular  value  of  overwhelm .   We  can  see  that  when  overwhelm  is  low ,  that  is  that  effect  is  strong  and  positive .   Then  as  overwhelm  increases ,  that  effect  is  weakening .  Until  eventually ,  when  overwhelm  is  really  high ,  that  effect  is  basically  flat .   A  really  nice  advantage  of  JMP  is  that  we  were  able  to  add  these  graph  fits  and  really  aid  user  understanding  here . Another  nice  aspect  of  this  Jason  Neumann  approach  is  that  we  can  calculate  these  significance  boundaries .   This  boundary  is  the  exact  value  of  overwhelm ,  where  this  effect  goes  from  being  statistically  significant ,  which  is  in  blue  to  non-significant ,  which  is  in  red . Typically  there's  going  to  be  two  significance  boundaries .   You  can  see  up  here  that  they  were  both  calculated ,  but  only  one  appears  in  the  plot .   This  is  because  this  plot  is  only  going  to  show  values  of  the  moderator  that  were  observed  in  the  data  set .   We  did  this  for  extrapolation  control . Here  we  can  say  that  as  long  as  overwhelm  is  less  than  about  1.25 ,  there's  going  to  be  a  significant  effect  of  meaningfulness  on  job  satisfaction  through  motivation . Our  final  section  of  output  here  is  going  to  be  a  conditional  indirect  effects  plot .  This  is  a  lot  like  the  simple  slopes  plot  that  I  showed  earlier .   Basically  we're  just  taking  a  few  of  those  graph  plots  and  we're  putting  those  into  a  static  plot .   Same  idea  here .  We  end  up  with  the  same  takeaways ,  but  this  specific  type  of  graph  is  often  needed  for  publication . Some  features  that  aren't  included  in  the  Add-in  right  now  that  we  would  love  to  add  in  the  future .  The  first  is  bootstrapping .   Right  now  these  confidence  bands  are  calculated  mathematically ,  but  finding  them  with  bootstrapping  is  sometimes  preferable .   We  would  love  to  be  able  to  add  that  in  the  future .   We  love  to  add  more  types  of  models . The  process  macro  that  I  mentioned  earlier  offers  dozens  and  dozens  of  model  options .  Here  we  only  have  three ,  but  we  did  choose  the  three  most  popular  types  of  these  models .  But  we'd  love  to  be  able  to  add  more  in  the  future . All  right .   With  that ,  I'm  going  to  go  ahead  and  wrap  up .  Thank  you  so  much  for  your  attention .  You  can  feel  free  to  email  me  with  questions  at  this  address .   I've  also  included  a  link  to  the  JMP  Community  blog  post  that  provides  a  lot  more  detail  than  what  I  had  time  to  get  into  today .   This  is  going  to  go  through  basic  moderation  as  the  running  example ,  which  I  think  will  be  really  applicable  to  anybody  in  any  field  that's  interested  in  testing  and  probing  interactions  with  these  tools .  Again ,  thank  you  for  your  attention .
JMP has a wealth of design of experiments (DOE) options from which to choose. While this array is incredibly powerful, it also has the potential to be a bit intimidating to those who are new to this area. What category of design should I choose from the many possibilities? How do I know what the best one is for my experimental objectives? This talk provides some ideas for how to strategically tackle Step 0 of the process of constructing the right design by considering the following questions: What are the goals of the experiment? What do we already know about the factors, responses, and their relationship? What are the constraints under which we need to operate?   Once these questions are answered, we can match our priorities with one of the many excellent choices available in the JMP DOE platform.     Hi.  I'm  here  to  talk  today  about the  crucial  Step  0 of  Design  of  Experiments. Really  the  idea  of  this  is  to  take full  advantage of  the  wealth  of  different  tools that  are  under  the  DOE  platform  in  JMP. I'll  walk  through   what  we should  be  thinking  about in  those  early  stages  of  an  experiment. If  you  look  at  the  DOE  platform  listing, what  you'll  see  is  that  there's a  lot  of  different  choices. Within  each  choice, there's  many  more  choices. Within  some  of  those, there's  nested  possibilities. If  you're   an  expert in  design  of  experiments, this  wealth  of  possibilities  really  feels like  such  a  wonderful  set  of  tools. I  love  all  of  the  options that  are  available  in  JMP  that  allow  me to  create  the  design  that  I  really want  for  a  particular  experiment. But  if  you're  just  getting  started, then  I  think  this  set  of  possibilities can  feel  a  little  bit  intimidating and  sometimes  a  bit  overwhelming. It  may  be  a  little  bit  like  going to   a  new  kind  of  restaurant that  you've  never  been  to  before. Someone  who's  a  seasoned  visitor to  those  kinds  of  restaurants, loves  all  the  possibilities and  the  wealth  of  options  on  a  big  menu. But  if  you're  there  for  the  first  time, it  would  be   nice  if  someone  guided you  to  the  right  set  choices so  that  you  could  make  a  good  decision for  that  first  visit and  have  it  be  successful. Here's  what  I'm  planning on  talking  about  today. First,  I  think  the  key  to  a  good experimental  outcome  is  to  really  have a  clear  sense  of  what  the  goal of  the  experiment  is. I'll  talk  through  some  different possibilities  of  common  goals for  experiments  that  really  help  us  hone in  on  what  we're  trying  to  accomplish and  what  will  indicate a  success  for  that  experiment. Then  I'll  do  a  quick  walk- through  of  some of  the  more  common  choices  of  design of  experiment  choices  in  JMP, and  then  I'll  return  to  how  do  we  interact with  those  dialog  boxes  that  we  get when  we've  chosen   a  design for  what  factors  to  choose,  the  responses, and  the  relationship  between the  inputs  and  the  outputs. That's   where  we're  headed through  all  of  this, and  I  will  say  that  the  first and  the  third  steps  really  need a  tremendous  amount of  subject  matter  expertise. If  you're  going  to  be  successful  designing an  experiment, you  really  need  to  know as  much  as  possible about  the  framework  under which  you're  doing  that  design. We  want  to in  fact  incorporate subject  matter  expertise  wherever  possible to  make  sure  that  we're  in  fact  setting  up the  experiment  to  the  best  of  our  ability. What  are  we  trying  to  do? I've  listed  here  six  common experimental  objectives. I  think  that  sort  of  gives  you, a  checklist  if  you  like  of  different options  of  things that  you  might  be  thinking of  accomplishing  with  your  experiment. We  might  start  with  a  pilot  study, so  we're  just  interested  in  making  sure that  we're  going  to  get  data of  sufficient  quality for  the  experiments  and  answering the  questions  that  we  want  to  have. We  might  be  interested in   exploration  or  screening. We  have  a  long  list  of  factors, and  we  want to  figure  out  which  one  seem to  make  a  difference for  our  responses  of  interest, and  which  ones don't  seem  particularly  important. We  also  might  want  to  do  some  modeling. Actually  formalizing that  relationship  that  we're  seeing between  inputs  and  responses, and  capturing  it  in  a  functional  form. Sometimes  we  don't  get  the  level of  precision  that  we  need, and  so  we  need  to  do  model  refinement, and  so  that  might  be  a  second  experiment. Then  once  we  have  a  model, we  want  to  use  that  to  actually  optimize. How  do  we  get  our  system  to  perform  to the  best  of  its  capability  for  our  needs. Then  lastly,   there's  a  confirmation  experiment where  we  make  that  transition from  the  controlled  design of  experiments  environment that  we're  often  doing our  preliminary  data  collection  in to  production, and  making  sure  we  can  translate what  we've  seen  in  that first  experiment into  a  production  setting. You  can  see  from  this  progression that  I've  outlined  here, that  we  may  actually  have  a  series  of small  experiments  that  we  want to  connect. We  may  start  off  with  a  pilot  study  to  get the  data  quality  right, then  we'll  figure  out  which  factors  are important,  then  we'll  want to  model  those, then  we'll  want to  use  that  model to  optimize, and  then  lastly  translate  those  results into  the  final  implementation in  production. We  can  think  of  this  sequentially or  for  an  individual  experiment  just tackling  one  of  these  objectives. Now  we  have  some  framework  for  what the  goals  of  the  experiment  are and  how  to  think  about  that, we'll  now  transition  to  looking at  what  some  of  the  common  choices are  in  JMP and  how  they  connect  with  different  goals. I'll  open  up  the  DOE  tab  in JMP, and  you  can  see  that  we've  got the  list  of  possibilities  here where  we've  got  the  nested  options tucked  underneath  some  of  the  main menu  items  that  we  have  here. The  talk  is  only  half  hour,  and  so  I won't  be  able  to  cover  all  of  the  tabs. I've  given  a  brief  description  of  some of  the  tabs  that  I  won't have  time  to  talk  about. Design  Diagnostics  is  all  about  having a  design  or  maybe  several  designs and  comparing  and  understanding the  performance. Sample  Size  Explorer  is  all  about  how  big should  the  experiment  be and  some  tools  to  evaluate  that. Consumer  Studies  and  Reliability  Designs are  really  kind  of  specialized  ones. I'm  setting  those  aside for  you  to  do  a  little  research on  your  own  about  that. In  Consumer  Studies, we're  usually  asking  questions of  consumers,  about  what  their priorities  are,  what  features  they  like. That  tends  to  be a  comparison  between  two  options and  how  they  value  those  choices. Reliability  is  all  about   how  long our  product  will  last. That's  a  little  bit  different than  things  that  I'll  talk  about in  the  rest  of  the  talk. I'll  start  off  with  some of  the C lassical  Designs or   the  general  designs  that  we  have that  have  been  developed. Then  I'll  finish  with  some of  the  JMP  specific  tools that  are  much  more  flexible  and  adaptable to  a  broader  range  of  situations. I'll  start  with   that  bottom  portion of  the  tab. Here  we  are  in  JMP  in  the  DOE  tab, and  I'm  going to  start  with  Classical. You'll  see  that  I'm  tackling  this in  a  little  bit  different  order than  the  list  is  presented  by  JMP. I  think  those  ones  are  presented  by  JMP in   their  order  of  popularity, and  I'm  choosing  to   tackle  them more  from   principles about  how  they  were  developed. In  Classical  Designs, a  Full  Factorial  design is  looking  at  all  combinations of  all  factors  at  all  levels. That  works  nicely if  we  have  a  small-ish  number  of  factors, but  it  can  in  fact  get  a  little  bit out  of  control if  we  have  a  large  number  of  factors, but  it's  exploring the  entire  set  of  possibilities very  extensively. The  next  one  that  I'll  talk  about is  a  Two-Level Screening  design, and  essentially,  what  that's  doing is  it's  choosing a  subset  of  the  two factorial  possibilities, and  it's  a  strategic  subset that  allows  us  to  explore  the  space, but  keep  the  design  size  more  manageable. You'll  notice  that  those first  two  possibilities I've  shown  at  two  levels, and  that's  typical  for  screening  designs. Usually,  we  just  want to  get a  simple  picture of  what's  happening  between the  input  and  the  responses. When  we  want to  start  modeling, then  a  Response  Surface  Design  typically allows  for  exploring  curvature. When  we're  modeling,  three levels or  sometimes  more  than  three  levels can  be  a  good  way  to  understand  curvature and  also  understand  interactions  between the  factors  and  how they  impact  the  response. Alright. T hat's  three  of  the  items under  the  Classic  tab. The  other  ones  are   Mixture  Design. Typically  in  all  the  other  possibilities, what  we  have  is  that  we  can  vary the  individual  factors  separately from  each  other. But  in  a  Mixture  Design  where  we're talking  about  the  composition or  the  proportion  of  the  ingredients, they're  interdependent. If  I  increase  the  amount of  one  ingredient, it  probably  reduces  the  proportion of  the  other  ingredients that  are  in  that  overall  mixture. A bit  of  a  specialized  one when  we're  looking at  putting  together  ingredients into   an  overall  mixture. Taguchi Arrays,  I've  listed  here as  a  kind  of  optimization, and  the  optimization that  they're  interested  in is  making  our  process  robust. Typically  when  we're  in  a  production environment,  we  might  have  noise  factors. These  are  in  fact, factors  that  we  can  control in  our  experiment but  when  we  get  to  production, we're  not  able  to  control  them. Then  we  have  a  set  of  factors that  we  can  control  both  in  the  experiment and  in  production. The  goal  of  Taguchi Arrays  is to  look  for  a  combination of  the  controllable  factors that  gets  us  nice  stable  predictable performance  across  the  range of  the  noise  factors. You  can  see  C1  here has  a  pretty  horizontal  line which  means  it  doesn't  matter which  level  we  are  at for  the  noise  factor, we'll  get  a  pretty  consistent  response. Those  are  the  classical  options. The  next  of  the  items on  this  JMP  design  tab that  I'll  talk  about are  Definitive  Screening  Designs. These  are  specialized  designs that  were  developed  at  JMP, and  they   are  a  blend of  an  exploration  or  screening  design, so  a  focus  on  a  lot of  two- level  factor  levels, and  modeling. You  can  see  with  the  blue  dots, we  have  some  third  levels, so  a  middle  value  for  the  factors that  allows  us  to  get  some curvature  estimated  as  well. It's  a  nice  compact  design that's  primarily  about  exploration and  screening, but  it  does  give  us  the  option for  an  all  in  one  chance  to  do some  modeling  as  well. That's  very  popular  in  a  lot of  different  design  scenarios. The  next  tab  is  Special  Purpose, and  you  can  see  there's  quite  a  long  list of  possibilities  there, and  I'll  hit  some  of  the  more  popular  ones that  I  think  show  up in  a  lot  of  specialized  situations. A Covering  Array  is  often  used  when we're  trying  to  do  testing  of  software. A  lot  of  times  what  causes  problems in  software  is  when  we  have the  combinations  of  factors. This  is  a  pretty  small  design that's  typical  for  Covering  Arrays, so  13  runs, and  we're  trying  to  understand things  about  10  different  factors. What's  nice  about  these  Covering Arrays  is  that  it  gives  us  a  way to  see  all  possibilities of,  in  this  case,  three  different  factors. If  I  take  two  levels  of  each  factor, a  zero  and  a  one, there's  eight  different  combinations for  how  I  can  combine  those  three  factors. All  zeros,  all  ones, and  then  a  mixture  of  zeros  and  ones. I've  highlighted  those with eight  different  underlined, what's  really  nice  about these  Covering  Arrays is  whichever  three  factors  I  choose, I  will  be  able  to  find  all  eight of  those  combinations. There's  10  choose  three  different combinations  of  those  3  factors that  I  might  be  interested  in, and  all  of  them  have  all  of  those possibilities  represented. That's   a  very  small  design that  allows  us  not  so  much  estimation, but  to  check  possibilities  for  problems that  we  might  encounter particularly  in  software. Next,  a  very  important  category of  Space  Filling  Designs. Compared  to  the  other options that  I've  talked  about, which  are  model- based, this  one  just  says,  I  maybe  don't  know what  to  expect  in  my  input  space. Let  me  give  even  coverage  throughout the  space  that  I've  declared and  just  see  what  happens. You  can  see  that  I  have  many more  levels  of  each  of  the  factors. There's  a  lot  of  specialized  choices in  here,  but  they  all  have  this  same  feel of  nice,  even  coverage throughout  the  inputs  face. I  think  these  are  often  used in  computer  experiments or  in  physical  experiments where  we're  just  not  sure  what the  response  will  look  like. I'll  talk  a  little  bit  more  about  that when  we  get  to  the  decision  making  portion in  Step  3  of  the  talk. Next  is   MSA  Design or  a  Measurement S ystem  Analysis, and  this  typically is  associated  with  the  Pilot  Study. Before  I  dive  in  and  really  start to  model  things  or  do  some  screening, it's  helpful  to  understand  some  basics about  the  process  and  the  quality of  the  data  that  I'm  getting. Here,  I  can  sort  of  divide the  variability  that  I'm  seeing in  the  responses and  attribute  it  to  the  operator, the  measurement  device,  or  the  gage, and  the  parts  themselves. Sort  of  understand  the  breakdown of  what's  contributing  to  what  I'm  seeing. That's  very  helpful  before I  launch  into  a  more  detailed  study. Finally, G roup  Orthogonal  Super saturated  Designs are  in  fact,  really  compact  designs. In  this  example,  where  you  have  six  runs, and  we're  trying  to  understand  what's happening  with  seven  different  factors. That  may  seem  a  little  bit  magical, but  it's  a  very  aggressive  screening  tool that  allows  us  to  understand what's  happening  with  a  lot  of  factors in  a  very  small  experiment. It's  important  with  these  designs to  not  have  a  lot  of  factors. If  all  seven  factors  are  doing  something, and  I  only  have  six  runs, I'll  end  up  quite  confused  at  the  end. But  if  I  think  two  or  three  of  them may  be  active, this  may  be  a  very  efficient  way to  explore  what's  going  on  without spending  too  many  resources. Those  are   the  start  here  ones that  I've  talked  through  a  little  bit. Now  I'm  going to  finish with   these  wonderful  tools  in  JMP that  are  more  general  and  more  flexible for  different  scenarios. Custom  Design,  I  think  is  just an  amazing  tool  for  its  flexibility. What's  really  nice  in  Custom  Design is  that  I  have  this  wealth of  different  possibilities for  the  kinds  of  factors that  I  can  include. Continuous  factors,  maybe  I'll  add  in, Discrete  Numeric  ones, and  then  also  Categorical  Factors. I  have  a  lot  of  different  choices so  I  can  put  together  the  pieces, and  if  I'm  not  sure  what  the  design should  look  like in  that  bottom  portion  of  the  list, this   gives  JMP  some  control to  help  guide  me  to  a  good choice. On  the  next  page,  I  have  the  option  about whether  I'm  just  interested in  Main E ffects, whether  I  want to  add some  two  factor  interactions, and  whether  I  want  to  build a  Response S urface  Model, so  more  the  modeling goal  of  the  experiment. This  is  sort  of  an  easy  way to  build  a  design, and  I  have  flexibility  here  to  specify whatever  design  size I feel  would  be  helpful  and  is  within my  budget  to  make  a  design and  the  expertise  of  the  JMP  design  team are  going to  guide  me to  a  sensible  choice. This  is  a  great  way  if  you're  not  sure about  how  to  proceed, but  you're  still  making  some  key  decisions about  what  the  goal  of  the  experiment should  look  like. Next,  the  Augment  tab. If  you  think  back  to  what  I've  talked about  for  the  Experimental  Objectives, you  see  that  there's  this  connection between  the  stages. Maybe  I've  done  some  exploring or  screening, and  then  I'd  like to  transition  to  modeling. Well,  this  allows  me  to  take  an  experiment that  I've  already  run and  collected  data  for, and  then  connect  it  to  the  Augment D esign, assign  the  roles  of  what's  a  response and  what's  the  factor, and  then  add  in  some  additional  runs. There's  some  specialized  ones  here, but  if  I  choose  the  Augment  portion, that  allows  me  to  specify a  new  set  of  factors, perhaps  a  subset  of  what  I  have or  an  additional  factor and  then  also  what  model I  would  now  like  to  design  for. This  is  a  flexible  tool  for connecting  several  sets  of  data  together. Lastly,  Easy  DOE  is  a  great  way  to  get started  for  your  very  first  experiment. It  allows  you  to  build  sequentially and  it  guides  you  through the  seven  different  steps of   the  entire  experiment. It'll  allow  us  to  design  and  define, and  so  that's  figuring  out what  the  factors  are, what  the  levels  are, their  general  nature, then  we  can  select  what  kind  of  model makes  the  most  sense  for  what  we're  trying to  accomplish, then  progress  all  the  way  to  actually running  the  experiment,  entering  the  data, doing  the  analysis and  then  generating  results. This  is  a  wonderful  progression that  walks  you  all  the  way  through what  am  I  trying  to  do? To  having  some  final  results to  be  able  to  look  at. What  I  will  say  is  that  this  is designed  for  a  model- based  approach. What  you'll  see  is  that  all  of  these look  like  they're  going  to  choose a  polynomial  form  of  the  model. That  needs  to  make  sense as  a  starting  point. But  if  that  does  make  sense and  it  does  in  a  lot  of  situations, then  this  is  a  wonderful  option. Just  to  finish  things  up  here, what  are  some  of  the  other  key  questions now  that  I  have  a  goal  I  know  a  particular choice  that  I  want  to  use  in  JMP, what  are  some  of  the  other  key  questions before  I  actually  generate  that  design? A whole  category  is  about  the  factors. We  need  to  use our  subject  matter  expertise to  figure  out  which  factors we  should  be  looking  at. If  we  have  too  long  of  a  laundry  list of  factors, then  the  experiment  necessarily  needs to  be  quite  large  in  order to  understand  all  of  them. That's  going to  have  an  impact  on  how expensive  our  experiment  will  be. If  we  have  too  few  factors, then  we  run  the  possibility of  missing  something  important. What  type  are  they  going  to  be? We  need  to  think  about getting  the  right  subset. As  I  showed  you  in  C ustom  Design, we  have  quite  a  wide  variety  of  different types  of  roles  for  the  different factors  that  we're  looking  at. That's  another  set  of  choices. How  much  can  we  manipulate  the  factors? Are  they  naturally  categorical, or  are  they  continuous? Then  we  need  to  think  about  the  ranges or  the  values  for  each  of  those. Let's  go  to  DOE  and  Custom  Design. Then I'll  just  start  off and  I'll  have  three different  continuous  factors. What  you  can  see  is  I  can  give  a  name to  each  of  the  factors, but  I  also  get  to  declare  the  range that  I  want  to  experiment  in for  each  of  those  factors. A s  you  can  imagine, this  has  a  critical  role  in  the  space that  I'm  actually  going  to  explore. I  need  to  hone  in  on  what's  possible and  what  I'm  interested  in to  get  those  ranges  right. If  I  make  the  range  too  big, then  I  may  actually  have  a  lot  going  on across  the  range  of  the  input and  I   may  not  be  able  to  fully capture  what's  going  on. If  I  make  the  range  too  small, then  I  may  miss  the  target  location and  I  may  get  a  distorted  view of  the  importance  of  that  factor. Here,  this  input  actually  has a  lot  going  on  for  that  response, but  if  I  sample  in  a  very  narrow  range, it  looks  like  it's  not  doing  anything. Lastly,  if  I'm  in  the  wrong  location, I  may  miss  some  features and  not  be  able  to  optimize  the  process for  what  I'm  doing. Again,  the  choice  of  which  factors and  the  ranges, relies  a  lot  on  having some  fundamental  understanding about  what  we're  trying to  do  and  where  we  need  to  explore. The  next  piece  to  talk  about  is the  relationship  between inputs  and  responses. I  will  say  that  one  of  the  common  mistakes that  I  often  see  is  that  we  run an  experiment, and  then  after  the  fact,  people  realize, oh,  we  should  have  collected  this. In  textbooks,  a  lot  of  times, it  looks  like  there's  a  single  response that  we're  interested  in and  we  run  the  experiment to  just  collect  for  that  response. In  practice,  I  think  most  experiments have  multiple  responses and  so  this  is  a  key  decision, is  to  make  sure before  we  collect  that  first  data  point that  we  actually  include the  right  set  of  responses so  that  we  can  answer  all of  the  questions  from  that  one  experiment. Then  we  need  to  think  about  what we  know  about  the  relationship. Is  it  likely  to  be  smooth? Is  it  going to  be  continuous in  the  range  that  we've  selected? How  complicated  are we  expecting  it  to  be? A ll  of  these  have  an  impact on  the  design  that  we're  going  to  have. A  couple  of  common  mistakes about  the  relationship  is, one,  being  a  little  too  confident, so  we  assume  that  we  know  too much  about  what's  going to  happen, and  then  don't  build  in  some protection  against   surprises. Then  also  if  we  have  multiple  responses, not  designing  for  the  most complicated  relationship. If  one  of  them  were  interested in  Main E ffects and  the  other  one  we  think there  might  be  curvature, we  need  to  build  the  design so  that  it  can  estimate  the  curvature because  that's the  more  complicated  relationship. A  first  key  decision  that  I  think is  a  little  bit  hidden  in  JMP  is  that  we have  to  decide  between  model- based, and  that's  usually  sensible if  we're  confident that  our  responses  will  be  smooth and  continuous, and  that  we're  not  investigating too  big  of  a  region, or  should  we  do  space  filling? Space  filling  can  be  a  good  safety  net if  we're  not  sure  what  to  expect, if  we're  exploring  a  large  region, or  if  we  want to  protect against  surprises. I'm  pointing  here  on  the  last  slide. I  have  more  details  about  that  to  a  paper that  I  wrote  with  a  colleague,  Dr. Lu Lu at  the  University  of  South  Florida, where  we  talk  about   the  implications of  that  first  fork  in  the  road, how  do  we  choose  between  model- based and  space  filling, and  what  are  the  repercussions? Then  lastly,  we  need  to  think a  little  bit  about  constraints. Our  input  region, if  we've  declared  some  ranges for  the  different  inputs, that   naturally  seems  like it's  a  square  or  a  rectangle. But  in  that  region, there  may  be  some  portions  where  we  can't get  a  response   or  we  just  don't  care about  what  the  responses  look  like. Imagine  if  I  am  doing  an  experiment  about baking  and  I'm  varying  the  time that  the  cookies  are  in  the  oven and  the  temperature  of  the  oven. I  might  know  that  the  coolest  temperature for  the  shortest  amount  of  time won't  produce  a  baked  cookie. It'll  still  be  raw, or  it  might  be  the  hottest  temperature for  the  longest  time  will overcook  the  cookies. I  want  to  sort  of  chop  off  regions of  that  space  that  aren't  of  interest or  won't  give  me  a  reasonable  thing. In  JMP,  there's  easy  ways to  specify  constraints to   make  the  shape of  that  region  match  what  you  want. The  last  thing  is  all  about  budget, how  big  should  my  experiment  be, and  that's  a  function  of  the  time that  I  have  available and  the  cost  of  the  experiment. In  JMP,  we  jump  to  here. Maybe  I  specify  a  response  surface  model, you'll  see  that  there's  a  new  feature called  Design  Explorer, which  when  I  activate  that, it  allows  me  with  a  single  click of  a  button  to  generate  multiple  designs. I  can  optimize  for  good  estimation, so  D  or  A-O ptimality, or  good  prediction of  the  responses  with  I-O ptimality. I  can  vary  the  size  of  the  experiment and  center  points  and  replicates. If  I  click  Generate  All  Designs, it  will  generate  a  dozen  or  so  designs, which  then  I  can  compare  and  consider and  figure  out  which  one makes  the  most  sense. I  think  understanding  the  budget, thinking  of  that  as  a  constraint, is  an  important  consideration that  we  need  to  have. To  wrap  things  up, just  a  few  helpful  resources. The  first  one  is   a  JMP  web  page that  talks  in  a  little  more  detail about  the  different  kinds  of  designs. It  fills  in  a  lot  of  the  details that  I  wasn't  able  to  talk  about  today about  those  individual choices  on  the  DOE  tab. The   Model-B ased  versus  Space-F illing , that's  the  paper  I  referenced  earlier, where  we  need  to  understand the  implications  of  choosing a  model- based  design or  doing  space- filling, which  is  a  little  more  general and  a  little  more protective if  we  are  expecting  some  surprises. Then  the  last  two  things  are, two White  Papers  that  I  wrote, the  first  one  talks  about  how you  can  use  Design  Explorer to   consider  different  design  sizes and  different  optimality  criteria and  then  choose  between the  different  choices  by  looking at  the  compare  design  option  in  JMP. Then  lastly, everything  I've  talked  about  here is  dependent  on  subject  matter  expertise. The  why  and  how  of  asking  good  questions, give  some  strategies  for  how  to  interact with  our  subject  matter  experts to  be  able  to  target  those  conversations and  make  them  as  productive  as  possible. I  hope  this  has  been  helpful, and will  help  you  have  a  successful  first experiment  using  JMP  software. Thanks.