Choose Language Hide Translation Bar

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Cation-exchange chromatography (CEX) is the industry gold standard for the analysis of biopharmaceutical charge variants. However, the development of CEX methods in a time and resource-efficient manner constitutes a bottleneck in product characterization. CEX separations are complex and governed by multiple factors. Several scientific publications have proven the successful application of design-of-experiment (DoE) in chromatography method development. Nevertheless, performing DoEs with a large number of factors may be challenging, time-consuming, and expensive. This work illustrates the use of a split-DoE approach to aid the development of a CEX method for the analysis of the charge variants profile of a mAb candidate. Analytical method development was intended to provide a high-throughput (HT) CEX method to support charge variants analysis with minimal sample and time requirements. The split-DoE approach is based on fundamental knowledge of the CEX separation mechanism and aims to reduce the number of experimental runs whilst exploring a wide experimental space. Regression modeling was used to study the effect of both individual process parameters and their interactions on the separation efficiency to ultimately identify the optimal method conditions. This study provides an efficient workflow for leveraging the development of CEX methods.     Hello,  everyone.  Thank  you  for  joining  my  talk.  I  am  Giulia  Lambiase,  I'm  a  Senior  Scientist  at  AstraZeneca.  I  work  in   biopharmaceutical  development  in  the  analytical  science  team.  Today,  I  want  to  talk  to you  about  the  use  of  DoE  for  the  development  of  analytical  characterization  methods,  most  especially  chromatography  methods. In  today's  talk,  I'm  going  to  talk  about  therapeutic  proteins,  what  they  are,  and  why  they  are  challenging  for  analytical  testing,  and  introduce  you  to  the  use  of  design  of  experiment  for  analytical  method  development  and  the  application  of  DoE  for  the  development  of  charge  variance  method,  specifically cation  exchange  chromatography  method. To  start  off,  protein  therapeutics  are  inherently  very  complex  due  to  their  larger  size  and  the  presence  of  the post- translational  modification,  and  also  chemical  modifications  that  can  the  protein  undergo  through  during  the  processes  of  expression  in  cells,  purification,  and  storage. Monoclonal  antibodies  dominate  the  biopharmaceutical  market,  representing  about  70 %  of  the   total  sales  of  biopharmaceutical  products.  However,  recently,  there  is  a  push  for  new  products,  next  generation  biopharmaceuticals,  which  are  bispecific  antibodies,  antibody  fragment,  fusion  proteins,  and  many  other  formats.  All  of  them  come  with  unique  challenges  due  to  their  complex  structure  and  presence  of  higher  order  structure,  glyco forms,  charge  variants,  the  sulfate  bonds,  oxidized  deamidated  species,  isomerization,  aggregation,  fragmentation. A ll  of  these  modification,  chemical  process  [inaudible 00:02:37]   modification,  can  impact  on  potency,  safety,  quality  of  the  final  drug  product.  This  is  why  thorough  analytical  characterization  and  analytical  testing  throughout  all  the  stages  of  product  life  cycle  is  key  to  meet  regulatory  standards,  to  be  enabled  to  deliver  a  product  that  meets  regulatory  quality  profile. We  use  a  plethora  of  analytical  techniques  for  analyzing  proteins,  and  these  are  based  mostly  on  chromatography  methods,  electrophoretic  methods,  and  [inaudible 00:03:25] .  Due  to  the  inherent  structural  complexity  of  proteins,  analytical  method  development  can  be  quite  challenging. In  today's  talk,  I'm  going  to  specifically  talk  about  chromatography  methods  and  the  use  of  design  of  experiment  to  help  the  development  of  chromatography  separation.  Chromatography  method  can  be  quite  complex,  especially  if  you  have  a  complex  analyte  like  a  protein.  This  is  because  the  separation  depends  on  the  interplay  of  several  variables  such  as  mobile  phase  composition,  buffer  pH,  flow  rate,  column  chemistry,  temperature,  the  type  of  detector  that  you  decide  to  use  for  the  analysis.  All  of  these  parameters  need  to  be  fine- tuned  and  controlled  during  the  separation  process  in  order  to  achieve  the  desired  separation.  DoE  can  be  very  useful  versus   one  factor  at  a  time  approach. One  factor  at  a  time  approach  involved  the  variation  of  one  parameter  at  a  time,  maintaining  the  other  constant.  This  may  lead  to  a  large  experimental  run,  lack  of  information  because  there's   lack  of  investigation  on  factors  interactions.  Lack  of  information  also  leads  to  additional  experiments  during  method  validation,  which  may  lengthen  even  more  the  method  development  process  and  finally  retard  the  overall  product  development. DoE,  in  comparison  to  one  factor  at  a  time  approaches,  DoE  enables  the  variation  of  multiple  parameters  at  a  time.  This  allow,  with  a  reduced  number  of  experiments,  to  investigate  a  large  number  of  factors,  including  the  interactions  between  them.  Also  the  development  of  mathematical  models  that  allow  the  assessment  of  relevance  and  the  statistical  significance  to  facilitate  all  the  steps  required  during  method  validation.  DoE  enables  really  to  investigate  a  wide  design  space  with  less  resources,  so  in  a  more  efficient  way.  In  fact,  I  like  saying  DoE  enables  faster,  cheaper,  and  smarter  experiments to  deliver  stronger  and  better  analytical  methods. In today's  talk,  I'm  going  to  talk  you  through  a  split  DoE  approach  for  the  development  of  a  cation  exchange  chromatography  method.  Cation  exchange  chromatography  is  used  for  the  analysis  of  charge  variants.  Specifically,  if  you  see  here  on  the  left  hand  side  of  this  slide,  you  can  see  a  chromatogram  of  a  protein  where  you  can  see  some  acidic  species  here  on  the  left,  [inaudible 00:07:17]   on  the  left  of  a  main  species  peak,  and  some  basic  species  peak. All  these  acidic  basic  species   can  be  formed  due  to  the  presence  of  chemical  modification  that  can  lead  to  superficial  charge  distribution  variation  in  the  protein.   Cation  exchange  chromatography  methods  are  quite  complex  chromatography  methods  because  the  separation  efficiency  is  affected  by  a  number  of  factors  and  quite  sensitive  to  small  changes  of  these  factors  such  as  column  chemistry,  mobile  phase  pH,  temperature,  flow  rate,  content  of  salt,  time  of  the  separation. In  this  approach,  I'm  going  to  talk  you  through  an  efficient  way  to  develop   cation  exchange  chromatography  method  using  DoE.  If  you  are  familiar  with  DoE,  you  may  know  that  often  requires  a  sequential  approach.  In  this  experiment,  I  performed  a  main  effects  screening  design  for  enabling  the  selection  of  the  best  column  chemistry  and  the  mobile  phase  pH  for  the  charge  variance  separation  of  this specific  mAb  molecule. During  the  second  DoE,  I  use  response  surface  methodology,  particularly  a  central  composite  design  DoE,  to  optimize  the  chromatography  separation  by  changing  the  flow  rate  and  [inaudible 00:09:36] .  Let's  take  into  more  detail  in  the  first  DoE  experiment.  This  was  a  main  effects  screening  design  where  I  screened  four  column  chemistry  bought  by  four  different  providers,  Agilent,   Sepax, Phenomenex,  and  Waters, and  I  screened  a  range  of  pH  from  5.5  to  6.5. My  response  was  the  experimental  peak  capacity,  which  is  a  parameter  that  tells  you  the  efficiency  of  a  chromatographic  separation,  precisely  the  number  of  peaks  that  can  be  separated  within  the  chromatogram,  the  chromatography  time  that  you  set.  Other  parameters  such  as  concentration  of  buffer,  concentration  of  salt at  the  start  of  the  chromatography  gradient,  flow  rate,  gradient  time,  shape,  temperature,  injection  volume,  concentration,  and  the  UV  absorbance  were  kept  constant. These  are  the  results  for  the  first  DoE.  On  the  left  hand  side,  you  can  see  the  four  different  column  results.  You  can  see  how  the  experimental  peak  capacity  changes versus  the  pH  change  in  the  mobile  phase  in  all  the  four  different  columns.  You  can  see  that  we  aim  to  have  high  experimental  peak  capacity  values.  You  can  see  that  the  Phenomenex  column  performed  best. In  all  of  these  three  columns,  we  can  see  that  pH  of  6.5  enables  greater  experimental  peak  capacities.  But  the  Phenomenex   column  allowed  for  better  separation  results.  It  is   also  visible  on  the  right  hand  side  of  this  slide  in  the  panel  A.  You  can  see  at  pH  6.5,  how  the  separation  differs  when  using  different  chromatography  columns. We  have   Agilent ,  Waters ,  Phenomenex ,  and  Sepax .  Definitely,  the  separation  of  the  charge  variants  using  the   Phenomenex   column  is  much  better  than  in  the  others  because  these  acidic  peaks  are  very  well  separated  as  well  as  these  basic  species  here  from  the  main  product  peak. Panel  B,  we  have  isolated  only  the  results  of  the   Phenomenex   column.  How  the  chromatography  separation  was  that  with  the  mobile  phase  or  with  pH  5.5, 6.0, and  6.5.  We  can  see  how  the  separation  improves  with  the  increase   in pH.  Obviously,  the  mobile  phase  pH  is  dictated  by  the  intrinsic  molecule  pI.  We  could  only  investigate  this  range.  Otherwise,  the  molecule  would  have  struggled  to  find  its  own  column. Based  on  our  fundamental  knowledge  of  chromatography  separation  with  cation  exchange  columns,  we  decided  that  this  parameter,  so  this   Phenomenex   column  and  pH  of  6.5,  were  optimal  to  carry  on  development.  We  carried  on  with  the  second  DoE  using  a  central  composite  design. Central  composite  design  is  a  type  of  DoE  falling  within  the  umbrella  of  response  surface  methodology,  which  is  used  for  optimized  conditions  for investigating  the  presence  of  curvature,  for  instance,  and  extrapolate  optimal  values.  In  this  case,  we  use  our   Phenomenex   column  and  mobile  phase  pH  of  6.5, and  started  to  play  with  other  parameters  such  as  buffer  concentration,  concentration  of  the  salt  at  the  start  of  the  gradient,  and  flow  rate  to  investigate  optimal  conditions. Central  composite  design  enabled  to  very  efficiently,  with  a  few  number  of  runs,  to  identify  optimal  separation  conditions,  optimal  method  conditions.  In  fact,  at  the  very  end  of  the  split  DoE  approach,  we  could  say  that  with  the  investigation  of  four  column,  mobile  phase,  pH  range,  salt  composition,  gradient  flow  rate.  With  only  27  experimental  runs,  we  could  optimize  a  method  for  a monoclonal  antibody.  This  method  is  very  useful  because  it  is  now  used  as  a  quick,  high  throughput  screening  experiment.   In  a  quick,  high  throughput  analytical  method  for  screening  differences  in  the  charge  variance  profile  of  these  specific  molecules  expressed  in  different  conditions  and  compare  it  to  a  standard. You  can  see  here  that  the  blue  line  is  our  reference  standard  and  the  red  line  is  a  stress  material  of  the  same  molecule.  You  can  see  how  the  charge  variance  profile  changed  as  a  consequence  of  the  stress  condition  applied  to  this  molecule.  This  was  achieved  thanks  to  this  analytical  method  which  was   developed  and  optimized  with  a  DoE  approach. We  also  decided  to  implement  this  DoE  approach  as  a  platform  workflow  for  analytical  method  development  for  new  products,  new  bio pharmaceuticals, and  we  screened  a  number  of  products.  For  all  of  them,  we  applied  first  the  first  main  effect  screen  design,  and  we  identified  the  best  column  and  mobile  phase  pH  to  use.   Secondly,  we  applied  the  central  composite  design  to  optimize  the  separation. Now,  we  have  identified  a  platform  column  and  a  mobile  phase  composition  for  this  class  of  therapeutics.  When  new  molecules  comes  into  the  pipeline,  we  can  very  quickly,  just  by   using  a  central  composite  design,  which  involves  actually  just  12  runs,  optimize  the  chromatography  profile  and  deliver  an  optimal  cation  exchange  method  for  a  specific  product. The   key  take- home  messages  from  my  talk  today  is  that   DoE  system  method  development  followed  by  appropriate  statistical  analysis  enables  to  plan  experiment  based  on  time,  cost,  and  analytical  resources  available  very  efficiently,  and  schedule  the  execution  of  experiments  with  adequate  sample  type  and  size  to  extrapolate  the  maximum  amount  of  information  from  our  chemical  data and  efficiently  address  the  challenges  and  goals  of  the  intended  research. It  definitely  saves  time  and  cost  for  experiment  execution  in  comparison  to  one  factor  at  a  time  approaches.  Most  especially,  it  allows  the  complexity  of  analytical  method  development,  but  still  interrogating  several  factor  at  a  time  and  studying  the  effect  of  both  individual  method  parameters  and  the  interaction  on  the  dependent  variable. With  today's  talk,  I  hope  I  inspired  you   to  apply  more  DoE  in  your  experiments.   Thank  you  very  much,  everyone,  for  your  attention.  If  you  have  any  questions,  feel  free  to  reach  out  to  me.  Thank  you.
In catalyst development, tests to measure performances can be extremely time-consuming and expensive. In this study, we have explored the possibility of using faster and less expensive characterization data to validate mathematical models linking production parameters and performances. Specifically, we first modeled the production parameters to the performance data available in our dataset, using generalized regression, and from this model, we predicted a set of optimal production parameters. We then analyzed the Infrared Spectroscopy (IR) data using the Wavelet model in Functional Data Explorer plugging the production parameters as additional parameters. Thanks to this addition we were finally able to generate a synthetic spectrum for an optimal catalyst. The generated spectra combined with the predicted production parameters can be used by the scientist to more quickly understand the underlying mechanisms driving performances. Finally, a new catalyst material developed using the predicted parameters can be analyzed using IR and the synthetic spectra can be used to validate the model.     Hello,  I'm  Chris   Gotwalt with  JMP,  and  my  co- presenter,  Giuseppe  De  Martino  from  Topsoe ,  and  I  are  giving  a  presentation  that  tells  the  story  of  a  data  analysis  project  that  showcases  the  new  wavelet  analysis  in  the  Functional  Data  Explorer,  one  of  the  most  exciting  new  capabilities  in  JMP  17. The  case  study  begins  with  a  product  formulation  problem  where  Tops oe  wanted  to  design  a  catalyst  that  optimizes  two  responses,  but  the  responses  are  in  conflict  with  one  another  in  that  improving  one  response  often  comes  at  the  expense  of  the  other. A  candidate  for  the  optimal  tradeoff  was  found  with  models  fit  by  the  generalized  regression  platform,  and  the  optimal  factor  settings  were  found  using  the  desirability  functions  in  the  profiler.  This  was  a  fairly  standard  DoE  analysis.  But  in  addition  to  the  measured  responses,  NIR  spectra  were  taken  from  some,  but  not  all  of  the  sample  batches.  This  is  information  that  can  be  used  to  give  a  clue  to  the  R&D  team  about  what  the  chemical  structure  of  the  ideal  formulation  should  look  like. In  addition  to  the  GenReg  model  of  the  responses,  we  also  used  wavelet  models  as  the  basis  function  of  a  functional  DoE  analysis  of  the  spectra  using  the  DoE  factors  as  inputs.  We  were  then  able  to  get  a  synthetic  spectra  of  the  optimal  formulation  by  plugging  in  the  optimal  factor  settings  found  in  the  initial  analysis  of  the  two  critical  responses. Before  going  into  the  presentation,  I  want  to  point  out  that  at  the  beginning  of  the  project,  Giuseppe  was  very  new  to  JMP  and  didn't  have  a  background  in  this  type  of  statistical  analysis.  Giuseppe  learned  all  he  needed  to  do  the  analysis  on  his  own  after  a  couple  of  web  meetings  with  me. This  obviously  shows  he's  a  clever  guy,  but  also  that  JMP  makes  learning  how  to  do  some  very  sophisticated  data  analysis  projects  quick  and  easy.  Now  I'm  going  to  hand  the  show  over  to  Giuseppe. Thank  you,  Chris.  Here  is  some  background  about  our  project.  It's  a  catalyst  development  project.  Therefore,  we  are  developing  many  different  recipes.  Each  recipe  has  a  unique  set  of  production  parameters,  and  once  the  sample  is  prepared,  we  characterize  it  in  our  analysis  lab  in   Topsoe.  And  finally,  we  do  a  performance  test. During  the  performance  test,  we  look  for  two  values  that  here  we  call  Response  1  and  Response  2.  In  this  specific  case,  we  are  trying  to  minimize  response   1 while  maximizing  response  2.  That  will  lead  us  to  this  ideal  space  in  the  top  left  corner  of  the  graph.  But  as  you  can  see,  the  55  samples  that  we've  tested  get  stuck  in  the  middle  of  the  graph.  That  is  because   Response 1  and   Response 2  are  inter- correlated,  meaning  that  improving  one  comes  with  the  expenses  of  the  other. Therefore,  we  move  to  JMP,  and  we  try  to  look  at  our  response  data  and  our  characterization  data,  try  to  see  if  we  can  move  away  from  this  line  in  the  middle  of  the  graph.  We  identified  two  targets  areas  that  we  want  to  reach,  and  together  with  Chris,  we  thought  about   using  JMP  to  create  a  model  that  would  connect  the  production  parameters  to  the  response  values. T hen  we  further  looked  into  our  infrared  spectroscopy  data  to  try  to  validate  our  model  and  to  get  some  extra  information  about  the  target  samples. Here  is  an  overview  of  the  data  set.  We  have  produced  112  samples.  Each  sample  has  a  unique  set  of  production  parameters.  We  have  analyzed  all  the  samples  using  infrared  spectroscopy.  We  have  one  spectrum  for  each  sample. Then  we  have  used  many  other  characterization  techniques  that  we  have  in- house  that  accounts  for  21  more  columns  of  data.  Finally,  we  have  tested  half  of  the  samples  and  that  accounts  for  the  last  two  columns  that  we  called  response. At  the  beginning  of  our  project,  we  actually  wanted  to  include  the  infrared  spectroscopic  data  in  our  larger  data  set,  and  that's  why  we  wanted  to  use  JMP,  because  now  we  have  this  new  wavelet  model  possibility.  And  that  would  enable  us  to  include  the  principal  components  coming  from  the  wavelet  model  in  our  data  set.  And  we  could  use  that  to  create  models  in  JMP.  But  before  we  start  our  analysis,  we  need  to  have  a  look  at  the  Pro  data  to  find  outliers.  We  do  that  by  clicking  Analyze  and  the  Multivariate  Methods,  Multivariate. H ere  we  can  select  our  production  parameters  and  characterization  data  as  Y  columns,  and  we  get  this  scatterplot  matrix.  This  is  an  example  of  just  looking  at  the  production  parameters  at  what  we  would  identify  as  an  outlier.  We  can  see  that  there  is  a  set  of  points  in  production  parameters  too  that  is  far  away  for  all  the  other  points.  Furthermore,  we  have  background  knowledge  about  these  samples  that  we  know  wouldn't  be  optimal  for  our  catalyst  development,  so  we  decided  to  right  click  and  say  "Row,  hide  and  exclude." We  did  this  also  with  other  points  looking  at  the  scatterplot  metrics  of  all  the  characterization  data.  Now  that  we  have  cleaned  up  the  data,  we  can  fit  a  model.  We  click  Analyze  and  Fit  Model,  and  we  select  the  production  parameters  as  variables  in  our  model.  We  click  Macros  and  response  surface  to  create  a  second  polynomial  combination  of  these  variables.   Then  we  select  our  responses  as  Y  values. Then  we  decided  to  use  generalized  regression.  H ere  Chris  can  add  some  more  info  about  why  we  decided  to  use  this  specific  type  of  model. We  used  a  quadratic  response  surface  model  because  the  design  had  three  or  more  levels  for  each  factor,  so  I  knew  that  we  would  be  able  to  fit  curvature  terms  if  necessary  and  also  be  able  to  fit  quite  a  variety  of  different  interaction  terms.  We  use  the  generalized  regression  platform  because  it  does  model  selection  with  non  Gaussian  distributions  like  the  log  normal.  In  my  opinion,  there  aren't  many  reasons  not  to  use  the  generalized  regression  platform  if  you  have  JMP  Pro  because  it  is  so  easy  to  use  while  in  many  ways  being  so  much  more  powerful  for  DoE  analysis  than  the  other  options  in  JMP  and  JMP  Pro. After  that,  we  can  select  our  distribution.  We  know  that  the  responses  are  going  to  be  strictly  positive,  so  we  select  the  log  normal  distribution  and  then  we  click  Run,  and  we  say  no. In  this  slide,  we  can  see  that  we  have  now  created  a  model,  but  we  have  also  the  possibility  of  creating  other  type  of  models  using  different  estimation  methods.  We  decided  to  use  best  subset.  Here,  Chris  can  add  some  more  words  about  it. Well,  so  here  we  use  best  subset  selection  because  the  full  model  isn't  terribly  large,  so  why  not  try  every  possible  subset  of  that  full  model  and  find  the  one  that  provides  the  absolute  best  tradeoff  between  accuracy  and  model  simplicity? On  the  other  hand,  had  there  been  eight  or  more  factors,  I  would  have  used  a  faster  algorithm  like  forward  selection  or  pruned  forward  selection  because  with  larger  base  models,  it  would  take  a  very  long  time  to  fit  every  possible  submodel  to  the  data.  We're  going   to  be  using  the  AICc  model  selection  criteria  to  compare  GenReg  models. The  AICc  allows  you  to  compare  models  with  different  effects  in  them  as  well  as  different  response  distributions.  With  the  AICc,  smaller  values  are  better  and  the  rule  of  thumb  I  use  is  that  if  a  model  has  an  AICc  value  that  is  within  4  of  the  smallest  AICc  value  seen,  then  those  two  models  are  practically  identical  in  quality  of  fit. If  the  two  models  have  AICc  values  within  10  of  each  other,  then  they  are  statistically  similar  to  one  another.  The  main  point  here  being  that  if  we  have  two  models  and  their  AICc's  differ  by  more  than  10,  then  the  data  are  pretty  strongly  suggesting  that  the  one  with  the  smaller  AICc  is  the  better  model  to  be  working  with. As  with  any  individual  statistic,  you  should  view  the  A ICc  as  a  suggestion.  If  your  subject  matter  experience  strongly  suggests  one  model  over  the  other,  you  may  want  to  trust  your  instincts  and  ignore  the  recommendation  of  the  A ICc. Once  we  have  created  this  new  model,  we  can  see  that  the  non- zero  parameters  have  now  dropped  from  16  to  nine.  If  we  want  to  compare,  if  the  model  has  improved,  we  can  look  at  the  A ICc  values.  We  can  see  that  there  is  an  improvement  of  more  than  10,  which  is  an  important  difference.  Therefore,  we  decided  to  go  with  the  best  subset. We  did  the  same  for   Response 2,  and  then  we  moved  on.  Now  we  can  click  on  the  red  arrow  and  say  profiler.  In  the  profiler,  we  can  play  around  with  the  production  parameters  and  see  how  the  model  is  expecting   Response 1  and   Response 2  to  change.  This  is  already  a  great  tool  for  the  scientist  to  understand  how  the  model  is  expecting  the  responses  to  vary,  but  we  can  do  more.  We  can  click  on  optimization  and   desirability  and  desirability  functions. Since  from  slide  one,  we  know  that  we  have  two  targets  that  we  want  to  reach,  we  can  change  the  desirability  function  to  match  those  targets.  So  we  double  click  on  the  Desirability  function and  we  say  match  target  and  select  the  target  area  that  we  want  to  reach.  Finally,  we  can  click  again  on  optimization  and  desirability  and  say  maximize  desirability. Here,  the  profiler  will  try  to  reach  the  optimal  points  for  the  production  parameters.  To  summarize,  we  can  say  that  now  we  have  the  first  model,  we  go  from  production  parameters  to  responses  and  we  have  set  two  targets  that  we  want  to  reach.  This  way  we  can  get  ideal  production  parameters  that  we  can  communicate  to  the  development  team  and  they  can  use  to  move  on  in  their  research. In  the  second  part  of  the  presentation,  I'm  going  to  talk  about  how  we  use  the  IR  Spectra.  Here  we  have  a  file  for  each  spectrum  and  therefore  we  need  to  click  on  file  and  import  multiple  files.  Then  we  need  to  specify  that  we  want  the  file  name  to  be  columned  in  the  data  set  and  then  we  can  use  this  sample  name,  which  is  the  name  of  the  file  as  an  ID  to  connect  it  to  the  other  table  where  we  have  all  the  data.  We  click  on  the  column  and  say  link  reference. Now  that  the  two  tables  are  connected,  we  can  click  Analyze  and  specialize  modeling  and functional  data  explorer.  In  the  Functional  Data  Explorer,  we  want  to  use  the  intensity  value  as  the  Y  output.  We  can  use  the  sample  name,  matrix  name  as  our  ID  function. Then  this  is  very  important.  We  use  the  production  parameters  as  supplementary  data.  And  the  weight  number  is,  of  course,  the  X- axis.  And  we  say,  "Okay,  here  we  can  see  that  the  data  is  already  clean."  We've  imported  all  the  Spectra  and  the  data  looks  clean  because  I've  done  the  preprocessing  outside  of  JMP.  I  used  Python  because  I'm  more  familiar  with  that  and  there  is  a  very  nice  module  that  is  able  to  remove  the  background  and  reduce  the  range  that  we  want  to  look  at. JMP  was  good  to  work  with  as  an  extra  tool  after  this  reprocessing.  Then  we  decide  to  click  on  models  and  wavelets.  We  move  from  discrete  data  to  continuous  data.  Now  we  can  also  look  at  the  diagnostic  plots.  This  is  for  you,  Chris,  to  take talk  about. It's  a  good  idea  to  look  at  actual  by  predicted  plots  as  you  proceed  through  a  functional  data  analysis.  These  have  the  actual  observed  values  in  the  data  on  the  Y  axis  and  the  predicted  values  on  the  X  axis.  We  want  the  predicted  values  to  be  as  close  to  the  actual  values  as  possible.  A  plot  like  this  one  that  is  tight  along  the  45  degree  line  indicates  that  we  have  a  good  model. Now,  some  of  you  may  be  concerned  about  overfit  since  the  predictions  fit  the  data  so  well.  In  my  experience,  I  haven't  found  that  to  be  a  problem  in  the  basis  function  fitting  and  functional  principal  component  steps  of  functional  DoE  analysis.  I'd  also  like  to  point  out  that  in  JMP  17,  we've  added  a  lot  of  new  features  for  spectral  preprocessing  like  standard  normal  variant,  multiplicative  scatter  correction,  and  Savitzky-Golay filters .  Those  of  you  that  don't  know  Python  have  access  to  these  capabilities  in  JMP  Pro  17. After  that,  we  can  also  have  a  look  at  the  Functional  PCA  analysis.  Here,  we'll  spend  some  more  words  on  it. After  the  wavelet  model  is  fit,  JMP  Pro  automatically  does  a  functional  principal  components  analysis  of  the  wavelet  model.  This  decomposes  the  spectra  into  an  overall  mean  spectra and  a  linear  combination  of  shape  components  and  coefficients  that  are  unique  to  each  sample  spectra.  When  we  do  the  functional  DoE  analysis,  GenReg  automatically  fits  models  to  these  coefficients  behind  the  scenes  and  combines  the  resulting  model  with  the  mean  function  and  the  shape  components  to  predict  the  shape  of  the  spectra  at  new  values  of  the  input  parameters. If  we  look  at  the  principal  component  analysis,  we  can  see  that  the  wavelet  model  has  created  a  mean  function  of  all  the  spectra  that  we've  set  as  input,  and  then  it  has  created  different  shape  functions.  We  decided  to  stop  at  six.   What  this  shape  function  described  is  the  variation  of  the  data  that  we  are  analyzing. As  you  can  see  from  the  left,  the  first  shape  function  is  accounting  for  72 %  of  the  variation,  while  the  second  is  accounting  for  22 %.   Together,  the  account  called  six  account  for  99.5 %  of  the  variation.   As  an  example,  we  can  look  at  principal  component  2,  and  we  can  see  that  around  3,737,  there  is  a  reduction.   There  is  a  minus.  That  means  that  increasing  the  principal  component   2 will  decrease  the  peak  at  3,737. This  is  just  an  example  to  say  that  already  from  these  six  shape  functions,  we  can  get  a  lot  of  information.  If  we  have  subject  matter  knowledge,  about  the  infrared  spectroscopy  and  this  catalyst  system.  Already  here,  we  spent  quite  a  lot  of  time  looking  at  the  principal  components,  but  this  is  not  what  we  are  going  to  focus  on  in  the  next  slides. What  we  want  to  look  at  instead  is  the  functional  DoE  analysis.  Here  we  have  as  well  a  profiler,  but  we  can  now  plug  the  production  parameters  that  we  got  from  the  first  model  that  we  developed.  Therefore,  knowing  the  target  production  parameters  that  we  want  to  use,  we  can  generate  a  fake  spectrum  or  a synthetic  spectrum,  we  can  call  it.  This  is  a  spectrum  of  a  sample  that  was  never  produced.  It  could  be  wrong,  but  it  can  give  some  ideas  to  the  scientists  about  what  you  would  expect  to  get  from  these  new  production  parameters. To  sum  up,  we  can  say  that  now  we  move  from  production  parameters  to  infrared  spectrum.  We  have  a  second  model  that  uses  the  wavelet  model  to  generate  synthetic  spectrum.  I  imagine  this  to  be  like  when  you're  baking  a  cake,  now  you  have  the  recipe  but  you  got  also  a  snapshot  picture  of  the  final  cake.  It  doesn't  explain  you  how  to  do  it  but  it  adds  information  about  what  you  want  to  achieve. Finally,  we  can  move  from  model  to  test.  That  means  that  we  can  give  the  R&D  team  a  new  recipe  and  also  the  synthetic  spectrum  of  that recipe.   Together  with  this  information,  they  can  try  to  develop  the  new  catalyst  and  see  if  the  model  is  validated  or  is  wrong. Another  thing  the  group  can  do  is  look  back  at  the  previous  samples.  Now  we  have  half  of  the  samples  that  were  not  tested.  Those  are  the  black  dots  in  the  slide.   We  can  look  for  outliers.  Is  there  a  sample  that  could  perform  really  good  that  we  haven't  looked  at?  We  actually  have  one.  So  that's  another  test  we  can  do. Looking  at  future  work,  I  added  this  slide  that  was  at  the  beginning  just  to  say  that  we  focused  on  the  production  parameters  and  the  IR  spectra,  but  we  haven't  really  looked  at  these  21  more  columns  of  characterization  data. In  the  future,  we  could  spend  some  time  trying  to  identify  the  most  predictive  parameters  in  these  21  columns  and  create  maybe  a  new  model  from  this  characterization  data  to  the  response  and  use  this  model  as  a  screening  model  to  avoid  testing  samples  that  would  not  perform  as  good.  That's  the  end  of  the  presentation.
Saturday, March 4, 2023
cipal Technology and Development Engineer, Microchip Technology Rousset   To address the new space market focused on cost reduction, the use of COTS (circuit on the shelves) products is a good option. To be compliant with space reliability, COTS must be evaluated and modified to meet space agency specifications, especially on Single Event Latchup (SEL). This effect occurs by the strike of a heavy ion on the circuit. The transmitted energy to the matter (LET) triggers the parasitic thyristor and induces the latchup. The SEL sensitivity is characterized by the LET threshold and the holding voltage criteria. Until now, to evaluate LET threshold, TCAD simulation and experimental tests were performed, but TCAD is time consuming and irradiation sessions are very expensive. An analytical model LET threshold is a solution to obtain a quick estimation at lower cost and could help to harden the product to radiation. JMP is well adapted to assist in building and analyzing the results of a design of experiment (DOE). Its profiler is adapted to find best combination of the inputs to meet LET threshold criteria.     Hello,  I  am  Laurence  Montagner.  I'm  working  for  Microchip Technology Rousset.  I  am  in   Aerospace  &  Defense  Group.  I'm  going  to  show  you  how  we  use  JMP  to  develop  a  predictive  single  event  latchup  model. This  project,  called  SELEST,  started  four  years  ago  to  develop  an  internal  SEL  prediction  tool.  This  work  was  funded  by  the  CNES,  the  French  space  agency.  Two  posters  were  presented  at  RADECS  conferences  in  2019  and  2021.  In  the  first  poster,  it  was  a  new  approach  and  the  feasibility  of  using  an  analytical  model.  That's  what  we  are  talking  about  today  and  that's  how  I'm  going  to  present.   The  second  poster,  it  was  with  a  more  accurate  model  using  a  neural  network  approach. The  context,  to  address  the  new  space  market  that's  been  low  cost,  that  means  that  our  circuits  are  going  to  be  launched  in  low  Earth  orbit.  We  are  going  to  reuse  COTS— circuits  on  the  shelves —to  make  them  radiation  tolerant  to  meet  space  agency  specifications.   We  need  to  analyze  a  lot  of  product  to  know  if  we  can  make  them   [inaudible 00:01:33]   tolerant.  W e  need  a  model  and  a  predictive  tool  to  gain  the  time  and  money  before  any  experimental  tests. When  they  are  sent  in  space,  those  circuits  are  under  radiations.  We  have  several  source  of  radiations:  the  sun,  cosmic  rays,  and   Van Allen belts.  The  sun  and  cosmic  rays  emit  electron,  protons,  and  ions.  As  for   Van Allen belts,  the  inner  belts  emit  proton  and  the  outer  one  emits  electrons.  Those  particles,  when  they  strike  our  circuits,  causes  damages. We  have  two  family  of  damages.  One,  the  TID— the   total ionizing dose—w e  don't  talk  about  it  today.  We  are  going  to  focus  on  SEE,   single event  effect.  We  are  going  to  focus  more  specifically  on   single event  latchup.  This   single event   latchup leads the   component  to  the  destruction.  That's  why  we  need  absolutely  to  predict  this  phenomenon. The  mechanism  on  the  single  event  latchup  is  very  similar  to  the  electrical   latchup,  but  it  is  not  provoked  by  the  same  causes.  In  the  single  event   latchup,  this  is  when   heavy   ions is  striking  sensitive  devices  as inverters   [inaudible 00:03:33] .  In  the  worst  case,  it  is  striking  in  the  middle  of  these  devices,  and  it  triggered  a  parasitic thyristor,   composed  here  of  an  NPN  and  PNP   bipolars. When  the  supply  of  the  circuit  is upper,  that's  the  Vhold  of  this  parasitic  thyristor.  At  that  time, the thyristor   is  still  on,  and  we  can  lead  to  destruction.  As  you  have  understood,  this  parameter  of   Vhold  is  very  important  for  us. I  just  talked  about  energy,  the  energy  of  the  heavy   ions.  There  is  a  physical   quantity very  important,  and  is   criteria  of  a space  agency.  This  is  the  LET,  linear  energy  transfer.  This  is  amount  of  energy  lost  in  the  matter  by  unit  track  lengths.  For  ESA,  European  Space  Agency,  circuit  is  said  to  be  immune  to   latchup when  this  value  is  above  60   mega-electronvolts centimeters square  per  milligram. Our  objective  is  to  be  able  to  predict  the  V hold  and  the   LET threshold  of  a  circuit, so  we  need  to  build  a  model.  We  use  TCAD   Sentaurus to  run  some  simulation  to  build  the  model  with  a  DOE.  For  the  DOE,  we  need  to  define  the input  and  outputs.  As  you  have   guessed  output,  V hold  and  LET threshold.  Regarding  input,  for  the  first  try,  we  decide  to  define  four  output,  two   from  the  process,  epi  thickness  and  epi do se,  and  two  from  the  design,  the  length  between  the  two   well of  the  test  structure  of  the  inverter,  and  the  length  between   the  top  and  the  well . Keep  in  mind  that  if  the  Vhold  obtained  by  simulation  is  upper  of  the  supply  of  the  circuit,  at  that  time,  the  circuit  is  immune  to  SEL.  If  the  Vhold  is  inferior  to  the  supply  of  the  circuit  at  that  time,  it  is  possible  to  have  a  single  event  latchup,  and  we  are  very  interested  to  know  the   LET threshold. In  the  flow,  we  are  going  to  build  a  DOE  with  JMP,  a  Full  Factorial  DOE.  We  are  going  to  input  this  DOE  in  the  TCAD  Sentaurus.  We  are  going  to  run  our  simulations.  We  are  going  to  take  the  output,   Vhold and  LET threshold.  We  put  all  results  and  input  in   JMP.  We  are  going  to  screen  data  and  build  a  model  with  JMP. Now,  let's  go  to  a JMP  data  table  that  we  use  to  study  the  feasibility  of  using  an  analytical  model.  This  is  the  table  with  our  four  input  and  two  output  here,  and  the  full  DOE,  one  value  per  color.  To  input  here,  we  can  see  a  preview  of  what  we  have  in  each  distribution.  To  have  a  better  display  and  have  a  better  exploration  of  data,  I  begin  by  putting  distribution  of  input  and  output.  We  can  check  that  for  each  input,  we  have  the  same  number  at  each  value  in  each  input,  so  it  is  okay. We  can  go  at  the  end,  and  we  can  see  that  for  the  output,  we  have  something  to  note  quickly,  that  the  V hold  is  inferior  to  the  supply  of  the  circuit,  which  is  1.95.  I  can  put  the  limit  on  the  graph.  We  can  have  a  single  event   latchup.  We  have  the  value   of the LET threshold in here.  What  we  see  that  some  value  are   upper that  100.  We  can  highlight  it  to  see  if  we  see  something  special  on  our  other  input.  We  can note  quickly  that  the  Body Ties  for  one  value,  whatever  other  input  value,  are  the  source  of   the LET  threshold   upper of 100.  If  we  can  see  upper  value  for  the  LET  threshold  upper  of  60,  there  is  not  the  same  effect,  maybe,  for  the  EpiThick.  With  this  first  analysis,  we  can  explore  our  data  and  have  an  idea  of  what  we  have. Second  graph  to  be  plotted  for  this  analysis.  It's  a  very  interesting  plot.  Variability/A ttribute Gauge  chart.  We  are  going  to  plot  all  our  input  in   X,  and  in  Y,  our  output.  Now,  we  can  analyze  first  if  we  are  main  effect  or  interaction.  We  can  connect  cell  means,  and  we  can  see  that  there  is  maybe  a  problem  on  the  T CAD  results. It  is  not  a  problem.  We  can  continue  our  analysis  and  study  to  know  if  it  is  feasible  or  not  to  have  the  analytic  model  because  it  is  just  one  dot,  but  we  must  note  what  happened  here.  We  saw  it  quickly  with  this kind of  graph.  If  there  is  a  problem  on  our  foot,  if  we  have  all  our  data  for  each  condition.  We  have  all  data,  but  this  one  is  to  be  analyzed. As  for  the  Vhold,  there  is  no  problem  on  our  output.  We  can  see  that  what  we  know  of   the  physics,  that  when  the  spacing  between  the  two  wells  is  higher,  the  Vhold  is  increasing.  This  result  is  consistent.   If  we  see  at  the  BodyT ie,  the  Body Tie  is  increasing,  the  Vhold is  decreasing.  That's  what  we  know,  too. Here  we  can  see  what  we  have  already  remarked  on  the  previous  analysis.  Here,  for  this  value  of  Body Tie,  for  the   LET threshold,  we  are  at  a  value  of  100  mega- electronvolt.  That's  what  we  already  noted.  We  have  a  trend  that  when  the  [inaudible 00:12:07]   is  increasing,  the   LET threshold is  increasing,  and  it  is  in  agree  what  we  know.  It  is  interesting.   We  can  say  that  with  this  graph  on  Vhold ,  we  have  a  main  effect  of  the   [inaudible 00:12:29] . We  can  keep  the  same  graph,  same  analyzing,  but  by  changing  the  order  of  the  input.  Re call , better.  Remove  this  input,  recall,  and  we  can  have  the  graph.  What  we  can  see  on  this  graph,  this  representation,  by  just  changing  the  order  of  the  input,  that  the  EpiD ose,  for  some  value  of  other  input  have  no  effect  for  some  condition  here.  So  we  can  deduce  there  is  interaction  between  inputs.  For  LET threshold , an  interaction  slightly  here  and  no  effect  for  other  value. Now,  we  have  checked,  we  have  no  problem  on  our  data.  We  have  seen  we  have  some  main  factors.  We  can,  by  curiosity,  use  another  tool.  It  is  a p artition  to  know  which  input  is  the  first  to  appear.  Then  on  the   LET threshold,  what  we  see,  so  Body Tie  first,  that's  what  we  have  seen,  and  the  EpiThick.  We  can  continue  by   [inaudible 00:14:24]  BodyT ie,  too.  Body Tie  is  very  important.  As  for   Vhold,  the  BodyT ie  is   important.  A fter  the  spacing is… Now, we  have  a  more  accurate  idea  of  what  we  have  in  our  data,  we  can  go  and  build  our  model.  We  put  our  input.  I  don't  know  what  I  have  done.  I  remove our  input.  We  have  run  a  Full  Factorial DOE .  We  try  this  model.  We  put  our  output,  and  we  can  run  it.  We  can  run  the  model.  We  can  see  that  I  missed  something  here.  I  already  prepared  something.  Fit  Model,  EpiDose ,  Macros,  Full  Factorial.  Okay, run.  It's  better  now.  Or  is  something  wrong? I  prefer  this  presentation  with  this  plot,  Actual  by  Predicted  Plot.  In  this  plot,  we  can  see  that  the  model  is  not  really  satisfying  because  we  have  a  lot  of  dots  far  from  the  red  area,  even  if  the  R-s quared  is  at  0.81,  it's not fully  satisfying. For  the  Vhold,  we  have  the  same  remark.  We  can  have  a  look  at  the  Profiler.  The  Profiler  can  show  us  the  good  trend,  but  we  are  not  going  to  have  a  long  time  on  this.  We  are  going  to  look  at  another  model  to  know  if  we  can  have  an  accurate  model  than  this  one.  We  are  going  to  try  another  one.  Not  so  far.  We  are  going  to  take  our  input,  and  we'll  try  Response  Surface,  and our  output,  and we  run  it.  We  are  going   to  remove EpiDose  because  there  is  no  effect. We  can  see  that  this  model  is  better,  the  R-s quare  is  better.  The  V hold  is  a  good  model.  Now,  we  have  a  look  at  the  Profiler.  The  Profiler, the trends  are  good  for  all  parameter.  We  note  that  there  is  maybe  something  to  do  because  we  see  in  this  part  of  the  curve,  the  LET threshold  value  are  upper,  and  it  is  not  in  agree  with  the  physics  and  what  we  know,  of  course.  As  I  have  already  said,  it  is  a  first  try.  We  need  to  work  and  work  as  we've  done  on  the  inputs  and  accuracy  of  the   LET threshold to  have  better  results  here.  You  can  see  now  in  the  EpiDose,  we  have  quite  no  variation,  a  little  variation  here. However,  this kind of  graph  is  very  interesting  for  us  because  when  we  want  to  make a  radiation  tolerant  circuit,  we  can  use  it  to  help  us  by  given  a  set  of  value.  We  can  use  here  the  Desirability  Function.  Now,  we  are  going  to  set  our desirabilities.  Here,  we  want  to  be   upper than  60 as  a  criteria  of  ESA.  I  put  60,  80,  and  100,  match  target.  Here,  for  the  Vhold,  we  want  to  maximize  it.  I  put  this  value,  2, 2.05.  Now,  I'm  going  to  maximize  desirability .  Of  course,  it  works. And n ow, not.   Match  target.   Target .  Now,  maximize  this  desirability.  Now,  we  have  a  set  of  value  we  wish  because  here,  we  have  a  range  of  value  in  the  Body Ties here.  We  can  play  with  it.  Here,  we  can  play  here  with  the  spacing. The  other  representation  and  the  tool  we  like  using  in  JMP  is  the  Contour  Profiler  with  some  fixed  parameter.  We  can  have  a  contour  plot  here.  If  you  want  to  change  a  design  value,  I  put  in  axis  the  Body Tie  and  the  Spacing _AC.  We  are  going  to  put  our  famous  value  of  60  for  the  LET,  the  Lo L imit  at  60.  To  have  a  LET  upper  than  60,  we  know  we  can  have  a  value  of  Body Tie  up  to,  if  we  take  the  cross  here,  about  6.5.  We  can  use  the  range  of   the  spacing. This  tool,  this  representation,  is  very  interesting  for  us  to  make  our  product  radiation  tolerant.  This  is  not  the  last  model  we  implement,  but  I  can  show  you  that  in  the  red  triangle,  we  can  save  the  prediction  formula  in  the  table  so  that  after,  we  can  take  it  and  encapsulate  it  if  we  want. Okay,  that's  all  for  me  on  JMP.  Let's  go  back  to  the  presentation.  With  this  method,  we  built  another  model,  this  one  using  a  neural  network.  The  prediction  obtained  with  this  model  was  compared  to a  experiment  on  circuit.  We  can  see  that  when  the  experiment  shows  there  is  no  single  event   latchup,  the  prediction  by  the  SELEST,  the  internal  tool,  said  the  same  thing,  there  is  no   latchup. When  experimentally  there  is  a  latchup,  SELEST,  even  if  there  is  a  difference  between  experimental  and  prediction,  show  there  is  a  latchup.  For  us,  it  was  a  good  result.  Not  so  accurate.  That's  why  the  work  continue  on  this  model  to  have  an  accurate  model.  That's  why  we  continue  working  on  DOE.  We  do  it  per  technology  node  for  having  a  better  accuracy  of  this  model.  Okay,  thank  you  for  your  attention.
It is common to need to compare two populations with only a sample of each population. Statistical inference can help the comparison. Our presentation is about inference involving two hypotheses: the alternative hypothesis and the null hypothesis. Sometimes the goal is to provide sufficient evidence to decide that there is a significant difference between two populations. The goal at other times is to provide sufficient evidence that there is significant equivalence, non-inferiority, or superiority between two populations. These two situations require different tests. We will review these situations, appropriate hypotheses, and appropriate tests using common examples. Another common comparison is between two measurements of the same quantity. Our presentation will focus instead on the Method Comparison protocol for chemical and biological assays used by pharmaceutical and biotechnology development and manufacturing. We will present two methods that are available in JMP® 17 to assess the accuracy of a new test method against an established reference method. One method is known as Deming regression or Fit Orthogonal in JMP. The second method is known as Passing-Bablok regression. We will review the background of assessing accuracy and the unique nature of data from method comparisons, and demonstrate both regression methods with examples.     Hello, and  welcome  to  our  presentation,  Approaches  to  Comparations  with   JMP® 17.  My  name  is  Mark  Bailey.  I'm  a  senior  Analytics  Software  Tester,  and  my  co- presenter  today  is  Jianfeng  Ding,  a  Senior  Research  Statistician  Developer.  Before  we  get  into  the  new  features,  we're  going  to  take  a  moment  to  make  sure  that  everyone  has  the  proper  background  to  appreciate  these  new  methods. This  has  to  do  with  using  statistical  inference  when  we're  trying  to  compare  two  populations.  This  is  a  very  common  task,  and  the  comparison  usually  leads  to  a  decision  between  two  ideas  about  these  populations.  If  we  could  observe  the  populations  in  their  entirety,  we  wouldn't  need  statistics,  but  that's  not  usually  the  case.  So, we  have  to  work  with  samples  from  the  populations.  Statistical  inference  can  provide  some  really  valuable  information  about  those  samples. In  particular,  is  there  sufficient  evidence  to  reject  one  idea  about  the  two  populations?  So  a  clear  statement  of  these  ideas  or  hypotheses  is  essential  to  making  the  correct  choice  for  the  test  and  also  for  the  correct  interpretation.  So  let's  talk  a  little  bit  about  the  ideas  or  hypotheses  that  are  part  of  these  statistical  tests.  The  alternative  and  null  hypotheses,  as  they're  known,  represent  mutually  exclusive  statements  about  these  populations,  and  no  other  hypothesis  is  possible. For  example,  one  statement  might  be  for  population  A  and  population  B  that  their  means  are  equal.  The  other  idea  is  that  they're  not  equal.  Those  two  ideas  or  hypotheses  are  mutually  exclusive  and  no  other  hypothesis  is  possible.  What's  the  role  of  these  two  ideas?  The  alternative  hypothesis  states  the  conclusion  that  we  would  like  to  claim.  It  represents  the  populations  and  it  will  require  sufficient  evidence  in  the  data  to  overthrow  the  other  hypothesis. The  other  one  is  called  the  null  hypothesis.  It  states  the  opposing  conclusion  that  must  be  overcome  by  strong  evidence.  It  serves  as  a  reference  for  this  comparison.  It's  assumed  to  be  true.  Now,  currently,  there's  somewhat  of  a  misunderstanding  in  hypothesis  testing  in  that  it  only  thinks  about  comparison  in  one  direction.  That's  because  historically,  this  is  the  way  it  was  presented  in  training. That  most  often  taught  test  is  used  to  demonstrate  that  there's  a  difference  between  two  populations.   The  resulting  lack  of  understanding  can  lead  to  a  misuse  of  these  tests.   To  be  clear,  the  choice  of  the  test  is  not  a  matter  of  what  data  is  collected  or  how  the  data  is  collected.  It's  entirely  about  the  stated  hypotheses  for  the  purpose  of  your  comparison.  Let's  look  at  these  two  possibilities.  Let's  say  that  the  goal  of  our  comparison  is  to  demonstrate  a  difference. We  want  to  explicitly  state  these  ideas  to  make  sure  the  test  is  clear.  In  the  first  example,  I  want  to  demonstrate  that  a  temperature  change  causes  a  new  outcome.  That  is,  there  is  a  difference.  We'd  like  to  claim  that  the  new  level  of  the  response  will  result  from  a  change  in  the  process  temperature.  Perhaps  we  expect  a  higher  yield  or  we'd  like  to  show  more  stability.  A  design  experiment  is  used  to  randomly  sample  from  the  population  for  a  low- temperature  condition  and  from  a  population  of  a  high- temperature  condition. The  two  hypotheses  for  this  test,  the  null  hypothesis  states  that  the  temperature  does  not  affect  the  outcome.  Remember,  it's  our  reference  and  we  assume  it  to  be  true.  The  alternative  hypothesis  is  our  idea.  Temperature  affects  the  outcome,  but  we  can  decide  this  only  if  the  evidence  is  strong  enough  to  reject  the  null  hypothesis.  Now,  let's  change  that.  Let's  reverse  that.  Let's  say  that  in  this  comparison,  I  want  to  demonstrate  equivalence.  In  the  second  example,  I  want  to  show  that  a  temperature  change  does  not  cause  a  change  in  the  outcome. That  is,  the  outcome,  either  way,  is  equivalent.  We  want  to  claim  that  a  planned  change  in  the  process  temperature  will  improve  the  yield  but  not  affect  the  level  of  an  impurity  of  the  product.  We  design  the  same  experiment,  collect  the  same  samples,  but  now  our  hypotheses  are  switched.  The  null  hypothesis  is  that  the  temperature  affects  the  outcome.  It's  our  new  reference,  we  still  assume  it  to  be  true,  while  the  alternative  hypothesis  states  that  the  temperature  does  not  affect  the  outcome,  the  impurity  level. But  we  can  make  that  claim  only  if  the  evidence  is  strong  enough  to  reject  the  null.  So do  I  test  for  a  difference  or  for  equivalence?  The  key  is  how  you  state  your  hypotheses.  These  two  examples,  I  think  you  can  see,  use  identical  data  but  different  tests.  The  choice  of  the  test  is  not  about  the  data.  It's  about  the  claim  that  we  want  to  make  and  how  we  state  that  properly  in  the  hypothesis.  Remember  that  statistical  tests  are  unidirectional.  That  is,  we  can  reject  a  null  or  not.  If  the  test  is  to  reject  a  null  hypothesis  with  a  high  probability  when  it's  false. Now,  let's  get  to  the  new  features  in   JMP® 17.  Jianfeng  will  now  present  the  equivalence  tests, and  when  she's  finished,  I'll  present  method  comparison. Now,  I'm  going  to  share  my  screen.  Hello,  my  name  is   Jianfeng Ding.  I'm  a  Research  Statistic  Developer  at  JMP® 17.  In  this  video,  I'm  going  to  talk  about  the  Equivalence,  Noninferiority,  and  Superiority  Test  in   JMP® 17.  The  basic  hypothesis  test  on  the  left  is  a  test  that  most  quality  professionals  are  familiar  with.  It  is  often  used  to  compare  two  or  more  groups  of  data  to  determine  whether  they  are  statistically  different. The  parameter  theta  can  be  a  mean  response  for  continuous  outcome  and  a  proportion  when  the  outcome  variable  is  binary.  Theta  T  represents  response  from  treatment  group  and  theta  zero  represents  the  response  from  a  control  group.  There  are  three  types  of  the  basic  hypothesis  test.  The  first  one  is   two-sided  test  and  the  rest  are  one- sided  tests.  If  you  look  at  the   two-sided  test  on  the  left,  the  no  hypothesis  is  that  the  treatment  means  are  same  and  the  alternative  hypothesis  is  that  the  treatment  means  are  different. Sometimes  we  really  need  to  establish  that  things  are  substantially  the  same,  and  the  machinery  to  do  that  is  called  an  equivalence  test.  An  equivalent  test  is  to  show  the  difference  in  theta  T,  theta  zero  is  within  a  pre- specified  margin  delta  and  allow  us  to  conclude  equivalence  with  a  specified  confidence  level.  If  you  look  at  the  equivalence  test,  the  no  hypothesis  is  that  the  treatment  means  are  different  and  the  alternative  hypothesis  is  that  the  treatment  means  are  within  a  fixed  delta  of  one  another. This  is  different  from  the  two- sided  hypothesis  test  on  the  left.  Another  alternative  testing  scenario  is  the  noninferiority  test  which  aims  to  demonstrate  that  results  are  not  substantially  worse.  There  is  also  a  testing  scenario  called  superiority  testing  that  is  similar  to  noninferiority  testing,  except  that  the  goal  is  to  demonstrate  that  results  are  substantially  better.  There  are  five  different  types  of  equivalence  type  tests.  Depend  on  the  situation.  When  should  we  use  this  test?  Will  be  discussed  next. These  tests  are  very  important  in  industry,  especially  in  the  biotech  and  pharmaceutical  industry.  Here  are  some  examples.  If  the  goal  is  to  show  that  the  new  treatment  does  not  differ  significantly  from  the  standard  by  more  than  some  small  margin,  then  equivalence  test  should  be  used.  For  example,  a  generic  drug  that  is  less  expensive  and  cause  few  side  effects  than  a  popular  name- brand  drug.  You  would  like  to  prove  it  has  same  efficacy  as  the  name- branded  one.  The  typical  goal  in  noninferiority  testing  is  to  conclude  that  a  new  treatment  process  or  product  is  not  significantly  worse  than  the  standard  one. For  example,  a  new  manufacturing  process is  faster.  You  would  make  sure  it  creates  no  more  product  defects  than  the  standard  process.  A  superiority  test  try  to  prove  that  the  new  treatment  is  substantially better  than  the  standard  one.  For  example,  a  new  fertilizer  has  been  developed  with  several  improvements.  The  researchers  want  to  show  that  the  new  fertilizer  is  better  than  the  current  fertilizer.  How  to  set  up  the  hypothesis?  The  graph  on  the  left  summarizes  five  different  type  of  equivalence  type  tests  very  nicely. This  graph  is  created  by  our   SAS/STAT®  colleagues,  John  Castelloe,  and  Donna  Watts.  You  can  find  their  white  paper  on  this  on  the  web.  Choosing  which  test  depend  on  the  situation.  For  each  of  this  situation,  the  region  that  we  are  trying  to  establish  with  the  test  is  shown  in  blue.  For  equivalence  analysis,  you  can  construct  an  equivalence  region  with  upper  bound  theta  zero  plus  delta  and  lower  bound  theta  zero  minus  delta.  You  can  conduct  an  equivalence  test  by  checking  whether  the  confidence  interval  of  theta  lies  in  entirely  in  the  blue  equivalence  region. Likewise,  you  can  conduct  a  noninferiority  test  by  checking  whether  the  confidence  interval  of  theta  lies  entirely  above  the  lower  bound  if  larger  theta  is  better  or  below  the  upper  bound  if  smaller  theta  is  better.  These  tests  are  available  in   JMP® 17 in  one  way  for  comparing  normal  means  and  in  contingency  for  comparing  response  rates.  The  graphical  user  interface  of  equivalence  test  launch  dialogs  makes  it  easy  for  you  to  find  the  type  of  test  that  correspond  what  you  are  trying  to  establish. A  forest  plot  in  the  report  summarize  the  comparison  very  nicely  and  makes  it  easy  for  you  to  interpret  the  results.  Next,  I'm  going  to  do  a  demo  about  equivalent  test,  superiority  test,  and  noninferiority  tests.  I'm  going  to  use  a  data  set  called  the  Drug  Measurement  that  is  in  the  JMP  sample  data  library.  Twelve  different  subjects  were  given  three  different  drugs  A,  B,  and  C,  and  continuous  measurement  were  made.  I  first  launched  Fit  Y  by  X  and  put  the  measurement  as  Y  and  drug  type  as  X  factor. This  will  bring  the  one- way  analysis.  Under  the  red  triangle  menu,  let's  first  find  equivalent  test  and  there  are  the  two  options,  means,  and  standard  deviations.  We're  going  to  focus  on  means  for  this  example.  This  will  bring  the  equivalence  launch  dialog.  In  this  section,  you  can  choose  which  test  you  would  like  to  conduct,  and  the  graph  represents  the  choice  of  the  selected  test.  For  superiority  and  noninferiority  test,  there  are  two  scenarios. One  is  large  difference  is  better.  Another  is  small  difference  is  better.  Choose  which  one  depend  on  the  situation.  You  need  to  specify  the  delta  or  the  margin  for  the  test.  You  also  need  to  specify  the  confidence  level  alpha  for  the  test.  You  can  either  choose  the  Pooled  Variance  or  Unequaled  variance  to  run  the  test.  For  this  example,  we  run  the  equivalent  test  first  and  we  specify  three  as  a  difference.  We  are  going  to  do  the  equivalent  test  for  all  the  pairs.  Click  okay  and  it  will  bring  the  equivalent  test  result. On  the  top,  it  is  the  statistical  detail  for  the  equivalence  test,  and  the  bottom  it's  a  forest  plot.  You  notice  there  are  two  regions.  The  blue  region  are  the  equivalence  region  and  the  red  are  the  non- equivalence  region.  The  lines  here  represent  the  confidence  interval  of  the  mean  difference  between  two  groups.  If  we  look  at  this  line,  this  is  the  confidence  interval  of  the  mean  difference  between  drug  A  and  C,  and  you  see  this  line  is  completely  contained  inside  this  blue  region.   We  look  at  the  P  value  of  the  equivalence  test  that  is 0.02  which  is  smaller  than 0.05.  So at  this  5 %  significance  level,  we  can  declare  that  drug  A  and  C  are  equivalent. But  when  you  look  at  the  confidence  interval  of  a  mean  difference  between  drug  A  and  B,  and  drug  B  and  C,  they  are  lined  beyond  this  blue  region.  So we  conclude  at  the  5 %  significance  level  drug  A  and  B,  and  drug  B  and  C,  we  cannot  conclude  they  are  equivalent.  Next,  if  we  assume  drug  C  is  a  standard  drug  and  you  would  like  to  find  out  if  drug  A  and  B  are  better  than  drug  C,  that  make  us  want  to  do  a  superiority  test.  Let  me  close  this  outline  note  for  now  and  we  launch  the  equivalent  test  again  and  we  click  Means. This  time  we're  going  to  run  superiority  test.  We  click  the  superiority  test  and  we  prefer  large  difference  and  we  specify  0.4  as  our  margin.  This  time  we  need  to  set  drug  C  as  our  control  group  and  click  OK.  This  will  bring  the  superiority  test.  From  this  forest  plot  you  can  see  the  confidence  interval  of  a  mean  difference  between  drug  B  and  C  is  completely  contained  inside  this  blue  region.  So  we  conclude  that  at  the  5 %  confidence  level,  we  can  declare  that  the  B  is  superior  to  drug  C.  But  we  cannot  make  a  same  conclusion  to  drug  A  and  C.  This  concludes  my  first  example. The  next  example  will  show  how  to  conduct a  noninferiority  test  for  the  relative  risk  between  two  proportions.  Let  me  open  the  data  table.  A  randomized  trial  is  to  compare  drug  FIDAX  as  an  alternative  to  drug  VENCO  for  the  treatment  of  colon  infections.  The  two  drugs  have  similar  efficacy  and  safety.  Two  hundred  twenty- one  out  of  225  patients  treated  with  FIDAX  achieved  clinical  Q  by  the  end  of  the  study,  compared  to   223 out  of   257  patients  treated  with   VENCO. We  launch  Fit  Y  by  X  again  and  we  plug  in  Q  as  Y  and  drug  as  X  factor and  account  as  a  frequency  and  click  OK.  Will  bring  the  contingency  analysis.  From  the  Likelihood  Ratio  test, Pearson  test,  and  Fisher's  Exactly  test,  they  all  indicate  that  there  is  no  big  difference  between  these  two  drugs.  But  we  would  like  to  find  out  if  drug  FIDAX  is  not  inferior  to  drug  VENCO.  We  go  to  the  up  red  triangle  and  bring  the  equivalent  test.  There  are  two  options.  One  is  the  risk  difference  and  one  is  relative  risk.  For  this  example,  we  choose  relative  risk. Again,  bring  this  equivalent  test  launch  dialog.  For  this  example,  we're  going  to  run  noninferiority  test.  A gain,  we  prefer  large  ratios.   We  specify  0.9  as  our  margin  and  we  care  about  the  treatment  effect.  So  we  should  choose  Yes  and  then  we  click  OK.  So  this  bring  the  noninferiority  test.  From  the  forest  plot,  we  can  see  the  confidence  interval  of  the  relative  risk  is  completely  contained  in  this  blue  region,  and  the  P  value  for  the  noninferiority  test  is  very  small.  So  we  conclude  at  the  5 %  significant  level,  the  drug  FIDAX  is  not  inferior  to  drug  VENCO.   This  concludes  my  talk  and  I  will  give  back  to  Mark.  I  need  to  stop this  share. Thank  you,  Jianfeng.  Now  in  the  last  part  of  our  presentation,  I'm  going  to  talk  about  another  comparison  where  we  want  to  compare  the  results  or  measurements  from  two  different  methods  of  measuring  some  quantity.  We  assume  that  there's  a  standard  method  that  already  exists.  It  has  been  validated.  We  can  use  it  to  measure  the  level  of  some  quantity.  That  might  be  the  temperature  or  the  potency  of  a  drug.  But  for  some  reason,  we've  developed  a  new  method  for  the  same  result. We  must  compare  its  performance  to  the  standard  method  before  we  use  it.   This  is  a  long  standing  issue.  This  comparison  has  been  codified  for  a  long  time  by  numerous  international  organizations.  I've  listed  a  few  of  them  on  this  slide.  So  this  is  a  very  well  studied  and  an  established  comparison.   In  this  case,  we're  going  to  compare  to  identity.  We're  going  to  compare  these  two  methods  where  ideally  the  test  method  would  give  us  the  same  value  as  the  standard  method.  So we  plot  the  data  using  a  scatter plot.  We  have  the  test  method  on  the  vertical  axis  and  the  standard  method  result  on  the  horizontal  axis.  We  can  even  plot  the  identity  line  where  Y  equals  X  for  reference. Ideally,  we  would  get  the  result  from  both  methods,  but  that  won't  happen  because  of  measurement  error  in  both  methods.  We'll  use  regression  analysis  to  determine  the  best  fit  for  this  line  where  we  have  the  test  method  versus  the  standard  method.  Then  the  estimated  parameters  for  our  model  can  be  compared  to  the  identity  line.  The  null  hypothesis,  we  start  with  the  idea  that  they  are  not  the  same.  So  the  intercept  of  this  line  is  not  zero  or  the  slope  is  not  one  or  possibly  both.  In  other  words,  the  results  are  not  equivalent. The  alternative,  which  we  would  like  to  claim  is  that  they  are  equivalent.   So  there  we  state  that  the  intercept  should  be  zero  and  the  slope  should  be  one.   To  make  this  comparison  using  regression,  we  have  to  postulate  a  model.  In  this  case,  it's  a  simple  linear  regression  model.  We  have  a  constant  term  A,  a  proportional  term  B  times  X.  We're  going  to  estimate  those  parameters  A  and  B, and  use  our  hypotheses  to  decide.  We  also  have  a  term  epsilon,  and  that  represents  the  measurement  error,  the  random  variation.   Using  linear  regression,  we  assume  that  the  Y  and  the  X  are  linearly  related. We  assume  that  the  statistical  errors  that  epsilon  are  in  Y,  not  in  X.  We  also  assume  that  those  errors  are  distributed  in  a  way  that  are  independent  of  the  response.  In  other  words,  the  statistical  error  is  the  same  across  the  entire  range  of  this  method.  Also,  that  no  data  exert  any  excessive  influence  on  these  estimates.  Well,  in  method  comparison,  we  usually  violate  these  assumptions.  First  of  all,  there  is  measurement  error  in  the  standard  method  as  well. Also,  often  the  errors  are  not  constant.  That  is,  we  observe  a  constant  coefficient  of  variation  but  not  a  constant  standard  deviation.   Outliers  are  present that  can  strongly  influence  the  estimation.  Other  regression  methods  are  required  in  such  a  case.  The  Deming  regression  simultaneously  minimizes  the  least  square  error  in  both  Y  and  X.  That's  appropriate  for  this  case.  The  Passing- Bablock  regression  is   a  nonparametric  method  that's  based  on  the  median  of  all  possible  pairwise  slopes. It's  resistant  to  outliers  and  nonconstant  errors.  Let's  talk  about  each  of  these  briefly.  The  Deming  regression  is  provided  in  the  Bivariate  platform  through  the  Fit  Orthogonal  Command.  This  has  been  available  in  JMP  for  many  years.  The  Deming  regression  can  estimate  the  errors  in  Y  and  X,  assume  that  the  errors  in  Y  and  X  are  equal,  or  use  a  given  ratio  of  Y  to  X  error. Passing- Bablock  regression  is  new.   JMP® 17  introduced  this  method  in  the  Bivariate  platform  through  the  Fit   Passing-Bablock  command.  This  command  also  includes  checks  for  the  assumptions  that  the  measurements  are  highly  positively  correlated  and  exhibit  a  linear  relationship.  Method  comparison  often  includes  a  comparison  by  difference.  The  Bland- Altman  analysis  compares  the  pairwise  differences  as  Y  to  the  pairwise  means  as  X  to  assess  bias  between  the  two  values. The  results  are  presented  in  a  scatterplot  of  Y  versus  X  for  your  examination  and  also  to  identify  any  anomalies.  This  occurs  in  the  Matched  Pairs  platform. The  Matched  Pairs  platform  has  been  part  of  JMP  for  many  years  as  well,  but  the   Bland-Altman  test  is  a  new  addition.  The  report  also  presents  the  hypothesis  test.  Now  I'd  like  to  demonstrate  these  two  methods.  As  I  said,  Deming  regression  has  been  available  for  a  long  time,  but  for  completeness'  sake,  I'm  going  to  demonstrate  it  here  alongside  the  new  methods. I  select  from  the  Analyze  menu,  Fit  Y  by  X.  I'm  going  to  compare  test   Method 1  to  my  standard  method.  The  standard  method  goes  in  the  X  role  and  the  new  test  method  goes  in  the  Y  role.  You  could  evaluate  more  than  one  test  method  at  the  same  time.  Here's  my  plot.  Initially,  I  see  a  scatter  plot.  I  expect  these  two  to  agree  very  well.  I  expect  to  see  that  they  follow  this  diagonal  path,  that  they're  linear,  and  so  forth.  I'll  click  the  red  triangle  next  to  Bivariate  and  select  Fit  Orthogonal. In  this  case,  I  don't  really  know  that  the  variances  are  equal  or  I  don't  have  any  prior  information  that  I  could  specify  a  ratio.  So I'll  have  JMP  estimate  the  errors  in  both.  I'll  use  the  first  option  here.  Now  we  have  the  fitted  line  using  Deming  regression.  Below  that  we  have  the  report.  The  report  includes  an  estimate  of  the  intercept.  We  can  see  it's  small  and  close  to  zero.  We  have  an  estimate  for  the  slope. Using  the  confidence  interval,  we  see  that  it  includes  one,  which  we  would  expect  if  the  test  method  agrees  with  the  standard.  That's  Deming  regression.  Now,  we're  going  to  take  a  look  at  the  new  methods   Passing-Bablock  regression  in  Blad- Altman.  Same  start,  select  Analyze,  and  then  select  Fit  Y  by  X.  I'm  going  to  use  the  Recall  button  here.  I  want  to  compare   Method 1  to  the  standard,  but  I'm  going  to  use  a  new  regression  technique. I'll  click  on  the  red  triangle  and  select  Fit   Passing-Bablock.   There's  actually  two  lines  here.  There's  a  red  line  that  represents  the  best- fit  line  using  the   Passing-Bablock  regression.  But  there's  also,  for  our  reference,  a  line  that  represents  where  y  equals  x.  It's  hard  to  see.  I'm  going  to  use  the  magnifier  tool  to  magnify  a  few  times.  Now  you  can  see  that  there  are,  in  fact,  two  separate  lines.  One  is  the  identity  and  one  is  the  fit,  but  they  overlap  quite  a  bit.  These  are  quite  similar. In  the  numerical  reports,  first  I  have  a  test  for  the  high  positive  correlation.  We're  using  Kendall's  Tau,  and  we  can  see  that  it  is  highly  significant.  We  reject  the  idea  that  they're  not  strongly  correlated.  Next,  we  have  a  test  of  linearity.  Here,  this  test  assumes  that  they're  linear, and w e're  looking  for  strong  evidence  against  that.  But  we  have  a  very  high  P  value  here,  so  we  do  not  reject  the  assumption  that  they're  linear.  Finally,  we  have  the  parameter  estimates  by  using  Passing -Bablock  regression,  we  have  the  point  estimate,  we  have  the  interval  estimate. For  the  intercept,  that  interval  includes  zero,  so  we  cannot  reject  an  intercept  of  zero.  We have  the  slope  is  contained  within  an  interval  that  includes  one.  Similarly,  we  can't  reject  that  the  slope  of  that  line  is  equal  to  one.  Let's  say  we'd  also  like  to  compare  these  two  methods  by  difference.  To  do  that,  I  click  on  the  red  triangle  for  the  options  of  the  Fit   Passing-Bablock  results, and  here  we  see  the  command   Bland-Altman  analysis.  It  takes  all  the  information  here  and  launches  match  pairs  with  the  additional  information  in  the   Bland-Altman  analysis. The  plot  is  showing  us  on  the  Y  axis  the  pairwise  difference  between  the  method  one  and  the  standard  method,  and  it's  plotted  against  on  the  horizontal  axis  is  the  mean  of  those  two  values.  The   Bland-Altman  analysis  is  helpful  because  it  gives  us  an  idea  about  the  bias.  So  here  I  have  an  estimate  of  the  bias  is  negative  0.113,  but  we  can  see  that  the  interval  estimate  of  the  bias  includes  zero,  so  we  can't  reject  the  idea  that  the  bias  is  equal  to  zero  and  so  on.  Now  we  have  in   JMP® 17,  a  much  more  complete  set  of  tools  for  comparing  different  test  methods.   That  concludes  our  presentation.  Thank  you.
Measurement System Analysis is a methodical approach for identifying and managing the sources of variation that can influence the measurement system. At the top level, this type of analysis enables the quantification of the measurement system variation (MV) present in the Total Observed Variation (TV) of a process or system. It separates it from the Process (part-to-part) Variation (PV). The measurement variation can be broken down further into Precision and Accuracy. For the Precision component, we follow a sequential method for Continuous Data to determine the adequacy of a measurement system. A Type 1 Gauge Study examines the accuracy and consistency of the measurement device, and a Full Gauge R&R Study explores the Repeatability and Reproducibility of the entire measurement system. This poster will demonstrate the practical application of these tools in JMP 17, as well as making reference to the new MSA Design tool to support the initial phase of measurement system analysis (data collection plan).   Attached are the JMP journal and JMP data tables (including reports) presented in the video.   Hope you enjoy !       Welcome  to  this  post  of  presentation  in  the  applications  of  MSA  platform  tools  in  JMP  17.  Before  we  go  into  it,  I  just  would  like  to  give  a  brief  description  about  Measurement  System  Analysis,  short  for  MSA.  So  when  we  look  at  the  total  observed  variation  in  the  process,  we  use  measurement  system  analysis  to  try  to  identify  and  manage  the  sources  of  variation  that  can  influence  the  measurement  system  being  used.  It's  a  combination  of  measurement  devices,  people,  procedures,  standards,  etc.  So  we  can  decompose  that  total  observed  variation  into  two  other  components,  the  process  component,  or  sometimes  called  part  to  part  variation.  But  what  we're  really  interested  with  measurement  system  analysis  is  in  this  measurement  system  variation  component. Looking  at  the  measurement  error  that  is  associated  with  this  measurement  system  variation,  that  can  be  broken  down  into  other  components,  precision  and  accuracy.  For  the  purpose  of  this  poster,  I  will  concentrate  on  the  tools  that  enable  us  to  identify  sources  of  variation  within  precision  and  specifically  repeatability  and  a  little  bit  about  bias  component  under  the  accuracy  component. What  we  tend  to  do  when  looking  at  measurement  system  analysis,  there  are  several  methods  involved,  normally  and  particularly  for  continuous  data.  We  start  by  examining  the  accuracy  and  consistency  of  the  measurement  device  alone  using  a  technique  called   type 1 gage study.  This  is  sometimes  also  named  as  analyzing  the  pure  repeatability  of  the  system.  So  we  have  one  single  part,  one  single  device  if  the  measurement  system  requires  a  manual  intervention,  we  can  have  one  operator.  But  the  idea  here  is  that  we  start  evaluating  the  pure  repeatability  of  the  system  before  going  into  more  complex  analysis  where  other  sources  of  variation  may  be  part  of  the  measurement  system,  which  essentially  is  the  second  step  around  what  is  called   Full Gage R&R,  which  examines  both  repeatability  and  reproducibility. Last  but  not  least,  we  have  continuous  gage  linearity  and  bias  study,  but  it's  not  going  to  be  covered  specifically  in  this  poster.  So  let's  have  a  look  of  what  this  means  in  terms  of  JMP  17.  In  the  new  version  of  JMP,  we  do  have  a  new  MSA  method,  type  1  gage  study,  that  essentially  is  going  to  help  us  identify  that  initial  phase  of  the  analysis  with  regards  to  the  pure  repeatability  of  the  system.  So  I'll  show  you  a  quick  example  of  that  of  an  output  report  with  the  Type  1  Gage  R&R.  And  what  you  can  see  here  is  that  by  default,  the  report  shows  a  run  chart.  So  this  looks  at  30  repeats  of  the  same  part  using  the  same  device  or  equipment  in  order,  so  this  timeline  really  helps  us  identify  any  special  situation,  any  special  measurements  that  didn't  work  very  well. There  is  a  reference  that  we're  on  the  nominal.  If  you're  using  a  reference  part,  for  example,  we  can  definitely  identify  whether  the  average  of  those  measurements  are  in  line  with  the  reference  part.  That  mean  value  can  be  added  to  the  graph  if  we  want  to.  As  you  can  see,  it's  going  to  be  on  top  of  it.  But  if  I  remove  the  reference  line,  then  you  can  see  average  and  the  reference  are  very  similar  for  this  example. We  also  want  to  have  a  look  at  this   Type 1 Gage R&R study  and  have  a  reference  around  20 %  of  our  tolerance.  What  in  the  type  1  gage  study  we're  doing,  we  are  limiting  the  analysis  to  only  20  %  of  the  total  tolerance  in  order  to  assess  whether  the  pure  repeatability  is  acceptable  or  not.  But  this  specification,  if  you  will,  for  the  type  1  can  be  all  consented  in  the  settings  of  this  tool. It  provides  some  summary  and  capability  statistics,  so  the  normal  reference  location  and  spread  references,  particularly  when  it  comes  to  six  standard  deviations,  the  number  of  measurements  taken  and  the  tolerance.  T hen  here  are  the  two  limits  above  reference  on  the  graph  for  the  20 %  of  the  tolerance,  so  plus  or  minus  10  %  of  that  tolerance. If  you  use  to  the  process  capability  indexes,  what  you  will  see  now  for  capability  of  the  gage,  CG  and  CGK,  they  are  exactly  the  same.  The  biggest  difference  here  is  that  obviously  we're  looking  at  the  capability  of  the  gage  and  assessing  this  variation  with  regards  to,  in  this  case,  the  20 %  of  the  tolerance,  but  suddenly  we  can  have  a  look  at  both  the  variation  relative  to  those  spec  limits  and  also  variation  and  location  for  CGK.  So  this  gives  us  a  summary  of  some  metrics  to  evaluate  the   Type 1 Gage R&R study  results.  There are  some  percentage  being  calculated  as  well  in  terms  of  percentage  of  variation  with  regards  to  repeatability,  but  obviously,  if  you're  using  a  reference  part  that  you  already  know  its  nominal  value,  we  can  evaluate  not  only  the  pure  repeatability  of  the  system  in  this  case,  but  also  the  bias.  So  the  difference  between  the  reference  value  and  the  nominal  value. So that  can  be  actually  added  as  an  additional  test  if  we  want  to,  so  this  is  really  looking  at  the  hypothesis  testing  case  in  terms  of  if  the  bias  is  equal  to  zero  so  either  the  average  and  the  reference  value  are  the  same  or  very  close  to  each  other,  and  as  you  can  see,  the  reported  P  value  there,  in  this  case,  statistically,  there  is  no  significant  difference  between  average  and  reference  value. Another  useful  visualization  within  this  tool  is  the  history  realm,  so  we  can  have  a  look  at  the  distribution  of  the  values  of  those  measurements  taken,  in  this  case,  those  30  measurements  for  reference,  that  can  be  customized  in  your  report  as  we  go  along.  So  very  quickly,  we  just  have  a  great  tool  to  initialize  our  measurement  system  analysis  process  by  now  having  what  is  called  a   Type 1 Gage R&R Study  as  part  of  the  MSA  platform  in  JMP. What  we  sometimes  do  is  we  also  have  a  second  step  before  we  go  into  what  is  called  a   Full Gage R&R  assessment,  both  repeatability  and   [inaudible 00:07:43] ,  which  can  be  called  a  Type  2  Study.  We  have  an  example  here  of  that,  so  the  only  difference  here  is  the  fact  that  in  between  the  30  measurements,  we've  removed  the  part  from  a  potential  holding  fixture  in  between  each  measurement.  As  you  would  expect,  by  doing  that  intermediate  step  in  between  all  the  measurements,  then  we  expect  to  have  more  variation,  and  this  is  what  we  can  see  here  now.  So  not  only  we  have  more  variation,  where  you  can  also  see  that  the  average  value  of  the  readings  are  also  much  lower  than  its  target  location. By  turning  on  the  bias  test,  we  will  be  able  to  see  that  now  the  P  value  when  compared  to  the  Type  1  study  where  we  didn't  remove  the  part  in  between  measurements,  the  part  was  fixed  and  just  measured  30  times  consecutively,  we  now  have  a  low  P  value  showing  that  there  is  significant  difference  between  the  reference  value  and  the  average. T his  can  be  built  into  several  increment  and  sources  of  variation  even  before  we  start  adding  multiple  parts  and  operators  or  equipment  into  this  analysis. But  if  we  do,  JMP  already  had  a  gage  on  our  study  tool  in  previous  versions.  This  is  just  a  quick  example,  what  that  means  in  terms  of  variability  of  the  gage, a nd  in  this  case,  we  use  the  gage  on  our  method  involved.  So  if  you  go  to  Analyze,  Quality  and  Process,  there's  an  updated  version  for  the  Measurement  System  Analysis.  In  the  MSA  method,  we  can  see  now  we  have  the  Type  1  Gage  study  that  I've  used  for  both  Type  1  and  Type  2.  The  only  difference  there  is  that  on  the  report,  the  output  report  for  the  Type  2,  I  just  edited  the  title  and  called  it  Type  2  just  to  differentiate  between  the  two  output  reports.  But  what  we're  seeing  here  for  the   Full Gage R&R  or  the  variability  analysis,  I've  used  the  Gage  R&R  method,  and  this  is  where  we  can  also  decide  the  type  of  model  used,  normally  used  as  cross,  so  we  can  see  all  the  effects  crossed  with  each  other  in  the  analysis,  as  well  as  some  additional  options  are  also  available. But  in  this  report,  I'm  customizing,  in  this  case,  I've  added  some  specification  limits.  Here,  essentially,  we're  not  looking  at  the  reproducibility  of  the  system.  It's  just  another  increment  we  evaluate  repeatability  in  this  case.  We  are  using  not  one,  but  10  parts  in  this  case  and  evaluating  that  variation  for  five  repeats  of  each  part.  As  you  can  see,  in  the  Gage  R&R  report  and  table,  the  reproducibility  component  is  zero  because  we  don't  have  additional  equipment  or  operators  being  evaluated  in  this  analysis,  so  all  the  variation  in  this  study  is  due  to  repeatability.  So  this  is  the  traditional  output  that  you  would  get  from  the  Gage  R&R  method  inside  of  JMP  for  reference. To  finalize  the  new  tools  involved  in  JMP  17  for  the MSA  platform,  what  I  would  like  to  highlight  as  well  is  that  as  part  of  the  planning  phase  for  any  Gage  R&R  study,  it's  important  to  understand  what  is  the  method  utilized,  of  course,  but  also  what  will  be  a  good  method  of  data  collection.  A s  part  of  JMP,  we  now  have  as  part  of  DOE  special  purpose,  a  new  tool  called  MSA  Design,  and  this  enables  adding  factors  like  parts  and  operators.  If  I  quickly  show  adding  three  factors  there,  I  can  identify  what  is  the  MSA  role  involved  in  each  one  of  them,  for  example.  This  is  a  great  opportunity  during  the  planning  phase  to  start  to  come  up  with  the  design  that  will  help  you  doing  the  data  collection  even  before  any  analysis  is  done. For  more  information  about  how  to  utilize  the  MSA  design  feature,  you  can  follow  this  link,  which  will  take  you  to  the  JMP  user  community  video  where  Hyde  Miller,  JMP  systems  engineer,  has  provided  more  information  about  this  tool. Hope  this  was  useful  for  you.  Thank  you  very  much.
The structural equation models (SEM) platform continues to grow and evolve into a more complete and powerful platform. An important feature added to SEM in JMP® Pro 17 is multiple-group analysis (MGA). MGA allows users to test for differences in parameters across populations by enabling the specification of models that can have group-specific estimates or equality constraints on parameters across groups. In this presentation, we will demonstrate the use of MGA and other new features in SEM using real data examples. We start with a simple regression example and then turn to a longitudinal analysis example that showcases the flexibility of MGA. Lastly, we show how survey development can be expedited by a new feature that links Exploratory Factor Analysis to the SEM platform.     Hi,  everyone.  I'm  Laura  Castro- Schilo.  I'm  a  senior  research  statistician  developer  working  with  the   Structural Equation Models  platform.   I'm  really  excited  today  to  show  you  some  of  the  new  features  that  we  have  in  JMP  Pro  17.  One  of  the  big  ones  is  going  to  allow  us  to  explore  group  differences.   We're  going  to  talk  about  that  a  lot  today. Our  plan  for  today  is,  hopefully,  we'll  spend  most  of  the  time  in  a  demo,  really  showing  you  those  new  features.  But  very  briefly,  before  we  get  into  that,  I  want  to  remind  you  what   Structural Equation Modeling  is,  why  you  might  want  to  use  it,  and  then  hopefully  through  the  demo,  I'll  show  you  how  to  use  it. Now,  our  presentation  today  is  not  very  long.   What  I  also  want  to  do  is  show  you share  with  you  some  additional  resources from  previous  discovery  presentations  and  developer  tutorials  where  you'll  learn  a  lot  more  detail  on  how  to  use   SEM. Now,  the  overview  of  new  features  we're  going  to  cover,  we're  going  to  first  talk  about  multiple  group  analysis  and  then  some  improvements  that  we've  done  for  longitudinal  modeling  within  SEM  and  for  survey  development.  After  we  go  over  those,  we'll  go  straight  into  the  demo  and  really  show  you  how  all  of  those  are  used. Structural Equation Modeling  is  a  very  general  analysis  framework  for  investigating  the  associations  between  variables.  Now,  this  is  a  very  broad  definition,  and  that's  purposeful, because   Structural Equation Modeling  is  a  very  broad  technique  where  a  number  of  different  models  can  be  fit. Here  I've  listed  a  few  of  the  models  that  you  could  fit  within   SEM,  but  this  is  not  an  exhaustive  list.  It  really  is  a  a  very  flexible  framework. A  natural  question  that  might  come  to  mind  is,  if  I  can  do  some  of  these  analysis  somewhere  else,  then  why  would  I  want  to  use  a  SEM?   Sometimes  you  might  not  need  to  use  it,  right?   That  would  be  just  fine.  But  there  are  some  circumstances  in  which  SEM  would  be  particularly  helpful.   I've  included  here  a  list  of  what  those  circumstances  might  be. The  first  is  sometimes  you  might  be  interested  in  understanding  the  mechanisms  by  which  things  happen.   This  is  a  circumstance  where  SEM  can  be  very  useful.  Oftentimes  when  you  want  to  understand  mechanisms,  that  means  that  you  have  variables  that  are  both  predictors  and  outcomes. Yet,  not  many  statistical  techniques  allow  you  to  specify  models  where  you  can  have  a  variable  be  both  a  predictor  and  an  outcome.  That's  actually  something  that  is  very  natural  in  SEM.   If  this  is  something  that  you're  working  on,  SEM  could  be  very  helpful  for  you. You  really  want  to  leverage  your  domain  expertise  if  you're  using  SEM.  The  reason  is  because  in  order  to  specify  your  models,  you  really  need  to  think  about  what  are  your  theories?  What  is  it  that  you  know  about  your  data?   You  come  up  with  those  theories,  you  translate  them  into  a  testable  model,  and  then  when  you  fit  your  models,  you  see  whether  or  not  there's  support  for  those  ideas  that  you  have. Now,  a  very  important  use  case  for  SEM  is  when  you're  working  with  variables  that  cannot  be  measured  directly.  So  late- in  variables  are  very  important  in  a  number  of  different  domains.  If  you're  interested,  for  example,  in  looking  at  customer  satisfaction  or  quality  of  a  product  in  the  social sciences.  There's  so  many  late- in  variables,  personality,  intelligence,  those  are  some  of  the  cliche  things  that  you  would  hear.  But  really,   late-in variables  are  all  over  the  place.  If  you  work  with  those,  if  you  have  research  questions  that  entail  late- in  variables,  then  you're  really  going  to  benefit  by  using  SEM. A  somewhat  related  reason  to  use  SEM  is  that  if  you  have  variables  that  have  measurement  error  and  you  actually  want  to  account  for  that  measurement  error,  SEM  can  also  be  very  helpful.   I  say  that  this  is  related  to  the   late-in variables  because  the  way  in  which  we  account  for  the  measurement  error  is  by  specifying   late-in variables  in  SEM.  Measurement  error  can  have  sometimes  unexpected  consequences  on  our  inferences.   It  can  be  quite  useful  to  account  for  it. Another  benefit  of  SEM,  and  this  is  one  thing  that  is  very  practical  really,  is  when  it  comes  to  having  missing  data,  which,  of  course,  are  all  over  the  place,  the  most  popular  estimation  algorithm  for  SEM  handles  missing  data  in  a  seamless  fashion  such  that  the  user  doesn't  really  need  to  do  anything.  Missing  data  are  handled  with  a  cutting  edge  algorithm  and  you  really  don't  have  to  worry  about  it  as  much.   If  you  have  missing  data,  sometimes  I  tell  people  even  if  you  just  have  a  simple  linear  regression,  you  can  benefit  from  using  SEM  just  because  missing  data  are  handled  and  it's  easy. Lastly,  Path  Diagrams  are  a  critical  tool  for   Structural Equation Models.  Those  diagrams  are  very  helpful  because  sometimes  even  the  most  complex  statistical  models  can  be  conveyed  in  a  very  intuitive  fashion  by  relying  on  these  diagrams. In  JMP,  we  use  these  diagrams  to  facilitate  the  specification  of  our  models,  but  also  to  convey  the  results  of  models.   Those  diagrams  can  be  very  helpful  when  you're  presenting  your  results  to  any  type  of  audience,  really. All  right,  so  this  is  just  a  very  brief  list  of  why  you  might  want  to  use  SEM.  I  do  want  to  share  a  link  here  to  a  presentation  that  I  gave  along  with   James Cuffler,  who's  also  in  job  development.   We  did  a  developer  tutorial  where  we  actually  went  into  a  much  more  depth  about  the  reasons  why  you  might  want  to  use  SEM.  If  you  want  to  check  that  out,  I  just  wanted  to  share  this  link  here.  If  you  download  the  slides  from  the  community,  you  don't  have  to  type  this  long  link.  You  can  just  click  on  that  and  check  that  video  out. All  right,  so  how  to  use  SEM.  Again,  I'm  going  to  go  into  a  demo  and  I'll  show  you  how  to  use  SEM,  but  my  demo  is  not  going  to  be  too  focused  on  a  tutorial  type  of  presentation,  mostly  because  of  time  constraints.  We  want  to  keep  this  short  and  sweet. What  I  want  to  do  in  this  slide  is  share  with  you  additional  video  presentations  where  you  can  go  and  learn  more  in  a  tutorial  form  how  to  use   Structural Equation Models  for  a  few  different  case  studies,  basically.  Here,  this  first  video  is  a  link  where  I  covered  how  to  model  survey  data  and  latency  variables.  We  cover  things  like  confirmatory  factor  analysisand  path  analysis  with  and  without  latency  variables. If  you  have  longitudinal  data,  this  video  can  be  quite  helpful.  Here  I  went  over  how  to  fit   late-in growth  curve  models  and  how  to  interpret  those  results.  We'll  do  a  little  bit  of  longitudinal  modeling  here  today  in  the  demo,  but  we  won't  be  able  to be  able  to  go  into  the  details  again  in  a  tutorial  way.   I  definitely  encourage  you  to  watch  that  if  you  are  interested. If  you  don't  have  prior  SEM  experience,  I  very  much  encourage  you  to  watch  this  other  video  where  James  Cuffler  and  I  talked  about  building   Structural Equation Models  in  JMP  Pro.  This  one  is  very  introductory,  and  so  you  might  want  to  start  with  that  one  prior  to  going  to  the  others. Okay,  so  now  it's  time  to  get  into  a  little  overview  of  the  new  features  in  JMP  Pro  17.   Multiple  group  analysis  is  a  feature  that  I've  been  really  looking  forward  to  presenting  on  because  this  is  something  that  extends  all  of  the  models  that  can  be  fit  within  SE M.   It  does  so  by  allowing  us  to  investigate  similarities  and  differences  across  sub-populations.  We  do  this  by  incorporating  a  grouping  variable  into  our  analysis. Now,  the  most  popular  multiple  group  analysis  examples  usually  show  a  grouping  variable  that  has  few  levels.  Things  like  demographic  variables  are  used  very  often.   Indeed,  in  the  demo  I'm  going  to  do,  I'm  also  going  to  use  a  simple  demographic  variable.  But  really,  there's  no  limit  to  how  many  levels  you  really  should  have.  What  really  matters  is  how  many  observations  do  you  have  for  each  level.   You  want  to  have  a  relatively  good  sample  size  for  each  of  those  subgroups. Now,  there  is  a  general  strategy  for  the  analysis.  We're  going  to  see  this  in  practice,  but  I  want  you  to  start  thinking  about  how  this  really  works.  It's  actually  quite  simple.  What  we  do  in  multiple  group  analysis  is  fit  two  models. One  of  those  models  is  going  to  be  a  more  restricted  version  of  the  other.   Once  we  fit  both  of  them,  we'll  be  able  to  do  a  likelihood  ratio  test  or  a  high  square  difference  test  in  order  to  make  an  inference  about  whether  the  restrictions  that  we  imposed  in  one  of  the  models  are  in  fact  tenable.   This  is  how  we  figure  out  whether  there  are  statistically  significant  differences  across  groups. Again,  we'll  see  that  play  out  in  the  demo . In  terms  of  longitudinal  data  analysis,  we've  made  it  a  lot  easier  to  interpret  the  results  from  your  models  by  looking  at  the  model  implied  trajectories  through  a  new  predicted  values  plot.  We  also  have  made  it  a  lot  easier  to  specify  multivariate  growth  curves.   If  you're  familiar  with  these  models,  they  allow  you  to  investigate  the  association  of  multiple  processes  over  time.  They  can  be  very  helpful,  but  it  used  to  be  a  little  tedious  in  terms  of  how  to  specify  those.  Now  we've  done  that  very  easy  and  fast  through  the  use  of  model  shortcuts.   For  some  advanced  applications,  we've  also  made  it  easier  to  define  an  independence  model  based  on  what  users  want  to  have  as  the  independence  model. There  are  also  some  improvements  for  surveys.  This  really  is  mostly  focused  on  streamlining  your  workflow  for  developing  surveys.  Usually  the  analytic  workflow  starts  by  using  exploratory  factor  analysis,  and  then  you  take  those  results  and  confirm  them  with  an  independent  sample  using  confirmatory  factor  analysis  in  SEM . What  we've  done  with  the  help  of  Jianfeng  Ding,  who  is  the  developer  for  Exploratory  Factor  Analysis,  we've  been  able  now  to  link  the  two  platforms  by  basically  allowing  you  to  copy  the  model  specification  from  exploratory  factor  analysis  and  then  paste  that  into  SEM  so  that  you  can  easily  and  quickly  confirm  your  results. We  also  have  a  new  shortcut  for  switching  the  scale  of  your  late-in  variables.  Sometimes  this  is  helpful  when  you're  developing  surveys  for  specifying  models.   We  also  have  a  number  of  more  new  heat  maps  that  are  just  going  to  make  it  easier  to  interpret  the  results  of  your  analysis. Now,  last  but  not  least,  our  platform  has  always  been  very  fast.  But  in  this  release,  Chris  Gotwell  put  a  lot  of  awesome  effort  toward  improving  even  more  the  performance  of  our  internal  algorithms.   If  you  have  lots  of  variables  or  lots  of  data,  definitely  give  it  a  shot.  I  am  very,  very  impressed  and  excited  about  what  we  have  to  offer  in  terms  of  the  performance  of  the  platform  as  well. Okay,  so  it's  time  for  the  demo.  Let's  go  ahead  and  show  you  what  I  have  over  here.  I  have  a  journal  where  I'm  going  to  work  through,  hopefully  we  have  enough  time  to  work  through  three  examples. The  first  one  here,  perhaps  not  surprisingly,  uses  our  big  class  data  table.  It's  going  to  be  a  very  simple  example  just  to  introduce  the  notions  behind  multiple  group  analysis.  Now,  what  I'm  going  to  do  here,  we  have  two  variables,  right?  Height  and  weight.  What  I  want  to  do  is  investigate  the  association  between  these  two  variables  by  sex.  I'm  going  to  go  to  the  Analyze  menu,  go  down  to  Multivariate  Methods,  and  then  Structural  Equations  Models.  I'm  going  to  use  both  of  those  variables  and  click  on  Model  Variables. Now,  the  brand  new  feature  of  multiple  group  analysis  can  be  found  in  this  launch  dialog  under  this  Groups  button.  This  button  is  new  and  that's  what's  going  to  allow  us  to  select  our  grouping  variable  and  click  on  groups  in  order  to  use  that  as  our  grouping  variable.  We're  going  to  look  at  how  males  and  females  and  whether  they  differ  basically  on  their  association  between  height  and  weight.  We're  going  to  click O kay. Now,  this  is  the  platform.  You  can  see  if  you  have  seen  our  platform  before,  it  looks  very  similar  as  before,  with  the  exception  of  these  new  tabs  right  here.  The  tabs  are  there  to  tell  us  about  the  different  groups  that  we  have  in  our  analysis.  In  this  case,  there's  only  two  levels  for  our  grouping  variables.   We  have  a  tab  for  the  females  and  a  tab  for  males. One  of  the  things  you'll  notice  is  that  the  Path  Diagrams  have  a  model  already  as  a  default  for  each  of  those  groups.  Those  default  models  are  the  same.  That's  why  when  I  switch  tabs,  nothing  really  changes.   The  Union  tab,  as  the  name  implies,  it  shows  us  what's  in  common  across  all  of  our  grouping,  well,  the  levels  of  our  grouping  variable. Here,  this  is  why  this  diagram  also  looks  the  same. In  order  to  specify  a  simple  linear  regression  in  SEM ,  here,  I'm  just  going  to  select  in  this  From  List  the  height  variable,  and  then  in  the  to  list,  I'm  going  to  select  weight.  I'm  going  to  link  those  two  variables  with  a  one  headed  arrow,  which  is  what  adds  that  regression  path  to  my  model.   This  is  just  a  simple  linear  regression  where  height  is  predicting  weight. Now,  sometimes  I  like  to   right-click  on  the  canvas  of  the  Path  Diagram  and  I  go  to  customize  diagram  just  to  make  the  nodes  a  little  bit  larger  because  I  find  that  sometimes,  especially  when  the  diagrams  are  small,  that  looks  a  lot  nicer.  This  is  just  a  simple  linear  regression. Now,  notice  that  because  I  did  my  model  specification  under  the  Union  tab,  both  the  females  and  the  males  inherited  those  same  changes  and  specifications  to  the  model  that  I  made.  If  I  make  any  changes  within  a  group  specific  tab,  then  those  changes  will  only  apply  to  that  group.  But  in  this  case,  what  I  want  to  do  is  fit  an  initial  model  where  both  males and  famales  and  females  get  their  own  estimates  for  this  linear  regression. Now,  keep  in  mind  that  the  estimation  of  this  model  is  all  done  simultaneously.  We're  not  separately  fitting  this  model  for  females  and  for  males.  Everything  is  done  simultaneously,  but  I'm  still  able  to  allow  each  of  the  groups  to  have  their  own  estimates  for  the  model. I'm  going  to  click  on  Run,  and  we'll  see  there's  a  model  comparison  table  where  we  can  learn  a  lot  about  the  fit  of  the  model.  But  now,  something  that's  new  in  our  report  is  that  we  have  these  tabs  for  each  of  our  groups.   We  have  a  tab  for  the  females  and  a  tab  for  the  males. Now,  if  you  focus  on  the  regression  coefficient,  for  example,  I  can  go  back  and  forth  and  realize  that  I  do  have,  in  fact,  a  different  estimate  for  that  coefficient. Now,  the  coefficient  looks  different,  but  I  don't  have  a  test,  a  formal  statistical  test  that  tells  me  whether  or  not  that  association  is  statistically  significant.  The  difference  and  the  association  is  different.   At  any  rate,  the  males  here  have  about  3.4,  female  have  a  little  bit  larger  value.  But  what  we  really  want  is  to  fit  a  second  model  where  we  force  an  equality  constraint  on  that  parameter  estimate,  and  then  we  can  use  that  to  compare  against  this  model.   Let's  go  ahead  and  do  that. I'm  going  to  be  on  the  Union  tab  and  I'm  going  to  select  that  regression  path  and  I'm  going  to  click  the  button,  Set  Equal.  This  is  going  to  bring  up  this  dialog  which  is  just  going  to  ask  me  to  confirm  that  I  do  want  to  apply  this  equality  constraint  across  all  of  my  groups,  which  I  do.   I'm  going  to  click  Okay.  Now  notice  that  I  have  this  new  label  that  was  put  here  on  the  edge.   If  I  look  at  the  female  tab  and  the  male  tab,  that  label  is  still  showing  up  on  that  edge  on  that  arrow.  That  is  our  way  to  convey  to  you,  the  user,  that  the  same  parameter  estimate  is  going  to  be  used  basically  to  describe  that  association. Okay,  so  let's  go  back  and  model  name.  We're  going  to  change  this  to  be  regression  effect  is  equal.  We  force  that  to  be  equal  in  this  model.   We're  going  to  go  ahead  and  click  on  Run.  Now,  again,  we  could  look  at  our  model  comparison  to  look  at  the  fit  of  my  different  models.  I  can  select  the  two  models  that  I  just  fit,  and  because  one  of  those  models  is  a  restricted  version  of  the  other,  we  call  this  that  the  models  are  nested,  we  can  actually  do  a  likelihood  ratio  test. That  is  done  very  easily  in  our  platform  simply  by  selecting  the  two  models  and  clicking  on  Compare  Selected  Models.  We  will  obtain  a  difference  in  the  Chi  square,  which  represents  the  change  in  the  misfit  of  the  model.  We  also  look  at  the  difference  in  degrees  of  freedom  between  the  two  models  and  the  differences  in  the  Fit  according  to  some  of  the  most  popular  Fit  statistics  in  SEM. Now,  according  to  this  specific  test,  it  appears  that  the  change  in  Chi  square,  the  increase  in  misfit  is  not  statistically  significant.   If  we  use  just  this  Chi  square  difference  test,  we  would  then  come  to  the  conclusion  that  even  though  those  two  values  are  different,  they're  not  statistically  different.   Now  we  could  go  back  down  here  to  our  tab   results  and  you  can  see  that  the  regression  coefficient  is  the  same  even  when  I  go  across  the  tabs.  We  could  then  say,  well,  there's  no  difference  between  males  and  females  in  terms  of  how  height  predicts  weight. This  is  a  very  simple  example  of  how  we  could  use  equality  constraints  across  groups  in  order  to  test  a  specific  hypothesis.  Now,  as  you  can  imagine,  I  could  go  back  into  my  model  specification  and  I  could  put  equality  constraints  also  on  the  variance  of  height  and  on  the  residual  variance  of  weight.  If  that  is  something  that  is  of  interest  to  me,  if  I  want  to  test  those  differences,  this  framework  allows  me  to  do  that. Now,  a  lot  of  times  you're  going  to  have  more  complicated  models  well  beyond  linear  regression,  or  you  might  have  more  levels  of  your  grouping  variable,  and  that's  totally  fine.  This  is  a  simple  example  that  hopefully  you  can...  That  allows  you  to  see  how  you  could  extend  this  into  a  more  complicated  setting.  Okay,  so  that  is  this  example. I  want  to  move  on  to  an  example  that  uses  longitudinal  data.  Now,  we're  not  going  to  move  away  from  multiple  group  analysis  entirely.  We're  basically  going  to  highlight  some  of  those  longitudinal  analysis  improvements,  but  then  still  bring  back  the  notion  of  multiple  group  analysis. For  this  example,  I  want  you  to  imagine  that  we  have  data  table  where  we  have  data  from  students  that  have  taken  an  academic  achievement  test  for  four  consecutive  years.   Perhaps  what  we  really  want  to  find  out  from  these  data  is  whether  student's  achievement...  How  is  it  developing  over  time?   Whether  males  and  females  differ  in  their  trajectories  over  time?   These  are  going  to  be  the  two  questions  that  we're  going  to  focus  on  for  this  particular  example. Now,  there  is  a  sample  data  table  that  you  will  find  within  our  sample  data  folder.  It's  called  Academic  Achievement.   You  could  use  that  to  follow  along  with  this  example.  In  this  data,  we  have  100  rows.  Each  row  represents  a  different  student  that  took  this  academic  achievement  test.   You  can  see  that  here,  these  four  columns  represent  the  scores  on  that  multiple  choice  test  that  was  taken  for  years  in  a  row.  Those  are  the  data  that  I'm  going  to  focus  on  for  fitting  a  longitudinal  model. I'm  going  to  go  to  the  Analyze  menu,  Multi  Variate  Methods,   Structural Equation Models,  and  those  four  variables  are  selected.   I'm  just  going  to  click  on  Model  Variables  in  order  to  use  those  in  SEM.   I'm  going  to  click  Okay.   Remember,  the  first  question  was,  how  do  students'  academic  achievement  develop  over  time?   We  want  to  characterize  that  growth  or  figure  out  whether  there  is  growth  indeed. We  have  our  model  shortcut  down  at  the  bottom  left,  and  you  can  see  that  under  the  Longitude  analysis  menu,  we  have  a  new  option.  We're  going  to  get  to  this  option  later  today,  multivariate,  late  in  growth  curves. But  we  also  have  had  a  few  other  options  here  that  make  longitudinal  modeling  very  quick  and  simple.  For  this  example,  I'm  going  to  use  the  Fit  and  Compare  Growth  models.  When  I  do  that,  three  different  models  are  fit.  I  obtain  a  Chi  square  difference  test  for  all  of  the  possible  combinations  here.   If  I  look  at  the  Fit  indices  and  also  the  results  from  this  Chi  square  difference  test,  I  will  recognize  that  the  best  fitting  model  here  is  the  linear  growth  curve  model.  In  other  words,  it  appears  that  the  scores  on  this  academic  achievement  test  over  time  can  be  best  characterized  by  linear  growth. Based  on  that,  I  will  go  ahead  and  focus  on  interpreting  the  results  from  this  linear  growth  curve  model.   I'm  going  to  open  that  and  recall  that  that  one  of  the  new  features  for  longitudinal  modeling  is  a  new  predicted  values  plot  that  allows  us  to  interpret  the  results  of  our  models  a  lot  more  easily.   If  you're  familiar  with  growth  curve  models,  you  know  that  some  of  the  key  parameter  estimates  are  these  right  here.  They  tell  us  on  average  how  our  students  where  do  they  start  and  how  are  they  changing  over  time  and  how  much  variability  there  is  in  those  trajectories. Under  the  red  triangle  menu  of  this  particular  model,  if  I  scroll  down,  I'm  going  to  find  an  option  called  predicted  values  plot.   If  I  click  on  that,  you  will  see  that  as  a  default,  we  show  you  box  plots  of  the  predicted  values  for  all  of  the  outcome  variables  in  the  model. Now,  when  you  have  longitudinal  data,  we  have  a  very  convenient  option  here  that  allows  you  to  connect  the  data  points  and  actually  obtain  a  spaghetti  plot  that  shows  you  each  of  the  individual  predicted  trajectories  by  the  model.  Now,  it's  pretty  cool  because  the  plot  is,  in  fact,  linked  to  the  data  table.   Whatever  selections  you  have  here  on  the  plot,  you  can  also  see  those  in  your  data  table,  which  is  something  that  you  know  to  expect  from  JMP. In  terms  of  interpreting  the  results  of  the  model,  it's  no  surprise  that  these  are  all  straight  lines  because  we  fit  a  linear  model.  But  you  can  certainly  see  that  there  is  a  lot  of  variability  in  the  way  these  students  are  changing.  Some  students  start  on  the  top  at  the  beginning  and  are  still  increasing.  Other  students  are  starting  low  and  are  actually  exhibiting  a  little  bit  of  decline  over  time. But  we  also  see  an  average  trajectory  here  that  seems  to  show  a  little  bit  of  increase  over  time.   On  average,  there  is  some  increase,  but  there's  a  lot  of  variability  on  how  people  are  changing.   Of  course,  one  of  the  natural  questions  you  might  have  is,  what  factors  predict  those  different  trajectories,  like  the  variability  in  that  intercept  slope,   that's  something  that  I've  covered  in  other  presentations,  so  I'm  not  going  to  talk  about  that  now.  But  again,  I  encourage  you  to  use  the  predictive  values  plot  to  better  interpret  your  longitudinal  analysis. We  talked  about  users  being  able  to  specify  their  own  independence  model.  That  is  something  that  we  do  here  in  the  model  comparison  table  and  can  be  very  useful  for  longitudinal  analysis.  We  do  have  an  independence  model  that  is  fed  by  default,  but  if  you  choose  to  change  that,  then  you  could  always   right-click  on  any  given  model  that  you  want  to  set  as  the  independence  independence  model,  and  we  will  take  care  of  that  change  for  you. That  is  an  advanced  technique.   I  very  much  advise  you  to,  if  you're  not  familiar  as  to  what's  the  proper  independence  model  for  your  analysis,  you  should  really  take  a  look  at  the  literature  to  make  sure  that  you're  using  a  good  independence  model  because  it  really  varies  sometimes  by  context. I'm  sitting  next  to  this  beautiful  window  and  the  sun,  it's  a  gorgeous  day  so  I'm  going  to  have  to  adjust  here  my  computer  so  that  I  don't  have  all  the  light  on  my  face.  I  apologize  for  that. Okay,  so  let's  get  back  to  this  question.  We  said  that  how  do students  achieve  and  develop  over  this  period  of  time?   We  now  have  an  understanding  that  it  develops  in  this  linear  fashion  and  that  there  is  substantial  variability.  That's  the  answer  to  that  question. Well,  the  next  thing  is,  do  males  and  females  differ  in  these  trajectories?   The  way  we're  going  to  acknowledge  and  address  that  question  is  by  using  multiple  group  analysis.  T his  could  be  back  in  the  platform,  we  could  use  the  main  triangle  menu  to  redo  and  relaunch  our  analysis. Now  what  we're  going  to  do  is  bring  this  grouping  variable,  I  have  a  sex  as a  groups  variable.   Just  by  doing  this,  we'll  be  able  to  invoke  our  multiple  group  analysis  functionality.   I'm  going  to  click  Okay,  and  now  you  can  see  that  our  report  for  the  platform  has  the  levels  of  our  grouping  variable  here  as  tabs. Just  as  before,  you  can  see  that  the  males  and  the  females  have  the  same  model  as  a  default,  but  we  can  make  changes  to  that.   We're  going  to  work  within  the  Union  tab  because  I  want  the  changes  that  I'm  about  to  do  to  the  model  specification,  I  want  them  to  apply  for  both  males  and  females. I  will  also  highlight,  and  this  is  just  a  little  side  note,  that  under  the  Status  tab,  you're  going  to  find  group  specific  information  that  we  didn't  have  before  when  we  didn't  have  multiple  group  analysis.   You  can  have  some  information  about  your  data,  missing  data,  and  so  on  that  is  specific  to  the  groups. Okay,  so  let's  go  ahead  and  answer  this  question.  Do  males  and  females  have  differences  in  their  trajectories?  Well,  I  already  know  that  the  linear  model  fits  best,  so  I'm  going  to  go  to  our  model  Shortcuts,  Longitude   Analysis,  and  I'm  going  to  click  on  the  linear  latency  growth  curve.  The  Shortcuts  very  quickly  set  up  the  model  for  me,  make  it  very  simple,  and  they  do  that  across  all  of  the  levels  of  the  grouping  variable.  I  have  the  linear  growth  curve  model. Notice  that  these  key  aspects  of  the  model,  the  estimates  that  really  characterize  the  change  in  our  data,  don't  have  any  labels  on  those  edges,  which  means  that  they're  freely  estimated  across  males  and  females.   My  first  model  here  is  a  linear  growth  curve  model.  I'm  just  going  to  put  a  little  keyword  here.  Oops,  I  erased  it.  Linear  growth  curve.  But  I  wanted  to  include  here  that  this  is  freely  estimated  right  across  the  groups.  I'm  going  to  click  on  run.  Excellent. We  can  see  here  some  Fit  indices.   In  my  report,  I  can  see  a  tab  for  the  males  and  for  the  females.   Of  course,  as  you'd  expect,  if  I  go  back  and  forth,  I  could  take  a  look  at  the  results  for  the  females  and  then  go  back  and  look  at  how  those  results  are  perhaps  different  for  the  males.   This  is  interesting.  There  appear  to  be  some  differences,  but  again,  we  might  want  to  figure  out  whether  the  differences  that  we  observe  just  from  looking  at  these  estimates  but  those  are  in  fact,  statistically  significant. What  I'll  do  is  I'm  going  to  go  back  to  my  model  specification  and  I'm  going  to  do  an  OmniVis  test.  In  other  words,  rather  than  just  putting  an  equality  constraint  on  one  of  these  estimates,  I'm  actually  going  to  do  that  for  all  of  these  estimates,  the  intercept  mean,  the  mean  for  the  slope,  and  the  covariance  of  the  intercept  and  slope,  and  their  variances. You  don't  have  to  do  it  this  way,  but  really  it's  your  research  question  that  should  be  guiding  where  do  you  place  those  equality  constraints?  In  my  case,  I  just  want  to  do  an  omniVis  test  where  I  figure  out  whether  the  trajectories  for  males  and  females  are  different  and  whether  or  not  I  need  a  separate  estimate  for  those  parameters. I  have  all  of  those  edges  selected  and  I'm  going  to  click  on  set  equal.  Here  I  confirmed  that  I  do  want  those  equality  constraints  across  both  groups.  This  is  actually  quite  helpful  when  you  have  more  than  two  levels  in  your  grouping  variable.  It  might  be  that  you  want  equality  constraints  across,  say,  two  groups  but  not  the  third.   You  can  uncheck  some  of  those  groups  here  if  you  needed  to. I'm  going  to  click  Okay.  Now  all  of  those  edges  got  a  different  label.   You  can  see  that  if  I  go  and  look  at  the  males  and  the  model  for  the  females,  those  labels  are  the  same.  Again,  just  to  remind  us  that  we're  going  to  estimate  only  one  estimate  for  each  of  those  edges  across  groups. Okay,  so  this,  once  again,  is  a  linear  growth  curve,  but  I  have  equal  growth  estimates.  Let's  go  ahead  and  run  that  model.  We  can  see,  again,  we  could  focus  on  the  fit  of  this  model.  It  doesn't  seem  to  be  as  good  as  the  previous  one.  Because  this  second  model  is  a  restricted  version  of  the  first,  we  can  actually  select  those  two  models  and  do  a  meaningful  comparison  by  clicking  on  Compare  Selected  Models. As  before,  we  are  able  to  see  here  the  change  in  the  Chi  square  along  with  the  change  in  the  degrees  of  freedom.   This  tells  us  how  much  increase  in  misfit  is  there  in  our  model,  and  is  that  increase  in  misfit  statistically  significant? If  it  is,  which  in  this  case  it  is,  then  we  basically  are  saying  that  those  equality  constraints  are  not  tenable.  It  was  not  a  good  idea  to  place  those.   Now  we  can  say  with  a  formal  statistical  test  that  there  are  statistically  significant  differences  in  the  trajectories  across  males  and  females. Now,  you  might  want  to  look  at  those  differences  by  using  the  predicted  values  plot.  That's  something  that  we  can  do  just  by  going  into  the  red  triangle  menu.  But  what  I'm  going  to  do  first,  actually  is  I  don't  really  want  to  look  at  the  model  that  has  the  equal  growth  estimates  because  we  just  realized  that  those  equality  constraints  were  really  not  a  good  idea.   I'm  not  going  to  look  at  that.  Instead,  I'm  going  to  look  at  the  first  model  we  fit,  and  I'm  going  to  do  the  same  for  the  males  here.  Under  the  red  triangle  menu,  I'm  going  to  click  on  Predicted  Values  Plot,  and  I'm  going  to  connect  those  points  because  I  know  my  data  are  longitudinal. This  is  the  plot  that  is  specific  to  the  males,  to  the  male  sample.  But  it  would  be  really  helpful  to  look  at  this  plot  side  by  side  with  the  plot  for  the  females.   It's  actually  quite  nice  that  all  of  our  red triangle  menu  options  here  are  automatically  turned  on  across  all  of  your  groups,  so  you  don't  have  to  go  group  by  group  turning  on  the  things  you  want  to  see. But  another  trick  that  I  really  like  is,  when  you  have  a  tabbed  report,  you  can  always  right- click  on  it  and  change  the  style  of  the  report  so  that  it's  on  a  horizontal  spread.  This  is  going  to  allow  you  to  see  the  tabs  side  by  side,  the  content  of  them. I'm  going  to  click  on  H orizontal  Spread.   Now  notice  that  I  have  the  males  and  the  females  side  by  side.  I'm  going  to  use  the  red  triangle  menu  along  with  the  option  or  Alt  key  in  order  order  to  turn  off  the  summary  of  fit,  the  parameter  estimates,  and  the  diagram.   Really,  all  I  want  to  see  is  the  predicted  values  plot.  I'm  going  to  click  Okay.  Perfect. Now  I  can  see  the  predicted  values  plot  for  the  males  and  for  the  females.  I  can  see  that  side  by  side.  Very  purposefully,  we  have  here  the  Y  axis  in  the  same  scale  so  that  these  plots  are  comparable.   Now  you  can  see  how  the  trajectories  differ.  We  see  that  there's  a  lot  more  spread  for  the  sample  for  the  females.   There  also  seems  to  be  a  little  bit  of  a  difference  on  that  average  trajectory  in  the  amount  of  growth. Again,  there's  many  more  follow  up  tests  that  we  could  do  here  in  order  to  figure  out  where  the  specific  differences  lie.  If  we  wanted  to  test  that,  we  could  say,  well,  is  there  a  difference  specifically  in  the  variance  of  the  slope?  We  could  put  that  equality  constraint  and  do  more  specific  tests  as  followups.  But  for  now,  I  hope  that  just  showing  you  this  example  really  allows  you  to  see  how  multiple  group  analysis  can  be  used  in  a  more  complex  setting  and  how  this  new  predictive  values  plot  can  be  used  to  really  facilitate  the  interpretation  of  your  longitudinal  models. All  right.  We're  almost  at  the  end  of  the  demo  and  what  I  want  to  do  very  quickly  with  the  same  data,  I  really  wanted  to  highlight  the  multivariate  growth  curves  shortcut.  L et  me  go  ahead  and  go  back  here  to  the   Structural Equation Models  platform,  and  this  time,  imagine  that  we  have  two  sets  of  scores  over  time. So  we're  going  to  be  looking  at  two  processes.  We  don't  just  have  academic  achievement  on  that  one  test.  We  have  it  on  two  different  tests  and  we  want  to  see  how  those  two  processes  are  changing.  How  are  they  related  over  time?   I'm  going  to  use  all  of  these  variables  here,  click  on  Model  Variables  and  Okay.  U nder  the  model  shortcut,  remember  that  longitudinal  analysis  multivariate latent  growth  curve,  that  shortcut  allows  me  to  select  variables  for  one  specific  process. Here,  I  might  have  those  first  four  variables.  That  was  my  first  process  I  want  to  look  at.  Let's  just  say  that  those  were  math  scores.  I'm  going  to  call  that  math.  Y ou  get  to  choose  here  what  type  of  growth  you  want  to  specify  for  that  specific  process  for  that  set  of  variables.  We're  going  to  stick  to  a  linear  growth,  and  then  we  can  click  the  plus  button  in  order  to  have  that  done  for  us  right  away.  Y ou  can  see  the  preview  in  the  background  here.  W e  have  an  intercept  and  a  slope  for  math.  T hen  we  can  change  the  name  here.  Maybe  the  second  process  is  science,  and  now  we  can  select  the  variables,  the  repeated  measures  for  that  science  test  over  the  four  years. A gain,  we're  going  to  stick  to  the  linear  model,  and  we're  going  to  click  the  plus  button.  V ery  quickly,  that  model  is  changing  there  on  the  background.  We're  done  now.  So  I'm  just  going  to  click  Okay.  A gain,  now  I  could  just  click  Run  and  very  quickly  get  the  results  for  that  model.  This  is  an  advanced  application,  but  it  is  a  really  interesting  one  because  it  allows  you  to  look  at  how  the  initial  time  point,  the  intercepts  across  two  processes  in  this  case,  how  are  those  intercept  scores  related?  Are  they  associated?  A lso  the  rates  of  change  over  time. If  you  have  a  higher  score  in  math,  do  you  also  tend  to  have  not  just  a  higher  score,  but  a  higher  slope  over  time  in  math?  Does  that  mean  you  also  have  a  higher  rate  of  change  in  science?  A ccording  to  this,  you  do,  because  we  have  a  positive  association  between  those  two  factors. Again,  just   highlighting  some  of  that  new  functionality.  My  very  last  example  is  for  survey  development.  This  is  going  to  be  very  brief,  I  promise.  Let's  just  say  here  that  we  want  to  figure  out  what  are  the  key  drivers  of  customer  satisfaction?  W e  know  that  perceived  quality  of  our  product  and  the  reputation  of  our  brand  are  really  important.  But  really,  before  we  can  even  answer  any  questions  about  customer  satisfaction,  we  really  need  to  make  sure  that  we  have  a  valid  and  reliable  way  to  assess  those  variables.  Because  these  are  variables  that  are  not  observed  directly,  they're  latent  variables,  and  therefore,  it's  difficult  to  make  sure  that  we  are  measuring  them  in  a  reliable  and  valid  way.  S urvey  development  is  all  about  achieving  that  goal. I  have  an  example  here  that  is  going  to  allow  us  to  see  how  exploratory  factor  analysis  is  now  linked  to  SEM  so  that  you  can  do  survey  development  in  a  really  streamlined  fashion.  I  have  843  rows  in  this  data  table.  Each  row  represents  an  individual  who  filled  out  a  survey.  In  that  survey,  they  gave  us  ratings,  answered  questions  about   the  perceived  quality  of  our  product.  They  also  gave  us  different  answers  for  the  perceived  brand  of  our  brand.  T hen  they  also  answer  questions  about  their  satisfaction  with  the  product.  This  could  be  things  like,  how  likely  are  you  to  recommend  our  product  to  someone  you  know?  Those  types  of  questions. I  already  have  a  saved  script  for  the  factor  analysis  platform.  I'm  not  going  to  get  into  the  details  of  how  you  use  this  platform,  but  I  do  want  to  focus  on  the  fact  that  the  results  from  this  analysis  are  right  here  in  the  rotated  factor  loading  matrix.  That  is  the  key  result  from  this  analysis.  U sually,  what  we  want  to  see  here  is  that  the  questions  that  are  supposed  to  measure,  in  this  case,  satisfaction,  that  they  are  in  fact,  loading  into  the  same  factor. In this  case  they  are,  and  that's  good  news.  We  see  the  same  pattern  for  quality.  The  more  substantial  loadings  are  for  these  first  three  questions  of  quality.  Notice  that  there  is  one  quality  question  that  doesn't  seem  to  have  a  good  high  loading  in  any  of  the  factors.  So  maybe  we  would  go  back  and  make  sure  that  the  wording  of  that  question  is  good,  or  we  might  just  want  to  throw  out  that  question  altogether. There's  also  a  couple  of  questions  for  perceptions  of  our  brand  that  didn't  seem  to  do  very  well.  A gain,  usually  you  do  very  careful  selection  of  your  questions.  You  would  go  back,  read  what  were  those  questions,  is  there  something  we  should  tweak,  or  shall  we  just  get  rid  of  them?  Now,  for  the  time  being,  the  feature  I  want  to  highlight  is  that  under  the  red  triangle  menu  of  this  model,  there  is  a  new  option  for  copying  the  model  specification  for  SEM.   I'm  going  to  click  that.  W hat  it  does  is  that  the  loadings  that  are  bold  here  in  our  final  rotated  factor  loading,  those  loadings  are  going  to  be  stored  so  that  we  can  use  them  in  the  SEM  platform. Normally,  you'd  want  to  collect  a  new  independent  sample  so  that  you  can  confirm  these  exploratory  results.  Now,  let's  just  assume  for  a  minute  here  that  this  data  table  is  my  new  independent  sample,  and  I  would  now  go  to  Analyze,  Multivariate  Methods,   Structural Equation Models,  and  I  can  use  all  those  same  variables,  like  on  model  variables,  and  then  Okay  to  launch  the  platform. Normally,  again,  you  want  to  confirm  the  results  that  you  found  with  an  independent  sample.  What  you  can  do  is  in  the  main  red  triangle  menu,  you  can  click  on  paste  model  specification.  N ow  notice  that  the  factor  loadings  from  the  factor  analysis  platform  were  rescaled  by  the  standard  deviations  of  the  indicators.  I'm  going  to  click  Okay,  and  you  can  see  now  that  the  values  here  are  fixed  for  the  loadings  of  those  late  in  variables.  They're  fixed  to  correspond  to  the  values  from  the  factor  analysis  platform. Now,  again,  they  have  to  be  rescaled  beause  the  variance  of  the  variables  is  taken  into  consideration.  That's  the  proper  way  to  specify  the  model.  But  it's  really  nice  to  be  able  to  streamline  this  workflow  because  normally,  if  you  really  want  to  fit  a  confirmatory  factor  model  based  on  an  exploratory  factor  analysis,  you  would  have  to  put  these  constraints  by  hand. T hat's  really  tedious.  So  we've  made  it  very  easy.  These  latent  variables  have  loadings  that  are  fixed  to  known  values  from  a  previous  study,  from  a  previous  exploratory  analysis,  and  we  can  now  confirm  whether  or  not  that  factorial  structure  still  holds  with  a  new  sample. One  thing  I  should  clarify  is  that  the  three  variables  that  did  not  have  substantial  factor  loadings  in  the  report  are  not  being  linked  to  any  of  the  latent  variables.  R eally,  we  don't  want  these  to  be  here  in  the  analysis.  W hat  I  can  do  is  the  red triangle  menu  also  has  an  option  for  removing  manifest  variables  from  the  analysis. I'm  going  to  use  that  so  that  I  can  quickly  just  find  quality  3,  brand  3,  and  brand  5 a nd  I  can  just  click  Okay  to  get  rid  of  those  variables  because  I  don't  really  want  to  fit  my  model  with  them  in  there.  A gain,  now  I  can  just  run  this  model,  assess  the  fit,  and  figure  out  whether  I  can,  in  fact,  confirm  my  results  from  exploratory  factor  analysis  using  confirmatory  factor  analysis  in  SEM.  T hat  is  all  I  have  for  this  demo.  I  hope  that  this  is  helpful  and  I  look  forward  to  answering  all  your  questions  during  the  live  Q&A.  Thank  you  very  much.
What if you could save time in your process of collecting data, cleaning it, and readying it to begin your analysis? Accessing data and preparing it for review is often the most time-consuming part of creating a new data analysis or project. With that in mind, we would like to introduce the Workflow Builder in JMP® 17. With this exciting new feature, JMP users can now record their entire process from beginning to end, starting with accessing data from multiple sources. Working with the action recorder (added in JMP® 16 to track steps and provide scripts that can be saved and reused), Workflow Builder tracks all your changes in data prep and cleanup, data analysis, and reporting. In this presentation, we will show how to operate the Workflow Builder, save each action, and then replay and share them in a polished report. This is sure to become your new favorite feature in JMP 17. No manual cleanup means extra time in your day!     Hi,  I'm  Mandy  Chambers,  and  I'm  a  Principal  Test  Engineer  in  the  JMP  Development  team.  I  want  to  talk  to  you  today  about  navigating  your  data  workflow.  The   Workflow Builder  is  new  for  JMP  17,  and  I  think  it  can  grant  all  your  wishes  for  your  data  cleanup. The   Workflow Builder  is  the  ability  to  record  your  JMP  data  prep  and  analysis  workflows.  It  tracks  all  your  changes  that  you  make,  cleaning  up  and  reporting.  It  records  your  steps  and  you  can  use  them  over  and  over  again.  It  allows  you  to  save  your  data  and  your  workflows,  package  them  together  nicely,  and  share  them  easily  with  others.  It  also  is  just  going  to  save  you  lots  and  lots  of  time  in  your  day. This  is  a  screenshot  of  the   Workflow Builder.  It's  listed  underneath  the  File  menu  system  in  JMP,  File,  New,  New  workflow.  You  need  to  open  a  data  table  or  import  data  to  begin  and  you  will  be  prompted  with  a  question  that  says,  Do  you  want  to  record?  When  you  say  yes,  then  the  screen  on  the  right  will  show  you  that,  like  in  this  case,  we  opened  Big  Class  and  it  will  be  recorded. I'm  going  to  get  into  this  today  and  show  you  as  much  as  I  can.  You  can  see  here  there's  lots  and  lots  of  stuff  happening.  There  are  step  settings,  there  are  navigation  buttons,  there  are  images,  and  so  much  more.  I  probably  can't  hit  everything  today,  but  I'm  going  to  do  my  best  to  try  to  show  you.  Let's  get  right  to  the  demo  and  I'll  get  started. This  is  a  new  utility.  I  do  think  it  saves  you  time  and  it  allows  you  to  do  clicks  instead  of  coding.  I  think  that's  really  cool  and  I  think  for  a  new  JMP  user,  especially,  this  is  going  to  really  make  their  life  easy. Great  job  to  JMP  for  building   up  ways  that  we  can  capture  JSL.  In  16,  JMP  introduced  the   Action Recorder,  and  now  in  17,  we  give  you  the   Workflow Builder  along  with  even  more  JSL  added  to  the   Action Recorder.  The   Workflow Builder  allows  the  JSL  to  be  saved  and  the  steps  replayed  as  well  as  shared.  A gain,  no  coding  is  really  required  unless  you  want  to  add  that. I'm  going  to  get  right  into  this  by  showing  you  a  workflow.  This  is  a  workflow  that  I  saved  as  a  demo.  Most  of  these  examples,  I  have  tried  to  use  sample  data  so  that  you  can  go  back  yourself  and  maybe  try  some  of  these.  I'm  using  the  food  journal  data  table  from  JMP.  I'm  recoding  a  column,  I'm  changing  a  column  properties  order,  and  then  I  ran  a   Text Explorer  report. You  see  the  buttons  here  at  the  top,  and  this  is  a  saved  workflow.  I'm  going  to  just  click  this  middle  button  which  says  to  execute  the  workflow  and  run  it  so  you  can  see  how  quickly  it  runs. Notice  a  couple  of  things  here  really  quickly.  Over  here  on  the  right,  you  see  these  little  images  pop  up.  A ll  that  is  simply,  if  I  click  on  that,  is  it's  a  screenshot  of  each  of  the  steps.  It  doesn't  really  do  anything.  It's  a  screenshot.  It  just  shows  you  what's  in  there.  Again,  if  I  click  this  one,  this  is  the  column  where  I  changed  the  column  property.  That's  nice  to  have  in  case  you  don't  remember  what  you  did. The  little  green  check  is  the  check  that  says  the  step  executed  properly.  If  it  didn't  execute  properly,  you  would  get  a  red  X,  so  you  would  know  something  didn't  work  right. Now,  I'm  going  to  recreate  this  for  us.  Looking  at  this   Workflow Builder  here,  I  had  created  my  local  data  filter  is  why  I  changed  the  order.  I  wanted  it  to  be  in  this  order.  I  hung  out  on  this  late  snack  thing  because  I  was  looking  and  thinking,  "Good  heavens,  if  I  had  cappuccino,  moko,  and  chocolate,  and  candy,  and  sugar  at  night,  I  would  never  go  to  sleep."  I  don't  know  about  you,  but  anyway. Let's  close  this  up.  That's  this  little  reset  button  right  here,  closes  everything  in  the  workflow.  Then  I'm  going  to  go  up  under  the  File  menu  system.  I'm  going  to  click  New  and  I'm  going  to  go  down  here  and  click  New  workflow.  Here's  my  new  workflow.  Let's  observe  a  couple  of  things  here  real  quick  before  I  get  started. This  is  the  little  record  button,  as  I  said,  so  we'll  put  that...  I'm  not  going  to  push  that.  I'm  going  to  show  you  how  that  executes.  But  you  have  places  where  you  can  get  data  from.  The  JMP  Log  history  over  here,  I  had  reduced  it  down  just  because  I  didn't  want  to  see  it  while  my  workflows  are  running. But  throughout  the  day,  you  could  open  the   Workflow Builder  and  you  could  just  use  it  all  day  long  and  it  will  stack  your  statements  down  here  in  increments  of  10  minutes,  an  hour,  two  hours.  It'll  save  things.  You'll  even  have  something  if  you  left  it  up  that  would  show  that  it  was  done  yesterday. You  could  come  back  to  this  and  you  could  grab  statements  and  all  you  have  to  do  is  grab  them  and  hit  this  arrow.  You  can  push  them  up  or  you  can  grab  them  and  drag  them  or  copy  and   right-click,  I  believe,  and  copy  and  go  up  and  paste  them  in.  All  sorts  of  ways  to  get  things  into  your   Workflow Builder.  I'm  going  to   right-click  here  and  delete  this.  I'm  going  to  cut  it  out  because  I  want  to  do  this  the  other  way. Let's  get  out  to  my  JMP  home  window  and  just  simply  open  my  food  journal.  There's  the  button  I  told  you  you  would  get  that  says,  "Hey,  do  you  want  to  record  this?"  I'm  going  to  go  yes.  Notice  that  in  the   Workflow Builder  right  here,  the  little  red  dot  is  now  hollowed  out.  I  am  in  record  mode,  so  everything  I  do  will  be  recorded. I'm  going  to  quickly  go  in  here,  so  I  can  make  sure  I  get  through  a  lot  of  examples.  I'm  going  to  recode  this  and  I'm  going  to  go  in  and  I'm  going  to  do  it  in  place  because  I  don't  want  an  extra  column.  I'm  not  going  to  type  a  whole  bunch  of  stuff  this  time.  I'm  just  going  to  convert  this  to  upper  case  so  it  makes  the  change  and  I'm  going  to  recode  it. Then  I'm  going  to  go  in  and  I'm  going  to  go  to  my  column  info  and  I'm  going  to  go  down  and  add  value  ordering  on  this.  Now  I'm  going  to  re order  this  to  where  I  have  breakfast  first.  I've  got  an  AM  snack.  I  need  lunch  to  move  up  here.  I  need  a  PM  snack  to  be  after  lunch,  then  I  have  dinner,  then  I  have  a  late  snack.  There's  my  ordering  I  wanted. You'll  see  that  those  steps  have  been  recorded.  There's  my  open,  there's  my  recode,  there's  my  change  of  column  property.  Now  I'm  just  going  to  grab  this   Graph Builder  here  is  the  one  I  think  I  read.  That's  easy.  Then  I  don't  have  to  recreate  a  lot  of  stuff.  I'm  going  to  add  a  local  data  filter  and  I'm  going  to  use  that  meal  column. Again,  here's  the  order  I  wanted.  If  I  click  on  this,  I  get  a  slightly  different  view.  It's  not  a   Text Explorer,  but  you  can  see  here,  here's  the  column  for  chocolate  and  of  course,  it  has  the  most  calories.  One  of  my  favorite  things  is  chocolate. Anyway,  notice  that  this   Graph Builder  is  not  recorded  yet  in  the   Workflow Builder.  Platforms  don't  get  recorded  right  away,  but  it  records  after  you  close  the   Graph Builder,  or  we  added  a  button  under  here.  If  you  want  to  save  the  script,  you  can  save  it  to  the  workflow.  But  for  all  I'm  doing  today,  I'm  just  going  to  close  it.  There  it's  been  written  to  the   Workflow Builder. The  other  place  that  you  can  go  and  grab  stuff,  and  this  is  one  of  the  things  I  love  about  the   Action Recorder.  If  I  wanted  to,  I  can  go  over  here  and  you  can  click  on  these  steps  as  well  now.  Under  this  red  triangle  menu,  you  can  save  the  script  and  say  add  it  to  the  workflow.  You  could  always  go  over  there  and  get  something  if  you  needed  to. Let's  stop  recording.  Let's  hit  the  reset  button  and  let's  close  this  thing.  Now,  let's  run  it  and  it  should  run  exactly  the  same  way  as  the  first  one.  I  did  a  little  bit  different  platform,  but  you  can  see  that  column  has  been  recoded  and  there's  my   Graph Builder.  Now,  we  have  created  your  first   Workflow Builder.  The  first is  as  simple  as  that. Moving  on  to  a  second  example  I  have.  Let's  look  at  this  virtual  join  example  I  created.  I  used  the  Pizzas  examples  from  the  sample  data  library.  If  you're  not  familiar  with  virtual  join.  I'll  give  you  a  crash  course  here  in  a  minute  or  less. But  I  opened  the  tables,  I  created  link  IDs,  which  you  need  with  virtual  join,  and  then  I  created  link  reference  columns  and  I  simply  ran  a   Graph Builder.  I'm  going  to  run  this  and  then  I'm  going  to  show  you  some  things  about  it.  I'm  going  to  tweak  it  a  little  bit,  show  you  how  we  can  clean  up  the   Workflow Builder. Here's  virtual  join  in  30  seconds  or  less.  A  link  ID  is  created  in  pizza  subjects  there.  A  link  ID  is  created  in  the  profiles  here.  Then  I  went  to  pizza  references,  which  is  what  I  like  to  call  my  main  table  with  virtual  join.  A ll  my  references  are  set  up  this  way.  All  of  these  steps  I  did  do  manually.  It  was  all  saved  in  the   Workflow Builder  and  created,  and  then  I  simply  ran  this  graph. That's  that.  Ways  that  you  could  clean  this  up  as  a  presentation  is  if  I  was  showing  this  to  somebody,  they  don't  really  care  about  seeing  the  data  tables  and  all  that  stuff.  They  would  be  more  interested  in  my  graph  and  maybe  if  I  did  a  distribution  or  a  tabulate,  the  reporting  I  would  get.  I'm  going  to  show  you  a  way  to  go  in  here  and  do  some  things  on  the  right  hand  side  here  of  the   Workflow Builder. You  open  the  step  settings  and  this  is  where  all  the  magic  happens.  When  you're  doing  these  things,  the  JSL  is  getting  is  saved.  This  is  where  it's  happening  and  we  have  some  additional  buttons  over  here  of  things  we  can  do.  Like  I  said,  I  really  don't  care  very  much  about  the  table  showing,  so  there's  a  nice  little  option  in  here  called  hide  tables. I'm  going  to  click  this  and  I'm  going  to  go  down  and  add  hide  tables  to  every  one  of  these  so  that  it  quickly  hides  those  three  tables  and  I  don't  have  to  worry  about  that  anymore. The  other  thing  that's  nice  is  that  these  steps  right  here  are  actions  are  all  doing  things  to  the  data  tables.  I'm  going  to  select  these  and  I'm  going  to  right -click  and  I'm  going  to  say  group  those  selected  things  together  and  it  will  put  them  in  a  little  group.  I'm  going  to  name  this  something.  They're  going  to  always  be  called  group,  group  one,  group  two  so  you  might  want  to  name  them  something  that  means  something  to  you.  I'm  going  to  call  them  link  ID  and  link  reference. The  nice  thing  about  this  is  as  you're  building  workflow,  I  can  reduce  this  down.  Your  workflows  could  be  really  long.  If  I  reduce  this  down,  I  have  a  lot  more  space.  You  can  do  groups  within  groups.  That's  nice  too.  Anyway,  I'm  going  to  expand  this  for  right  now. Then  you  probably  remember  when  I  ran  this,  it  ran  really  fast.  My  final  step  is  a   Graph Builder,  and  I  want  it  to  hesitate  just  a  little  bit  so  you  can  see  it  hesitates.  I'm  going  to  go  in  here  and  add  a  custom  action.  This  is  where  the  coding  comes  in.  If  you  do  know  JSL,  I'm  going  to  add  a  wait  statement  here,  and  I  want  the  wait  to  happen  before  my  step,  so  it's  going  to  be  right  here. You  do  have  the  ability  in  this  to  use  the  arrows  to  push  things  down,  to  push  things  up.  You  have  a  little  bitty  trash  can  right  here,  so  if  you  add  something,  you  can  delete  it.  You  also  can  simply  leave  something  in  and  uncheck  it,  and  that  actually  works  really  well,  too.  I'm  going  to  go  in  here  and  this  is  where  because  I  know  JSL,  I'm  going  to  type  a  wait  statement  just  as  simple  as  that. Let's  close  this  up  and  let's  run  it  and  let's  watch.  There's  the  hesitation  and  there's  the   Graph Builder.  Let  me  close  it  one  more  time  and  get  you  look  right  here  and  look  for  a  little  running  man  when  I  run  this.  I'm  going  to  execute  it.  There's  the  running  man  and  there's  the   Graph Builder.  You  can  see  that  this  is  much  more  presentable  if  I'm  doing  something.  Again,  if  I  had  other  reports  I  wanted  to  show,  that  would  work  really  well. A  couple  more  things  you  can  do.  Obviously,  when  you  create  a  workflow,  you  save  it  up  here  by  saying  File,  Save  or  Save  As  and  give  it  a  name.  Every  workflow  has  an  ending  of  a . jmp  flow.  If  you  want  to  create  a  Journal,  the  developer  was  cool  in  creating  something  called  Add  Steps  to  Journal.  You  get  this  right  here,  which  is  nice. Journals  are  nice,  but  they're  hard  sometimes  to  make.  This  is  nice.  Each  step  is  in  here.  The  code  is  in  here.  You  could  run  it,  you  could  clear  it,  you  can  look  at  it.  I've  got  a  thumbnail  down  here  for  my   Graph Builder,  and  if  I  click  this,  I  get  the  full -size   Graph Builder.  Really  nice  feature  to  add  that  in  there. The  other  thing  I  wanted  to  point  out  is  if  you  go  up  to  the  red  triangle,  you  have  the  ability  to  save  the  script  to  the  script  window.  This  is  the  entire  script  for  doing  all  of  these  things  we've  just  done.  Even  the  hide  function  is  added  at  the  top  to  hide  with  the  tables.  The  only  thing  that  you  need  to  know  is  that  this  will  run  just  like  JMP  would  run  prior  to  JMP  17  without  a   Workflow Builder. This  script  does  not  rebuild  the   Workflow Builder  window.  The  only  way  you  can  use  the  window  is  through  the  UI.  But  it's  nice  if  you  want  the  script  or  you  just  want  to  save  it  that  way,  then  you  have  that  ability  to  do  that. Moving  on.  The  next   Workflow Builder  I  wanted  to  show  you  was  an  educational  type.  Let's  look  at  this  one.  This  was  done  by  Peter  Hersh.  He  did  something  much  more  complicated  and  I  took  it  and  made  one  of  my  own.  But  basically,  it's  a  workflow  that  opens  a  data  table,  it  runs  a  distribution.  It  goes  out  to  a  particular  section  like  right  here,  and  it's  using  what  we  call  a  show  message  window,  a  model  window  where  it  stops.  I  put  in  a  definition  for  quantiles  that  I  took  right  out  of  JMP  book  because  I  don't  know  that  stuff.  Then  I  said,  okay,  and  I  did  a  second  definition  which  is  showing  just  a  summary  of  these  are  the  other  stats  that  are  listed  in  the  summary  statistics. Kind  of  a  nice  thing  to  do  here.  Notice  a  couple  of  things  about  this  workflow.  I'm  going  to  close  this  up.  Notice  I  don't  have  a  red  button.  If  I  go  here  and  say  File,  New,  New  workflow  again,  I  don't  have  a  red  button  and  I  don't  have  the  JMP  Log  history.  When  I  set  this  up,  I  put  this  in  something  called  presentation  mode.  That's  first  thing  under  the  red  triangle.  You'll  see,  if  I  uncheck  that,  those  objects  come  back. But  I  didn't  feel  like  for  this  thing,  you  actually  need  that  in  there  so  I  turned  it  off  because  it's  really  something  you're  using  for  teaching.  The  other  thing  that  you  can  also  do  in  here  is  you  can  duplicate  a  workflow.  You  can  just  open  a  new  one  and  this  would  be  one  where  you  could  change  something. Let's  make  a  couple  of  changes  here  because  I  just  want  to  show  you  how  nice  this  is  that  this  will  be  able  to  be  changed.  I'm  going  to  go  and  get  a  different  JMP  table.  I'm  going  to  go  get  the  Cleansing  table.  Then  I'm  going  to  show  you  how  this  is  set  up  inside  of  the  report. The  report  step  code  was  to  run  the  distribution,  which  I  will  have  to  change  this  as  well  to  be  the  same  table.  Then  I  need  to  change  the  column  that  in  this  table  is  called  pH.  The  dispatch  for  the  quantiles  will  have  to  be  pH  as  well.  This  was  assigned,  the  distribution  is  run,  but  then  it's  been  assigned.  This  is  a  little  JSL  where  it  was  assigned  to  a  report. Then  the  show  message  window,  which  you  grab  from  here,  this  is  how  this  is  added,  was  added  in  here.  You  give  it  a  title,  which  is  the  title  of  the  message  window  is  client  tiles,  and  there's  where  I  pasted  the  definition.  Then  your  next  step  is  to  clear.  It's  another  JSL  step  to  select  your  report  and  deselect  it.  Then  you  go  on  and  select  the  second  section  of  the  report,  which  is  the  summary  statistics. I  need  to  change  this  again  to  pH.  Then  at  the  end,  here's  my  second  show  message  window  that  was  added.  You  can  add  as  many  of  these  as  you  want  to.  It's a  model  window  so  that  it  will  stop  and  wait  for  you  to  do  something.  If  I've  made  these  changes  correctly,  let's  run  this.  There  you  can  see  there's  the  Cleansing  table,  there's  the  pH  column  with  the  distribution. I'm  selecting  the  definition  of  the  quantiles,  and  I'm  going  on  and  using  the  definition  for  the  summary  statistics  at  the  bottom.  A  really  nice  feature  here,  really  cool  thing  that  you  can  actually  do  with  teaching,  I  think.  I  think  there's  going  to  be  a  lot  of  people  that  enjoy  using  that. Workflow  subset  is  also  a  really  cool  thing.  You  can  create  a  subset  of  a  workflow  by  going  into  here  and  selecting  temporary  subset,  or  you  can  actually  do  it  on  the  right  hand  side  where  you  open  a  table,  you  add  an  action,  and  you  say  subset  in  here,  which  is  what  I've  already  done  with  this. Here's  my  subset.  You  get  this  little  window  whether  you  do  it  from  the  temporary  side  or  whether  you  do  it  here  and  it's  stored  right  here  always,  just  so  you  know.  But  we  built  in  some  selections  here.  Your  selections  are  to  get  all  the  table,  50 %  of  the  table,  25 %  or  whatever,  and  that's  what  I've  checked.  These  are  choices.  This  is  going  to  make  a  lot  more  sense  for  a  table  that's  really  big. This  is  great  because  you  can  run  some  analysis  on  some  of  the  table,  maybe  that's  big.  Then  when  you're  all  done  with  capturing  everything,  you  would  go  back  and  maybe  change  it  back  to  all  of  your  data  and  it  should  work  seamlessly.  What  I  did  here  was  I'm  using  something  everybody's  probably  really  familiar  with  consumer  prefs,  and  there's  about  446  rows,  so  I'm  using  25 %. I'm  going  to  run  this  and  I  just  ran  a  categorical  platform.  You  can  see  here,  this  is  the  output  from  this  on  just  112  rows.  That's  all  it  used  in  this  case.  Now,  I'm  going  to  close  this  up  and  go  over  here  and  say,  "Okay,  let's  just  say  I'm  done  with  what  I  want  to  do  and  I  don't  need  a  subset  anymore." Well,  I  can  go  back  here  and  delete  this  with  the  trash  can,  or  I  could  simply  just  uncheck  it  this  time  and  basically  run  it  again  because  I  might  want  to  do  something  later  with  it.  Now  when  I  run  it  again,  you  can  actually  see  right  here  that  you  get  the  448  rows.  Okay,  that's  how  many  are  in  there.  You  get  your  full  platform,  you  get  all  the  data  analysis.  I  think  people  are  going  to  get  a  lot  of  mileage  out  of  that  as  well.  Being  able  to  do  the  subset,  especially  if  you're  using  millions  of  rows  of  data,  I  think  that  will  be  very,  very  helpful. One  more  thing  I  forgot   just  then.  Let  me  show  you  quickly  how  to  create  a  workflow  package.  If  you  go  in  here,  last  thing  in  the  menu  is  to  create  a  package.  The  difference  in  saving  a  workflow  locally  and  creating  a  packages  locally,  it's  for  you.  So  it's  going  to  go  wherever  you  put  things  on  your  drive,  on  your  Mac  or  Windows  or  wherever.  But  when  you  create  the  package,  it  creates  a  little  temporary  place  and  you  can  package  it  together  and  send  it  to  somebody  and  it  should  work  seamlessly  for  them. What  this  is  telling  me  is  that  I  need  to  have  this  data  source  of  consumer  prefs  attached  to  this  for  it  to  work.  There's  that  little  button  I  told  you  about  that  said  presentation  mode.  If  you  were  giving  it  to  somebody  and  they  didn't  really  want  them  to  change  anything,  you  could  check  that  or  you  don't  have  to  check  that. Let's  say,  okay,  and  I'm  going  to  name  this  workflow  package,  just  let  it  default.  It  adds  the  same . jmp  flow  to  the  end.  I'm  going  to  save  it  here.  Then  what  I  want  to  do  is  go  back  and  hopefully  I  can  find  it.  There  it  is.  I'm  going  to  open  it  and  I  want  to  show  you  real  quickly  how  it  saves  that.  It  did  do  presentation  mode,  but  just  so  you  can  see,  here's  the  temp  consumer  prefs. Just  to  prove  that  it  runs,  there's  this  and  it  runs  the  same  way.  Really  nice  feature.  Packaging  is  great  for  sharing  stuff  with  other  coworkers,  sending  reports  and  that  thing. The  generalized  workflow  is  another  really  nice  utility  that  I  think  people  are  going  to  use.  This  particular  example  starts  out  by  not  opening  a  data  table.  It  was  built  with  a  template  data  table  and  and  I  ran  a  distribution.  If  there's  things  you  like  to  do  every  day,  like  let's  say  you  run  a  couple  of  distributions  and  maybe  you  run  your  favorite   Graph Builder  and  you're  always  doing  the  same  analysis  over  and  over  in  a  given  day  on  all  your  data,  this  is  the  way  to  set  this  up. The  way  you  set  this  up  is  you  go  under  the  red  triangle  menu  to  References,  and  you  click  this  little  button  that  says  Manage.  Now  I'm  in  this  Managing  window  and  you  can  see  here  this  is  where  I  use  this  ANOVA  template  JMP  table.  I  typed  in  this  prompt  that  says  select  the  table  that  I  want  to  use  for  analysis.  I  asked  the  mode  to  be  every  time  I  run  this,  I  want  it  to  prompt  me. Then  I  had  three  columns  in  that  table  that  I  used  for  my  three  analysis  variables.  For  every  one  of  them,  I'm  saying  each  run,  select  a  column  for  analysis.  I'm  going  to  be  prompted  for  that.  Then  you  save  this  and  this  reference  is  all  set  up  for  this  particular  workflow.  Let's  run  this  real  quick  and  let's  see.  Here's  my  choose  table  thing  right  here.  Let's  go  other. It's  giving  me  an  option  to  go  out  here  and  I'm  going  to  go  to  my  data  table  and  I've  got  a  couple  of  things  I  wanted  to  try  to  show.  I'm  going  to  grab  hybrid  fuel  economy  here.  Here's  my  prompts,  what  columns  would  I  like  to  use?  The  first  one  I  want  to  do  is  combined  in  BG  and  then  I  want  to  go  and  grab  engine  and  then  I  want  to  grab  the  number  of  cylinders. Here's  my  three  distributions,  and  then  I've  got  my  favorite  box  plot.  That's  just  showing  one.  Just  for  fun,  I  wanted  to  show  you  a  second  one.  Again,  hybrid  fuel  economy  is  open,  and  so  I'm  going  to  select  other,  and  I'm  going  to  go  to  the  sample  data  again,  and  I'm  going  to  look  for  Titanic  passengers  and  open  this. Then  I'm  going  to  go  in  here  and  select  I  want  age,  and  then  I  want  sex,  and  then  I  want  passenger  class.  A gain,  here's  my  favorite  box  plot  and  here  are  my  distributions  and  everything  ran  exactly  like  I  wanted  it  to.  Really  good  thing  to  use  reference  manager  in  order  to  build  a  template  out. Notice  when  I  close  this,  this  didn't  close  the  tables.  The  reason  it  doesn't  close  those  tables  is  because  those  tables,  there's  not  a  statement  in  here  that  says  open  a  table.  We  opened  those  tables  from  the  file  menu  system,  so  I'll  just  simply  go  up  and  close  those  myself. The  final  example  I  want  to  show  you  is  something  that  is   Workflow Builder.  We  had  some  comments  about  people  wanting  to  use  workflows  for  archived  projects.  This  is  a  project  I  did  back  in  2015,  16  when  I  started  with  JMP,  and  I  think  I  showed  this  another  time  and  mentioned  that  I  had  been  working  six  years,  and  I  went  back  the  other  day  and  thought,  Wait,  that's  like  eight  years.  I  can't  count,  but  other  than  that. But  this  is  something  where  I  went  out  to  a  website,  I  read  in  data  on  nesting  in  North  Carolina,  and  it  would  continue  to  change.  You  would  read  it  throughout  the  summer  and  I  did  some  work  down  at  Oak  Island,  which  is  a  beach  where  we  go  often. Anyway,  I  had  saved  all  of  these  scripts  in  a  folder  that  I  worked  on.  I  thought,  I'm  going  to  try   Workflow Builder  with  this  and  see  what  happens.  Basically,  instead  of  opening  a  table  first,  what  I  ended  up  doing  is  I  went  out  and  I  used  the  custom  action  field  where  it  says  add  custom  action  under  here  as  my  very  first  step. My  first  few  steps,  I  went  and  took  my  script  and  I  pasted  the  script  I  had  saved  inside  of  this  window.  This  is  what's  in  here.  At  the  bottom,  I'm  saving  a  table.  The  data  that  I  was  actually  using  in  this,  all  I  had  to  do  was  go  to  the  same  web  page  and  change  it  to  2020,  2021,  2022.  That's  what  I  did.  Then  there's  a  turtle  species  table  where  you  go  and  do  the  same  thing. Basically,  I  read  these  tables  in  and  then  I  went  to  the   Workflow Builder  and  I  said,  Okay,  now  what  I  want  to  do.  Once  I  got  them  in,  I  started  building  my  steps.  I  concatenated  the  table.  I  thought,  well,  I  don't  need  those  original  tables,  so  I  closed  columns  so  they  get  closed.  I  added  some  columns.  I  created  a  change  here.  I  hid  and  excluded  some  totals  I  didn't  need  and  then  I  ran  my  platforms. I'm  going  to  run  this  so  you  can  see.  The  cool  thing  about  the  platform  runs  that  I'm  going  to  show  you  is  that  these  are  scripts  that  I  didn't  have  to  change  at  all.  As  long  as  my  columns  were  named  the  same  thing,  I  just  took  the  scripts  and  copying  and  pasted  them  right  into  here  and  they  worked  beautifully  just  like  they  did  eight  years  ago. Notice  in  this  workflow  here  at  the  bottom,  these  two  steps  right  here  are  italicized  and  grayed  out.  The  reason  they're  in  there  is  because  I  didn't  want  to  show  them  today.  If  I  right -click  on  this,  you'll  notice  that  the  step  enabled  button  is  not  checked.  I'm  going  to  check  it  just  so  you  can  see  that  change.  Now  that  changes  so  that  it's  dark  and  it's  like  everything  else. A  nice  feature  that  you  can  do  this  because  as  I  was  looking  at  this,  I  thought,  "Well,  I  don't  want  to  show  this  today,  but  I  might  want  to  keep  them.  I  don't  want  to  delete  them."  This  turns  it  off  so  it  won't  run,  but  I  don't  have  to  necessarily  get  rid  of  it. Let's  run  this.  Pay  attention  at  the  top.  This  is  a  good  example  to  watch  the  little  running  man  that  pops  up  here.  It  takes  a  minute.  You  see  him  because  they're  going  out  to  the  internet,  it's  actually  getting  data.  This  little  part  goes  a  little  slower,  but  then  it  takes  off  and  goes  very  fast.  But  once  the  data  is  in  here,  then  everything  else  runs  really  fast. Really  nice  that  script  all  worked.  Here's  my  data.  I  have  this  turtle  nesting  data  showing  May,  June,  July,  and  August  for  each  of  these  years, 2020,  20 2 1  and  2022.  Nesting  just  for  a  case  study  and  point  tends  to  trend  up,  and  then  the  next  year  it  might  go  down. Again,  it  did  that.  If  you  go  back  and  look  at  data  from  way  back,  it's  just  up  and  down,  up  and  down.  But  2022  was  a  pretty  good  year  for  nesting  in  North  Carolina.  Then  this  was  a  really  cool  graph  that  I  had  before  that  I  loved.  It's  a  bubble  plot,  but  I  went  out  and  grabbed  a  turtle  SVG  file  and  plugged  the  turtles  in  for  the  bubble  plot. This  is  showing  nests  totals  with  false  crawls, and  just  false  crawls  are  when  a  turtle  comes  out  to  lay  her  eggs  and  then  she's  scared  or  something  and  she  doesn't  lay  or  she  changes  her  mind.  They're  usually  pretty  even  numbers  on  a  certain  beach,  they'll  be  pretty  close  to  each  other.  That's  why  this  bubble  plot  is  cool  the  way  it  works.  It's  showing  the  nesting  totals  with  the  false  crawls. I'm  interested  in  this  turtle  right  here  because  that's  Oak  Island  where  I  said  that  I  go  to  the  beach  and  I  did  some  work  down  there  with  them  back  in  2015,  '16.  Kind  of  a  fun  graph.  Kind of neat that this  is  an  archived  project  and  it  actually  works  now.  It's  a  good  way  to  record  and  save  stuff. That's  all  I  have  today.  I  just  want  to  say  that  I  want  to  thank  the  development  staff  for  working  so  hard  on  designing  this.  Hernes  Pessour,  David  White,  and  Evan  McCorkle,  just  to  name  a  few.  Julian  Paris  was  instrumental  during  the  design  phase  and  advisement  of  this.  I  do  have  a  reference  down  here  for  the  sea  turtles  if  you're  actually  interested. In   closing,  I  just  want  to  say  I  think   Workflow Builder  is  the  best  new  feature  for  JMP  17,  but  I'm  a  little  biased,  but  I  do  believe  it's  going  to  save  you  time  with  less  coding  and  more  clicking.  I  think  you're  going  to  get  more  and  more  out  of  reusing  recorded  and  repetitive  steps.  It  should  simplify  your  work  efforts,  and  then  it  will  definitely  accelerate  your  daily  processes,  leaving  you  much  more  time  in  your  day. Thank  you  for  listening  today.  Thank  you  for  letting  me  share  with  you  about  the   Workflow Builder.  Please  try  it  out.  Please  let  us  know  what  you  think  and  we'll  look  forward  to  hearing  your  feedback.
In past Discovery talks, we've shown how to acquire data, create a report, and publish it to JMP® Live using the desktop task scheduler. But what if your JMP Live report does not change daily? What if only the data changes, and you want to share those updates with your colleagues? JMP Live can now schedule the refresh of your data on JMP Live without having to publish your work again. This presentation will show how to use this new capability and discuss this new feature in the context of JMP and JMP Live plans.   2023-EU-30MP-1194 - Automatic Data Refresh in JMP Live 17.mp4 My  name  is   Brian Corcoran,  and  welcome  to  Automatic  Refresh  of  Data  in  JMP  Live  17.  I'm  a  JMP  Development  Manager,  and  my  group  is  responsible  for  JMP  Live. What  is  JMP  Live?  For  those  of  you  who  may  not  know , it  may  be  worth  giving  you  a  little  introduction.  JMP  Live  is  a  web- based  collaboration  site,  so  users  can  publish  reports  from  JMP,  their  desktop  version  of  JMP,  and  the  data  to  JMP  Live. Users  at  JMP  Live  can  interact  with  those  reports,  and  if  they  have  JMP,  they  can  also  download  them  to  work  on  those  reports  with  their  desktop  copy  of  JMP.  A  copy  of  JMP  is  not  necessary,  though,  to  use  the  JMP  Live  site. Now,  in  JMP  Live  15  and  16,  we  required  users  to  publish  new  content  from  the  desktop  application  in  order  to  update  things.  A  common  request  that  we  got  is  that  it'd  be  nice  if  the  server  could  do  this  for  me  when  I'm  not  working  or  whatever,  and  I  could  just  look  at  an  updated  copy  at  my  leisure. That  forced  us  to  revisit  how  we  treat  data  in  JMP  Live  17.   JMP  Live  17  really  represents  a  major  rewrite  of  the  product.  We  made  data  an  equal  to  a  report.  Before  that,  it  was  along  for  the  ride  hidden,  and  you  really  wouldn't  see  it  being  transmitted  up  to  JMP  Live. Now  you  can  publish  data  independently.  You  can  look  at  it  on  the  JMP  Live  site.  You  can  update  just  the  data  in  any  reports  that  use  it,  and  they  can  share  that  data  will  all  automatically  be  recreated  with  that  new  data.  This  work  was  all  done  to  provide  the  foundation  for  refreshable  data. Here,  the  contents  of  the  data  post  on  JMP  Live  are  refreshed  on  the  server  side,  and  there's  no  intervention  by  a  user  in  the  JMP  desktop  client.  Usually,  data  of  this  nature  is  in  a  database  or  some  REST- based  web  endpoint  or  something  like  that.  It  has  to  be  data  that  is  accessible  from  the  server  where  you  have  JMP  Live  installed. JMP  Live  provides  us,  we  hope,  easy  to  use  scheduler,  so  you  can  put  in  a  repeatable,  hands- free  refresher  of  your  data.   This  fulfills  the  dream  of,  I  go  home  at  night  and  the  data  is  refreshed,  the  reports  are  regenerated.  When  I  come  in  the  morning  and  I'm  drinking  my  tea  or  coffee  or  whatever,  I  can  look  at  the  updated  report  and  make  decisions  based  on  that  new  data. I'm  going  to  provide  a  variety  of  scenarios  on  how  you  can  learn  to  do  data  you  fresh  with  JMP  Live.   Let  me  first  shut  down  PowerPoint,  and  I'll  bring  up  a  copy  of  JMP  Pro  17,  but  this  will  work  on  regular  JMP  equally  well. All  right,  so  first , I'm  going  to  start  with  a  really  simple  explanation  of  how  we  separate  reports  and  data  in  JMP  Live  17.  I  think  it's  important  to  understand  that  before  we  proceed  to  the  more  complicated  examples. I'm  just  going  to  bring  up  a  sample  data  set  that  we  ship  with  JMP,  the  financial  data.  It's  just  data  for  500  companies  across  a  variety  of  industries,  and  you  can  see  them  organized  by  type.   It's  basically  a  sales  profitability  data  number  of  employees,  things  like  that. Let's  suppose  that  I  create  a  simple  bivariate  plot  of  sales  by  number  of  employees .   There's  not  a  lot  to  it,  but  I  can  hover  over  the  points  and  see  the  data.  Let's  suppose  I  want  to  publish  that  to  JMP  Live. I'm  wanting  to  go  here,  and  I'm  going  to  say  publish  report  to  JMP  Live.  I've  set  up  a  connection  to  a  server  that's  internal  to  SaaS and  that  has  access  to  SaaS  resources  here.  I'm  going  to  publish  this  as  a  new  report. I've  set  up  a  folder  called  Discovery  to  publish  this  report  into.   We'll  just  go  ahead  and  do  that.  The  first  time  it  makes  the  connection,  it  can  take  a  little  bit  longer,  but  there  we  go.  Let's  go  ahead  and  bring  up  this  JMP  Live  release.  I'm  going  to  go  to  my  space  that  I  just  published  to,  Brian  Corcoran,  in  my  Discovery  folder. There  is  my  report.  If  I  open  it  up,  you  can  see  I  can  hover  over  it  just  like  I  did  in  JMP.  But  I'm  looking  at  this,  and  it's  a  boring  report.  There's  not  a  lot  to  it.  Maybe  I  should  have  included  some  more  information. Well,  let's  go  back  to  our  report.  There  we  go.  Let's  suppose  I  want  to  add  a  fit  mean  and  a  fit  line  to  that.  I  also  want  to  supply  a  local  data  filter.  I'm  going  to  make  a  filter  by  industry  type  so  that  I  can  cycle  through  each  industry  and  look  at  the  individual  companies  involved.  If  I  get  the  drug  pharmaceutical  companies,  I  can  hover  over  the  one  with  the  most  sales,  and  we  see  it's  9.8  billion. Now,  let's  suppose  that  I  want  to  update  this  report.  But  in  the  meantime,  maybe  I've  got  information  that  says,  "Hey,  this  is  not  a   $9 billion  company.  It's  a  $19  billion  company,  but  I'm  waiting  for  verification  on  that.  I  don't  want  to  publish  the  data  with  this,  but  I  really  would  like  to  update  the  contents  of  my  graphic.  Well,  we  can  still  do  that. I'm  going  to  go  ahead  and  publish  the  report.  But  this  time,  we're  going  to  do  a  replace  operation.   We'll  select  replace  an  existing  report  down  here.   It's  going  to  ask  us,  what  report  do  you  want  to  replace?   It's  going  to  give  us  the  candidates  like  the  most  recently  accessed, and so   there's  our  financial  report .  I'm  going  to  say  next  there. Here  it  says,  "What  do  you  want  to  do  about  the  data? "  I'm  going  to  select,  Use  the  data  that's  already  up  on  JMP  Live.   I'll  say  replace  the  report.   It  goes  ahead  and  does  that. If  I  go  up  to  my  site,  I  can  see  that  it  did  indeed  add  my  fit  lines  and  means  and  my  data  filter.  I  can  manipulate  these  like  we  do  in  JMP,  but  you'll  see  that  my  outlier  company  still  is  a  $9  billion  company.  All  right,  so  we  did  not  update  the  data. Now,  let's  suppose  I've  shut  this  report  down,  but  I  do  get  information  that  the  sales  are  indeed  $19  billion  for  drug  company  number  one.  I  can  choose  to  publish  just  the  data.   I'll  say,  Update  existing  data.   Once  again,  I'll  select  my  financial  post,  and  I want  to  replace  that. Now  you  see  that  it's  automatically  reloading  the  data .  It's  going  to  recalculate  all  of  our  statistics  down  here  as  well  with  new  fit  lines  and  that.  Now  we  can  see  that  our  outlier  is  represented  here  as  a  $ 19  billion  company.  If  I  wanted  to,  I  could  even  bring  up  the  data  with  the  data  viewer  that  I  mentioned  earlier , allowing  us  to  explore  that,  and  there  is  our  update  to  our  drug  data  for  company  number  one. All  right,  so  that  is  the  separation  of  reports  and  data,  and  that  provides  the  foundation  for  our  data  refresh.  Let's  go  ahead  and  get  into  a  real  refresh  example.  Let's minimize  this  for  a  minute,  and  do  a  little  cleanup  because  otherwise,  we  will  get  confused  where  we're  at. Now,  my  next  example  will  be  a  simple  data  refresh,  and  it  allows  me  to  introduce  another  feature  that's  new  to  JMP  17,  and  that  is  access  to  the   OSI Pi Historian database.  If  you're  not  familiar  with  historian  databases,  they're  often  used  to  collect  lots  of  process  data,  maybe  from  manufacturing  lines,  lots  of  different  machines  and  devices,  putting  real -time  data  into  a  database.   Then  you  can  look  at  this  historian  database  at  your  leisure  to  analyze  trends  and  see  where  there  are  problems. We  have  a  historian  database  with  sample  data  here  at  SaaS,  and  I'm  going  to  select  that  and  get  some  information  out  of  that.  Here  is  our  connection  to  the  PI  server.   I'll  open  the  sample  data  area. What  we  have  here  is  a  simulated  data  center  where  we  have  lots  and  lots  of  racks  of  computer  servers.   Essentially,  we're  looking  at  power  consumption  on  all  of  those  to  see  where  we're  spending  a  lot  of  money  and  things  like  that.   All  of  these  represent  a  table,  and  we  can  import  all  of  these  at  once  if  we  wanted  to. I'm  just  going  to  import  the  data  for  power  meter  number  one  here.   This  is  old  data.  I'm  going  to  go  back  in  time  on  it.  I'm  going  to  ask  for  5,000  points.  I'll  take  just  a  second  or  two  to  import,  but  we'll  start  that  up.  There's  our  data. I  would  call  your  attention  right  here  to  this  source  script.  This  is  important.  If  we  edit  this,  we'll  see  that  it  contains  information  to  recreate  this  data  fetch,  including  the  location  of  our  P I server,  the  actual  table  here  that  we  want  to  import,  how  many  points  we  want.   This  will  be  useful. Let's  go  ahead,  though,  and  just  create  a  simple  run  chart  and  control  chart.  I  can  select  that,  and  we're  just  going  to  do  the  values.  All  right,  so  there  it  is.  Let's  go  ahead  and  publish  that  to  our  JMP  Live  server. All  right,  I'm  going  to  publish  that  as  a  new  report  back  into  our  Discovery  folder.  I'm  just  going  to  publish  it  as  is.  Close  that,  and  we'll  bring  up  our  browser.  We  are  going  to  go  to  a  different  session.  Hold  on.  We  are  going  to  look  at  my  Brian  Corcoran  space  again. In  the  Discovery  folder,  we'll  find  our  Atlanta  data  center.  Now,  we  can  open  in  that  up  and  see  the  points  and  all  that.  We  know  how  that  works. I'm  going  to  call  your  attention  to  these  Files  tab,  though.   Here  we  have  our  financial  report.  But  here's  our  Atlanta  data  center  report.  This  is  the  report,  but  this  is  the  data  table,  so  let's  click  on  this. There's  the  report  that's  based  on  the  data  if  there  are  multiple  reports  that  I'll  show  here.   Here's  the  settings,  and  here's  where  it  gets  interesting.  There  is  our  source  script  that  we  had  down  in  our  table  in  JMP. Here's  something  called  a  refresh  script.  Let's  just  concentrate  on  these  two  panes  now.  The  source  script  has  been  uploaded  with  the  data,  and  it  provides  us  a  basis  for  how  we  could  recreate  this  data.  A  refresh  script  is  a  piece  of  JSL  that  is  going  to  supply  data  when  a  report  records  it. There's  one  big  rule  for  a  refresh  script,  and  that  is  that  the  last  operation  that  it  performs  must  be  to  return  a  data  table.   Essentially,  data  refreshes  are  done  through  JSL  scripts.  Let's  enable  this  refreshable  button.  Let's  copy  this  script  as  the  basis  for  our  refresh  script.  I'm  just  going  to  paste  this  in  here. If  you  remember  earlier  when  we  were  looking  at  the  P I dialog,  I  said  you  could  import  all  kinds  of  tables  at  once .  Because  of  that,  OSI  PI  import  JSL  returns  a  list  of  data  tables,  not  a  single  data  table,  but  a  list.   Our  rule  for  a  refresh  script  is  it  must  return  a  single  data  table.   We  need  to  get  to  that  point. I'm  going  to  assign  this  output  of  this  refresh  script  to  a  list  variable  that  I  just  arbitrarily  named  DT  list.  I'm  going  to  put  a  semi colon  on  this  run  statement.  Now  I  am  going  to  assign  the  first  element  in  that  list,  which  is  the  only  table  that  we  have  and  the  only  table  we  care  about  to  a  variable  named  DT.  Since  that's  the  last  operation  in  the  script,  that's  what's  going  to  be  returned  here. While  we're  at  it,  though,  why  don't  we  go  ahead  and  we're  going  to  change  it  to  return  10,000  points.  I'll  save  that  script  out.  Now,  let's  go  ahead  and  try  this  out.  Here's  a  button  we  can  manually  refresh  our  data  server  side.  I'll  say  yes,  we  know  what  we're  doing.  We  want  to  do  this. All  right,  I said  it  was  done  three  minutes  ago,  and  then  it  changes  to  a  few  seconds  ago.  Let's  look  at  our  report.   Here's  our  report.  Let's  look  at  it.  It  looks  a  little  different  than  the  one  we  had  back  in  JMP  because  now  we  have  10,000  points  of  data.  We've  done  all  of  this  on  the  JMP  Live  server,  not  on  the  desktop  client. Now  we  could  recreate  that  on  the  server , if  we  want,  without  ever  having  to  involve  our  client.  Let's  go  ahead . That's  our  first  example  of  data  refresh.  I'm  going  to  clean  this  up,  so  we  don't  get  confused  with  our  other  work. A  common  operation  where  you'd  want  to  do  a  data  refresh  is  a  fetch  of  data  from  a  database.  One  of  the  big  ones  is  Postgres,  and  I'm  going  to  show  an  example  of  fetching  from  Postgres.  Before  I  do  that,  I'm  going  to  go  ahead I'm  going  to  change  the  JMP  Live  server  that  I'm  accessing.  This  one  is  actually  outside  of  SaaS  resources.  I  can  go  to  manage  connections  here. You  can  actually  look  at  this  yourself.  It's  devlive 17.j mp.c om.  If  you  go  out  there,  you'll  be  able  to  see  these  reports.   For  this  one,  I'm  going  to  bring  up   Query Builder.  On  Amazon  Web  Services,  I  have  created  a  Postgres  database  full  of  sample  data.   I'm  going  to  open  that  up. I  have  some  stock  quotes  for  Apple  computers  that  I  just  put  up  there  for  demonstration  purposes.  Let's  go  ahead  and  open  that  up.  I'm  going  to  just  build  a  query  quickly.  I  only  have  date  and  quote  data. This  is  essentially  the  closing  quote  of  the  stock  price  for  the  end  of  the  day,  and  it  starts  on  the  January  1st  of  last  year.   Let's  go  ahead  and  just  run  that  query.  You  see  what  I'm  doing  here.  Let's  take  this  a  little  bit  further.  I'm  going  to  shut  this  down. Now,  another  new  feature  of  JMP  17  is  something  called  the   Workflow Builder.  I'd  like  to  integrate  the   Workflow Builder  into  this  demonstration  tool.   Let's  do  that.  I'm  going  to  go  over  here , and  I'm  going  to  say  New  workflow. It's  going  to  capture  my  actions  as  I  do  them.  I'm  going  to  start  recording  that  workflow.   Let's  go  ahead  and  we're  going  to  run  our  query  again,  this  time  with  the  workflow  recording  it. It'll  capture  that  query.   You  see  here,  if  we  look  at  our  source  script  again,  once  again,  this  has  information  on  how  to  connect  to  that  Postgres  database,  the  table  that  we  want  to  look  at  here,  and  what  the  query  actually  is. Now,  let's  suppose  after  the  fact  that...  I don't  want  that.  Let's  see.  Let's  suppose  after  the  fact  that  we  decide  we  want  to  do  some  manipulation  on  this  table.   Maybe  I  want  to  do  a  subset  of  the  table  based  on  a  where  clause. Now,  I  probably  could  have  done  this  in   Query Builder,  but  I  thought  of  this  after  the  fact.   Let's  do  row  selection.  I'm  going  to  select  where,  and  I'm  going  to  select  where  the  date  is  greater  than  or  equal  to  June  1st  of  last  year.   I'm  going  to  make  sure  that  it's  selected.  There  it  is. Then  I'm  going  to  do  a  subset  of  that  table  just  using  the  selected  rows.  Then  maybe  after  that  fact,  and  you'll  notice  that  our   Workflow Builder  seems  to  be  accumulating  this  information,  I'm  going  to  go , and  I'm  going  to  color  by  column .  I'm  going  to  color  by  this start quote. This  is  the  data  to  the  point  that  I  feel  like  I  want  to  do  my  analysis.  Let's  stop  the   Workflow Builder  from  recording  that.  Let's  go  ahead  and  create  a  graphic  on  that.  I'm  just  going  to  do  the  quotes  by  the  dates.  We  see  when  the  stock  was  high  back  here  in  the  summer  of  2022,  and  it's  gone  down  considerably. Let's  go  ahead  and  publish  that.  Again,  this  is  going  to  a  new  server  this  time.  The  first  connection  can  take  a  little  bit  of  time  sometimes. All  right,  let's  publish  new.  This  time,  I've  set  up  a  space  called  Discovery  Year  of  2023  with  a  folder  named  Automatic  Data  refresh.   You  can  look  at  this  at  your  leisure  and  see  this  yourself.  I'm  going  to  name  this  Apple  Stock  Quotes  Since  June.   I'm  going  to  go  ahead  and  publish  that. Let's  go  ahead  and  look  at  this  server.  All  right,  and  this  one's  in  dark  mode.  Make  you  realize  you're  on  something  different.   Here's  our  report.  Let's  go  ahead,  though,  and  look  at  that  space,  and  I  can  search  on  Discovery  to  see  which  one  I  want  to  look  at.  Here's  Europe. There's  my  folder,  Automatic  Data  Refresh  in  JMP  Live  17  in  our  report. L et's  go  ahead  and  look  at  the  files,  though.  There's  our  Apple  quotes,  and  there's  no  source  script.  What's  up  with  that?  Well,  let's  go  back  and  look  at  our  data  here. We  look  at  the  source  script  here . We  see  that  subset  operation  just  picked  out  individual  rows  that  were  selected.  I t  couldn't  go  far  enough  back  to  understand  that  this  came  from  a  previous  database  fetch.  It  just  knows  it  has  this  table  that  had  been  unsaved  at  that  time,  and  it  was  picking  rows  out  of  it.  That's  not  going  to  be  helpful  for  us. We  have our  workflow  and  I've  stopped  that.  Let's  go  ahead  and  say,  what  happens  if  I  say,  save  script  to  script  window  of  our  workflow?   There  it  has  captured  our  query  along  with  our  subset  operation  and  our  coloring  operation.  Let's  go  ahead  and  we're  going  to  copy  this  and  use  this  as  a  basis  for  our  refresh  script. All  right,  so  let's  make  this  refreshable.   We'll  go  ahead  and  edit.  All  right.   This  does  require  a  little  bit  of  change.  First  of  all,  I'm  going  to  return  my  information  to  data  tables.  This  is  my  query  data  table,  the  original  full  query. I  don't  really  need  to,  but  I'm  going  to  assign  that  to  a  variable  here  because  it  provides  clarity  for  me.  Here's  our  where  clause  selection .  This  is  what  I  really  want  to  capture  this  subset  operation. I'm  going  to  say  this  subset,  and  I'm  going  to  put  brackets  around  here.   The  reason  is  this. The  way   Workflow Builder  builds  this  is  it  cascades  or  chains  together  operations,  and  we  have  a  select  where  here.   It's  going  to  take  the  most  recent  operation  and  assign  it  to  my  variable. I  don't  want  the  selection  to  go  into  this  subset,  I  want  the  subset  operation.  I  put  a  bracket  around  this  part  to  make  it  just  one  object  that's  referred  to  by  subset.  It'll  be  the  subset  operation  operation  that  goes  into  this  variable. I'm  going  to  go  ahead  and  put  that  subset  table  in  here  for  coloring,  and  then  I'm  going  to  put  in  an  empty  reference  at  the  end  here  to  our  subset  table.  A ll  that  does  is  ensure  that  the  last  operation  is  to  return  that  subset  table. Let's  go  ahead  and  save  this.  Let's  see  if  it  works.  Always  good  to  test.  It looks  like  it  did.  If  it  didn't,  we  could  go  to  the  history  here,  and  we  can  see  we  did  an  on-demand  data  refresh.  If  it  had  failed,  it  would  indicate  that  here, a nd  the  details  pane,  which  just  shows  that  we  got  a  table  back  out  of  this,  would  instead  show  an  error  log  from  JMP  itself. There's  a  hidden  JMP  session  behind  here  doing  the  query  for  us,  and  it  would  provide  the  log  of  that  JMP  session  here,  so  we  could  get  an  idea  of  what's  going  on.  It  looks  like  we  have  a  valid  refresh  script  at  this  point.  However,  I  just  essentially  manually  refresh  data  that  I  already  had.  That's  not  particularly  interesting. Let's  also  look  at  this  table.  Notice  that  for  the  query  information,  it  put  in  a  holder  for  the  password  because  we  don't  want  to  transmit  those  credentials.  Most  of  the  time,  it  also  put  in  one  for  a  user  ID.  In  this  particular  case,  it  did  not  because   Workflow Builder  doesn't  do  that.  But  if  you  did  this  directly  from   Query Builder,  it  would. How  do  we  provide  the  password,  though,  for  this  script  that's  going  to  need  credentials  to  really  fetch  new  data  later  on?  We  validated  that  this  script  is  going  to  be  okay,  but  we  need  the  password  for  new  data.  What  we  do  here  is  we're  going  to  use  this  as  sign  credentials,  and  we  have  this  credential  that  I  have  created,  but  I'll  show  you  what  you  would  do  if  you  had  none.  You  create  just  essentially  a  username  and  password  pair  and  a  stored  credential  name. In  this  particular  case,  I  already  had  one , and  it  did  get  used,  but  I'm  going  to  select  this  radio  button  anyway  to  make  sure  that  we  understand  these  two  are  associated.  If  we  had  more  than  one,  we  would  need  to  have  one  selected  for  it  to  work. W hat's  going  to  happen  is  my  username  and  password,  when  it  finds  PWD  or  UID,  if  there's  this  placeholder  with  percent in,  it's  going  to  substitute  those  credentials  in  at  the  time  of  the  query, so  o nly  when  needed  and  make  the  query  itself.  These  credentials  are  stored  using  secure  string  technology  in  the  database,  which  also  can  be  encrypted.  They're  very  secure,  and  they're  only  used  in  memory  at  the  time  of  the  query.  We're  pretty  sure  that  we're  not  going  to  have  our  credentials  breached. Now,  what  do  we  do  as  far  as  creating  a  repeatable  operation  where  we  don't  have  to  be  around  to  do  this?  Well,  we  use  the  refresh  schedule  and  hit  Create.  This  is  pretty  flexible  but  hopefully  easy  to  use  panel. Right  now,  it's  saying  it's  going  to  create  a  schedule  that's  going  to  run  on  every  day.  I  don't  want  it  to  run  on  Sundays  and  Saturdays.  I'm  going  to  exclude  those  because  the  stock  market  is  not  open.  You  can  have  a  repeat  on  a  timely  basis,  like  up  to  5 and  every  5  minutes.  I  only  want  it  to  repeat  once  a  day,  so  I'm  going  to  turn  that  off.  When  do  we  want  it  to  start? If  all  your  servers  are  operating  in  your  same  time  zone,  then  you  don't  have  to  worry  about  this.  You  would  just  put  in  whatever  time  you're  at,  well,  put  in  the  time  you  want  it  to  run.  I  have  to  have  a  little  more  complex  calculation i n  my  case.  I'm  running  on  Amazon  Web  Services,  and  we  run  all  of  our  servers  on  the  UTC  universal  time,  which  at  this  point  is  5  hours  different. Because  I  want  to  show  this  operating  quickly  for  a  demo,  I'm  going  to  essentially  take  the  time  of  my  demo  and  add   5 hours  to  it  and  we're  going  to  run  that.   We're  going  to  put  this  in  as  running  at  7: 31  PM.  When  I  say  okay  and  save  this,  it's  going  to  calculate  when  it's  going  to  run,  and  it  says  it's  going  to  run  in  about  a  minute.  That's  what  we  are  hoping  for. Off  screen  here,  I  have  an  update  to  a  database.  Let's  just  pretend  that  this  was  done  from  some  operation  that's  automatic,  but  I'm  going  to  actually  provide  an  additional  stock  quote  that  shows  the  stock  jumping  up  in  price.   Maybe  I'd  come  in  in  the  morning,  take  a  look  at  my  new  graphics,  see  the  stock  went  up,  and  then  maybe  it's  time  to  sell.   We're  just  waiting  on  this  at  this  point  and  hopefully,  you'll  see  that  it  gets  queued  up  and  our  report  will  get  regenerated  quickly. While  we're  waiting  on  this,  I  will  mention,  too,  that  your  refresh  schedule  can  be  scheduled  to  terminate  at  a  certain  time.   If  you  want  it  to  end  at  the  end  of  the  year  or  something  like  that,  you  can  put  that  in  as  well.   We  saw  it  refreshed  a  few  seconds  ago.  Let's  go  take  a  look  at  our  report.   There's  our  report.  We  see  it's  declining  price,  and  then  if  we  hover,  we  see  that  we  have  a  new  point  here,  and  it  has  indeed  jumped  up  in  price. That  same  script,  if  we  go  back  here,  will  run   5 days  a  week,  every  day  at  the  same  time  without  our  intervention . We've  provided  our  credentials.  Everything  is  automatic  at  this  point , and  we  realized  our  ambition  to  essentially  just  be  able  to  come  in  and  get  a  new  report  every  morning  without  having  to  worry  about  anything  else.   You  set  it  up  once,  run  many  times,  and  we're  good. That  is  probably  the  most  important  example  I'm  going  to  show.   I'm  going  to  show  one  more  trick,  and  it's  more  of  a  trick  that  may  help  you  in  certain  situations.  Let's  clean  this  up. What  I'm  going  to  do  next  is  I  am  going  to  show  using  a  REST  endpoint.  You're  not  familiar  with  that.  Essentially,  a  lot  of  organizations  make  their  data  available  through  what  looks  like  a  web  URL.   I  have  a  script  that  I  developed,  and  I'll  describe  that  in  a  second. Often  you  would  need  a  credential  or  some thing  called  an  API  key  or  something  like  that  to  essentially  have  permission  to  use  this  site.  Many  of  them  cost  quite  a  bit  of  money.  This  one  does  not. You  access  this  URL,  and  it  returns  a  whole  block  of  data.   JMP  has  a  variety  of  methods  that  it  provides  to  help  you  parse  this  data  apart  and  put  it  into  a  data  table.   That's  what  we're  going  to  do  here. This  particular  site  is  one  called  Eurostats.  It  has  a  free  API  I  was  able  to  use. I'm  not  going  to  go  into  it,  but  it  essentially  has  this  query  language  that  you  can  append  to  your  URL  to  tell  exactly  which  table  you  want  and  what  data  points . I  have  it  starting  in  2021  and  not  ending, s o  we'll  continue  to  get  new  data  as  it  becomes  available. It  returns  in  this  one  big  block  of  data,  and  JMP  knows  how  to  interpret  that  and  turn  it  into  a  data  table  here.   The  data  for  the  date  information  comes  in  as  character,  and  we  don't  want  that.  We  want  numeric,  and  we  want  it  to  be  continuous, so  there's  a  little  loop  here  that  runs  to  change  the  column  types  after  the  import. If  I  run  this , and  what  we're  fetching,  by  the  way,  are  natural  gas  prices  in  the  EU.   This  data  is  pretty  old.  I  don't  know  how  often  they  refresh  their  data,  but  right  now , they're  only  providing  data  for  the  first  half  of  2022.   Hopefully,  they'll  update  at  least  with  the  second  half  early  soon. What  do  I  do?  I  can  look  at  this  column  data,  and  the  columns  are  named  specifically  for  periods  of  time.   I  really  just  want  to  have  a  periodic  refresh  of  the  data  where  I  just  grab  the  latest  one  and  use  it  in  a  report. Let's  look  at  this.  If  I  were  to  do  a  graph  builder,  this  is  really  handy,  though,  you  can  drop  this  in.  The  geographic  and  time  period  column  can  help  you  map  out  the  map  of  Europe.   If  I  drop  in  the  latest  one,  unfortunately,  there's  some  missing  pieces,  but  it  will  give  us  a  general  idea  of  what  gas  prices  averaged  in  this  first  half  of  2022. Again,  if  I  were  to  save  this  out,  you'll  see  that  it's  going  to  use  this  date  specific  column.  That's  not  what  we  desire.  A  little  trick  you  can  use  here  is  to  just  rename  the  latest  column.   Here  I  have  another  little  scriptlet,  and  I'll  open  that  up. Essentially,  all  I'm  going  to  do  here  is  I'm  going  to  say,  let's  take  the  last  column  and  we're  going  to  just  rename  it  most  recent.   Then  if  I  want,  I  can  create  a  graph  builder  script  that  uses  most  recent.   If  I  do  this,  there's  our  column.  Now  I  can  bring  up  Graph  Builder  and  plop  in  our  geography  column,  our  most  recent  data.  I  can  maybe  enlarge  that  a  little,  and  we  can  go  ahead  and  publish  that. Let's  go  ahead  and  we're  going  to  just  put  that  in  our  folder  under  the  name  Gas  Prices  in  the  EU.  I'll  publish  that.   We'll  refresh  our  web  page  here.   There's  our  gas  prices  graphic,  just  like  we  hoped,  and  it'll  have,  if  we  hover  over  it,  the  mean  gas  price  and  things  like  that. If  I  look  at  this  data,  though,  and  we  go  into  our  Refresh  settings,  it  didn't  understand  all  of  our  script  about  getting  our  REST  endpoint  and  things  like that,  but  that  doesn't  matter  because  we  already  have  our  script.  Over  here,  I  can  just  copy  this  as  is  and  put  it  in  here.   We  will  return  that  table  as  our  last  operation. Maybe  I  just  want  to  run  this  one  day  a  week  or  something  like  that  just  so  that  whenever  we  finally  get  one,  we  periodically  review  this,  and  we'll  see  the  latest  thing.  This  is  a  case  where  you're  just  occasionally  viewing  the  data  to  see  if  there's  any  updates. Of  course,  we'd  want  to  make  sure  manually  that  this  works  before  we  move  on  to  other  things,  but  we  can  see  that  indeed  it  did  the  report,  even  though  it  has  been  updated  a  few  seconds  ago . Even  though  the  data  is  the  same,  we  know  it's  working  so  that  when  new  data  does  come  out,  we  will  grab  it  and  populate  this  graphic  with  it. A gain,  this  is  on  devlive 17.j mp. com.  You  will  need  a  SaaS  profile  ID  to  log  into  the  server  if  you  want  to  look  at  this,  but  I  will  leave  it  out  there  for  you  to  take  a  look.   That  it  concludes  our  data  refresh  examples.  I  hope  this  gives  you  an  idea  of  some  of  the  powerful  new  capabilities  that  JMP  Live  provides.  I  appreciate  you  attending  this  talk.  Thank  you. Transcribed with Happy Scribe ShareThis({ sharers: [ ShareThisViaTwitter ], selector: "article" }).init();
All gauges have errors. They might be minuscule, or they might be large, but they always exist. Large or small, the errors lead to gauges having some likelihood of making Type 1 and Type 2 errors (passing a bad part or failing a good part). The mistake likelihood is higher for parts that lie near the specification limits. These errors cost real money! But how do we quantify those costs? This paper builds on the results shown in a 2022 JMP Americas Discovery paper (2022-US-30MP-1123) that discussed how to quantify the gauge performance and how to set “informative manufacturing specs” (or guardbands) to improve the gauge’s performance in segregating good vs. bad parts. In this paper, we extend the learning and script functionality as we discuss how to combine gauge characteristics with the costs of individually passing a failed part, rejecting a good part, and projecting production volumes. This gives insight into risk analyses, e.g., how much I should budget to account for gauge errors, whether (or how much) to spend on improving our gauge, etc.     Hi,  I'm  Jerry  Fish.  I'm  a  support  engineer  with  JMP,  helping  customers  in  the  central  part  of  the  United  States.  Today's  talk  is  entitled  My  Gauge  Isn't  as  Good  as  It  Could  Be—  Will  Its  Errors  Cost  us  Money  and  how  much? I'm  Jason  Wiggins,  also  a  senior  systems  engineer,  and  I  support  semiconductor  users  in  the  Western  United  States.  This  talk  is  a  follow  on  to  one  we  did  for  discovery  Americas  in  2022.  In  our  first  talk,  we  introduced  the  notion  that  measurement  systems  are  integral  to  our  businesses.   In  fact,  we  have  many  measurement  systems  we  interact  with  in  our  daily  lives.  Along  with  that  idea,  we  introduced  the  notion  that  measurement  system  or  gauge  variation  can  impact  decisions  in  real  world  inspection  situations.  We  introduced  gauge  performance  curves  as  a  way  of  visualizing  gauge  variation  and relative  to  specification  limits.   In  this  talk,  we'll  extend  that  and  explore  the  costs  associated  with  gauge  variation  through  a  fun  role  play  conversation  between  a  quality  manager  of  an  automobile  manufacturing  plant,  that'll  be  Jerry,  and  I'll  be  acting  as  a  quality  consultant.   To  kick  things  off,  I'll  get  on  a  quick  team  call  with  Jerry.  Hi,  Jerry. Hi,  Jason.  How  are  you  doing? I'm  doing  pretty  good.  Thanks  for  spending  a  few   minutes  with  me.  As  a  quality  consultant,  I  help  quality  stakeholders  like  yourself  understand  and  improve  processes.  Now,  I  prefer  using  JMP  as  it's  a  general  purpose,  easy  to  use  data  analytics  package  that  has  many  quality  and  process  control  features.   JMP  makes  quick  work  of  the  analytics  part  of  process  improvement,  so  more  time  can  be  dedicated  to  actually  improving  the  process. Well,  it  is  nice  to  meet  you,  Jason.  Just  to  let  you  know,  though,  we  already  have  software  in  place  for  our  internal  quality  programs,  so  I'm  not  really  sure  what  your  software  can  do  that  we  cannot  already  do.   Can  we  make  this  quick? I  understand  completely,  Jerry.  I'll  try  to  make  the  most  of  your  time  today.  First,  can  you  tell  me  a  little  bit  about  your  company  and  your  quality  program? Sure,  happy  to.  Acme  Motors  has  built  a  reputation  with  our  customers  of  manufacturing  the  highest  quality  cars.  We're  always  concerned  with  quality.  We  have  various  gauges  that  we  use  to  ensure  our  quality  stays  high.  We've  been  doing  this  for  years,  and  frankly,  we  think  we're  pretty  good  at  it. I'm  familiar w ith  Acme  Motors  and  your  high  quality  reputation.  My  consulting  team  and  I  have  recently  been  working  with  manufacturing  companies  like  yours  to  advance  the  use  and  effectiveness  of  gauge  studies.  Measurement  systems  analysis,  another  way  of  saying  that.  One  of  the  things  we  seek  to  understand  are  the  monetary  costs  associated  with  the  gauges  used  to  measure  process  quality  characteristics  in  your  manufacturing  plant.  Have  you  quantified  how  much  any  of  your  gauges  are  costing  your  business? I'm  not  sure  what  you  mean. Well,  gauges  are  not  perfect.  They  make  mistakes.  Sometimes  they'll  throw  away  good  parts  and  sometimes  they'll  pass  bad  parts.  Unless  you  have  a  perfect  gauge  and  really  no  one  has  these,  these  mistakes  are  inevitable. I  suppose  so,  but  we've  done  gauge  studies  that  say  our  gauges  are  good.  Well,  some  of  them  are  actually  categorized  as  adequate  by  the  AI AG  guidelines.  Doesn't  that  mean  we're  okay  to  use  them? Well,  possibly,  but  there  is  a  lot  more  to  the  story  than  just  using  good,  adequate,  and  poor  AI AG  gauge  assessment  criteria.  For  example,  have  you  seen  a  gauge  performance  curve? I  can't  say  that  I  have.  No. Now,  this  is  what  one  looks  like.  The  X  axis  shows  the  true  part  values  and  the  lower  and  upper  specification  limits  are  shown  with  these  lines.  The  Y  axis  shows  the  probability  of  passing  a  part.  If  you  have  a  part  that  is  truly  good  but  very  close  to  the  lower  spec  limit,  there's  almost  a  50 %  chance  the  gauge  will  recommend  that  you  throw  it  away.  That's  one  way  of  thinking  about  it.  But  also,  there's  nearly  a  50 %  chance  that  you  will  accept  a  part  that  is  truly  bad  and  near  the  lower  spec. Very  interesting.  What  happens  if  we  could  change  the  variation  of  the  gauge  then? Well,  the  shape  of  this  curve  definitely  depends  on  how  good  your  gauge  is.  Let's  play  with  this  just  a  little  bit.  What  if  we  could  reduce  the  variation  by  a  factor  of  10?  Make  a  quick  change  here  and  replot  our  gauge  performance  curve.  If  this  is  possible,  we  will  correctly  accept  or  reject  more  of  the  parts.  We're  moving  from  incorrect  to  correct  when  we  do  this.  Let  me  break  from  the  role  play  for  just  a  moment.  The  gauge  performance  curve  I  am  showing  is  an  add  in  that  we  made  for  our  2022  Discovery  Americas  presentation.  The  add  in  is  available  on  the  community.  Back  to  you,  Jerry. That  is  really  an  interesting  chart,  Jason.  I  don't  think  we  do  anything  like  that.  What  you're  saying  is  that  the  gauge  errors  contaminate  the  measurements,  but  all  I  have  is  the  imperfect  measurement.  Your  gauge  performance  curve  is  plotted  versus  true  part  values.  I  wish  I  knew  those  true  part  values,  then  I  could  know  exactly  which  parts  to  keep  and  which  to  throw  away.  Is  there  a  way  I  can  know  the  true  part  value? We  could  know  that  directly  if  we  had  that  ever  elusive  perfect  gauge,  which  we  don't.  We  really  can't  ever  get  to  the  level  of  knowing  the  true  value  of  an  individual  part,  but  we  can  estimate  the  true  part  distribution  given  our  knowledge  of  the  gauge  characteristics  and  the  measured  part  distribution. How  would  you  do  that? Well,  if  we  assume  that  gauge  errors  are  normally  distributed,  and  for  the  moment,  let's  ignore  any  bias  or  linearity  problems  that  you  might  have,  and  that  we  have  the  measured  part  distribution.  If  we  have  that,  we  can  back  out  the  variance  of  the  true  part  distribution  using  a  simple  equation.  That  simple  equation  is  just  the  difference  between  the  measured  part  variance  and  the  gauge  variance.  The  plot  on  the  right  is  shown  for  a  situation  where  the  measured  variance  is  25,  the  gauge  variance  is  16,  and  if  we  subtract  those  two,  the  true  part  variance  is  nine.   We're  beginning  to  get  the  parameters  for  that  distribution  because  we  know  the  results  of  our  gauge  study  and  we  understand  the  variance  associated  with  our  gauge. Now,  we  would,  from  this,  build  a  normal  distribution  that  centered  on  the  measured  distribution  mean  with  the  new  standard  deviation.  A gain,  the  result  of  which  is  going  to  look  like  the  plot  on  the  right.  In  the  plot,  the  blue  bars  represent  your  measured  part  distribution.  The  areas  above  and  below  the  spec  are  are  shadeded  in  pink.  Notice  that  the  measured  part  distribution  is  much  wider  than  the  true  part  distribution.  The  measured  distribution  is  what  you  get  when  you  run  the  true  part  distribution  through  your  imperfect  gauge. Okay,  that  makes  sense,  at  least  a  simple  case.   How  do  you  relate  this  to  what  it's  costing  my  company? Well,  we  can  use  this  information  in  a  numeric  simulation  to  characterize  the  mistakes  that  our  gauge  is  going  to  make.  When  we  do  that,  we  can  generate  a  part  inspection  table  like  this. Let  me  study  this  table  for  a  minute.  My  eye  is  immediately  drawn  to  the  center,  the  green  box  that  says  95.4 %.  Am  I  interpreting  this  right?  95.4 %  of  my  total  production  is  truly  good  and  we're  shipping  it. Correct. That's  a  good  thing.  Now,  looking  at  the  first  and  last  columns,  if  I  add  18  and  25,  let's  see,  that's  about  0.043 %  of  my  production  parts  are  truly  low  parts.  A nother  0.041 %  on  the  last  column  are  truly  high.  This  is  bad.  It  says  my  process  is  making  bad  parts  that  must  be  thrown  away  or  reworked.   I  see  another  problem.  If  I  look  at  that  center  column  and  I  add  those  all  together,  I  get  99.9 %  of  my  production  parts  that  truly  are  good.   It  says  that  the  gauge  is  identifying  2.3 %  of  those  as  too  low  and  2.3 %  is  too  high  as  well.  Now,  the  customer  doesn't  care  about  that.  They're  still  getting  good  parts, b ut  I  certainly  do.  I'm  making  good  product  and  I'm  throwing  it  away.  Worst  yet,  look  at  that  center  row,  those  red  squares.  There's  another  0.036,  18  and  18,  of  truly  bad  parts,  parts  that  are  too  low  or  too  high  that  are  being  accepted  by  this  measurement  gauge.  This  is  serious.  I  do  not  want  to  ship  bad  parts  to  my  customer  if  I  can  help  it. That's  right.  This  is  cool.  You're  beginning  to  see  the  cost  of  having  an  imperfect  gauge. This  is  really  interesting.  It  shows  that  if  I  don't  do  something  about  my  imperfect  gauge,  I'll  risk  accepting  bad  parts  and  throwing  away  good  parts,  both  of  which  are  bad  for  my  business.   On  the  other  hand,  I  think  we've  got  a  way  to  handle  this,  Jason. Okay,  what's  that,  Jerry? Well,  we  use  something  called  guard  bands.  These  are  our  bands  that  are  set  inside  the  specification  limits.  If  we  set  them  far  enough  inside  the  spec  limits,  we  can  reduce  and  essentially  eliminate  shipping  bad  parts.  Doesn't  that  fix  at  least  part  of  our  problem? At  least,  guard  bands  are  definitely  a  good  way  to  reduce  the  percentage  of  bad  parts  that  make  it  through  your  inspection  process.  A  lot  of  companies  use  them.   Have  you  considered  the  fact  that  improving  quality  using  guard  bands  comes  at  the  expense  of  throwing  away  good  parts? Honestly,  that  has  occurred  to  us,  but  we  haven't  tried  to  quantify  that  damage. Well,  let's  extend  this  example  out  a  little  bit  more  and  let's  just  assume  that  we  bring  those  specifications  in  by  one  unit  of  measure.  We'll  call  these  guard  band  limits.  Our  lower  guard  band  limit  would  be  41  and  our  upper  guard  band  limit  would  be  59.  We're  going  to  use  this  as  our  inspection  screening  values  instead  of  the  original  upper  and  lower  specs.  Now,  we  can  do  the  same  numerical  simulation  and  update  the  results.  Let's  just  take  a  look  at  the  differences  between  the  tables.  Can  you  see  how  the  percentages  have  changed? We  went  from  shipping  roughly  0.04 %  of  parts  that  were  truly  bad  to  only  shipping  0.03 %  of  bad  parts.  That  looks  successful.  Maybe  we  could  even  squeeze  our  guard  bands  in  further  and  improve  that.  Especially  given  our  high  production  volume.  We're  talking  real  bucks  here. It  is.   Also  notice  how  many  truly  good  parts  are  now  being  screened  out.  Every  time  you  screen  out  and  throw  away  good  part,  it  is  costing  your  company  money. Well,  you're  right  about  that,  Jason.  Is  there  a  way  to  look  at  this  monetarily?  What  if  we  assume  that  a  bad  part  in  the  simulation  results  in  a  bad  car?  Can  we  input  the  cost  of  scrapping  the  car  and  see  how  that  affects  the  bottom  line? Absolutely,  we  can  do  that.  I'll  need  to  get  a  little  information  from  you,  though.  First,  how  much  does  it  cost  to  make  the  car? Yeah,  let's  say  for  the  sake  of  this  demonstration,  $35,000. Okay,  great.  That  means  for  each  rejected  car,  it  costs  your  company  $35,000.  You  might  manufacture  in  rework  costs  here,  but  let's  say,  for  example,  we  just  throw  the  car  away.  Now  we  need  production  quantity. I  don't  know.  Let's  just  choose  a  million  cars. Okay,  great.  Now,  how  much  do  you  charge  for  a  truly  good  car  that  makes  it  to  a  dealership? The  dealerships  buy...  Let's  just  say  they  buy  these  cars  from  us  for  $40,000. Okay.   If  I  understand  this  right,  your  profit  per  car  is  that  40K  minus  the  35 K  and  your  profit  has  been  $5,000  per  car. Right. Last  thing,  do  you  know  the  cost  associated  with  selling  a  bad  car? That's  a  little  tougher.  There  are  the  obvious  costs  of  repairs  to  the  bad  car  or  potential  cost  of  return.  Those  are  relatively  easy  to  calculate,  but  there's  also  damage  to  our  reputation.  Our  customers  demand  quality,  and  if  we  start  putting  bad  product  out  the  door,  it  can  quickly  get  out  of  hand  and  result  in  lost  future  sales.  That's  a  lot  more  difficult  to  calculate.   I  know  you  need  the  number.  For  the  sake  of  argument,  let's  just  say  that  totals  to  $50,000  per  bad  car  that  makes  it  out  into  the  market. Excellent.  Let's  take  a  look  at  the  profits  and  losses.  Same  simulation.  Just  review,  make  sure  that  we're  looking  at  the  correct  values.  You  told  me  that  manufacturing  cost  per  car  is  $35,000.  You  then  sell  that  to  a  dealer  for  $40,000.  Our  profit  is  $5,000.  Cost  of  selling  a  bad  car  is  $50,000.  We're  going  to  look  at  this  across  a  1  million  car  production  run.  Have I  captured e verything,  right? I  think  that  looks  good. All  right.   If  we  look  at  the  net  profits  and  losses,  you  stand  to  make  about  346  billion  from  the  1  million  cars  you  make. That  sounds  good. Not  bad.  The  total  profit  from  the  truly  good  cars  that  are  shipped  is  about  371  billion.  The  loss  due  to  making  truly  bad  cars  that  are  caught  in  your  inspection  is  199  million,  which  is  the  sum  of  98  million  plus  101  million. Okay. The  law...  I let  you  digest  for  a  second? Yeah,  I'm  following. Okay.  The  laws  due  to  shipping  truly  bad  cars,  this  is  the  one  you  are  really  concerned  about,  is  137  million,  which  is  68  million  plus  69  million.  Finally,  the  loss  from  scrapping  truly  good  cars,  this  is  what's  costing  your  business,  is  $25  billion.  That's  quite  a  lot.  That's  the  sum  of  $12.4  million  and  12.4  million. That's  fascinating  and  also  a  little  depressing  that  we're  losing  that  much  money.   If  you  change  things,  let's  say  you  change  the  guard  band  settings,  will  the  total  net  profit  change? That's  right.  That's  definitely  true.  You  could  see  that  change. In  that  case,  then  could  there  be  an  optimum?  I  can  imagine  widening  the  guard  bands  or  narrowing  them  and  looking  at  the net  profit,  would  there  be  an  optimum  for  that  net  profit  peaks  out? Yes,  you  can  definitely  explore  that  trade- off  in  a  lot  of  different  ways.  You  could  answer  questions  like,  how  would  improving  my  gauge  by  a  factor  of  10,  like  we  showed  with  the  gauge  performance  curve,  improve  my  profitability?  Or  how  much  can  I  afford  to  spend  on  fixing  or  replacing  a  gauge?  If  we  know  what  the  costs  to  the  company  are  for  our  measurement  system,  then  we  can  justify  the  cost  of  fixing  or  replacing  gauge.  Also,  just  to  your  point,  what  if  I  adjusted  my  guard  bands?  We  can  definitely  answer  that  question.  A nother  common  one  is  what  if  I  improve  my  process  capability?  I  just  tighten  the  variation  in  my  process,  what  does  that  do  to  my  profits  and  losses? I  could  trade  that  off  against  the  cost  of  improving  that  process  capability.  Interesting.  Well,  I  must  say,  Jason,  I'm  impressed.  This  has  been  a  good  use  of  time,  but  I  think  I  owe  it  to  my  company  to  muddy  the  waters  just  a  little  bit.  This  is  all  great  for  normal  distributions  and  simple  gauge  errors  and  those  kinds  of  things.  Those  calculations  that  you've  shown  are  easy.  But  what  if  I  have  gauge  linearity  or  bias  problems?  Or  what  if  I  have  a  skewed  distribution,  which  is  really  pretty  typical  in  my  company.  We  rarely  run  into  the  nice  bell  shape  curve.  Getting  a  true  part  distribution  out  of  the  measured  part  distribution  becomes  a  lot  more  difficult  than  just  using  that  simple  formula  you  showed  earlier.  Can  you  even  can  do  that? Absolutely.  We  are  writing  an  add- in  that  will  make  you  able  to  define  the  shape  of  any  measured  part  distribution.  W e  can  do  the  same  exercise  with  measured  part  distributions  that  are  normal  or  log- normal,  uniform,  Weibull,  or  even  a  custom  distribution.   It's  an  add- in  we're  working  on.  It's  a  work  in  progress. All  right,  that's  fantastic.  I'm  ready  to  buy  in.  When  will  that  be  available? We  have  the  basics  of  the  add  in  worked  out,  but  we  need  some  time  to  make  it  more  user  friendly.  We'll  be  working  on  that  in  the  coming  few  months.  Probably  before  midyear,  we'll  have  that  wrapped  up.  When  it's  done,  we'll  post  it  on  the  JMP  website  in  our  community  file  exchange. A  few  months,  really?  I'll  forget  all  this  by  then. That's  okay.  We  recognize  that.  Once  our  ad  in  is  ready  for  prime  time,  we'll  announce  a  series  of  open  to  the  public  seminars  where  we  will  go  into  detail  about  what  you've  seen  here,  as  well  as  other  aspects  like  relating  these  concepts  to  Donald  Wheeler's  EMP  methodology,  which  is  another  personality  in  the  Measurement  Systems  Analysis  platform  in  JMP.  Here's  a  quick  peek  at  the  topics  for  the  up  and  coming  talks. We'll  spend  more  time  elaborating  on  how  gauge  studies  that  are  using  the  AI AG  classification  that  we  talked  about  earlier,  we'll  talk  about  how  that  can  lead  to  unrealistic  gauge  assessments.  We'll  also  explore  how  Wheeler's  Evaluating  the  Measurement  Process,  the  EMP  method,  can  provide  us  more  realistic  gauge  classification.  We're  going  to  present  the  problem  with  AIG  and  present  the  solution  using  Wheeler's  methods.  We'll  also  show  how  EMP  method  can  advise  us  on  how  to  use  our  gauge.  How  do  we  use  it  in  the  production  process?  One  example  that  we'll  be  covering  is  objectively  setting  guard  bends.  The  remaining  topics,  hey,  we'll  spend  a  little  bit  more  time  interpreting  gauge  performance  curves,  talk  about  how  to  blend  performance  with  part  variation  to  determine  cost  associated  with  imperfect  gauges. Really,  that  is  what  we're  talking  about  today,  but  we  feel  like  we  need  to  extend  that  a  little  bit  so  that  we  all  understand  how  that  works.  Final  two  topics,  how  can  Wheeler's  calculations  be  factored  into  this  gauge  cost  conversation,  and  how  to  understand  gauge  cost,  again,  to  the  point  of  non- normal  part  distributions. That's  perfect.  Can  you  make  sure  that  I'm  on  that  invitation  list?  I  want  to  make  sure  that  everyone  in  my  quality  department  attends  your  seminars. Sure  thing,  Jerry.  Anything  else  I  can  do  for  you  today? Yes.  Get  back  to  work  on  that  ad- in.  The  sooner  it's  available,  the  better. Will  do. All  right,  this  concludes  our  presentation.   I'll  say  that  as  we  were  doing  research  for  this  talk,  we  uncovered  many  concepts  that  are  important  to  understanding  how  to  use  measurement  systems.  We  feel  like one  of  these  concepts  deserve  more  time  than  we  had  in  our  talk  today.   We  look  forward  to  continuing  the  conversation  with  you  in  the  coming  months.  Any  closing  thoughts,  Jarry? Just  that  if  you've  kind  folks  that  are  attending  today,  if  you're  interested  in  attending  those  upcoming  seminars,  please  let  your  local  JMP  support  person  know  or  your  support  team  know,  and  they'll  make  sure  that  you're  included  on  that  invitation  list. Excellent.  With  that,  thank  you,  everyone,  for  attending. Thanks,  all.
JSL can be used to make GUIs to access data in measurement systems with pre-programmed scripts, such as for SPC. For this to work, JMP® must remember the control limits from the status quo when new data arrives. A second JSL script is needed to connect to a DB and load existing limits for variables if they are present. Control limits can be altered/added manually and updated in the data table or automated with default values and a factor for multiple variables. When others pull the data again from the DB, the control limits are automatically added to the column parameter and go into the SPC. This presentation will provide a live demo and JSL example of how to insert and update data in a DB.     Hello,  my  name  is  Mauro Gerber.  I  work  as  a  data  scientist  for  Huber and Suhner  in  Switzerland.  I  would  like  to  introduce  you  to  the  problem  we  had  regarding  SPC  and  I  would  like  to  have  you  on  the  right.   I  want  to  talk  about  SPC  scripting  and why  it's  important  to  write  the  information  back  into  a  database.  W hat  we  have  is  an  optical  measurement  system  that  measures  dimensions  on  parts  that  we  store  in  a  database.  The  goal  was  now  to  get  the  data  back  out  of  it  and  to  statistical  process  control  on  it.  Now  what  happened  was  that  over  time  some  variables  can  shift  and  the  worst  case  would  be  that  we  get  the  out  of  spec  and  then  we  have  to  do  some  measurements  to  get  it  back  in.  It  would  be  preferable  if  we  can  do  that  beforehand. The  idea  of  SPC  is  like  telling  a  dog  to  stay  where  it  is.  W e  achieve  this  by  redefine  a  stable  face  and  say  if  the  process  moves  out  of  that  window,  it  gives  a  notification  before  it  gets  an  out  of  spec  and  we  can  take  measurements  to  get  back  into  stable  again.  O ne  way  to  achieve  this  is  by  the  process  screen  platform.  We  can  sort  about  this  control  chart  alarms  which  gives  us  how  many  of  the  variables  violate  the  test  one.  Test  one  is  simply  out  of  control  limits.  As  you  can  see,  the  most  alarms  I  get  from  the  most  stable  process.  T his  is  a  bit  paradox.  As  you  can  see  here,  I  have  a  very  stable  process  and  the  SPC  limits  that  gets  calculated  automatically  gives  me  very  much  false  positives.  ` I  shouldn't  react  on  this  because  it's  just  within  the  variation. The  second  problem  with  it  is  if  I  have  an  order  and  analyze  the  samples,  it  calculates  the  control  limits  for  me.  Like  in  this  examples,  it  says  it's  all  good. I f  I  later  on  analyze  a  second  order,  the  calculation  of  the  limits  automatically  switches  to  the  new  process  variations.  I t  makes  the  window  bigger  and  again  says,  hey,  everything  is  okay  until  we  see  of  course  that  it  moved  and  the  variation  got  bigger.  W hat  I  did  making  an  SPC  script  that  deals  with  the  measurement  and  loads  and  stores  SPC  limit  in  the  database  so  we  can  have  a  proper  working  SPC. I  would  like  to  switch  to  demonstration  of  how  this  looks  in  our  database.  W hat  we  have  is  a  script  that  imports  the  data.  What  does  is it  goes  into  the  database  and  search  the whole  product  order  we  have  stored  for  this  demo.  I  made  a  special  data set  and  it  automatically   contains this program.  For  us  is  like  article  or  part.  W e  can  select   like PO1  that  the  first  one  we  did  in  our  example,  I  can  say  okay,  I  want  to  have  a  look  at  that  data  and  at  the  back  and  this  is  the final  result.  A s  you  can  see  here, after  [inaudible 00:04:23] and SPC.  In  our  example,  this  is  what  happened.  I  have  a  stable  process,  everything  is  good.  Look  at  all  processes.  This  is  where  the  error  came  from. Again  I  can  show  you  the  SPC.  Y ou  see  here, in the  beginning  the  process  was  okay.  Then  there  was  a  phase  from  where  the  variation  got  bigger  and  then  it  got  back  again.  T his   X1  got  a  problem.  The  problem  was   X2.  As  you  can  see  here,  I  got  some  parts  out  of  spec.  This,  I  would  like  to  change  now. A s  we  discussed  earlier,  what  we  need  to  do  is  go  to  the  first  PO  and  set  this  table  one  I  want  to  go  back  to  and  I  make  now  run  the  script  that  goes  over  the  table  that  is  active  and  extract  every  line  with  spec  limit.   I  see  here,  this  is  the  spec  limits  and  you  can  see  here  empty  control  limits.  W hat  I  can  do  now  is  two  ways.  I  can  either  activate  this  PC  here,  have  a  look  at  the  data  and  say  zoom  in  a  bit  and  say  manually  on   X2 lower  control  may  need  2.98  and  3.02  and  down.  Tell  the  program  hey,  please  update  the  limits  for  me  and  I  go  back  to-- No  this  was   wrong ,  sorry. 2.97  and  3.01,  update. T he  limits  I  just  set  are  now  in  the  control  limits  which  I  can  check  in   X2 it  added  the  control  limits  for  myself.  The  problem  is  now  if  I  close  the  table  and  read  download  the  data,  these  control  limits  are  lost.  W hat  I  can  do  now  is  save  it,  close  all,  reload  the  data  and  the  script  now  looks  if  there  are  control  limits  present  for   X2 and  I  did  none  for   X1  and  if  I  go  here  now  SPC   X2,  the  control  limits  are  now  set  as  desired.  If  I  have  a  lot  of  control  limits  present  in  my  data  set,  like  20  or  so,  it  could  be  quite  difficult  to  set  the  limit  manually  for  everyone.  This  is  where  I  made  the  script  that  you  can  actually  select  the  desired  variables  you  want  set  default. A s  I  showed  you  earlier,  if  I  would  set  it  to  one,  it  would  copy  the  automatic  calculated  limits  from  the  system  which  can  be  too  tight.  T his  case  I  say  okay,  I  want  the  margin  twice  as  big.  Now  it  runs  through  every  row  and  sets  automatic  the  limit.  I  save  the  limits  to  the  database  and  if  I  run  the  limits  again,  you  can  see  it's  centered  and  it  has  a  nice  window  around  them  quite  easily.   I  can  manage  limits  for  a  lot  of  variables. If  I  go  now  to the  problem  we  had.  I  now  select  all  of  them,  run  the  script  again  and  run  it  again.  Now  it  takes  over  the  limits  I  set  earlier.  A s  you  can  see  here,  I  would  get  warnings  earlier  on  that  something  is  wrong  and  then  we  could  have  prevented  this  from  happening.  A s  you  see  here  with  some  countermeasures  and  now  we  are  back  again  into  the  stable  phase. What  I  use  in  the  script  is  I  search  with  spec  limits  to  identify  which  one  are  variables  to  work  with.  This  can  also  solve  problems  with  platforms  who  depend  on spec  limits.  I  can  filter  them  and  only  make  limits  or  control  limits  for  SPC  or  spec  limits  for  the  process  capability  platform.   I  don't  get  an  error  message  that  control  limits  are  missing. What  is  important  is  of  course  safety.  Habit  of  jump  is  that  the  source  is  in  the  file.  W hatever  password  or  database  connection  you  use  gets  stored  into  the  table.  This  of  course,  can  reveal  server  name  or  even  username  and  password.  To  get  around  this  there's  a  preference  ODBC Hide  Connection  String.  What  you  also  can  do  is  in  the  script  itself,  encrypt  the  code  with  password  and  username.  You  see  in  here,  so  people  are  unable  to  read  it. When  I  write  into  the  database,  I  use  the  code  create  database  connection.  I  set  the  reference  and  then  there  is  SQL  statement  I  put  together  from  insert  into  which  database  name  it  is.  Then  what  I  write  that's  from  the  list  I  generate  program  names,  empathize  and  the  control  limits. Th en  execute  these  SQL  statements  from  the  connection  string  and  the  SQL  string.  What's  important  is  that  when  the  log  is  longer  so  I  get  an  error  message  from  the  SQL,  it  beeps  me  and  put  it  in  the  log.  W hen  something  is  working  wrong,  I  can  check  it  like  credentials  are  wrong  or  the  connection  could  build  up  or  whatever  it's  possible. Then  I  write  the  control  limits  into  the  data  table.   I  check  for  the  name  of  the  article  and  if  there  are  SPC  limits  present,  if  they  are,  I  update  the  column  properties  with  the  control  limits  which  then  get  automatically  displayed  in  the  SPC  and  the  program  handles  accordingly. Second  use  for  writing  something  back  is  if  we  have  a  measurement  we  make  a  test  with  it  like  environmental  test  or  endurance  test  and  make  a  second  measurement.  I  have  the  same  part  with  two  measurements  or  I  have  a  false  measurement  like  something  went  wrong.  I  retake  the  measurements  and  the  second  measurement  is  actually  what  counts. T his  I  can  navigate  by  a  small  script  that  says  update  DB.  I  can  give  me  a  dialogue  if  I  want  to  set  the  measurements  inactive  or  update  it.  If  I  want  to  update  it,  I  can  type  which  label  I  want  and  what's  the  name  of  it.  It  works  similar  as  the  function  name  selection  in  column.  What  it  does  is  for  each  measurement  I  get  a  unique  ID,  even  if  the  part  itself  has  a  serial  number.   I  have  the  serial  number  measured  twice  after  zero  hour,  after  100  hours  from  test  option  one.  I t  got  different  measurement  IDs.  This  is  how  I  differentiate  between  the  measurements. For  the  inactive  part,  that's  simply  that  in  the  SQL  statement  I  can  say  if  the  measurement  were  inactive  disregarded  so  these  faulty  measurements  don't  show  up.  K ey  points  is  careful  that  you  may  not  give  out  some  sensitive  information  like  password  usernames.  You  can  hide  certain  parts  in  the  script  with  encryption  the  code  is  JSL  Encrypted  and  then  the  encrypted  code,  is  in  here  as  a  text.  U se  pref  ODBC  Hide  Connection  String.  A nother  way  is  use  windows  authentication  to  avoid  credentials  altogether.  This  also  can  help  that  you  can  use  specific  columns  that  are  writable  like  the  label one, label  two  column  and  to  avoid  manipulations. We  have  to  policy  a  user  can  set  a  measurement  inactive  but  he  cannot  delete  it.  I f  we  will  search  to  the  database,  we  can  restore  all  the  data  if  something  went  wrong.  Another  good  practice  is  check  that  the  data  is  actually  written  into  the  database  like  a  handshake  and  enrich  the  data  with  important  information  that  can  be  handy  for  user  years  later.  Because  they  may   don't  have  the  information w hy  you  took  tree  measurement  of  the  same  part  and  maybe  wondering  why  it's  getting  worse  and  in  that  all  have  the  same  information  to  work  with.  I  can  straight  this  little  future  view.  I  can  go  back  in  here  for  the  demo. Usually  when  we  have  press  builder  make  it  row  there  in  the  string  second  two.  What  I  do  is  select  those  parts  row,  name  selection  in  column  and  that  would  give  me  a  column  which  I  then  can  save.  I f  I  reload  the  data,  this  information  is  lost.  What  I  can  do  now  is  this  feature  update  database.  I  can  update  it,  I  can  say,  hey,  label  one,  this  is  stable.  Y ou  can  see  here,  label  one  is  now  marked  as  stable. I  can  go  and  say  okay,  these  measurements update  I  want  to  leave  on  stable,  close  it  and  I  simply  put  the  data  from  the  database  and  as  you  can  see  the  stable,  unstable  faces  are  stored.  I f  someone  later  makes  it,  then  they  can  bow  conclusion  from  the  phases  here  is  missing.  This  is  the  stable  face,  this  is  un stable  face  and  the  same goes  with  data.  I  can  select  some  object  inactive.  It  will  warn  me  if  I  want  those  trees  inactive a nd  the  next  time  I  would  call  the  function  they  wouldn't  show  up.  If  I  want  to  have  them  back,  I  can  select  here  include  arrow  meshes  that  those  measures  come  back  again.  T his  suits  my  script.  I  will  thank  you  for  listening  and  if  you  have  further  questions  I  will  be   [inaudible 00:19:42]  later  on.  Thank  you  very  much.
Cerba Research is a global company providing high-quality, specialized analytical and diagnostic solutions for clinical trials. Cerba Research Montpellier develops customized immunohistochemistry protocols to detect the expression of selected targets on patients’ tissue sections. To be used in clinical trials, these protocols must meet the regulatory agencies’ criteria to ensure that the protocol will allow consistent results on precious patients’ samples. With the diversity of parameters evaluated and the types of evaluation possible in implementing these custom protocols, automating data analysis became a need. Thanks to various JMP® tools, we have developed an automated analysis that saves time and homogenizes protocol performance reports by including statistical and graphical data in a Dashboard. This process, submitted as JMP Add-in, has been incorporated into our user workflows, thus facilitating our procedures.     Used version: JMP v16.1.0     Hello.  My  name  is  Marie  Gérus-Durand,  and  I'm  working  for  C erba Research  Montpellier.  Today,  I  will  show  you  how  we  set  up  automatization  of  immunohistochemistry  data  analysis  for  protocol  validation  in  Montpellier  using  JMP  and  its  tools  like  dashboard  and  add-in  functions. First,  some  words  about   Cerba Research,  it's  a  worldwide  company  with  capabilities  in  all  the  continents.  Here  I  highlighted  in  yellow  the  department  I'm  working  for.  It's  a  Histopathology  EHC  Department.  As  you  can  see,  I'm  based  in  Montpellier  in  France. W e  have also  other  labs  in  US,  in  New  York,  and  in  Taiwan  in  Taipei. First  of  all,  what  is  immunohisto chemistry?  The  aim  of  the  technique  is  to  detect  targets protein  mainly  on  a  tissue  sample.  Here  you  have  a  slice  of  a  tissue,  for  example,  when  you  do  a  biopsy.  We  will  look  at  the  targets  of  interest  using antibodies,  which  will  detect  the  target.  This  antibody  is  recognized  by  another  one  which  is  combined  with  chromal  fluoroform  or  active  components  that  allow  the  detection  of  the  target.  Here  you  see,  for  example,  these  three  components  are  highlighted,  meaning  that  antibodies  bind  it  and  we  can  detect  it.  After  the  experiment,  we  can  can  look  at  the  slides  under  microscope  or  using  a  scanner,  which  allow  visualization  and  analysis  of  the  results. On  the  next  slide,  I  just  zoom  in  so  you  can  see  better  what  it  looks  like.  Here  it's  a  skin  sample  and  you  have  a  cell  nuclear  in  blue  and  the  target  of  interest  in  red.  One  of  the challenges  within  immunohisto chemistry  and  histopathology  is  that  you  have  many  possible  protocols,  colorations.  Here  on  the  left,  it's  two  different  histological  colorations.  It  doesn't  involve  antibodies  like  I  show  you,  but  it  just  reactive  with  the  different  components  of  the  tissues.  You  see  here  for  the  MOVAT,  we  have  five  colors.  For  the  HE,  we  have  only  one  color,  but  the  intensity  depends  on  the  type  of  structure  you  are  looking  at  in  the  tissue. On  the  right,  you  have  two  immunohistochemistry  protocol.  One  simplex,  we  called  it,  because  we  detect  only  one  target,  and  it's  a  chromogenic  here.  It's  in  brown.   On  the  top  right,  you  have  a  multiplex.  H ere  we  detect  many  components.  Here  it's  a  fourplex,   four  targets  on  the  same  slide,  and  each  is  revealed  by  different  flow  of  work.   You  have  different  color  for  each  of  the  targets. Among  of  these  coloration  detection  possibilities,  then  you  have  a  multitude of  possible  analysis  method.  The  slides  can  be  analyzed  by  a  pathologist,  which  will  give  us  qualitative  or  semi- quantitative  data,  or  by  image  analysis,  which  will  give  us  quantitative  data.   Another  layer  of  that  is  that  you  can  have  reportable  parameters  which  are  single.  For  example,  if  you  have  a  simlplex,  you  detect  the  targets,  only  one  target,  and  you  assess  only  one  parameter  like  percentage  of  positive  cells,  for  example,  or  you  can  have  many.  For  one  target,  you  can  have  the  percentage  of  positive  cells  and  a  specific  histology  score.  Or  if  you  have  a  multiplex,  then  you  can  multiply  this  for  all  targets  in  the  multiplex.  Each  report  level  parameter  is  target- dependent.  You  can  imagine  that  we  have  a  lot  of  combination  that  we  can  have  to  access  during  our  validations. In   Cerba Research  M ontpellier,  in  2022,  we  have  a  small  part,  like  20 %  of  our  project  related  to  animals.  We  are  studying  animal  samples.   The  other  projects  were  on  human  samples.   Among  this,  most  of  them are  four  clinical  trials  of  the  project.  That's  a  very   [inaudible 00:04:56] .  W e  have  some  others  that  are  outside  clinical  trials,  a  quarter  of  them,  and  a  small  portion,  3 %  that  are  CAP  compliant.  CAP  is  a  specific  regulation  for  US.   It is  to  know  that  before  being  used  in  a  clinical  trial,  we  should  demonstrate  that  our  protocol  that  we  developed  in  Montpellier  show  consistency  in  results  for  section  of  the  same  sample.  If  we  analyze  the  sample  at  different  type points,  for  example,  the  samples  of  patients  involved  in  the  study,  the  first  year  should  be  the  same  than  in  the  five  years  after. On  the  different  automatons,  we  have  at  our different  sites,  and  when  the  samples  are  analyzed  by  different  operators  or  pathologists.   This  applies  a  rigorous  validation  according  to  the  health  agency,  and  this  validation  is  mainly  based  on  statistical  criterion. The  implication  for  the  company  is  that  we  need  to  increase  the  team  members  to  support  the  increasing  number  of  projects  we  have  each  year,  and  we  need  a  normal  genus  statistical  analysis  pipeline  to  be  sure  that  we  will  give  all  our   clients  the  same  type  of  results.  Obviously,  we  need  statistical  analysis  tool,  and  it's  when  we  choose  JMP  to  support  our  validation  of  the  protocol. Today,  I  will  show  you  only  a  part  of  what  we  are  doing  because  I  don't  have  time  to  show  you  everything.  I  chose  a  quite  simple  example.  It's  a kind of  experiment  we  do  to  validate  the  precision  of  the  protocol,  meaning  that  we  check  the  intra-run ,  which  is  called  repeatability  precision  over  three  slides.  Here  are  the  three  ones  highlighted  in  purple  in  the  same  cycle.  The  three  slides  I  run  at  the  same  time,  they  come  from  the  same  sample,  and  we  just  check  that  we  have  the  same  data.  The   inter-run,   reproducibility,  test  over  two  slides  highlighted  in  blue  in  each  cycle.  In  total,  we  have  six  slides  over  three  cycles. For  reportable  parameters,  I  will  use  an  image  analysis   dataset,  which  is  quantitative  data  and  usually  easier  to  analyze.  We  will  have  two  reportable  parameters  for  one  target. How  do  you  start?  First,  we  need  to  import  the  data.  I  don't  know  if  you  are  familiar   with  that.  But  in  our  case,  we  have  data  either  from  Word  documents,  so  we  use  the  Word  Import  Tool  available  on  JMP  community  website .  I  put  the  link  here  so  you  can  go  and  find  it  again.  Or  we  import  data  from  Excel  either  directly  by  opening  the  file  in  JMP  or  by  using  the  JMP  tool  in  Excel. For  this  presentation,  just  to  be  faster,  I  create  some  script  to  help  me  to  focus  on  the  dashboard  creation  after  that,  which  will  take  more  time.  But  this  is  just  some  script  and  just  to  be  faster,  but  I  will  not  develop  into  it  then.  Here  I  will  open  a  data  table  from  Project X  here.  As  you  can  see,  it's  a  quite  simple  table.  I  have  four  columns  with  the  validation,  the  slide  ID,  which  are  internal  slide  numbers,  and  the   dataset  for  my  two  reportable  parameters.  This  data  continues  because  they  come  from  image  analysis. Once  I  have  this,  I  will  need  to  prepare  my  data.  It's  a  most  time  consuming  part  of  analysis  data.  Again,  it's  why  I  have  a  script.  You  see  the  five  columns  where  I  did.  The  sample  to  be  able  to  correlate  each  data  to  the  same  sample,  which  is  a  part  of  the  slide  ID  we  have  internally.  I  just  get  a  formula  here  to  help  me  to  do  that.  The  slide  number,  which  are  the  last  digits  of  the  slide  ID,  and  the  slide  order,  the  1,  2,  3,  4,  5,  6,  7  for  each  sample.  Thanks  to  this  slide  ordering,  I  would  say,  I  implement  the  repeater.  The  three  first  slides  which  were  staying  in  the  same  cycle  for  repeater,  and  two  first  slides  of  each  cycle  for   reproductivity  test. Here  I  have  all  the  information  needed  to  do  my  analysis.  I  go  back  to  my  journal,  and  we  will  want  to  do  the  dashboard  question.  I  still  have  some  steps  to  do  before  that  because  I  would  like  to  have  all  the  analysis  I  want  to  put  in  the  dashboard.  Here  are  the  two  little  table  you  see  where  we  are  required  to  analyze  the  CV  of  our  protocol  for  each  sample  for  repeatability.  I  selected  only  the  three  first  slides  thanks  to  the  local  data  filter.  The  same  for  reproducibility,  where  you  see  I  have  the  slides  from  the  reproducibility  column. Here  are  the  data  that  I  need.  This  data  I  updated  them  in  here.  You  see  I  have  much  more  columns  now.  It's  easier  to  find  the  name  here.  I  have  the  sample  CV  for  repeatability,  for  reportable  parameter  1  and  2,  and  then  the  same  for  reproducibility  for  the  two  parameters,  and  then  I  do  the  mean  of  both  samples  for  each  of  these  columns.  Here,  all  the  data  I  will  need  to  implement  in  my  dashboard.  I  can  cross  this  table.  I  don't  need  them  anymore. I  will  now  do  the  graphs  and  tables  that  will  really  fit  in  the  dashboard.  Again,  what  I  want  to  show  to  the  client  is  the  distribution  of  the  data  for  repeatability.  I  put  as  well  the  standard  deviation  and  the  mean.  Usually,  it's  pretty  good  here and  for  reproducibility  where  I  have  all  my  six  slides.  Then  I  would  put  a  table  with  a  CV  for  each  sample  for  repeatability  and  the  reproducibility  on  the  left  for  the  first  parameter  and  on  the  right  for  the  second  one.  The  same  outline  for  the  mean  of  the  two  samples. These  are  the  four  part  I  want  to  show  on  my  dashboard.  I  will  show  you  how  it  looks  like.  I  want  to  obtain  something  like  that.  This  is  often  I  did  that  I  can  show  it,  the  two  graphs  and  the  two  table.  It's  what  I  would  show  you  how  to  do  now.  You  see  that  all  the  graphs  are  to  the  same  data  table,  sorry,  and  it's  much  easier  to  do  the  dashboard  after.  I  saved  as  well  all  the  scripts  so  I  can  redo  them  whenever  I  want.  I  will  create  a  new  file,  new  dashboard.  You  have  many  templates. I  usually  start  from  blank  and  just  you  have  to  put  in  what  you  want  to  see.  Sometimes  it's  a  bit  difficult  because  it's  small,  but  we  always  manage  to  find  our  way. My  table  at  the  bottom,  you  just  drop  them  where  you  want  to  have  them.  It's  pretty  simple.  You  can  change  the  names of each  part,  so  I  will  not  do  it  just.  But  you  can  see  that  you  can  edit  all  the  parts.  You  can  run  your  script  and  then  give  you  the  dashboard.  It's  pretty  similar  to  what  I  showed  you  before.  I  have  my  two  tables  at  the  bottom  and  my  two  graphs  at  the  top  here.  I  have  inverted  the  two,  so  I  usually  prefer  to  start  with  repeat  that.  I  will  just  switch  them.  If  I  put  it  where  something  was  already,  they  just  switch. Here  we  are.  This  is  the  layout  I  want,  so  it's  good.  After,  you  can  play  to  see  better,  more  or  less,  of  the  table,  et  cetera.  But  this  is  just  for  visual   [inaudible 00:15:00]   process.  Then,  okay,  now  we  have  dashboard,  but  it  would  be  more  interested  if  we  can  directly  go  for  the  dashboard  when  you  have  all  your  data  you  want  on  the  dashboard  and  click  and  you  have  a  dashboard.  For  that,  we  just  need  to  do  an  add- in.  It's  quite  simple.  Thanks  to  the  magic  triangle,  I  call  it,  the  red  triangle  here.  You  click  Save  Script  and  just  To  Add-I n.  Then  it  will  create  a  script  that  will  do  the  same  dashboard  again. Let's  go  back  to  the  data  table.  I  will  close  this  one.  You  are  sure  that  the  one  that  you  will  see  is  not this  one.  Sorry,  I  shouldn't  have  closed  it  before  doing  the  dashboard.  I  will  just  use  this  to  be  faster  and  not  to  create  it  again.  Here,  if  you  click  on  the  red,  Save  Script,  To  Add -In.  Then  this  is  the  name  you  will  have  in  the  add- in  list,  but  it's  to  manage  your  add- ins,  I  would  say,  but  it's  not  the  one  that  will  figure  out  in  the  add-in  tab  in  JMP.  The  name  that  you're  in  the  add-in  tab  is  this  one.  For  today,  I  would  just  call  it  Test  so  I  know  which  one  is  this.  Save.  You  see  here,  you  have  all  the  script  used  by  JMP  to  do  this  dashboard,  and  I  will  save  it  in  our  Project X.  Here  you  see  it  take  the  name  on  the  first  tab  Dashboard  only. I  save  it.  Here  I  have  this  both  tick  Install  after  save,  so  it  was  already  put  in  my  add-in  list.  If   the  box  was  not  ticked,  then  you  have  just  to  go  to  the  location.  You  save  your  file  and  click  on  it  and  it's  installed.  Now  I  can  close  this  dashboard . I  have  created  my  complement already.  Now,  how  to  use  it?  I  just  went  a  bit  faster.  If  you  open  it,  it  installs.  As  I  have  it  already,  I  will  not  start  it.  It's  just  under  it.  If  you  go  to  View,  Add-ins,  sorry,  Dashboard,  and   Unregister,  it's  not  erased.  I  will  find  it  again  when  I  go  to  my  project.  Here  you  see  I  have  it.  If  I  double -click,  it  ask  me  if  I  want  to  install  it.  Sure,  I  want  to  install  it.  It's  back  here  again.  You  can  share  it.  You  just  have  to  copy  the  same  file  I  clicked  on  and  paste  it  in  a  shared  folder  or  send  it  to  a  JMP  user  colleague.   You  can  modify  it.  For  this  you  have  to  go  in  Open.  Again,  this  is  the  dashboard  but  then  the  black  click  this  time,  just  go  on  the  arrow  here  to  open  and  Open  using  Add-In  Builder. Here  you  go  back  to  the  first  time  window  where  you  have  your  script  here,  and  you  can  either  edit  the  script  or  put  other  functions  that  I  don't  really  use,  to  be  honest.  But  you  have  many  functions.  I'm  sure  you  will  find  more  information  on  JMP  website  about  that.  This,  for  example,  will  allow  you  to  put  all  the  preparation  step  in  the  same  complement.  When  you  run,  everything  is  done  at  the  same  time.  This  is  it. In  conclusion,  using  this  dashboard  and  add-in  functions  allow  us  to  have  reports  consistency  because  we  have  always  the  same  set  up  of  results  to  send  to  the  clients.  We  increase  the  traceability.  Thanks  as  well  for  the  use  of  the  scripts  because  we  are  sure  that  we  are  all  doing  the  same.  I t's  a  great  time  saver  because  as  you  say,  I  just  have  to  click  on  one  button  and  I  have  my  dashboard.  If  you  combine  this  with  a  precision timing.  All  the  data  preparation,  then  you  have  your  table,  you  click  on  one  button,  and  you  have  everything  done.  It's  a  great  time  saver. It's  all  I  wanted  to  show  you  today.  I  hope  you  enjoy  it.  If  you  have  any  question,  don't  hesitate  to  reach  out  to  me  either  by  email— you  have  the  email  on  the  first  slide  here— or  through  the  JMP  community.  Thank  you.
Statistical evaluation for biological assays is critical because a lot of data needs to be summarized for reporting to customers and authorities for drug registration. JMP® is a helpful tool not only for calculating the required parameters but also for automating evaluation. For example, it can be used to graph and automate calculations for Repeatability, Intermediate Precision, Linearity, and Robustness of a Relative Potency Assay. Because these calculations are often required, a single JMP file for calculating the parameters saves a lot of time and can be used by novice users. Furthermore, the evaluation remains consistent throughout various assays, even when different technologies are used, such as SPR, ELISA, or Cell-Based Assays.     Hello, and  welcome  to  my  presentation  about  JMP  in  qualification  and  validation  of  biological  assays.  I've  divided  this  presentation  into  five  parts.  At  first,  I  want  to  give  you  a  small  introduction  about  my  person  and  the  company  I'm  working  for,   VelaLabs.  The  second  part  is  a  general  introduction  about  method  qualification  and  method  validation  like  we  perform  it  at   VelaLabs  often.  The  third  part  is  how  we  collect  and  summarize  the  data.  Then  I  will  continue  with  the  JMP  data  table  where  I've  created  some  scripts  to  evaluate  the  data  generated  during  qualification  and  validation.  The  last  part,  I  will  talk  about  some  additional  robustness  parameters  where  different  functions  of  JMP  are  used. My  name  is  Alexander  Gill,  and  I'm  at  VelaLabs  since  2019.  I'm  a  laboratory  expert  in  the  l igand binding assay  group .  I'm  mostly  responsible  for  method  development,  qualification,  and  validation  for   Biacore assays  and  ELI assays.  VelaLabs is  a  contract  laboratory  for  quality  control u nder  GMP  conditions.  We  have  four  operational  departments:  the  ligand binding assay  group,  the  physico- chemical  group,  and  the  cell- based  assay  group,  and the  microbiological  group. Method  qualification  and  validation  is  important  in  the  life  cycle  of  pharmaceuticals  and   biologicals.  Here,  the  life  cycle  of  such  drugs  is  shown  from  the  pre- clinical  phase  over  the  clinical  phases  and  the  application.  During  the  pre- clinical  phase,  developed  methods  are  suitable  which  are  on  the  scientifically  sound.  Afterwards,  for  the  clinical  trials  phase  1  and  phase  2,  we  use  mostly  qualified  methods. For  method  qualification,  we  show  with  some  suitable  parameters  the  performance  of  the  assay.  If  the  assay  is  then  validated,  derived  from  the  data  generated  during  qualification,  we  create  limits  which  must  be  reached  during  method  validation.  The  validated  method  afterwards  is  used  for  clinical  trials  phase  3,  new  drug  application,  and  also  for  batch  release  in  post-marketing  afterwards. Here,  I've  shown  some  examples  for  the  performance  parameters.  The  accuracy  shows  if  the  method  has  any  bias  or  shift,  or  especially  it  lacks  bias  or  shift/  the  intermediate  precision  is  the  variability  between  runs  where  we  show  that  different  operators  and  different  devices  on  different  days  do  not  influence  the  result.  The  repeatability  is  the  variability  within  one  run  where  we  try  to  keep  the  differences  between  the  reported  values  as  small  as  possible. The  linearity  shows  the  dose  response  of  the  assay  over  the  whole  assay  range.  During  the  robustness,  we  show  that  different  parameters  can  or  cannot  influence  the  result.  For  example,  different  ligand  lots  or  different  models  of  devices.  Then  the  sensitivity  to  detect  stability- indicating  changes,  there  we  use  mostly  stress  samples  to  show  that  they  can  be  easily  distinguished  to  non- stress  samples.  Specificity  is,  for  example,  a  blank  subtraction  or  positive  or  negative  controls. The  data  collection  is  mostly  performed  in  Microsoft  Excel  because  it's  more  accessible  within  our  company.  I  will  come  later  to  this.  We  also  collect  the  reported  value,  which  is  the  final  outcome  of  the  assay.  The  reported  value  is  calculated  using  a  validated  software  like  PLA,  SoftMax  Pro,  or  the  Biacore  Software.  This  is  to  ensure  the  data  integrity.  Every  step  where  a  human  is  involved  in  the  evaluation  has  to  be  checked  by  a  second  operator.  As  I  use  a  relative  potency  assay  as  example  for  this  presentation,  I've  also  shown  here  what's  the  reported  value  for  this  assay.  It's  the  relative  potency  with  the  95 %  confidence  interval  as  a  quality  parameter. Here  are  the  reasons  why  we  use  Microsoft  Excel  for  the  data  collection  because  it's  available  on  every  PC  within  our  company  and  every  employee  has  basic  knowledge  about  it.  The  raw  data  from  the  validated  softwares  are  also  often  exported  in  Excel.  What  is  really  important  that  the  data  in  Excel  are  organized  in   datasets,  so  they  can  be  transferred  to  JMP  more  easily. Here  is  a  basic  experimental  design  for  a  method  qualification  or  validation.  The  first  six  runs  are  basically  designed  around  the  intermediate  precision  where  we  use  50%,  100%,  and  200 %  sample  mimics  in  each  of  these  six  runs.  These  runs  are  spread  above  two  devices,  two  operators,  and  performed  on  three  different  days.  We  report  the  mean  relative  potency  for  each  of  these  dosage  points,  the  standard  deviation,  the  coefficient  of  variation,  and  the  95 %  confidence  interval.  For  accuracy,  we  use  the  same   dataset  as  for  intermediate  precision,  but  we  calculate  the  mean  recovery,  and  therefore  standard  deviation,  C V,  and  95 %  confidence  interval  both  for  all  18   datasets  together  and  also  for  each  dosage  point  separate. The  seventh  run  is  for  the  determination  of  repeatability  where  we  use  six  100 %  sample  mimics  within  one  run  and  also  report  the  mean  relative  potency,  standard  deviations,  CV,  and  the  95 %  confidence  interval.  Then  for  linearity,  which  is  here  in  run  1,  we  use  the  sample  mimics  for  intermediate  precision  and  additionally  use  75 %  and  150 %  sample  mimic  within  this  one  run  to  show  that  the  results  are  linear  over  the  whole  assay  range.  Therefore,  we  report  the  correlation  coefficient,  the  slope,   Y-intercept,  and  residual  sum  of  squares.  For  robustness,  in  this  case,  we  show  a  lower  and  a  higher  immobilization  level  and  also  use  two  different  lots  of  the  ligand. Then  now,  I'll  show  you  the  Excel  table  where  we  can  see  here  in  the  first  few  columns  the  metadata  for  each  data set,  then  the  reported  value  with  the  95 %  confidence  interval,  the  slope  ratio,  which  is  additional  quality  parameter  and  shows  afterwards  if  the  analyte  is  comparable  to  the  reference.  The  column  for  recovery  is  empty  because  the  recovery  will  be  calculated  in  the  JMP  software.  Here,  the  matrix  where  it's  defined  which   datasets  are  used  for  which  parameters. Then  there  are  two  different  possibilities  to  transfer  this  data  into  the  JMP  software.  One  is  with  this  function  where  a  data  table  can  directly  be  created  out  of  this  table.  But  in  this  case,  I  won't  use  this  function  because  I  have  already  created  a  JMP  table  with  all  the  scripts  I  need.  I  just  copy  all  the  data.  But  for  this  procedure,  it's  important  to  show  all  available  digits  of  the  reported  values  because  only  the  shown  digits  are  pasted  afterwards  into  the  JMP  software. I  now  copy  with  CTRL+ C  all  this  data  and  then  go  to  the  JMP  data  table  where  I  can  paste  all  this  data.  Then  we  get  here  an  alert  because  in  the  column  Recovery,  I  created  a  formula  to  calculate  the  recovery.  I  don't  want  to  paste  the  data  in  here,  but  the  Excel  table  does  not  contain  data  in  this  column.  We  click  Okay,  and  everything  is  pasted  as  we  wanted. For  what  purposes  JMP  can  be  used  under  GMP  conditions?  We  use  it  during  the  method  development  phase  for  design  of  experiments,  for  example,  to  investigate  more  different  parameters  of  the  method  within  one  set  of  experiments.  Then  use  it  for  the  statistical  data  analysis  and  also  for  comparability  studies.  For  example , if  a  customer  wants  to  compare  a  biosimilar  with  the  originator. During  qualification  and  validation,  JMP  can  also  be  used  for  the  design  of  experiments.  For  example,  for  the  intermediate  precision  parameters  or  to  spread  the  robustness  parameters  over  the  qualification  runs.  Then  I  will  show  afterwards  for  the  determination  of  assay  performance  in  qualification  and  for  the  check  of  the  assay  performance  during  validation.  But  for  this,  an additional  QC  check  is  required  afterwards  if  all  the  calculations  are  performed  in  the  right  way.  This  is  very  important  that  JMP  is  not  really  usable  for  the  determination  of  reported  values.  Therefore,  as  I  mentioned  before,  we  used  mostly  validated  softwares. Now  we  go  to  the  JMP  data  table  where  I  will   first  show  you  how  I  create  most  of  the  script.  Therefore,  I  use  distribution.  For  example,  if  I  create  the  accuracy  at  50 %,  I  select  the  Recovery  and  choose  it  for  the  Y  columns.  Then  I  click  Okay.  Then  we  have  here  all  available   datasets.  To  limit  these  datas ets,  I  create  a  local  data  filter  and  use  Accuracy  and  edit.  If  it  then  choose  all  the  columns  indicated  with  an  X,  we  have  reduced  the  data sets  to  18. To  reduce  it  further  for  only  the  50 %  sample  mimics,  I  add  with  the  AND  function  an  additional  filter  for  the  nominal  potency,  which  I  then  limit  to  the  sample  mimics  with  about  50 %  nominal  potency.  Then  you  see  we  have  only  six   datasets  left  with  mean  recovery  of  99 %  and  a  coefficient  of  variation  of  about  6 %  and  the  confidence  interval.  To  save  this  script,  I  go  again  to  the  red  triangle  here  and  save  the  script  to  data  table. For  example,  as  accuracy  50 %  2,  because  I've  already  created  a  similar  script  here.  The  difference  for  the  intermediate  position,  if  we  open,  for  example,  here  the  intermediate  position  at  100 %  is  only  that  we  not  use  the  recovery  here,  but  the  relative  potency  and  have  also  again  the  same  parameters  reported. For  repeatability, w e  choose  only  one  run  with  the  six  100 %  sample  mimics.  We  report  also  the  same  data  like  the  mean  relative  potency,  the  standard  deviation,  the  95 %  confidence  interval,  and  also  the  coefficient  of  variation. What's  also  very  interesting  here  is  the  linearity  where  we  use  a  different  function.  I  created  this  using  a  Y  by  X  plot  and  plotted  the  relative  potency  by  the  nominal  potency  and  created  a  linear  fit  through  all  these  data  points.  Then  we  report  the   Y-intercept,  the  slope  of  the  linear  fit,  the  RS quare  or  coefficient  of  correlation,  and  also  the  sum  of  squares  error  or  residual  sum  of  squares.  Then  we  go  back  to  the  presentation. For  additional  robustness  parameters,  we,  for  example,  show  the  performance  of  the   assay using  different  material  lots.  For  them,  we  show  if  they  have  equal  variances.  If  the  variances  are  equal,  we  use  the  T-t est.  If  not,  we  use  the  Welch- test.  For  example,  for  ELISA  methods, w e  also  measure  sometimes  the  plates  on  two  different  models  of  plate  readers  to  show  if  both  models  can  be  used.  This  is  then  analyzed  using  a  paired  T- test. At  the  end,  I  want  to  thank  you  for  your  attention.  If  you  have  any  further  questions,  you  can  type  it  into  the  Q&A  or  contact  me  directly.
This talk will focus on how JMP® helped drastically reduce the cultivation experimentation workload and improved response from four up to 30-fold, depending on the target. This was accomplished by screening potential media components, generally the first, and sometimes tedious, step in fermentation optimizations. Taking characteristic properties such as the chemical composition of complex components like yeast extracts enables flexibility in the case of future changes. We describe the procedure for reducing the workload using FT-MIR spectral data based on a DSD setup of 27 media components. After several standard chemometric manipulations, enabled by different Add-ins in JMP® 16, the workload for cultivation experiments could be drastically reduced. In the end, important targets could be improved up to approximately 30-fold as a starting point for subsequent process optimizations. As JMP® 17 was released in the fall of 2022, the elaborate procedure in version 16 will be compared to the integrated features. It might give space for more inspiration – for developers and users alike.     Hello  everyone,  nice  to  meet  you.  I'm  Egon  Gross  from  Symrise,  Germany.  From  professional,  I'm  a  biotechnologist  and  I'm  looking  forward  for  my  presentation  for  you. Hello  everyone.  I'm  Bill  Worley.  I  am  a  JMP  systems  engineer  working  out  of  the  chemical  territory  in  the  central  region  of  the  US. Hi.  My  name  is  Peter  Hersh.  I'm  part  of  the  Global  Technical  Enablement  Team,  working  for  JMP  out  of  Denver,  Colorado. Peter welcome  to  our  presentation,   Data- driven  selection  as  a  First  Step  for  a  Fast  and  Future -Proof  Process  Development .  First  I  want  to  introduce  the  company  I'm  working  for.   We  are  located  in  Holzminden,  more  or  less  in  the  center  of  Germany,  and  there  are  two  sites  coming  from  our  history. Globally  seen,  we  are  located  with  the  headquarters  in Holzminden .  We  have  big  subsidiaries  in  Peterborough,  in  Sao  Paulo  and  in  Singapore,  and  there  are  quite  a  lot  of  facilities  in  France  also  coming  due  to  our  history. Coming  to  the  history,  Symrise  was  created  in  2003  out  of  a  merger  from  Harmon  and  Rhymer,  which  was  founded  in  1874,  in  Dragoko,  which  is  the  other  side,  from  our  facility,  which  was  established  1990.  Over  the  years  there  have  been  quite  some  acquisitions  and  also  in  2014  the  acquisition  of  Diana,  which  is  mainly  located  in  France  because  that's  the  reason  why  there  are  so  many  different  research  and  development  hubs. Our  main  products  come  from  agricultural  products  or  from  chemical  processes  and  there  are  quite  a  lot  of  diverse  production  capacities,  production  possibilities  for  our  main  customers  being  human  or  pet.  As  is  so  diverse,  we  are  dealing  for  food  for  the  human  consumption,  for  pets  consumption,  and  also  for  health  benefits. On  the  other  side,  the  segment  Scent  and  Care  is  dealing  with  fragrances  coming  from  fine  fragrances,  to  household  care,  to  laundry ,  whatever  thing  you  can  imagine,  that  smell  nicely. As  I  said  in  the  beginning,  I'm  a  biotechnologist  by  training  and  I'm  dealing  a  lot  with  fermentation  processes  to  optimize  them  and  to  scale  them  up  or  down.  One  major  issue  when  it  comes  to  fermentation  is  the  broth,  the  liquid  composition  of  the  media,  which  will  then  feed  the  organisms.  No  matter  which  organisms  that  are,  they  need  carbon  sources,  they  need  nitrogen  sources,  they  need  minor  salts,  major  salts,  pH  values,  and  other  things. It  is  often  important  which  kind  of  media  one  has.  When  it  comes  to  media  composition,  there  are  two  big  branches  which  can  be  seen.  One  is  the  branch  of  synthetic  media,  so  all  components  are  known  in  the  exact  amount  and  composition.  The  other  way  are  complex  media,  for  example,  having  a  yeast  extract  or  a  protein  extract  or  whatever,  where  it's  a  complex  mixture  of  different  substrates,  different  chemical  substances.  The  third  approach  would  be  a  mixture  of  both. One  of  the  side  effects  of  these  complex  media  is  that  it's  quite  easy  to  deal  with  them.  But  on  the  other  hand,  there  can  be  constitutional  changes  over  time,  as  some  vendors  tends  to  optimize  their  processes,  their  products,  to  whatever  region,  to  whatever  target.  Some  customers  get  hold  of  those  changes,  some  don't. Another  issue  might  be  the  availability o f  it's  a  natural  product  like   [inaudible 00:04:38]  or  whatever.  You  might  know  some  ingredients,  you  will  surely  not  know  all  ingredients  and  there  might  be  promoting  or  inhibiting  substances  within  those  mixtures. At  the  beginning  of  a  process  development,  the  media  is  of  main  importance.  Therefore  I  tried  to  look  at  carbon  sources,  nitrogen  sources,  salts,  trace  elements,  and  so  on,  being  my  different  raw  materials.  While  growing  the  organisms,  one  has  to  take  care  of  different  temperature,   stirring velocities  to  get  oxygen  into  the  liquids,  cultivation  time,  and  there  are  a  lot  of  unknown  variables  to  get  an  idea  what  the  effect  might  be  to  the  cell  dry  weight,  for  example,  or  to  the  different  targets  compounds  one  has  in  mind. For  this  setup,  I  used  then  the  definitive  screening  design.  As  the  most  of  you  know,  they  are  quite  balanced  and  have  a  particular  shape  which  is  reflected  in  these  three- dimensional  plot.  You  can  see  definitive  screening  design  is  somehow  walking  around  certain  edges  and  having  a  center  point.  Due  to  the  construction  of  the  definitive  screening  design,  one  can  estimate  interactions  and  square  effects.  These  interactions  are  not  confounded  with  the  main  factors  and  the  main  factors  itself  are  also  not  confounded  with  each  other.  This  is  a  very  nice  feature  of  the  definitive  screening  design  and  therefore  they  are  very  efficient  when  it  comes  to  the  workload  compared  to  formerly  known  screening  designs. Some  disadvantages  are  also  there.  One  big  disadvantage  is  if  you  have  about  50%  of  the  factors  that  are  working  that  have  a  high  influence  or  even  more,  you  have  a  significant  influence,  significant  confounding,  which  you  have  to  take  care  of.  In  this  particular  case,  although  it's  the  leanest  possible  design  I  found,  the  practical  workload  would  require  five  to  six  months  just  for  screening.  This  is  far  too  long  when  it  comes  to  a  master  thesis. The  alternative  was  then  to  build  another  design  or  to  build  another  process.  I  was  so  inspired  in  Copenhagen  2019  by  a  contribution  from  Bill  where  he  talked  about  infrared  spectroscopy  and  I  thought  why  that  might  be  a  good  idea,  using  the  chemical  information  hidden  in  a  near-infrared  spectrum  to  describe  the  chemical  composition  of  the  mixtures. Therefore  I  established  this  workflow.  First,  the  media  preparation  was  done  of  all  the  65  mixtures.  Then  the  near- infrared  spectrum  was  measured,  some  chemometric  treatments  were  preferred  and  afterwards,  the  space  of  variation  could  be  held  constant  at  a  maximum,  but  the  number  of  the  experiments  could  be  reduced  quite  significantly. To  show  you  how  the  workflow  is,  I  started,  as  I  said,  with  spectral  information.  One  of  the  first  principles  one  has  to  do  is  to  make  a  standardization  of  the  spectra  to  avoid  baseline  shifts  and  things  like  that.  This  is  one  way  to  make  it.  Introducing  a  new  formula  to  standardize,  or  what  I  did,  I  used  an  add-in  to  preprocess  and  calculate  the  standard  normal  variety,  which  is  when  it  comes  to  the  digits,  the  same  as  the  standardization,  as  we  see  here. With  this  standardized  spectra,  depending  on  each  measurement,  I  continued  then  and  compiled  first  all  these  spectra.  What  you  see  here  on  the  top  is  the  absorption  of  every  sample.  We  have  an  aqueous  system  so  we  took  water  as  a  reference.  A fter  building  the  difference  between  the  absorption  and  the  water,  we  then  got  deeper  and  saw  differences  within  the  spectra. One  of  the  big  question  was  do  I  calculate  first  the  difference  between  the  absorption  of  the  sample  and  the  water  and  calculate  then  the  standard  normal  variety?  Or  do  I  first  calculate  the  standardization  and  then  use  these  standardized  values  from  the  water  background? One  could  think  the  procedure  is  the  same,  but  the  outcome  is  different.  As  you  see  here,  on  the  right- hand  side  of  the  dashboard,  I  zoomed  into  this  area  and  in  the  lower  part,  the  curves  have  a  different  shape,  a  different  distance  from  each  other  than  in  the  upper  part.  T his  might  have  then  an  influence  on  the  subsequent  regression  analysis.  Therefore,  I  selected  first  to  make  the  standardizing  and  then  the  difference  calculations. After  I  did  these  first  steps,  then  came  the  chemometric  part,  that  is  smoothing  and  the  filtering and  to  calculate  the  derivatives.  This  is  a  standard  procedure  using  an  add-in  which  is  available.  You  can  imagine  that  the  signal  might  have  some  noise.  This  is  seen  here  in  the  lower  part,  the  red  area  is  the  real  data,  and  the  blue  curves  are  the  smooth  data.  On  the  left  upper  side,  you  see  the  first  derivative.  On  the  right  upper  side,  the  second  derivative  of  these  functions.  If  it  comes  to  polynomial  fits,  it's  depending  on  the  procedure,  what  you  are  fitting,  what's  the  polynomial  order,  and  how  broad  your  area  is,  where  you  make  the  calculations  in. If  we  take  here  only  a  second- order  polynomial,  you  see  that  it  might  change.  Now,  this  is  not  a  two,  this  ought  to  be  a  20.  Then  the  curve  smooths  out.   Although  it's  smooth,  you  can  see  differences  in  height,  in  shape.  To  get  hold  of  those  data,  one  has  to  save  the  smooth  data  to  data  tables,  separate  data  tables.  Then  I  tried  different  settings  for  the  smoothing  process,  because  I  did  not  know  from  the  beginning  which  process  is  the  best  to  fit  my  desired  outcome  of  the  experiment  at  the  end. After  quite  a  lot  of  smoothing  tables,  which  were  then  manually  done,  and  I  then  concatenated  the  tables.  These  are  all  the  tables  we  just  made.  I'm  going  to  the  first  one  and  say,  please  concatenate  all  of  the  others.  The  nice  thing  is  that  you  then  have  at  the  end,  these  different  distances  coming  from  the  smoothing  effect.  I  had  a  second  polynomial  order.  A  third  polynomial  order  is  20  points  to  the  left  and  to  the  right  for  the  smoothing  process  and  30  and  so  on. This  is  just  a  small  example  to  show  you  the  procedure.  I  did  quite  more.  What  I  did  was  this  amount  of  treatment.  I  had  [inaudible 00:15:01]  for  a  second,  third,  or  fifth  polynomial  order  with  10, 20,  or  30.  Now  came  the  big  part  to  decide  which  particular  procedure  represents  my  space  at  best.  This,  therefore,  I  made  a  principal  component  analysis  of  all  my  treatments  I  did. This  is  a  general  overview.  The  numbers  represent  each  experiment  by  its  own  that  you  can  follow  them  in  the  different  scores  and  loading  spots.  The  loading  plot  is…  That's  a  regular  picture  of  a  loading  plot  when  it  comes  to  spectral  data.  If  you  take  into  account  that  we  are  coming  from  a  design,  this  value  of  24%  explained  variation  at  the  beginning  for  the  first  component  is  very  high. Why?   Because  the  factors  of  the  definitive  screening  designs  are  orthogonal  to  each  other  and  independent  from  each  other.  One  would  expect  lower  values  for  the  principal  components.  After  this  treatment,  the  first  derivative  with  second  order  polynomial  and  10  points  to  the  left  and  to  the  right  for  the  smoothing,  it  looks  very  evenly  distributed.  You  might  think  of  a  cluster  here  on  top  or  below. I  went  through  all  of  these  particular  processes  and  selected  then  a  favored  one,  where  I  saw  that  the  first  principal  component  has  a  very  slow  describing  power  for  the  variation.  That's  then  the  way  I  proceeded. After  selecting  the  particular  pre- processed  data,  I  still  have  my  65  samples.  But  as  we  heard  at  the  beginning,  65  is  far  too  much  for  a  workload.  If  you  ask  yourself,  why  is  there  132  samples?   That  is  because  I  copy  pasted  the  design  below,  the  original  design  for  the  spectral  treatment  I  used  then. If  you  want  then  to  select  your  runs  you  are  able  to  make  due  to  time  reasons  or  due  to  cost  reasons  or  whatever,  this  is  one  process  you  can  make  use  the  coverage  and  the  principal  components.  Then  this  is  the  design  space  which  is  available  dealing  for  the  all  variation  which  is  inside.  But  as  you  see,  we  would  need  to  make  132  experiments.  If  we  then  go  just  select  all  the  principal  components  and  say  please  make  only  the  one  which  are  possible,  then  you  have  the  ability  to  type  in  every  number  you  want  to. At  this  stage,  I  selected  several  smaller  or  bigger  designs  and  saw  how  far  can  I  go  down  to  reach  at  least  a  good  description  power.   I  made  these  25  experiments,  let  JMP  select  them.   The  nice  thing  is  with  this  procedure,  if  you  are  coming  back  to  your  data  table,  they  are  selected.  But  this  procedure  I  didn't  do  right  at  the  beginning.  At  the  beginning,  I  made  a  manual  selection. How  did  I  do  that?  I  took  the   score plot  of  the  particular  treatment  and  then  selected  manually  the  outer  points  as  good  as  possible.  Not  only  in  the  picture  of  the  first  and  second  principal  component,  but  I  went  deeper.   This,  for  example,  is  the  comparison  of  a  selection  method  I  just  showed  you  with  the   DOE of  the  constraint  factors  and  with  the  manual  selection,  just  for  showing  you  maybe  some  differences. If  you  make  this   DOE selection  several  times,  don't  be  confused  to  get  not  always  the  same  numbers,  the  same  experiments,  which  might  be  important.   With  this  approach,  I  then  reduced  the  workload  from  64  experiments  to  25  experiments.   In  all  of  these  experiments,  all  my  raw  materials  I  had  from  the  beginning  were  inside.  I  didn't  leave  any  raw  material  out,  and  that  was  very  nice  to  see,  that  I  could  retain  the  space  of  the  variation. After  the  cultivation  in  two  blocks,  which  took  a  frame  week  of  three  weeks  for  each  block,  we  yet  then  analyzed  our  metabolome  and  the  supernatant  and  determined  our  cell  drive  mass.  For  time's  sake,  I  show  you  only  the  results  and  the  procedure  for  the  cell  dry  mass.  Other  molecules  might  be  the  same  procedure  to  be  done  then. The  next  issue  I  had  was  that  there  is  a  confounding.  I  had  to  expect  the  confounding  because  I  had  only  25  experiments  for  27  mixtures  coming  out  of  a  design  where  I  knew  where  I  supposed  to  have  interactions  and  quadratic  effects.  These  interactions  is  nothing  new  when  it  comes  to  media  composition.  Quadratic  effects  were  nice  to  be  seen. Then  came  the  next  nice  thing,  which  was  introduced  by  Pete  Hersh  and  Phil  K.  It's  the  SVEM  process,  the  Self-V alidated  Ensemble  Model.  In  this  sketch,  you  see  the  workflow  and  we  will  go  through  that  in  JMP.   The  first  thing  was  to  look  at  the  distribution  of  my  target  value.  After  making  a  log  transformation,  I  then  saw  that  it's  normally  distributed.  So  we  have  a  log- normal  distribution.  That's  nice  to  know. The  first  thing  was  to  download  this  add- in,  Auto validation  Set-up,  and  hit  the  run  button.  We  then  get  a  new  table.  The  new  table  has  50  rows  instead  of  25  rows  from  our  original  table.  Why  is  that  so?  The  reason  for  that  is  while  hitting  the  button,  the  data  table  gets  copy- pasted  below  and  we  get  a  differentiation  into  the  validation  set  and  into  the  training  set,  as  you  see  here.   The  nice  feature  of  this  Auto validation  table  is  that  you  can,  due  to  a  simulation,  find  out  which  parameters,  which  factors  have  an  influence. This  happens  by  the  spared  fractionally  weighted  bootstrap  weight.  If  you  look  for  example,  the  second  experiment  has  a  value  of  1.8  in  the  training  set  and  the  same  sample  has  a  value  of  0.17  in  the  validation  set.   This  then  gives  one  the  ability  to  have  a  bigger  weight  for  some  samples  in  the  training  set  and  vice  versa  in  the  validation  set.   While  they  have  a  bigger  value,  a bigger  weight  in  the  training  set,  they  have  a  lower  weight  in  the  validation  set. To  analyze  this,  it's  necessary  to  have  the  pro  version  to  make  a  generalized  regression.  As  we  took  the  log  value  of  our  cell  dry weight,  I  can  then  make  a  normal  distribution  and  then  it's  recommended  to  make  a  lasso  regression.  From  the  first  lasso  regression,  we  get  a  table  for  the  estimates,  and  now  comes  the  nice  part.  We  make  simulations  changing  the  paired  weight  bootstrap  weight  of  each  factor. For  time's  sake,  I'm  just  making  50  simulations.   From  these  50  simulations,  we  get  then  the  proportion  for  each  factor  we  had  in  the  design  where  it  entered  the  regression  equation,  or  didn't  enter  the  regression  equation.   This  pattern  comes  due  to  this  randomization  process  of  the  bootstrap  forest  method.  From  this  distribution  we  go  to  the  summary  statistics,  customize  them,  we  are  just  only  interested  in  the  proportion  nonzero.  This  proportion  nonzero  is  finally  the  amount  of  the  50   simulations.  How  often  this  particular  variable  went  into  the  regression  equation. From  this,  we  make  a  combined  data  table  and  have  a  look  on  the  percentage  of  each  variable  being  in  a  model  or  being  not  in  a  model.  This  looks  a  little  bit  confusing.  If  we  are  ordering  it  by  the  column  two  descending,  we  then  see  a  nice  pattern. Now  you  can  imagine  why  I  introduced  at  the  beginning  this  null  factor  or  these  random  uniform  factors.  T he  uniform  factors  were  manually  introduced.  The  null  factor  was  introduced  by  hitting  the  auto- validation  set.  What   do these  points  mean?  These  points  mean  that  until  the  null  factor,   these  variables  have  a  high  potential  because  they  were  quite  often  within  the  model- building  processes.  These  at  the  bottom  were  quite  seldom  within  the  model- building  processes  so  the  ability  to  reduce  your  complexity  is  given  by  just  discarding  these.  Here  in  the  middle  one  has  to  decide  what  to  do. After  having  this  extraction,  not  losing  information,  and  not  losing  variation,  one  can  then  think  of  different  regression  processes  making  response  surface  model  or  step wise  regression  or  whatever  regression  you  have  in  mind.  It's  wise  to  compare  different  regression  models  looking  what's  feasible,  what's  meaningful.   That  was  the  procedure  I  used  in   JMP 16.  While  coming  now  to  Pete  and  Bill,  they  will  describe  you  something  else. Thank  you,  Egon.  That  was  a  great  example  of  an  overview  of  your  workflow.  Thank  you.  What's  new  in   JMP 17  that  might  have  helped  Egon  a  little  bit  with  the  tools  he  was  working  with?   I'm  going  to  start  off  with  a  little  bit  of  a  slide  show  here.  I'm  going  to  be  talking  about   Functional Data Explorer.  That's  in   JMP Pro  and  talking  about  the  pre- processing  and  Wavelet  modeling  that  are  built  into   Functional Data Explorer  now. All  right,  so  let  me  slide  this  up  a  little  bit  so  you  can  see.   What's  new  in   JMP 17?  We've  added  some  tools  that  allow  for  a  better  chemometric  analysis  of  spectral  data.  Really  any  multivariate  data  that  you  might  have  that  you  can  think  of,  these  tools  are  there  to  help.  First  is  adding  the  preprocessing  methods  that  are  built  into  FDE  now. We've  got  standard  normal  variant,  which  Egon  showed  you.  We've  got  multiplicative  scatter  correction,  which  is  a  little  bit  more  powerful  than  the  standard  normal  variant.  Both  of  these  will  not  disrupt  the  character  of  your  spectra.  That's  not  the  story  with   Savitzky-Golay.  It  does  alter  the  spectra,  which  will  then  make  a  little  bit  harder  to  interpret  the  data.  The  key  thing  is  it  still  helps.  Then  we  have  something  called  polynomial  baseline  correction,  which  is  another  added  tool  if  you  need  that. The  next  step  would  be  then  to  save  that  preprocess  data  for  further  analysis,  like  principal  component  analysis,  partially  squares,  so  on  and  so  forth,  so  you  can  do  some  analysis  there. The  Wavelet  modeling  is  a  way  to  look  at  the  chemometric  data  similar  to  principal  component  analysis.  We're  fitting  a  model  to  the  data  to  determine  which  is  the  best  overall  fit  for,  in  this  case,  25  spectra.  That's  the  goal  here.  It's  an  alternative  to  spline  models.  It's  typically  better  than  spline  models,  but  not  always.  You  get  to  model  the  whole  spectra,  not  the  point- by- point,  which  you  would  do  with  other  analysis  types. Then  you  get  to  discern  these  things  called  shape  functions  that  make  up  the  curve.  These  shape  functions  are,  again,  similar  to  principal  component  analysis  in  that  they  are  helping  with  dimension  reduction.   Then,  as  I  said  before,  these  are  excellent  tools  for  spectral  and  chromatographic  data,  but  virtually  any  multivariate  data  is  fair  game. These  are  just  an  example  of  the  Wavelet  functions  that  are  built  in.  I  could  try  and  pronounce  some  of  these  names,  but  I'll  mess  them  up,  but  know  that  these  are  in  there.  There  is  a  site  here  that  you  can  look  up  what  these  Wavelets  are  all  about.  I  got  the  slide  from  Ryan  Parker  so  thank  you,  Ryan. Almost  last  but  not  least,  what  we're  doing  with  this  functional  principal  component  analysis  is  we're  trying  to  determine,  again,  what's  the  best  overall  fit  for  these  data  and  then  compare  the  curves  as  needed.  What  comes  out  of  the  Wavelet  modeling  is  a  Wavelet  DOE,  and  we  determine  which  wavelengths  have  the  highest  energy  for  any  given  spectra  or  whatever  we're  looking  at. These  Wavelet  coefficients  can  then  be  used  to  build  a  classification  or  quantification  model.  That's  up  to  you.  It  depends  on  the  data  and  what  supplemental  variables  you  have  built  in.  In  this  case,  this  is  a  different  example  where  I  was  looking  at  percent  active  based  on  some  near  IR  spectra. Let's  get  into  a  quick  example.  All  right.  This  is  Egon's  data.  I've  taken  the  data  that  was  in  the  original  table,  this  absorption  minus  the  water  spectra,  and  I've  transposed  that  into  a  new  data  table  where  I've  run   Functional Data Explorer.  I'm  just  going  to  open  up  the  analysis  here.  It  does  take  a  little  bit  to  run,  but  this  is  the  example  that  wanted  to  show. We've  done  the  pre- processing  beforehand.  We've  taken  the  multiplicative  scatter  in  this  case  and  then  the  standard  normal  variate,  and  then  built  the  model  off  of  that.   After  this  function  or  these  pre- processing  tools  which  are  found  over  here,  I'm  going  to  say  that  data  out,  and  then  that  data  is  going  to  be  used  for  further  analysis  as  needed. To  build  on  the  story  here,  we've  got  the  analysis  done.  We  built  the  Wavelet  model.  After  we've  gotten  the  mean  function  and  the  standard  deviation  for  all  of  our  models,  we  build  that  Wavelet  model  and  we  get  the  fit  that  you  see  here.  What  this  is  telling  us  is  that  the  Haar  Wavelet  is  the  best  overall  based  on  the  lowest   Bayesian Information  Criteria  score .   Now  we  can  come  down  here  and  look  at  the  overall  Wavelet  functions,  the  shape  functions,  and  get  an  idea  of  which  Wavelets  have  the  highest  energy,  which  shape  functions  are  explaining  the  most  variation  that  you're  seeing  between  curves,  and  then  you  can  also  reduce  the  model  or  increase  the  model  with  your  selection  here  with  the  number  of  components  that  you  select. One  thing  that  comes  out  of  this  is  a   Score Plot  which  allows  you  to  see  groupings  of  different  in  this  case,  spectra.  One  that  you're  going  to  see  down  here  is  this. This  could  be  a  potential  outlier.  It's  different  than  the  rest.  If  you  hover  over  the  data  point,  you  can  actually  see  that  spectra.  You  can  pin  it  to  the  graph,  pull  that  out,  and  then  let's  say  let's  just  pick  another  blue  one  here  and  we'll  see  if  we  can  see  where  the  differences  are. It  looks  like  it  might  be  at  the  beginning .   If  we  look  at  this  right  here,  that's  a  big  difference,  then  maybe  that  just  didn't  get   subtracted  out  or  pre- processed  the  same  way  in  the  two  spectra.  I  don't  have  an  example  of  the  Wavelet   DOE for  this  set up,  but  just  know  that  it's  there.  If  you're  interested  in  this —this  has  been  a  fairly  quick  overview— but  if  you're  interested  in  this,  please  contact  us,  and  we  will  find  a  way  to  help  you  better  understand  what's  going  on  with  Wavelet   DOE and  preprocessing  built  into   JMP Pro.  Pete,  I  will  turn  it  over  to  you. All  right.  Well,  thanks,  Bill  and  Egon.   Just  like  Bill,  I  am  going  to  go  over  how   Self-Validating Ensemble Models  changed  in   JMP 17.   Bill  showed  how  you  could  do  what  Egon  did  in  16   in 17  much  easier  using   Functional Data Explorer.  For  me,  I'm  going  to  take  that  last  bit  that  Egon  showed  and  with  the  add- in,  creating  that SVEM set up.  Using  those  partially  weighted  bootstrap  columns  and  then  also  making  that  validation  and  the  null  factor.   I'm  going  to  just  show  how  that's  done  now  in   JMP® 17.  So  this  is  much  easier  to  do  in   JMP 17.  Just  like  that,  spectral  data  processing  with  FDE,  this  is  done  in   JMP 17. If  you  remember,  Egon  had  gone  through,  he  looked  at  all  those  spectra,  he  extracted  out  the  meaningful  area,  looking  at  smoothers,  the  standard  normal  variant,  and  did  a  bunch  of  different  pre-processing  steps.  Then  he  took  those  preprocessing  steps  and  he  selected  a  subset  of  those  runs  to  actually  run,  and  he  had  come  up  with  25.   Here  is  those  25  runs.   From  this  step,  what  he  did  is  that   Self-Validating Ensemble Model  or SVEM. In  16,  this  took  a  little  bit  of  doing.  You  had  to  make  that  model,  then  you  had  to  simulate,  then  you  had  to  take  those  simulations,  and  run  a  distribution  on  each  one  of  them,  and  then  get  the  summary  statistics,  and  then  extract  that  out  to  a  combined  data  table,  and  then  graph  that  or  tabulate  that  and  see  which  ones  happen  the  most  often. That  was  a  lot  of  steps  and  a  lot  of  clicks  to  do,  and  Egon  has  clearly  done  this  a  bunch  of  times  because  he  did  it  pretty  quickly  and  smoothly,  but  it  took  a  little  bit  of  doing  to  learn.   Clay  Barker  made  this  much  easier  in   JMP 17.   Same  25  samples  here,  and  instead  of  running  that  Auto validation  Set- up  add- in  that  Egon  showed,  we're  going  to  just  go  ahead  and  go  to  Analyze  and  Fit  Model. We'll  set  up  our  model.  I f  you  remember,  we're  taking  this  log  of  the  dry  weight  here.  We're  going  to  add  a  couple  of  random  variables  along  with  all  of  our  factors  into  the  model,  and  then  we're  going  to  make  sure  that  we've  selected  generalized  regression.   This  is  the  set up  for  our  model,  we're  going  to  go  ahead  and  run  it,  and  in   JMP 17,  we  have  two  new  estimation  methods. These  are  both   Self-Validating Ensemble Model  methods.  The  first  one  is  a  forward  selection.  I'm  going  to  go  ahead  and  use   SVEM Lasso  because  that's  what  Egon  used  in  his  portion,  and  here  you  just  put  in  how  many  samples  you  want.   He  had  selected  50.  I'm  going  to  just  go  with  the  default  of  200.  Hit  go,  and  you  can  see  now  it's  running  all  of  that  behind  the  scenes  where  you  would  have  simulated,  recalculated  those  proportional  weights,  and  then  at  the  end  here,  we  just  have  this  nice  table  that  shows  us  what  is  entering  our  model  most  often  up  here. Then  when  we  hit  something  like  a  random  variable.   Just  out  of  randomness,  something  that's  entering  that  model  is  entering  maybe  about  half  the  time.   Things  that  are  entering  more  regularly  than  a  random  variable,  we  have  pretty  high  confidence  that  those  are  probably  variables  we  want  to  look  at.   Then  we  would  go  from  here  and  launch  the  Profiler.  I've  already  done  that  over  here,  so  we  don't  have  to  wait  for  it  to  launch  or  assess  variable  importance. But  here,  this  shows  us  which  of  these  factors  are  active.  We  can  see  the  most  active  factors,  and  while  it's  not  making  a  super  accurate  model,  because  again  if  you  remember,  we  are  taking  25  runs  to  try  to  estimate  27  different  factors.  If  you  take  a  look  here  at  the  most  prevalent  ones,  this  can  at  least  give  you  an  idea  of  the  impact  of  each  one  of  these  factors.  All  right,  so  that  basically  sums  up  what  Egon  had  done.  It  just  makes  this  much  easier  in   JMP 17,  and  we  are  continuing  to  improve  these  things  and  hope  that  this  workflow  gets  easier  with  each  release  of  JMP.   Thank  you  for  your  attention  and  hopefully,  you  found  this  talk  informative.
The new Workflow Builder introduced in JMP 17 is a great time saver for automating a fixed set of tasks. But wouldn’t it be great to create a workflow you could use on any data table? With Workflow Builder’s Reference Manager, you can! In this presentation, we will show how to record a workflow using a specific data table, then modify it so it can be used on any other table. Using this technique, you can build a custom tool with minimal JSL coding and share it with your colleagues.     Hello,  my  name  is  Michael  Hecht,  and  I  am  here  to  talk  today  about  the  workflow  builder,  which  is  a  new  feature  in  JMP  17 that  allows  you  to  record  operations  that  you  do  within  JMP  and  then  play  them  back  to  recreate  those  operations. If  you  attended  the  plenary  session  this  morning,  then  you  saw  Mandy  Chamber's  demo  of  workflow  builder,  which  gave  a  good  overview.   Mandy  also  gave  a  talk  this  morning  that  goes  into  more  detail  on  the  workflow  builder  user  interface. I'm  going  to  start  talking  about  a  more  advanced  feature  of  the  workflow  builder  called  the  reference  manager.  This  is  a  feature  that  allows  you  to  take  a  workflow  and  manage  how  references  to  data  tables  and  columns  within  those  tables  are  mapped  and  resolved.  It  allows  you  to  make  workflows  that  are  more  generic  to  be  used  with  any  data  table. Let's  get  started.  I'm  going  to  start  by  creating  a  new  workflow  which  is  under  File,  New  and  there's  New  Workflow.  You  see  the  little  tag  that  says  this  is  a  new  item  for  version  17.  When  I  choose  this,  I  get  this  untitled  workflow  builder  window.  The  panel  in  the  center  lists  the  steps  of  my  workflow  and  as  you  can  see,  they're  empty. I'm  just  going  to  click  the  record  button,  which  is  this  button  with  the  big  red  dot.  You  see  it  changes  appearance  to  show  that  recording  is  in  progress.  Then  I  will  do  some  operations  in  JMP and  have  them  be  recorded. First,  I'll  go  I'm  going  to  go  to  the  File  menu  and  I'm  going  to  open  up  a  data  table.  We  see   Big Class  here  opened  and  if  we  look  in  the  workflow  builder,  we  see  that  a  step  was  added  to  do  that  same  operation. Now  I'm  going  to  go  to  the  Analyze  menu  and  I'll  do  a  Fit  Y  by  X  to  do  a  one- way  analysis  using  age  as  my  X  factor.  For  the  response,  I'm  going  to  create  a  transform  column  by  right  clicking  and  choosing  formula.  I'm  going  to  use  weight  divided  by  the  height  squared,  which  is  the  formula  for  body  mass  index. If  we  were  using  metric  units  of  kilograms  over  centimeter  squared,  then  that  would  be  sufficient.  But   Big Class  has  its  measurements  in  imperial  units  of  inches  and  pounds.  We  have  to  multiply  this  by  a  scaling  factor  to  get  a  standardized  BMI. Now  I  have  my  formula  correctly,  I'm  going  to  rename  my  transform  column  'cause  I  don't  want  to  use  the  default  name,  I'm  going  to  call  it  BMI.  Click  okay.  Click  okay.  Now  BMI  is  in  my  list  of  columns  that  I  can  use  to  create  the  report.  I'm  going  to  add  that  as  my  response.  Click  okay. Here's  my  one- way  analysis  of  variants.  You  notice,  though,  that  it  did  not  yet  add  this  step  to  the  workflow  builder,  but  there  is  a  little  note  at  the  top  saying,  hey,  I  see  you  launched  a  platform. A s  soon  as  you  finish  with  the  analysis  and  close  the  window,  I'll  add  it  to  the  workflow. That's  so  the  workflow  builder  can  capture  any  changes  you  might  make  to  the  analysis.  For  example,  I'm  going  to  turn  on  means  ANOVA  here  to  get  the  means  diamonds.  Now  when  I  close  the  window,  you  see  the  step  gets  added,  report  snapshot,   Big Class,  Fit  Y  by  X  of  BMI  by  age. Great.  I  am  done  recording.  I'm  going  to  click  the  button  again  to  stop  recording.  Then  I'm  going  to  turn,  switch  to  presentation  mode  for  this  workflow.  Like  the  tool  tip  says,  that  removes  some  of  the  editing  controls.  Namely  it  removes  the  record  button  so  that  I  don't  accidentally  hit  it  again,  and  it  takes  away  this  activity  log  at  the  bottom. I'm  going  to  rewind  my  workflow  to  the  beginning,  which  closes  windows  that  were  associated  with  it.  Then  I'm  going  to  click  the  Run  button  to  replay  it  just  to  make  sure  it  does  what  I  wanted  to  do.  I  click  Run  and  it  opens   Big Class.  Then  here's  the  analysis  just  like  that  we  had  it  before. That's  great.  I  want  to  look  at  these  different  steps  in  the  workflow.  You  can  see  that  behind  each  one  is  some  JSL  that  shows  up  in  the  tool  tips.  But  I  can  open  up  this  step  settings  panel  to  see  more  details.  I'll  click  on  the  first  step  and  we  see  that  there's  some  metadata  information  and  then  the  JSL  to  run  it,  which  is  just  an  open  command  and  there's  a  path  name  to  the  file. When  I  click  on  the  second  one,  I  see  that  we  have  JSL  that  sends  a  one- way  message  to  this  reference  to  the  bigc lass. Jmp  data  table.  Inside  the  one- way,  we  see  references  to  weight  and  height  and  age.  There's  a  reference  to  BMI,  but  BMI  is  computed  as  a  transform  column  right  there. That  all  looks  good.  But  when  I  run  this  workflow,  I  don't  always  want  to  run  it  on   Big Class.  I  might  want  to  run  it  on  a  different  data  table.  I'd  like  the  user  of  the  workflow  to  be  able  to  choose  that  data  table.  I  really  don't  need  this  open  data  table  step  at  all.  I'm  going  to  select  it  and  click  the  trash  can  icon  here  to  remove  it  from  the  workflow. Well,  now  what's  going  to  happen  when  I  run  my  workflow?  Let's  give  it  a  try.  I'll  click  the  Run  button  and  it  immediately  prompts  me  to  choose  a  data  table.  The  data  table   Big Class,  it  can't  find  it  anywhere.  It's  not  in  the  list  of  tables  that  are  currently  open  in  JMP.  It's  asking  me,  do  I  want  to  go  find  it  and  open  it  or  maybe  open  a  different  table? It  says  that  anywhere  that   Big Class  is  referenced,  it  will  use  the  table  that  I  opened.  It  has  a  list  here,  but  there's  only  one  item  in  it .  I'm  going  to  select  that  and  click  okay.  Now  it's  prompting  me  to  go  ahead  and  open  a  table.  Well,  I  have   Big Class  right  here,  so  I'm  going  to  drag  that  in  and  click  Open,  and  it  runs  just  like  before. Well,  that's  great.  Let  me  rewind  and  run  it  again,  but  this  time  I  want  to  choose  a  data  table  that's  not   Big Class.  Let's  see  what  happens.  I'll  click  okay,  and  I  want  to  use  this  table  Football.  Football  is  data  about  a  college  football  team  playing  American  football  as  opposed  to  rest  of  the  world  football. I'm  going  to  open  that  and  there  it  is.  But  you  see,  I'm  immediately  prompted,  hey,  this  data  table  does  not  have  a  column  named  age.  What  would  you  like  to  use  instead?  I'm  going  to  choose  position  and  I'll  click  okay,  and  I  get  my  one- way  analysis. Notice  that  it  did  not  prompt  me  for  height  and  weight  because  those  columns  already  exist  in  the  data  table,  so  it  just  uses  them  directly.  T hese  positions  are  now  the  categories  that  it's  using  for  the  one- way  analysis,  and  they're  all  the  abbreviations  for  different  positions  in  American  football. You  have  your  defensive  back,  defensive  lineman,  full  back,  half  back.  I  don't  know  what  IB  is,  but  kicker,  offensive  back,  offensive  lineman,  quarterback,  tight  end.   You  see  the  wide  receivers  have  a  nice  little  group  here  with  low  BMI  because  they  have  to  be  fast. That's  all  cool.  But  I  noticed  in  this  football  data  table  right  below  position,  there's  a  second  variable  called  position  2. I f  we  look  at  what  that  is,  it's  a  different  categorization  of  the  data.  Position  divides  the  data  into  11  categories,  but  position  2  divides  it  into  7. It  might  be  interesting  to  run  my  workflow  using  position  2  for  comparison.  I'm  going  to  rewind  this  and  I'm  going  to  run  it  again,  but  this  time  I'll  choose  position  2.  Well, wait  a  minute,  it  didn't  even  give  me  a  chance  to  choose  the  variable.  It  just  went  ahead  and  used  position. In  fact,  it  didn't  ask  me  what  data  table  to  use.  It  decided  Football  was  already  open,  so  it  could  just  use  that.  Somehow  the  workflow  is  remembering  my  choices  from  the  previous  run.  If  I  want  to  choose  a  different  variable,  I  have  to  somehow  prevent  that  from  happening. Let's  take  a  look  at  this  workflow.  We  see  that  there's  an  option  on  the  red  triangle  menu  that  says  References  and  has  a  sub menu,  allow  replacement  of  references.  Well,  that's  already  checked  and  the  tool  tip  says  it  allows  prompting  for  tables  and  column  references  that  it  can't  find. Well,  that's  exactly  what  happened  the  first  time  we  ran  it.  But  then  when  we  ran  it  again,  it  reused  the  replacements  that  already  had.  But  here's  the  second  option,  Clear  Replacements.  The  tool  tip  here  says  that  it  clears  the  previous  replacement  choices,  which  is  what  we  want.  Let's  do  that,  and  then  we'll  rewind  this  and  run  it  again. Okay. Now  it's  prompting  me  for  a  data  table  again.  Because  Football  is  one  of  the  tables  already  opened,  it  appears  in  the  list  here.  I  can  just  pick  it,  click  okay,  and  now  it's  prompting  me  for  age  again.  This  is  great.  I  can  pick  position  2  and  click  okay.  You  can  see  the  seven  categories  that  are  defined  by  position  two  here. Okay,  well,  let's  go  back  to  position  1  now.  Rewind  this  and  run  it  again.  Again,  it  didn't  give  me  a  chance  to  choose  position  2.  I  don't  really  want  to  have  to  clear  the  replace ments  every  time  that  I  run  this  workflow.  What  I  need  is  some  way  to  control  which  replace ments  are  remembered.  I  see  this  third  option  here  that  says  Manage.  P ool tip  says  it  manages  the  table  and  column  references  that  can  be  replaced  at  runtime.  Let's  see  what  we  can  do  here. This  brings  up  a  window  called  the  Reference  Manager.  At  the  top  is  this  check  box,  allow  replacement  of  references,  which  is  pretty  much  the  same  as  that  first  sub menu  choice.  Just  like  that,  when  it's  checked.  Then  there's  a  button  reset  all  replacement  choices,  which  sounds  the  same  as  the  clear  replacement  menu  item. Then  we  have  a  list  of  table  references.  There's  only  one  item  in  the  list.  That's  because  my  workflow  only  has  the  one  reference  to   Big Class.  If  this  were  a  more  complicated  workflow  that  accessed  multiple  tables,  then  all  the  table  references  it  used  would  appear  in  this  list.  You  select  one  of  them  and  then  you  have  details  that  you  can  change. I  can  see  that  I  can  add  a  custom  prompt  rather  than  the  big  long  prompt.  I'm  going  to  use  something  a  little  simpler.  How  about  please  choose  a  data  table.  I  see  that  this  mode  is  set  to  prompt  if  necessary.  It's  necessary  when  it  can't  find   Big Class  or  it  can't  find  what  you  see   Big Class  is  mapped  to  here  Football.  But  what  I  want  is  for  it  to  prompt  every  time  I  run  the  workflow,  so  I  want  to  choose  each  run. Then  down  here,  we  have  a  list  of  the  column  references  that  the  workflow  uses  from  this  data  table.  We  see  BMI,  which  is  the  transform  column.  That  should  never  be  prompted  before  because  we're  computing  it  within  our  workflow,  so  I  can  change  this  one  to  never. You  see  age  is  remembering  its  mapping  to  position  and  it  prompts  if  necessary.  We  want  to  change  that  to  prompt  on  each  run.  I'm  going  to  also  give  a  custom  prompt  for  it  about  to  please  select  a  category  column.  I'm  going  to  copy  that  so  I  can  reuse  it. Height  and  weight  are  both  referenced  here and  as  we  saw,  it  can  pick  those  up  automatically  if  they  exist  in  the  data  table.  That's  good.  I  think  if  necessary,  it's  doing  what  we  want.  Let's  leave  it  at  that,  but  I'll  give  it  a  better  prompt.  We'll  change  this  one.  The  same  thing  for  weight,  we'll  change  here. I  think  we  have  all  of  our  settings  the  way  we  like  it,  but  we  still  want  to  clear  these  mappings  to  the  Football   data table.  I'm  going  to  click  button and  hopefully,  those  will  go  away. All right, t hat's  good.  I'll  click  okay. Let's  rewind  this  and  try  it  again.  It's  letting  me  choose,  and  I  can  go  back  to  position.  You  saw  the  new  prompts  in  both  of  those  dialogs,  the  custom  prompts  that  I  chose. That  all  works  well.  In  fact,  I  can  even...  Let's  see,  let's  go  open  our Big C lass  table.  If  I  run  the  workflow  now,  you'll  notice  that  even  though  the  workflow  was  originally  recorded  using   Big Class,  because  I  set  the  prompt  to  each  run,  it's  prompting  me  now  instead  of  just  using   Big Class  directly.  But  I'm  going  to  choose   Big Class.  Now,  even  though   Big Class  has  an  age  column,  it's  asking  me  to  select  a  category  so  I  can  choose  something  different  like  sex.  Then  you  get  one  way  of  BMI  by  sex. This  is  all  doing  what  we  want.  I  am  going  to  save  this  workflow  from  the  File  menu,  Save,  I'll  give  it  a  name  of  BMI  and  it  automatically  gets  an  extension  of . jmpflow, J-M-P F-L-O-W.  We'll  save  it  to  the  desktop  and  there  it  is.  You  see  it  has  this  little  org  chart  looking  icon.  If  I  get  info  on  it  from  the  finder,  we  can  see  that  the  hidden  extension  .jmpflow  is  right  there. This  is  great.  I  could  distribute  this  to  my  colleagues  and  they  would  open  it  on  their  installation  of  JMP.  Then  when  they  run  it,  it  will  prompt  them  for  a  data  table,  it'll  prompt  them  for  a  category  column,  and  it'll  produce  the  report. That's  fine.  That's  a  great  way  to  do  it.  I  think  I'm  going  to  take  it  one  step  further,  though.  I  would  like  to  package  this  within  an  add- in.  An  add- in  lets  you  customize  items  on  the  JMP  toolbar  and  menu  bar.  For  example,  you  can  see  I  have  an  add- ins  menu  here  already  with  some  items  in  it,  but  I'm  going  to  create  a  new  add- in  to  put  my  own  command  there  for  BMI. Let's  go  to  the  File  menu  and  we'll  create  a  new  add- in  and  this  add -in  builder  dialog  appears.  Let's  give  it  a  name.  We'll  call  it  BMI  Report.  Add-ins  need  a  unique  identifier,  which  is  just  a  string,  but  we  use  this  reverse  DN S system  that  you  take  your  company's  website,  like  ours  is  Jmp.com  and  reverse  it, s o  we'll  use  com.j mp. I  am  the  only  hashed  at  JMP,  so  I'll  put  that  in  here,  and  I'm  going  to  give  it  something  unique  for  this  specific  ad- in  that  I'm  creating.  We'll  call  it  BMI-R eport.  I'm  going  to  select  all  that  and  copy  it. This  is  version  1  of  my  add- in.  As  I  said,  the  workflows  are  a  new  feature  of  JMP  17,  so  I'd  like  to  set  the  minimum  JMP  version  to  17.  Unfortunately,  it  looks  like  we  forgot  to  add  that  as  a  possibility  to  this  menu.  I'll  just  do  the  best  I  can  and  choose  16,  and  hopefully,  we  will  get  that  corrected  for  JMP  17.1  coming  next  month. I  want  to  add  a  menu  item  to  the  add- ins  menu.  I'll  click  add  command  and  we'll  name  it  BMI  Report.  I'll  even  give  it  a  tool tip,  create  a  one- way  analysis  of  body  mass  index  by  the  chosen  category.  That's  pretty  good. Now  I  do  need  to  add  some  JSL  here,  but  it's  pretty  simple.  All  I  want  to  do  is  open  bmi.j mp flow.  However,  I'd  like  to  embed  my  BMI  workflow  within  the  add- in  that  I'm  creating,  which  means  I  need  to  tell  the  open  command  that  it  comes  from  this  add- in's  home  directory,  which  I  can  get  to  with  the  path  variable  dollar  add- in_ home.  I  have  to  give  it  the  add- in's  unique  ID  as  well  in  parentheses  and  put  a  slash  there  for  the  directory  separator.  That  looks  good. Now  I  need  to  actually  add  my  workflow  as  a  file  embedded  in  my  add- in.  I'll  go  to  additional  files  and  add  it  there.  That  I  believe  is  everything.  Let's  close  this,  save  changes.  It  gives  me  a  default  name  of  BMI  Report.  We'll  save  it  on  my  desktop.  There  it  is.  It  should  have  this  workflow  embedded  in  it. I'm  going  to  close  it  here  and  I'm  going  to  put  this  workflow  that  I  built  in  the  trash,  empty  the  trash.  When  we  save  an  add- in,  JMP  automatically  installs  it  as  well.  If  I  come  over  to  JMP  and  look  at  my  add- ins  menu,  now  we  have  BMI  Report.  When  I  choose  that,  it  should  open  the  workflow  that's  embedded  within  the  add- in. There  it  is.  We  can  run  it.  We  get  our  custom  prompts  before,  we  can  choose  a  table,  we  can  choose  a  category.  I  am  going  to  redo  the  analysis  here  so  that  I  have  a  copy  of  this  that's  not  under  the  control  of  the  workflow  and  will  stay  around  even  when  I  rewind  the  workflow.  Let's  rewind  it.  I'll  run  it  again  and  let's  choose  position  two. You  can  compare  the  two  reports  and  let's  see,  we  can  look  at  things  like,   this  O  category  under  position  two  that  corresponds  to  the  full  back,  the  half  back  and  the  quarterbacks, so  it's  the  offense.  L  is  the  defensive  line  and  the  offensive  line,  so  that's  the  linesman. Anyway,  that  pretty  much  concludes  the  items  I  want  to  cover  in  this  talk.  I  direct  you  again  to  Mandy's  presentation  this  morning,  Navigating Your Data Workflow: Workflow Builder Grants Your Wishes for Data Cleanup,  for  a  great  overview  of  of  the  rest  of  the  workflow  builder  UI.  I  believe  at  this  time  we  are  going  to  take  live  Q&A.  Thank  you  very  much.
A frequent task in data analysis is aligning curves before a descriptive or root cause analysis. Often an additional complication occurs when the measurement intervals are not equidistant in the series to be compared. There is not one single value that quantifies the shift for a whole curve. Interpolation is the solution in cases like this. Simple linear interpolation may lead to numerous random errors; a spline interpolation is more robust. Since the Graph Builder exports the formula for the spline in its current shape, it became an easy, accessible tool for the alignment of curves. And as all steps can be programmed in JSL, it provides a framework for automating curve alignment. This presentation will describe the background, concept, and case study application for the alignment of curves.     Welcome,  everybody,  to  this  presentation  about  a  use  case  of  curve  alignment.  Experienced  analysts  often  say  that  in  a  larger  analytical  project,  plus  minus  60 %  of  the  total  time  goes  into  the  preparation  of  data.   If  curves  play  a  role  and  especially  the  alignment  of  curve  is  needed,  then  that  is  certainly  close  to  the  truth. Curves  are  very  specific  types  of  data,  and  JMP  has  some  tools  to  work  with  curves  and  to  address  all  the  related  problems.  In  the  sample  library,  there  is  the  Algae   Mitscherlich data,  which  is  one  of  my  favorite  data  sets  with  that  respect  because  it  has  the  option  to  deal  with  many  aspects  of  fitting  curves. This  is  just  an  example  of  the  development  of  Algae  density  in  different  treatments.  The  type  of  curves  that  I'm  going  to  talk  about  are  typically  observations  or  measurements  over  time.  But  this  doesn't  mean  any  loss  in  generality.  The  presented  concepts  work  in  all  kinds  of  curve  relationships. This  is  an  example,  A lgae  measurement  over  time,  and  one  of  the  aspects  that  is  in  the  focus  of  the  analysis  for  this  data  set  is  to  specify  curves,  specific  types  of  curves  that  have  a  known  shape  and  are  driven  by  certain  parameters  and  then  to  estimate  those  parameters  based  on  the  data. In  those  cases,  the  parameters  very  often  have  a  technical  meaning  like  slope,  inflection  point,  limit  that  gets  approached.   That  platform  here  also  has  the  sliders  that  let  you  analyze  how  changing  one  of  those  parameters  affects  the  shape  of  the  curve. In  the  specific  case  that  we  are  going  to  talk  about,  we  are  not  specifically  interested  in  the  curve.  The  curve  itself  is  only  a  help  because  we  are  facing  another  problem. This  is  the  data  set,  or  this  is  a  part  of  the  data  set  that  goes  back  to  the  real  problem.  We  had  this  series  of  measurements,  one  and  another  series  of  measurements,  and  they  belong  to  two  different  devices.  U nfortunately,  the  clocks  of  these  devices  were  not  in  sync.  But  luckily,  each  of  the  devices  had  one  sensor  that  measured  the  same  substance.  W e  could  look  for  times  where  the  measurements  were  very  close  to  each  other.   Then  try  to  find  out  how  to  correct  one  of  the  clocks,  so  to  say,  so  that  we  get  aligned  measurements,  and  then  use  those  to  evaluate  the  data  from  all  the  centers  that  have  been  available  in  that  data  set. What  was  the  problem  of  the  task?  You  see  the  curves  here.   The  red  curve  is  the  one  that  we  took  as  the  reference  curve,  and  the  blue  one  is  the  one  that  we  wanted  to  shift.   You  see  not  only  that  the  curves  are  quite  some  distance  away,  although  they  should  theoretically  have  measured  the  same  substance  at  the  same  time,  but  also  the  time  points  of  each  series  is  completely,  or  the  time  point  of  both  series  are  completely  unrelated.  With  the  bare  eye,  we  don't  see  any  lag  that  we  could  use  to  correct  one  of  the  data  sets. Therefore,  I  looked  into…  I  compared  the  time  points,  not  the  Y  measurements,  the  time  points  of  the  two  series.  If  there  was  just  a  shift,  then  we  would  expect  to  see  all  the  data,  all  the  points  exactly  on  one  line.  But  here  you  see  there  are  ups  and  downs,  so  this  is  obviously  not  the  case. Perhaps  we  can  see  more,  we  can  understand  more  if  we  calculate  row  by  row  in  the  data  table,  we  calculate  the  difference  of  the  two  times  and  look  at  those.   Here  with  some  fantasy,  we  see  a  little  bit  of  curvature,  so  till  the  end,  it  seems  to  be  closer  related  than  to  the  beginning.  But  in  the  beginning,  this  looks  like  real  random  data.   This  as  well  does  not  help  us  a  lot  to  figure  out  how  to  relate  the  data. I  thought  I  had  the  link  to  the  data  table,  but  we  can  look  in  this  screenshot  as  well.  This  is  the  data  set  that  you  have  seen  before,  a  little  bit  annotated.   We  see  that  two  lines  have  specific  markers,  the  star  and  the  circle.  This  is  due  to  the  fact  that  the  whole  measurement  project  had  a  ramp- up  phase,  and  at  the  star  point,  the  measurement  series,  the  measurement  time,  the  real  process  time  started. The  circle  is  there  where  we,  after  visual  or  manual  inspection,  saw  the  starting  point  in  the  second  time  series,  and  we  want  to  align  both.   W e  need  to  change  the  relationship  of  the  rows,  of  the  data  and  the  rows.  We  want  to  shift  one  of  the  data  sets,  and  that  reminded  me  of  the   paternoster  that  I  like  to  use  when  many  years  ago  I  was  working  for  a  company  that  had  a  very  old  administrative  building  and  we  had  the   paternoster  in  there. It  came  to  me  that   the  strategy  that  we  are  following  is  part  of   paternoster shift,  which  gives  the  word  elevator  pitch  a  completely  new  meaning,  by  the  way.  How  do  we  find  the  right  steps,  the  right  place  to  fit?  We  do  not  have  similar  or  identical  time  points  in  both  series  of  times.  We  need  to  construct  those  time  points  somehow.   Of  course,  this  is  done  through  interpolation. T he  first  thing  that  comes  into  mind  is  linear  interpolation,  and  if  I  zoom  in  into  only  a  part  of  the  data  set,  then  it  is  evident  that  linear  interpolation,  so  just  checking,  so  to  say,  the  regression  between  neighboring  time  points  has  some  problems,  especially  if  we  look  into  areas  where  we  have  horizontal  lines,  which  may  easily  happen.  Then  the  time  point  in  that  range  is  quite  arbitrary.  It  always  leads  to  the  same  results.  The  opposite  is  true.  If  we  are  in  an  area  with  a  very  steep  ascent,  then  a  little  change  on  the  X- axis  or  the  time  may  lead  to  significant  changes  in  the  Y  value.  T his  is  not  a  very  good  technique.  We  can  use  splines  to   interpolate between  the  values. You  know  splines,  certainly  from  the  graph  builder.  If  you  make  a  scatter  plot,  then  by  default,  the  smoother  is  switched  on  and  the  smoother  provides  splines.   You  can  even  change  the  stiffness  or  the  degree  of  fit  or  the  closeness  to  the  data  with  a  slider  in  the  graph  builder. The  advantage  of  using  a  spline   as  an   interpolation  tool,  also  takes  into  consideration  points  further  away.  T hey  build  a  smooth  curve.  That  is  why  in  the  graph  builder,  it's  called  the  smoother.   This  makes  it  easier  to  use  those  for  interpolation and  to  use  these  as  well  as  a  base  in  our  alignment  process. We  need  to  fit  splines.  How  can  we  do  this  or  which  platforms  do  help?  First  of  all,  simple  tool  is  Fit  Y  by  X  comes  into  mind  very  fast  when  you  work  with  JMP  and  do  that  data  analysis.  This  is  the  data,  one  of  the  curves.  There  is  the  spline  fit,  and  here  is  the  slider  that  let  me  choose  how  close  or  how  close  I  want  to  fit  my  data.  Very  good,  very  easy  to  use,  and  you  can  save  the  spline  but  only  the  values,  not  the  formulas.  We  are  keen  on  getting  the  formula  for  the  spline. Next  stop,  Fit  Model.   If  you  have  a  continuous  variable,  you  select  it,  you  can  give  this  the  attribute  of  being  a  knotted  spline  effect.  When  you  do  so, you  are  prompted  to  say  how  many  knots  that  spline  should  have,  the  more, the more  flexible.   I  accept  the  default,  say  run.  We  get  the  typical  report  from  Fit  Model.  A lso,  we  have  fit  models,  functions  of  saving  formulas.   We  can  use  Fit  Model,  save  the  formula. Little  disadvantage  here  is  I  need  to  specify  the  number  of  knots  before  I  start  the  analysis.  Once  the  analysis  is  done  within  the  platform,  I  don't  have  the  option  to  play  with  it  or  change  it  like  it  is,  for  example,  in  Fit Y  by  X. Another  tool  is  the  Functional  Data Explorer . T he  Functional  Data  Explorer  has  splines  as  a  core  function,  and  it  is  also  functionality  to  find  optimal  definitions,  optimal  fits  for  the  splines.  You  can  export  everything.  It's  a  bit  because  simple  tasks  like  this  is  not  where  the  Functional  Data  Explorer  is  made  for.  You  need  some  more  clicks  to  come  to  a  result.  A lso,  it's  only  available  for  people  who  have   JMP role. Remains,  the  Graph  Builder.  You  have  seen  it  before,  and  this  time  I  want  to  show  the  spline   control  as  well.  As  I  said,  we  can  use  the  slider  to  determine  the  fit.  A  very  nice  feature,  by  the  way,  is  that  you  can  check  this  box,  then  through  a  bootstrap  sampling  method,  the  confidence  interval  for  the  smoother  is  calculated  or  estimated.  You  see  how  that  changes when  I'm…  Now  you  can  see  better,  I  have  a  lot  of  data  and  there  is  not  too  much  variability. H ere  the  confidence  band  is  quite  small.  But  if  we  zoom  into  one  of  these  areas  here,  for  example,  that  place  and  look  at  what  happens  when  I  change  this,  then  we  see  that  the  smoother  can  even…  That  the  line  of  the  smoother  can  even  walk  out  of  its  own  confidence  band.   This  is  another  visual  help  to  find  out  a  good  fit  for  the  smoother,  for  the  spline.  It  should  stay  within  its  own  confidence  limit.  Then  comes  the  very  important  option  here.  We  can  save  the  formula.  Then  we  have  a  formula  for  this  spline. The  graph  builder  surprises  as  a  modeling  tool.  Who  had  expected  this?  How  does  that  help?  This  is  again  part  of  my  data  table,  small  part.   You  see  that  now  I  have  two  columns  here  where  I  saved  the  formulas  for  the  smoother  too.   Down  here  in  the  colored  rows,  I  put  some  arbitrary  time  points  in.   That  leads  to  an   interpolated response  relative  to  the  time  point  that  I  have  given.  It  only  works  for  interpolation.  We  cannot  extrapolate  this  way,  it's  only  with  interpolation. But  this  way  I  can,  for  example,  manually  add  different  time  points.  I  have  this  one  here  plus  X  seconds  in  that  case ,  and  then  I  can  see  what  is  the  difference  of  the  interpolated  value.  Now  I  can  put  reference  times  in  and  I  see  exactly  what  is  the  expected  value,  plus  minus  a  little  bit  for  both  measurements. I  did  this  for  two  different  phases.  I  can  go  here  and  experiment  anymore.  In  my  journal,  you  see  in  the  yellow  rows,  I  added  eight  seconds.  In  the  orange  ones,  10  seconds.  Depending  on  what  you  want  to  do,  this  is  the  principle  of  how  you  can  work  with  this.  If  your  task  is  a  one- off  task,  this  is  good  enough.  You  can  go  in  here,  play  with  the  data,  see  the  difference. Our  task  was  more  regular.   The  good  thing  is  everything  can  be  controlled  with  JSL.   As  usual,  for  many  commands  that  you  do  manually  on line,  you  have  corresponding  JSL  statement,  and  I  just  listed  some.  This  is  not  a  working  program.  First  of  all,  you  need  to  set  up  the  graph,  clear.  Then  you  have  commands  that  you  can  send  to  the  graph  and  specifically  to  the  smoother  element  in  your  graph.   We  can  change  the  smoother  so  we  could  even  interactively  try  to  determine  good  fits. We  can  also  give  the  command  to  save  the  formula  in  the  data  table.  That  is  the  command  that  plays  an  important  role  for  our  solution  here.  You  can  read  out  the  current  settings  of  the  Lambda  slider  and  something  small. How  did  we  want  to  use  this?   The  concept  here  was,  of  course,  first  you  need  to  determine  what  is  the  reference  curve  and  what  is  the  objective  curve,  the  one  that  needs  to  be  shifted.  Then  you  calculate  the  spline  function  for  the  reference  curve  and  determine  the  direction  of  shift.  Where  are  we?  Do  we  need  to  shift  our  time  up  or  down? Then  we  move  the  Y  values  of  the  objective  curve,  one  row  in  the  desired  direction  and  calculate  the  spline  function  for  this  new  curve.  Save  that,  use  a  reference  value,  and  then  calculate  the  differences  in  Y  for  each  row.   Then  we  take  the  total  sum  of  those  differences  as  a  criterion  when  to  break  the  process.  Because  after  every  step,  we  calculate  that  difference,  we  save  the  difference,  we  do  the  next  step,  and  we  check,  was  there  an  improvement? If  yes,  we  move  up  or  down  one  row  more,  and  then  we  repeat  that  whole  activity  until  there  is  no  improvement  anymore.  The  whole  program  in  the  real  project,  of  course,  runs  behind  the  scenes.  You  wouldn't  see  anything.  But  I  added  did  some  graphs  to  make  it  visual  to  demonstrate  how  that  works  step  by  step. The  starting  situation  is  this  one.  On  the  left- hand  graph,  you  see  the  dashed  line  and  the  solid  line.  The  dashed  line  is  the  reference  line.  The  solid  line  needs  to  be  moved.  On  the  right  side,  you  see  the  differences  per  row. In  the  beginning,  the  differences  are…  You  see  that  here  in  the  starting  area,  the  differences  here  are  pretty  small.  Then  they  get  larger  and  larger,  and  they  are  negative.  That  is  why  it  goes  down  here  on  a  negative  scale,  very  small  differences  in  the  beginning,  and  then  they  get  up  larger. This  is  the  starting  situation.  You  will  see  this  picture  again.  Then  the  program  will  start  shifting  the  reference  curve  one  cell  up  in  our  situation,  our  case.  Then  you  see  how  these  graphs  update  for  every  step.  Yes,  first  we  need  to  tell  JMP  what  are  the  time  and  measurement  values  for  the  reference  and  the  objective  curve.   Here  we  go.   It  will  take  a  little  bit  in  the  beginning,  then  afterwards  the  steps  come  faster. You  see  how  for  every  step,  the  blue  curve  approaches  the  dotted  curve,  and  how  the  differences  decrease.  The  last  step  did  not  improve  the  situation  anymore,  therefore,  the  program  stepped  one  step  back. Now,  we  have  the  data  table  in  a  situation  where  we   shifted  up  the  objective  curve.  Now  we  can  use  this  shift  for  all  the  other  measurements,  for  all  the  other  sensor  results  that  we  had  for  this  device  and  start  the  analysis. That  was  it.  I  hope  I  could  inspire  you  a  little  bit.  It  was  an  interesting  presentation.  If  you  have  any  questions,  please  don't  hesitate  to  contact  me.  My  email  was  on  top  of  the  presentation,  bernd.heinen@stabero.c om.  Thank  you  very  much.
When analyzing data, scientists and engineers often know what they want to accomplish but are unsure which statistical test is needed. This talk introduces a new Add-in, the Data Analysis Director (DAD). This Add-in was designed to make it easier to find the proper statistical test in JMP® based on the analysis task, goal, and type of data you have. DAD provides a guided flow to help you find the right analysis and run it in JMP. It includes built-in examples, links to JMP Help, demo videos, and even lets you launch the analysis on your data. As you will see, DAD is a useful tool that can help guide you along your analytic journey.     Thank  you  for  the  introduction.  My  name  is  Mia  Stephens,  and  I  am  a  JMP  product  manager.  And  I'm  also  the  lead  developer  of  STIPS,  which  is  our  free  online  course  that  we'll  talk  about  in  a  few  moments. And  my  name  is  Peter  Hersh.  I'm  part  of  the  JMP  Global  Technical  Enablement  team,  and  I  did  a  lot  of  work  finishing  up  and  developing  the   Data Analysis Director,  which  we're  going  to  be  covering  today. I'm  going  to  get  us  started.  We'll  start  by  talking  about  STIPS —Statistical  Thinking  for  Industrial  Problem  Solving —and  how  the  development  of  STIPS  was  really  the  beginning  of  the   Data Analysis Director  or  DAD.  If  you're  familiar  with  STIPS,  this  is  our  free  online  course.  If  you  were  at  the  Discovery  in  Frankfurt  a  few  years  ago,  you  heard  us  talk  about  this  for  the  first  time. STIPS  is  30 -35  hours  of  online  training  for  anyone  who  wants  to  learn  how  to  build  a  foundation  in  statistical  thinking.  It  covers  the  basics,  from  learning  how  to  define  a  problem;  exploratory  tools,  and  how  to  communicate  the  message  in  your  data,  how  to  prepare  your  data  for  analysis;  quality  methods,  SPC  capability,  measurement  systems  analysis;  basic  inferential  statistics  like  hypothesis  testing  and  sample  size;  correlation  and  regression; fundamentals  in  design  of  experiments;  predictive  modeling,  and  text  mining.  This  is  just  an  introduction  of  these  topics. All  in  all,  it's  about  30 -35  hours.  As  we  set  out  to  develop  this  course,  we  wanted  to  make  sure  that  we  included  the  right  topics  and  topics  that  are  most  commonly  used  in  the  industry.  And  we  also  wanted  to  make  sure  that  we  understood  the  challenges  that  users  face  in  industry. Before  we  started  developing  any  content,  we  did  a  survey.  One  of  the  questions  we  asked  was,  what  are  the  most  common  analysis  tasks  and  methods  that  you  use  in  industry?  SPC  was  at  the  top  of  the  list  with  some  of  the  other  quality  methods,  DOE  and  hypothesis  testing.   This  part  of  the  survey  allowed  us  to  identify  the  general  groupings  of  topics  that  we  would  include  in  STIPS. And  relevant  to  this  talk,  the  second  question,  what  are  the  biggest  challenges  you  face  when  you're  using  data  to  make  decisions?  We  weren't  very  surprised  to  see  data  preparation  at  the  top  of  this  list,  but  something  that  was  a  little  bit  surprising  was  understanding  which  method  to  use  and  how  to  use  it. As  we're  developing  STIPS,  there  are  a  lot  of  topics  included  in  STIPS,  and  if  you're  learning  statistics  for  the  very  first  time,  we  knew  that  this  could  be  a  little  bit  overwhelming.  We  developed  this  concept  of  a  tool  that  would  help  you  understand,  "Well,  which  method  do  I  want  to  use  based  on  what  it  is  that  I  want  to  know,  what  it  is  I  want  to  do  with  data,  and  what  type  of  data  that  I  have?" A t  the  time,  we  call  this  the  Data  Analysis  Assistant.   It  was  basically  an  unfolding  utility  where  it  started  with  just  a  general  statement.  In  general,  what  is  it  that  you  want  to  do?  And  then  based  on  how  you  answer  this  question,  it  allowed  you  to  drill  down. If  I  chose,  "I  want  to  describe  a  group  or  groups,"  and  then  the  next  question  I  answered,  "I  want  to  explore  relationships  between  two  variables  and  my  data  is  continuous,"  then  it  gave  a  recommendation.  A  statistical  technique  that  might  make  sense  is  scatter  plots.  You  can  find  this  in  the  Graph  Builder  or  in  Fit  Y -by -X.  And  we  provided  a  link  to  some  data  sets  that  were  used  in  STIPS. Our  original  plan  was  that  we  would  have  this  really  as  part  of  STIPS  to  accompany  STIPS  so  that  people  could  refer  back  to  it  after  the  fact.  But  STIPS  took  several  thousand   man-hours  to  develop  and  time  got  away  from  us.  Fortunately,  Peter  was  on  the  STIPS  development  team  and  he  saw  the  value  of  a  utility  like  this.  I'm  going  to  turn  it  over  to  Pete  and  Pete's  going  to  talk  about  how  this  original  concept,  this  data  analysis  assistant,  ultimately  became  DAD. Thanks,  Mia.  Let  me  share  my  screen  here.  There  we  go.  The  motivation  from  this  actually  really  came  from  a  couple  of  customers  reaching  out  and  asking  exactly  what  Mia  found  in  that  survey.  When  they  had  new  users  coming  to  them,  they  weren't  quite  sure  where  to  go  into  JMP  to  do  the  analysis  they  were  after,  so  they  didn't  know  what  technique  to  use  when.   Several  of  our  customers  were  communicating  through  their  training  organizations  that,  "Hey,  it'd  be  great  if  there  was  some  way  to  direct  people  to  the  analysis  they  wanted." I  reached  out  to  Mia,  and  she  had  already  laid  the  groundwork  with  that  data  analysis  assistant  and  developed  all  of  the  tasks  that  most  people  were  needing  to  navigate  to,  and  all  I  did  was  take  that  and  finish  it  off.   Let's  get  in  and  actually  look  at  what  this   Data Analysis Director  looks  like. When  you  launch  it,  it's  going  to  look  like  this,  and  this  is  just  an  application  that  is  inside  of  JMP,  and  we  have  it  deployed  as  an   add-in.  And  we'll  share  the  link  on  where  you  can  get  that   add-in.  But  you'll  notice  here  that  as  I  pick  a  task  from  this  side,  it  will  give  me  several  different  options  for  goals  for  that  specified  task.  And  then  when  I  pick  a  goal,  it  will  let  me  know  if  there's  different  types  of  data  that  might  have  that  same  goal.   Once  I  do  that,  then  all  of  these  buttons  down  here  become  active  and  I  can  do  different  things. So  to  give  you  an  idea  here,  let's  say  I  wanted  to  compare  groups.  I  have  two  or  more  independent  populations,  and  then  there's  only  one  type  of  data  that  I'm  looking  for.  If  I  then  launch  an  example,  you'll  see  JMP  will  automatically  launch  this  sample  data  set  and  run  that  example.  T his  is  a  great  start  and  this  is  where  we  started  with  JMP  16  was  the  ability  to  do  that.  We  can  also  take  you  right  to  the  help  menu  for  that  specific  test,  the  launch  analysis,  which  will  allow  you  to  just  bring  up  that  analysis,  and  then  also  a  demo  video. This  demo  video  just  links  to  our  learning  library,  which  is  this  great  resource  that  is  basically  answering  a  question  of  how  do  I  do  X  task  in  JMP?   In  this  case,  the   Data Analysis Director  just  allowed  me  to  figure  out  that  what  I  wanted  to  do  was  a  two -sample  T -test, and  here's  a  quick  2- 5  minute  video  on  how  to  do  that. The  new  thing  with  JMP  17  that  we  added  was  this  workflow.  This  is  really  nice  for  people  who  maybe  don't  have  a  ton  of  statistical  background  and  maybe  don't  know  what  they're  looking  for  inside  a  JMP.  When  I  open  the  workflow...  This  is  a  new  feature  in 17.  So  if  you're  operating  in  JMP  16  or  older,  you  won't  have  workflows.  But  with  JMP  17,  this  is  a  nice  new  feature. When  I  hit  play,  what  JMP  does  is  it  opens  that  data  set,  just  like  the  example  I  had  launched  before,  but  now  it  has  some  extra  capability  in  there.   It's  highlighting  some  of  the  reports   where  I  should  look.  It's  telling  me  about  that  report.  It's  also  stepping  through  and  telling  me  what  each  one  of  these  reports  means.  Then  at  the  end,  I've  done  that.  That's  great.  You  could  probably  get  there  with  scripting  as  well  to  be  able  to  recreate  this,  but  workflow  makes  this  a  lot  easier.  And  then  it  also  allows  for  a  generalizable  aspect  to  this. This  is  key,  especially  with  this   Data Analysis Director.  We  encourage  folks  to  go  ahead  download  this   add-in,  but  you  can  make  it  your  own.  And  how  you  might  make  it  your  own  is  by  taking  a  look  at  the  things  that  come  with  it. First  off,  with  workflows,  if  I  do  not  prompt  JMP  to  open  a  data  set,  so  I'm  going  to  just  remove  this  data  set,  and  I  hit  play...   I  forgot  to  close  that  behind  the  scenes.  Excuse  me  one  second.  I f  this  data  set  isn't  open  and  I  am  looking  for  a  specific  analysis,  so  here  I'll  just  open  a  different  data  set.  If  I  hit  play  here,  JMP  is  going  to  tell  me,  "Hey,  I  don't  have  that  data  set  you  are  looking  for,  but  unlike  a  script,  I'm  not  going  to  crash.  I'm  just  going  to  go  ahead  and  prompt  you  to  pick  a  data  set." I'll  hit  okay,  and  then  it  says,  "Hey,  pick  a  continuous  variable  in  that  data  set."   Again,  unlike  a  script,  if  it  doesn't  find  that  column  that  you're  prompting  it  to  find,  it  just  won't  run.  For  this  workflow,  it's  saying,  "Oh,  okay,  pick  a  continuous  Y.  All right, now  pick  a  column  to  replace  gender."  And  now  it's  running  through  that  same  analysis,  it's  giving  me  that  same  report.  T his  makes  this  very  generalizable. T hat's  the  nice  thing  here  with  that  workflow. If  you  want  to  make  this  your  own,  with  the   add-in,  you  get  this  nice  easy  table  that  has  is  all  of  the  scripts  behind  the  scenes  that  you  can  edit.  You  can  use  your  own  sample  data  set  that  is  maybe  more  relevant  to  your  company.  You  can  use  different  workflows.  We  have  our  demo  videos,  but  maybe  you  have  demo  videos  that  you'd  like  to  use  instead.  This  is  very  easy  to  tweak.  This  will  be  installed  right  with  the   add-in.   Without  having  to  go  in  and  script  things,  the  add -in  is  just  looking  for  a  certain  row  in  this  data  set,  so  very  easy  to  change  that. One  thing  you  might  be  asking,  and  maybe  you've  heard  of  this,  is  with  JMP  17,  we  also  got  a  new  feature  called   Search JMP.  You  might  be  asking,  "Well,  when  would  I  use  DAD  instead  of   Search JMP?  What  is  the  difference?" We  did  a  nice  job  here  of  laying  out  the  main  differences.  Search JMP  is  built  right  into  JMP,  and  I'll  show  you  what  this  looks  like  here  in  a  minute,  whereas  the   Data Analysis Director  or  DAD  is  installed  as  an  add-in,  so  you  won't  have  Dad  by  default,  you'll  have  to  go  and  install  it. And  the   Data Analysis Director  is  really  directed  for  new  users,  maybe  people  who  aren't  as  familiar  with  JMP  or  aren't  as  familiar  with  statistics  in  general,  where   Search JMP  can  be  used  by  anybody.  You  just  need  to  know  what  you're  looking  for.  So  maybe  if  you  don't  know  the  technique,  the   Data Analysis Director  is  a  better  place  to  go.  But  if  you  happen  to  know  the  name  of  the  analysis  you  want  to  do,   Search JMP  is  an  easier  way  to  find  that.  You  also  get  those  example  videos,  example  methods,  and  those  workflows  inside  of  JMP. For   Search JMP,  this  is  not  example -based.  This  will  launch  the  analysis  for  you,  but  it  will  not  walk  you  through  an  example.  It  is  also  more  comprehensive,  the   Search JMP.   Data Analysis Director,  we  picked  some  of  the  things  JMP  can  do  and  highlighted  that.   Search JMP  will  look  through  everything  inside  of  JMP,  including  the  help,  the  sample  data,  the  scripting  index,  all  of  that. If  I  am  inside  of  JMP  here,  any  window  open  under  Help,  it's  the  second  thing  on  the  Help  menu,  and  it's   Search JMP.   For  folks  who  haven't  seen  this  before,  if  I  start  typing  in  something  like   T-test,   Search JMP  will  automatically  open  this  up,  and  I  can  go  to  Topic  Help,  I  can  go  to  Go,  I  can  launch  this.   You  can  see  it's  a  lot  like  that  launch  analysis  inside  of  the   Data Analysis Director.  Again,  the  difference  is  I  just  need  to  know  the  name  of  the  technique  I'm  looking  for.   That's  the  difference  between  these  two  tools  and  when  I  might  use  DAD  versus   Search JMP. To  summarize  what  we've  talked  about  here,  really,  the  whole  point  and  motivation  of  the   Data Analysis Director  is  to  help  new  users  determine  which  tool  to  use  when.   This  all  dates  back  to  that  survey  Mia  was  using  to  figure  out  what  people  need  the  most  help  with,  and  that  was  a  surprise  result  of  that  survey  that  came  out  of  the  S TIPS  development. And  then  we  want  to  help  new  users  navigate  JMP,  so  get  over  that  initial  hurdle  of  coming  from  either  a  different  statistical  tool  or  just  not  being  as  familiar  with  statistics.  And  really,  I  think  Mia  put  this  great  when  she  said  we  really  want  to  just  help  democratize  statistics,  help  scientists  and  engineers  who  maybe  haven't  taken  many  stats  classes  be  able  to  find  what  analysis  they  need  more  easily.  And  we  do  this  with  examples,  applications  of  different  methods. And  like  I  showed,  you  can  customize  DAD  to  make  it  your  own.  So  put  in  your  own  examples.  If  there's  something  very  relevant  to  you  and  your  company,  put  it  in  there.  You  can  tweak  the  workflows,  you  can  adjust  the  examples,  the  example  data  sets,  all  of  that's  really  straightforward. And  when  we  compare  that  to   Search JMP,   Search JMP  is  a  great  tool  to  find  what  you're  looking  for  when  you  know  the  name  of  the  test  you're  looking  for. We  will  post  this  in  the  JMP  user  community.  This  is  a  free   add-in.  Here's  the  link  to  that   add-in.   You  can  also  just  search   Data Analysis Director  JMP  in  Google,  and  it'll  be  your  top  result.  A lso  for  anyone  who  hasn't  taken  STIPS,  we  strongly  encourage  you  to  do  it.  It's  a  free  online  course,  really  gets  to  the  core  of  how  to  use  statistics  in  general,  not  just  in  JMP.  It  will  walk  through  the  examples  in  JMP,  but  it's  a  great  course  for  folks  who  are  familiar  with  JMP  or  not. A  couple  of  people  we'd  like  to  thank.  Julian  Parris  helped  a  lot  on  the  front  end  with  this.  And  then  Don  McCormack  was   really  instrumental  in  us  finishing  off  the   add-in.  He  developed  a  lot  of  that  application  and  interface  you  see  there.  We  also  had  many  other  people  who  have  tested  and  provided  feedback.  And  of  course,  Evan  for  the  lead  developer  of  Search JMP,  and  he's  right  here  at  the  conference.  So  if  you  have  questions,  please  stop  by  the  developer  booth  and  talk  to  him. Thank  you  for  your  time.  Hopefully,  you  found  this  useful  and  you  will  go  and  check  out  our   Data Analysis Director  inside  the  community  and  provide  any  feedback  for  any  future  development  we  might  want  to  do  on  this.  Thank  you.  And  Mia,  any  last  thoughts? No.  Great  job.  Thank  you. Thanks.
Challenges with a JMP® and Python integration resulted in a search for an alternative solution that would allow for the evaluation and testing of the various Python libraries and powerful algorithms with JMP. This would enable JMP users to work with Python from a familiar JMP environment. After a few different iterations, a RestAPI service was developed, and when JMP calls this service, it dynamically creates a user interface based on the options the service currently provides. The JMP user can then utilize this user interface to employ different algorithms such as HDBSCAN, OPTICS, and UMAP by sending data directly from JMP in one click. After the algorithm has finished its operations on the server side, it will return data to JMP for further analysis and visualization.     Welcome  to  the  Pythonless  Python  Integration  for  JMP  presented  by  Murata  Finland.  My  name  is  Philip  O'Leary.  Shortly  about  Murata,  we  are  a  global  leader  in  the  design,  manufacture,  and  supply  of  advanced  electronic  materials,  leading- edge  electronic  components,  and  multifunctional  high- density  modules.  Murata  innovations  can  be  found  in  a  wide  range  of  applications through  mobile  phones  to  home  appliances,  as  well  as  from  automotive  applications  to  energy  management  systems  and  health care  devices. We  are  a  global  company,  and  as  of  March  2022,  there  was  approximately  77.5  thousand  employees  worldwide,  just  under  1,000  in  Finland,  where  we  are  located.  Our  product  line up  here  in  Finland  include  accelerometers,  inclinometers,  gyroscopes,  and  acceleration  and  pressure  sensors.  Our  main  markets  are  the  automotive,  industrial,  healthcare,  and  medical. Today,  we  have  two  presenters,  myself,  Philip  O'Leary  and  my  colleague,  Jarmo Hirvonen .  I've  been  working  in  the  ASIC  and  MEMS  industry  for  over  40  years,  32  of  which  have  been  here  at  Marata.  I've  had  several  roles  here  and  have  come  to  appreciate  the  importance  of  data  within  manufacturing.  Most  recent  years  have  been  devoted  to  supporting  the  organization  take  benefit  from  the  vast  amount  of  data  found  from  within  manufacturing.  I  currently  lead  Murata's  data  integration  team. Jarmo,  perhaps  you'd  like  to  give  a  few  words  on  your  background. Yes,  sure. Hi,  I'm  Jarmo  Hirvonen  and  I  work  in  Philips'  team  as  a  data  integration  and  data  science  specialist.  I  have  been  using  JMP  for  four  and  a  half  years,  approximately  the  same  time  that  I  have  been  working  at  Murata.  I'm  a  self- learned  programmer.  I  have  been  studying  both  programming  besides  a  couple  of  basic  courses  at  university. In  my  position,  I  do  a  lot  of  JSL  scripting.  I  write  adding  scripts,  reports,  automatisation,  basically  almost  everything  you  can  script  with  JSL.  If  it  stays  mostly  inside  JMP.  I'm  active  JMP  community  member.  I'm  also  a  super  user  there.  B ecause  due  to  my  background  with  the  JSL  scripting,  I'm  also  steering  committee  member  in  the  community  scripters  club.  I  have  also  written,  I  think  at  the  moment  nine  add- ins  that  have  been  published  to  JMP  community.  Feel  free  and  try  them  out  if  you  are  interested  in  that.  Thank  you. Thank  you,  Jarmo.  This  is  the  outline  for  the  presentation  that  we  have  for  you  today.  As  this  session  has  been  recorded,  I  will  not  read  through  the  outline  as  you  can  do  so  yourselves  afterwards.  Why  do  you  have  the  need  for  a  JMP  Python  integration?  Well,  basically,  we  are  very  happy  with  the  performance  and  the  usage  we  have  of  JMP.  It  doesn't  require  any  programming  for  the  basic  usage,  and  we  see  this  as  a  big  advantage.  JMP's  visualization  and  interactive  capabilities  are  excellent.  T he  majority  of  people  performing  analysis  at  Murata  in  Finland  are  already  using  JMP.  W e  have  a  large  group  of  people  throughout  the  organization  using  JMP,  and  we  want  to  maintain  that. However,  on  the  Python  side,  we  see  that  Python  has  powerful  algorithms  that  are  not  yet  available  in  JMP.  We  already  have  people  working  with  Python  in  various  different  applications,  and  we  have  models  within  Python.  We  want  to  support  these  people  and  also  help  others  understand  and  take  advantage  of  the  Python  world.  B asically,  we  want  to  take  advantage  of  the  wide  use  of  JMP  here  at  MFI  and  offer  JMP  users  access  to  some  common  Python  capabilities  without  the  need  for  themselves  to  program. I'll  continue  here.  Share.  JMP  already  has  Python  Integration,  but  why  we  are  not  using  that?  Basically,  there  are  two  groups  of  reasons, JMP  and  us  or  our  team.  My  experience  regarding  JMP  are  from  JMP 15  in  this  case.  JMP  update  at  least  once  broke  this  integration  and  it  caused  quite  a  few  issues  for  us  because  we  couldn't  use  the  Python  JMP  scripts  anymore  unless  we  modified  them  quite  heavily.  Getting  JMP  to  recognize  different  Python  installations  and  libraries  has  been  quite  difficult,  especially  if  you  are  trying  to  work  on  multiple  different  installations  or  computers. Also,  JMP  didn't,  at  least  that  then  support  virtual  environments  that  are  basically  necessary  for  us.  Then  our  team  side,  we  don't  have  full  control  of  Python  versions  that  JMP  users  are  using  or  the  libraries  and  packages  they  are  using.  Because  not  everyone  is  using  JMP  as  the  main  tool.  They  might  be  using  Python  and  they  have  some  versions  that  don't  work  with  JMP  and  we  don't  want  to  mess  with  those  installations.  Also,  in  some  cases,  we  might  be  running  Python  or  library  versions  with  JMP  doesn't  support  yet,  or  maybe  it  doesn't  support  old  versions  anymore. What  is  our  current  solution  for  this  Python  JMP  or  JMP  Python  Integration?  We  are  basically  hosting  Python  server  using  a  web  framework.  We  can  create  endpoints  to  that  server,  which  are  basically,  behind  them,  there  are  different  algorithms.  We  communicate  with  Rest API  between  JMP  and  the  server.  This  is  the  biggest  benefit.  This  test  we  can  use  JMP  with  the  server,  but  we  also  have  a  couple  of  additional  benefits.  We  can  have  centralized  computing  power  for  intensive  models.  For  example,  we  don't  have  to  rely  on  the  laptop  to  perform  some  heavy  model  calculations.  The  server  is  not  just  limited  to  JMP.  We  can  also  call  the  endpoints  from  Python  or  for  example,  R.  We  are  not  dependent  on  the  JMP  supported  Python  and  library  versions  anymore.  We  can  basically  use  whatever  we  want  to. Next,  I  will  go  a  little  bit  away  from  the  PowerPoint  to  jump  and  show  a  little  bit  of  the  user  interface.  First,  I  will  explain  some  terminology  which  might  appear  here  and  there  on  this  presentation.  W e  have  endpoints,  basically,  this  path  here  is  endpoints.  These  come  directly  from  the  server.  Then  we  have  methods.  It's  the  last  part  of  the  endpoint,  DSME  and  XT  boost  in  these  two  here. Then  we  have  parameters,  this  column,  and  this  is  basically  the  inputs  that  we  will  send  to  the  server.  Then  we  have  what  I  call  stack  or  we  call  stack.  It's  the  collection  of  stack  items.  O ne  row  is  the  stack  item  that  we  can  send  one  after  another  to  the  server.  Quickly  jump  here.  W hat  features  we  have?  We  have  easy  to  add  new  endpoints.  Basically,  we  write  at  the  end point  to  the  Python  server,  we  built  the  server,  we  ran  the  JMP  add- in,  and  this  list  will  get  updated.  This  adding  support  dynamic  data  table  list.  I f  I  change  the  table  here,  it  will  update  here.  Also,  if  new  table  is  opened,  the  other  screen,  but  it  doesn't  really matter.  You  can  see  it  here,  the  untitled3  tree  was  opened. Then  we  can  send  data  directly  from  here  to  the  server,  but  they're  pressing  multiple  different  options  for  sending.  I  can  send  these  selections  that  I  had  here  basically  immediately.  I  will  show  the  results  here.  After  getting  the  data  back,  we  join  it.  These  are  from  the  server.  We  join  the  data  to  the  original  data  table  we  had  and  then  we  have  some  metadata  we  can  get  from  the  server  between  the  from  the  communication.  Notes,  column,  properties  telling   what  method  and  parameters  were  used  to  get  these  two  columns.  Then  we  group  them. I f I  have  run  multiple  models  or  methods,  it's  easier  to  see  which  are  from  which  runs. Then  we  have  table  scripts  which  are  also  grouped.  This  is  different  screen,  let's  move  them  around.  We  have  stack.  What  was  sent?  HTTP  response  from  the  sent  that  comes  from  the  server.  Then  in  this  case,  we  also  receive  from  the  endpoint  an  image.  In  this  case,  it's  a  scatter  plot  from  the  t-SNE  components.  I  said  already  earlier,  we  can  send  multiple  items  from  the  stack  one  after  each  other.  You  can  build,  let's  say,  HPP  scan  with  different  input  parameters  used  in  the,  let's  say,  20  here  and  then  20  to  40,  add  them  to  stack  and  just  send  them  there  and  come  back  when  they're  done  and  you  can  start  comparing  if  there  are  some  difference  between  those. T hen  endpoints  have  instructions  how  to  use  them.  Documentation  link,  if  we  have  one  short  description,  in  this  case,  very  short  description  of  the  endpoint,  and  then  what  each  of  the  parameters  do.  Minimum  values,  maximum  values,  default  values,  and  descriptions  of  those. Then  we  also  have  user  management.  In  this  case,  I'm  logged  in  as  a  super  user,  so  I  can  see  these  two  here  experimental  endpoints  that  basic  user  would  not  be  able  to  even  see.  Then  back  to  PowerPoint.  This  may  be  a  partial  implementation,  partially  how  the  adding  works.  When  the  user  runs  the  adding,  the  JMP  will  ping  the  server,  and  if  the  server  is  up  and  running,  JMP  will  send  new  request  for  the  JSON  that  we  will  use  to  build  the  interface.  The  JSON  is  passed  and  then  the  interface  is  built  and  it  is  using  JMP- type  classes  that  I  will  show  a  bit  later.  C ustom  class  is  created  in  JMP. A t  this  point,  users  can  start  using  the  user  interface.  User  fills  the  selections,  parameters,  data  tables,  and  such,  and  then  sends  the  item  from  the  stack.  We  will  get  the  columns  based  on  the  inputs,  get  the  date  that  we  need  and  convert  that  data  to  JSON.  In  this  case,  I  call  it  column  JSON  because  there's  a  demonstration.  Basically,  normal  JSON  would  always  have  the  column  name  duplicated.  Each  row  will  have  all  the  column  names  here.  I n  this  case,  we  will  have  column  name  only  once  and  then  list  of  values.  This  makes  the  object  we  send  much  smaller. Before  we  send  the  data,  we  will  ping  the  server  again.  This  is  done  because  we  have  different  timeouts  for  ping  and  the  request.  Otherwise,  JMP  will  lock  down  for  a  long  time  if  the  server  is  not  running  and  we  are  using  two  minutes  timeout,  for  example.  T hen  when  the  server  gets  the  data,  it  will  run  the  analysis,  return  the  analysis  results,  and  we  join  them  back  table  at  the  metadata  table  scripts  and  so  on.  A t  this  point,  users  can  start  to  continue  using  JMP,  send  more  items  from  the  stack,  or  maybe  even  JMP  to  graph  builder and  start  analyzing  the  data  that  he  or  she  gets  back  from  the  server. T his  is  the  JMP- type  classes.  W e  have  different  classes  for  different  type  of  data  we  get  from  the  server.  We  have  booleans.  I n  JMP,  this  is  checkbox  columns, enumerators,  this  would  be  combo  box  type  number,  TypeS tring,  and  not  implement  that.  This  is  basically  used  to  check  that  the  server  is  correctly  configurated.  This  is  a  quick  demonstration  of  one  of  those  Type Column. On  server  side,  it  has  been  configured  like  this.  When  we  request  the  JSON,  it  will  look  more  like  this.  Then  this  type  column  class  will  convert  it  into  an  object  that  will  look  in  the  user  interface  like  this.  From  here  you  can  see  that  for  example,  minimum  items  is  one.  It's  the  same  as  minimum  here.  Max  items,  same  thing.  Then  modelling  types  have  also  been  defined  here.  We  can  limit  minimum,  maximum  values,  and  so  on  based  on  the  schema  we  receive  from  the  server.  A ll  of  these  are  made  by  the  custom  JMP  classes.  T his  is  enumerator,  some  options,  then  number  boxes,  and  here  is  the  boolean. N ow,  Phil,  we'll  continue  with  the  couple  of  demonstrations  of  the  Pueb  interface. Thanks,  J armo.  All  demonstration  is  done  today  will  be  performed  using  standard  JMP 16.  There  are  three  demonstrations  I'd  like  to  go  through,  each  having  a  different  task  in  mind.  The  first  one,  I'll  just  open  the  data  set.  This  is  a  data  set  which  contains  probe  or  test  data  from  five  different  products.  It's  a  rather  small  data  table  just  to  ensure  that  we  don't  get  caught  for  time. W e  have  29  probe  parameters  for  five  products  within  the  same  product  family.  T he  task  at  hand  is  to  try  to  determine  quickly,  do  we  have  anomalies  or  do  we  have  opportunities  for  improvement.  Looking  simultaneously  at  these  five  different  products,  29  different  parameters,  such  that  we  could  identify  something  that  could  help  reduce  risk  or  something  that  perhaps  could  reduce  cost  and  improve  yield. O ne  possible  way  to  do  this,  of  course,  would  be  the  one  factor  at  a  time  whereby  we  would  just  manually  march  through  all  the  different  data,  all  the  different  parameters  and  look  for  patterns.  Very  inefficient  for  29  parameters,  it's  okay,  but  some  of  our  products  have  thousands  of  parameters,  so  it's  not  the  best  way  to  approach  the  task  at  hand. Another  possibility  would  be  to  take  all  of  these  parameters  and  to  put  them  through  some  clustering  algorithm  to  see,  could  we  find  groups  naturally  from  the  data  that  we  have?   I  want  to  use  the  JMP- PyAPI  interface  that  we  have  here.  Jarmo  already  explained  briefly  how  these  work,  but  I  will  demonstrate  it. T he  intention  that  I  have  now  is  to  make  a  HDBSCAN .  I'm  going  to  make  the  scan  on  all  the  probe  parameters.  I'm  going  to  use  the  default  settings.  Default  settings  are  typically  already  quite  good.  And  I'm  going  to  send  this...  I'm  not  going  to  make  a  big  stack.  I'm  going  to  send  this  setting  straight  for  analysis.  W e  can  see  rather  quickly,  the  algorithm  came  back  and  suggested  that  I  have  a  cluster.  There  are  actually  three  clusters  and  one  grouping  of  wafers  which  do  not,  in  fact,  belong  to  any  of  the  clusters.  Knowing  that  I  have  five  products,  I'm  going  to  go  with  this  for  the  sake  of  demonstration.  I  can  see  from  here  a  histogram  of  the  number  of  wafers  in  each  cluster,  but  it  doesn't  really  give  me  a  good  visualization  of  what's  going  on. I'm  going  to  also  do  a  dimension- reduction  procedure.  I f  I  go  back  into  the  same  interface,  and  now  I'm  going  to  do  a  teeth  knee  dimension  reduction  on  the  same  parameters  and  send  it  immediately.  Wait  for  the  dimension  reduction  algorithm  to  do  its  job,  and  it  will  return  back  two  components  for  teeth  knee,  one  and  two,  against  which  then  I  can  actually  visualize  the  clusters  that  the   HDBSCAN  gave  me  such  that  if  I  now  plot  teacher  1,  teacher  2,  and  colour  code  them  in  accordance  with  the  clusters  that  have  already  been  identified. As  I  said,  we  have  three  clusters  and  one  grouping  of  wafers  which  don't  necessarily  belong  to  a  cluster.  Maybe  somewhat  disappointing  knowing  that  I  have  five  different  products.  T hankfully,  I  have  an  indicator  of  the  product.  It's  here.   I  said,  this  is   actually  frustrating  because  now  I  have  two  different  products  being  clustered  as  being  the  same.  I n  actual  fact,  this  is  the  medical  application  of  the  same  automotive  part.  I n  fact,  the  parts  are  identical  so   them  being  in  the  same  cluster  is  not  a  problem. This  part  is  rather  unique.  It's  different  to  the  other  products  in  the  same  family,  such  that  it  got  its  own  cluster  with  a  few  exceptions,  so  it's  quite  good. T hen  the  B2  and  the  B4  versions  basically  have  the  same  design.  W hat  I'm  concerned  is  that  the  B4  has  been  allocated  a  cluster  1  and  also  a  lot  of  minus  ones  for  wafers  in  the  same  product  type. I'd  like  to  further  investigate  what  this  might  be  due  to  so  that  I  have  scripted  to  the  table,  I  want  to  make  a  subset  of  this SENSORTYPE NR SA AB4,  and  then  I'm  going  to  plot  the  differences  for  every  parameter  by  cluster  minus  one  and  cluster  one. H ere  we  see  the  parameters  in  question,  and  the  biggest  differences  are  observed  for  Orbot  1  and  Orbot  2.  I'm  not  going  to  get  into  the  parameters  themselves,  but  just  suffice  to  say  that  some  parameter  differences  are  bigger  than  others.  Now  that  I  know  that  these  exist,  I'd  like  to  check  across  all  the  wafers  in  this  subset,  how  does  Orbot  1  and  Orbot  2  actually  look?  H ere  we  see,  in  fact,  that  the  ones  which  have  been  allocated  minus  1  are  not  belonging  to  the  cluster  itself  have  a  much  higher  value  of  Orbot  1.  In  fact,  this  anomaly  is  a  positive  thing,  because  the  Orbot  value,  the  higher  it  is,  the  better.  W e  see  that  there's  quite  a  large  group  of  wafers  having  exceedingly  larger  values  of  Orbot  than  what  we  would  typically  see. T he  next  step,  of  course,  would  be  then  to  do  a  commonality  study  to  figure  out  how  has  this  happened,  where  have  the  wafers  been,  what  has  the  process  been  like,  and  look  for  an  explanation.  Well,  we  can  see  that  very  quickly,  a  multi  product,  multi- parameter  evaluation  of  outliers  or  anomalies  can  be  very  quickly  performed  using  this  method.   I  will  now  move  on  to  the  second  demonstration. Just  need  to  open  up  another  file.  T his  application  is  very  different.  I t's  very  much  trying  like...  Or  actually,  it  is  a  collection  of  functional  data.  In  fact,  there  are  bond  curves,  curves  which  occur  in  our  anodic  bonding  process  when  we  apply  temperature,  pressure,  voltage  across  a  wafer  stack  to  have  the  wafer,  the  glass  and  the  silicon  bond  together.  If  we  look  at  individual  wafer  curves,  we  can  see  that  each  wafer  has  a  similar  but  still  unique  curve  associated  with  it.  We  can  see  the  bonding  process  time  and  the  associated  current. T he  task  I  would  like  to...  The  goal  I  would  have,  if  I  just  remove  the  filter,  I  would  like  to  know,  without  having  to  look  through,  in  this  case,  352,  but  we  would  have  thousands  of  these  every  week,  how  many  different  types  of  curves  do  I  actually  have  in  my  process?  T hen  tying  that  in  with  the  final  test  data,  can  this  curve  be  used  to  indicate  a  quality  level  at  the  end  of  the  line? In  order  to  do  this,  I'm  going  to  split  the  data  set.  N ow  I  put  the  time  axis  across  the  top  and  the  current  through  each  column.  The  first  thing  that  I  do  after  doing  this  splitting  then  is  to  again  go  back  to  our  PyAPI  interface  and  I'm  going  to  look  at  Split  Data.  W hat  I  want  to  do  is  to  make  a  dimension  reduction  because  you  can  see  that  I  have  many,  many  columns,  and  it  would  be  much  better  that  I  can  reduce  the  dimension  here. A gain,  I'm  going  to  do  a  teach- me  analysis.  I'm  going  to  send  it  straight  to  the  server,  and  we  can  see  that  the  algorithm  has  come  back  with  two  components.  I  can  demonstrate  them  very  quickly  what  they  look  like.  T he  352  wafers  which  were  represented  by  functional  data,  curve  type  data  a  few  minutes  ago  are  now  represented  using  a  single  point  for  each  wafer. Now,  having  reduced  the  dimension  of  the  data,  I'd  like  to  perform  a  cluster  analysis  next.  A gain,  I'll  go  back  to  my  AyAPI.   I'm  now  going  to  do  a   HDBSCAN  on  the  titanium  components.  I  just  need  to  check  on  this  analysis  what  would  be  a  suitable  level.  If  I  send  it  immediately,  I  get,  colour  code,  the  cluster,  you  can  see  that. Now  clusters  have  been  allocated  to  the  teach-me  components.  This  is  the  first  level  analysis  using  the  teach-me, sorry,  using  the  HDB  defaults,  I  could,  of  course,  try  another  setting.  I  could  perhaps  run,  maybe,  if  we  think  out  loud,  25  wafers,  a  batch  of  wafers,  and  half- wafer  batches  are  things  that  would  be  of  interest  to  me,  and  look  to  see  what  would  this  cluster  now  look  like.  N ow  all  of  a  sudden,  I  have  much  more  clusters.  O f  course,  it  does  take  some  subject  matter  expertise. You  need  to  know  what  clusters  you  would  expect.  In  this  case,  I  said,  okay,  a  natural  rational  group  for  us  within  the  manufacturing  would  be   a  bunch  of  wafers,  a  lot  of  wafers,  wafers  are  posted  in  25  wafer  batches.  S ometimes  we  have  halfway  for  batches,  which  we  do  experimental  runs  on  and  so  on  and  so  forth.  N ow  we  can  see  that  we  have  clusters  associated  with  the  different  types  of  curves.   I'm  going  to  shorten  this  demonstration  rather  than  you  watching  me  do  joins  and  so  on  and  so  forth.  W hat  I'm  going  to  do  is  I'm  going  to  take  from  the  original  data,  I'm  going  to  put  this  cluster  into  the  original  data.  I t's  of  course,  opening  on  another  screen. I f  I  do  cluster  overlays,  we  can  see...  T his  is  the  original  data  where  at  first  I  showed  you  each  individual  wafer  bond  curve.  Now  we  can  see  that  we  were  able  to  identify  the  distinct  differences  between  seven  clusters  and  one  group  of  wafers  which  don't  belong  to  any  particular  tester.  W e  can  see  that  very  quickly,  we've  been  able  to  go  through  large  numbers  of  wafers,  determine  similarities  between  them,  and  come  up  with  clusters. If  you  bring  this  even  one  step  further,  we  can  take  a  look  at  the  actual  teach-me  components,  the  coloured  clusters,  and  have  a  quick  look  at  what  do  the  actual  contents...  W e  can  see  this  is  cluster  minus  one.  They  seemingly  have  something  which  has  a  very  high  bond  current  at  the  very  beginning,  cluster  zero,  very  high  bond  current  at  the  end.  Y ou  can  see  that  if  we  were  to  spend  enough  time  on  this,  you  would  see  lots  of  similarity  between  bond  curves  within  each  cluster.  A  short  demonstration  on  how  to  take  functional  data  from  hundreds  of  wafers,  cluster  them,  and  with  them  the  various  visualization  techniques  within  JMP,  how  to  clearly  identify  and  present  so  that  people  understand  the  different  groupings  that  exist  within  the  data  sets. This  concludes  my  demonstration  number  two.  I  have  one  more  demonstration.  This  is  maybe  in  some  respects,  for  some,  maybe  a  fun  demonstration,  so  that longer  to  take...  Again,  it's  not  a  real  wafer,  but  I'm  playing  with  the  idea  that  I  have  a  silicon  wafer  and  there  are  some  noise.  This  is  a  defect  layout  from  an  automated  an  inspection  tool,  this  data  has  been  simulated. The  purpose  of  having  this  simulation  is  to  look  for  scratches  or  patterns  found  from  defect  data  layout.  This  is  rather  easy  and  straightforward  if  I  don't  have  noise.  I  can  see  that  there's  noise  associated  with  this  data  set.  W hat  I  want  to  determine  is,  can  I  find  a  way  to  identify  these  three  spirals,  assuming  that  they  simulate  some  scratch.  In  fact,  they're  not  very  similar  to  a  scratch,  except  they  are  patterns  having  high- density  defects  in  a  small  area.  T hat's  the  main  purpose  of  using  it,  rather  than  showing  you  actual  wafer  automated  visual  inspection  data. The  idea  is  that  the  task  at  hand  is  try  to  identify  the  spirals  from  this  data  set.   I'm  going  to  use,  again,  a  trust  ring  method.  A gain,  it  will  be... The  table  I  will  use  spiral  data  with  noise.  As  Jarmo  pointed  out,  we  can  run  because  if  I  don't  know,  obviously  putting  the  number  of  wafers  in  here,  25  and  12  won't  help  me  because  I'm  looking  at  a  single  wafer.  T he  numbers  I  put  in  should  be  somehow  representative  of  how  many  defects  are  typically  seen  within  a  scratch  and  what  are  the  smaller  sample  sizes  associated  with  clusters  and  so  on  and  so  forth,  minimum  samples. Being  a  complete  novice,  I  don't  know.   I'm  going  to  put  in  some  numbers  to  play  with.  Twenty five  would  be  minimum  cluster  size  with  a  minimum  sample  size  of  zero.  Add  to  stack,  and  then  I  say,  Okay,  well,  this  is  rather  inexpensive  to  do  so  I'm  going  to  add... You're  missing  the  columns. Oh,  sorry.  Thank  you.  This  will  help.  Let  me  clear  stack  in  my  enthusiasm  to  move  forward.  I  did  not  include  what  I  should  have.  L et  me  start  again.  Thank  you. Twenty five  minimum  cluster  size,  minimum  sample  size,  add  to  stack. Fifty  minimum  cluster,  add  to  stack.  Seventy five.   I'm  allowing  the  scratches  to  be  bigger  and  bigger.  Add  to  stack,  100.  Are  not  necessarily  bigger  and  bigger,  but  they  would  have  more  and  more  defects  associated  with  them.  Add  to  stack.  And  then  I'm  going  to  add  another  combination  of  75  too,  add  to  stack.   I  could  just  take  one  of  these  and  run  it.   I  could  select  one  and  run,  but  I'm  not.  I'm  going  to  be  greedy.  I'm  going  to  run  the  whole  stack  at  the  same  time. I'm  going  to  run  one, two, three, four, five  cluster  analysis  against  the  data  that  I've  represented,  I've  taken  it  from  this  wafer.   I  send  the  whole  stack,  and  cluster,  something  has  gone wrong.  All  my  clusters  are  showing  minus  ones.  Let  me  try  this  again.  To  make  a  long  story  short,  and  also  the  fact  that  this  is being  recorded  and  we  don't  want  to  start  again  from  the  beginning. I  know  that  at  the  end,  that  if  I  take  this...  I'm  not  sure  why  this  has  disappeared,  but  let  me  try  it  one  more  time.  The  table  I  need  is  the  noise  table.  I'm  taking   HDBSCAN  X,  Y  features,  X,  Y,  20.  I'm  going  to  make  a  shortcut,  75  and  two,  send  immediately.  Now,  thankfully,  I  don't  know  whether  I  had  selected  incorrectly  last  time,  the  table  or  whatever.   Now  that  we're  here,  put  up  a  few  thumbies,  send  it  immediately,  and  so  on. As  I  said,  we  could  have  run  quite  many.  The  idea  then  is  to  look  then   at  the  layout  and  try  to  determine.  I s  it  with  this  particular  setup,  finding  good  clusters  and  it's  a  minus  one?  I t  says,  no,  you're  not  finding  anything  there.  T hen  if  I  colour  code  by  the  other  clusters,  it  has  in  fact  found  quite  well  lots  of  points  that  don't  belong  to  any  cluster. T hen  three  individual  spirals  which  are  very  well  identified.  Y ou  think,  what's  the  benefit  of  this?  Well,  now  that  I  know  what  typical  scratch  content  looks  like,  then  I  could  in  fact,  then  open  up  another  wafer. If  I  open  up  data  from  another  wafer,  make  the  plot  of  the  layout,  we  can  see  that  there  are  no  scratches  on  this  wafer,  it's  only  noise.  W hat  would  happen  then  if  I  run   the  same  setup?  My  wafer  is  another  wafer.  I'm  doing  it  on  X,  Y.   I'm  looking  to  determine  based  on  my  best  settings  of  how  I  should  be  able  to  find  scratches  75  and  two,  send  immediately  and  plot  with  clusters.  We  only  have  minus  ones,  so  nothing  has  been  detected  has  been  a  scratch. H aving  this  possibility  to  be  able  to  run  this  algorithm  against  wafers  on  the  database,  then  I  could  make  a  collection  of  wafers  that  have  scratches,  don't  have  scratches,  or  spirals  in  this  case,  and  then  use  that  data  for  an  input  to  a  commonality  study to  try  and  determine  which  machines  in  the  production  line  are  coming,  are  resulting  in  the  scratches  on  the  wafers.  This  concludes  the  third  demonstration.  Now  I'll  hand  it  back  to  Jarmo. I'll  take  that.  W e  have  a  couple  more  slides  to left.  Here  is  a  couple  of  ideas  we  have  for  possible  future  development  using  DoE  approach  for  the  stack  building,  basically  what  Philip  did  by  hand,  but  used  DOE,  so  I  had  middle  max  values  and  so  on,  and  then  sent  that  whole  stack.  Then  metadata  viewer,  so  you  can  compare  the  results,  try  JMP 17's  new  multiple  HTTP  requests,  local  server,  so  we  don't  rely  on  the  server  being  up.  Try  the  new  hopefully  updated  native  JMP  Python  Integration.  This  would  allow  us  to  have  faster  data  transfer,  possibly  the  more... W e  could  start  testing  with  this  application,  then  try,  for  example,  running  from  graph  builder,  we  could  trigger  the  functions,  combining  different  endpoints. F irst,  we  could  input  the  data  from  t-SNE  based  on  the  input  that  data  and  then  automatically  cluster  the  t-SNE .  T hen,  of  course,  we're  always  adding  new  endpoints  if  we  find  out  what  we  want  to  have.  Last  slide  is  that  we  will  be  sharing  small  sample  of  the  code.  There  will  be  a  JMP  file  with  the  JMP  script,  Python  script,  and  installation  instructions  there.  Y ou  can  try  to  be  quite  simple  user  interface  which  will  send  data  to  local  server  and  you  will  get  the  data  back.  It  also  has  some  ideas  in  the  instructions  sheet  that  you  can  try  to  implement  if  you're  interested  in  trying  this  approach  for  the  JMP  Python  Integration.  That's  for  us.  Thank  you. Thank  you  also  from  me.  If  you  need  to  contact  us,  you  can  do  so  via  the  community.
Bradley Jones, JMP Distinguished Research Fellow, JMP   There is scant literature on screening when some factors are at three levels and others are at two levels. Two well-known and well-worn examples are Taguchi's L18 and L36 designs. However, these designs are limited in two ways. First, they only allow for either 18 or 36 runs, which is restrictive. Second, they provide no protection against bias of the main effects due to active two-factor interactions (2FIs). In this talk, I will introduce a family of orthogonal, mixed-level screening designs in multiples of eight runs. The 16-run design can accommodate up to four continuous three-level factors and up to eight two-level factors. The two-level factors can be either continuous or categorical. All of the designs supply substantial bias protection of the estimates of the main effects due to active 2FIs. I will show a direct construction of these designs (no optimization algorithm necessary!) using the JSL commands Hadamard product and direct product.     Hello.  My  name  is  Bradley  Jones.  I  lead  the  team  of   DoE and  reliability  in  JMP,  and  I  want  to  talk  to  you  today  about  a  family  of  orthogonal  main  effects  screening  designs  for  mixed  level  factors.  This  is  a  subject  which  I'm  really  excited  about.  We've  just  submitted  a  revision  to  the  paper  for  this  and  I'm  hoping  that  it  will  get  accepted  so  we  can  include  it  along  with  definitive  screening  designs  in  the   DoE platforms   in JMP. L et's  get  started. My  collaborators  for  this  work  are  Chris  Nachtsheim,  who's  a  Professor  at  the  University  of  Minnesota  Carlson  School  of  Business,  and  Ryan  Lekivetz,  who's  a  member  of  my   DoE team  at  JMP. Here's  my  agenda.  I'm  going  to  start  with  a  little  bit  of  history  and  some  technical  preliminaries.  Then  I'm  going  to  describe  three  different  constructions  for  these  orthogonal  mixed  level  screening  designs. T here  are  three  different  ways  that  we  can  make  them. I'll  show  you  the  JMP  scripting  language  for  creating  these  design  sets.  That  will  only  be  necessary  until  we  can  get  them  built  into  JMP  itself.  Then  I'll  spend  a  little  bit  of  time  looking  at  the  design  properties  for  designs  constructed  under  these  three  methods.  I'll  discuss  data  analysis  for  these  designs  and  show  an  example  in  JMP,  and  then  I'll  make  a  summary  and  some  recommendations  at  the  end.   Let's  start  with  some  history  and  motivation. The  first  screening  designs  were  fractional  factorial  designs  like  apartment  designs,  or  non  regular  fractional  factorial  designs  or  regular  fractional  factorial  designs.  For  these  designs,  every  factor  was  at  two  levels  only.  En gineers  that  I  have  talked  to  have  felt  uncomfortable  about  these  designs  because  they  felt  that  the  world  tends  to  be  nonlinear  and  two  levels  just  isn't  sufficient  to  capture  nonlinearity  in  the  effect  of  a  factor  in  a  response. Then  in  2011,  definitive  screening  designs  arrived  and  here  all  the  factors  were  assumed  to  be  continuous  and  each  factor  was  at  three  levels,  which  allows  you  to  fit  curves  to  the  relationship  between  factors  and  responses.  A t  the  bottom  there  is  the  citation  for or  the  reference  for  the  paper  that  first  introduced  these  designs  in  2011. Now,  there  are  some  pros  for  our  initial  implementation  of  DSDs  and  also  some  cons.  Let's  go  through  the  pros  first.  The  pros  are  that  at  least  in  our  original  implementation,  six  factor,  eight  factor  and   10 factor  definitive  screen  designs  had  orthogonal  main  effects,  but  we  were  unable  to  get  orthogonal  main  effects  for  more  factors.   It  turns  out  that  a  year  later,  somebody  published  a  nice  way  of  getting  orthogonal  designs  for  definitive  screening  experiments  for  every  even  number  of  factors  for  which  conference  matrices  were  available.   That  was  a  big  advance. Another  good  thing  about  DSDs  is  that  the  main  effects are  orthogonal  two- factor  interactions  so  that  the  estimate  of  a  main  effect  will  never  be  biased  by  any  active  two- factor  interaction. The  really  exciting  aspect  about  DSDs  is  that  all  the  quadratic  effects  were  estimable,  which  is  never  possible  with  screening  designs.  Even  with  centerpoints,  you  can  detect  that  there  is  a  non linear  effect,  you  just  don't  know  where  it's  coming  from. Then  finally,  in  DSDs,  if  there  are  six  factors  or  more,  and  only  three  of  the  factors  turn  out  to  be  important,  then  at  the  same  time  you  do  screening,  you  can  also  do  response  surface  optimization.   You  could  maybe  get,  if  you're  lucky,  a  screening  design  and  a  response  surface  design t o  optimize  a  process  in  one  shot.  You  have  to  be  lucky,  of  course,  you  have  to  have  three  or  fewer  active  effects  or  factors. The  cons  of  the  initial  implementation  of  definitive  screening  designs  is  that  first,  they  couldn't  accommodate  categorical  factors.  Secondly,  they  couldn't  accommodate  any  blocking.   Thirdly,  some  researchers  have  pointed  out  that  for  detecting  small  quadratic  effects,  that  is,  quadratic  effects  where  the  size  of  the  effect  is  about  the  same  order  as  the  error  standard  deviation,  there  is  low  power  for  these  estimates,  detecting  them  successfully.  Of  course,  if  the  quadratic  effect  is  big,  then  of  course  you  can  detect  it,  especially  if  it's  three  times  as  big  as  the  error standard  deviation. Now,  after  the  original  publication  of  DSDs,  we  were  well  aware  that  it  was  a  problem  that  we  couldn't  accommodate  two  level  categorical  factors.  So  we  wrote  a  new  paper  in   Journal  of  Quality  Technology  in  2013  that  showed  how  to  incorporate  2- level  categorical  factors. Then  in  2016,  we  wrote  another  paper  in  technometrics  that  showed  how  to  block  definitive  screen  designs  using  orthogonal  blocks,  blocks  that  are  orthogonal  to  the  factors.  W e  were  trying  step- by- step  to  address  the  cons  that  were  associated  with  original  implementation  of  this  methodology.  Another  thing  that  we  noticed  was  that  people  were  having  a  little  bit  of  trouble  knowing  how  to  analyze  definitive  screening  designs.   We  invented  an  analysis  technique  that  was  based  on  our  understanding  of  the  structure  of  a  DSD.   It  took  particular  advantage  of  the  special  structure  of  a  DSD  to  make  the  analysis  sensitive  to  that  structure.  Rather  than  trying  to  use  some  generic  model  selection  tool  like  stepwise  or  lasso  or  one  of  those. This  made  it  possible  for  a  non-expert  in  model  selection  to  use  this  out  of  the  box  automated  technique  for  analyzing  a  definitive  screening  design.   That  was  in  2017,  but  there's  still  problems. First,  we  did  write  that  paper  that  added  categorical  factors  to  a  DSD,  but  if  you  had  more  than  three,  the  quality  of  the  design  went  down,  and  that  was  undesirable.  In  fact,  if  you  had  too  many  categorical  factors,  things  didn't  look  good  at  all.   That  was  an  issue.  A gain,  quadratic  effects  have  been  pointed  out  to  have  low  power  if  they're  small.   The  purpose  of  this  talk  is  to  introduce  a  new  family  of  designs  that  addresses  these  issues. H ere  we  go. First,  I  have  to  do  some  technical  preliminaries  to  explain  what  we  need  to  have  the  ability  to  do  in  order  to  build  these  designs.   I'm  going  to  start  out  by  talking  about  conference  matrices.  The  conference  matrix  is  the  tool  that  the  second  paper  that  discussed  DSDs  in  2012  and  introduced  all  the  orthogonal  DSDs  for  twelve  factors,  14  factors,  16  factors,  and  so  on.  They  use  conference  matrices  to  make  that  happen. I  need  to  show  you  what  a  conference  matrix  is.  You  can  see  here,  conference  matrix,  that  is   four  factors  with  four  runs,  and  there  are  zeros  on  the  diagonal  elements  and  ones  and  minus  ones  off  the  diagonal.  T he  cool  thing  about  a  conference  matrix  is  that  if  you  multiply  the  transpose  of  conference  matrix  with  the  conference  matrix,  you  get  an  identity  matrix  times  the  one  minus  the  number  of  rows  in  the  design.  I t's  an  orthogonal  design. Now,  conference  matrices  exist  when  the  number  of  rows  and  columns  is  equal.  They're  square  matrices  and  they  only  exist  when  the  number  of  rows  and  columns  is  an  even  number.  There's  a  conference  matrix  for  every  even  number  of  rows  and  columns  from  two  rows  or  and  two  columns  to  30  rows  and  30  columns,  except  for  the  case  where  the  number  of  rows  and  columns  is  22.   It's  actually  been  proven  that  the  case  where  there  are  2 2 rows and colum ns,  the  conference  matrix  has  been  proven  not  to  exist.  So  there's  no  way  to  construct  one,  sadly,  although  I  can't  prove  that  result  myself. Okay,  the  next  thing  I  need  to  talk  about  is  something  called  a   Kronecker product,  which  uses  that  circular  symbol  with  an  X  in  the  middle  of  it,  so  that  when  you  see  that  in  an  equation,  it  means  you  want  to  make  a  Kronecker  product.   The   Kronecker product  is  also  called  a  direct  product,  and  in  fact,  JMP  scripting  languages,  language   JSL makes   Kronecker products  of  matrices  using  the  direct  product  command,  not  the  Kronecker  product  command. The   Kronecker product  of  a  vector  one  stacked  on  top  of  negative  one  with  a  conference  matrix  stacks  C  on  top  of  negative  C  as  below.   What  the  Kronecker  product  does  is  for  every  element  in  the  first  matrix,  it  substitutes  that  element  times  the  second  matrix.  So  one  times  the  conference  matrix  is  just  the  conference  matrix,  and  negative  one  times  the  conference  matrix  is  negative  of  the  conference  matrix. Basically,  a   Kronecker  product  of  one  minus  one  with  a  conference  matrix  just  stacks  the  conference  matrix  on  top  of  itself.  So  here's  a  case  where  I  did  just  that.  You  have  a  four  by  four  conference  matrix  on  top  of  its  fold ,  which  is  also  four  by  four,  and  if  you  were  to  add  a  row  zero,  you'd  have  a  four  factor  definitive  screening  design. Conference  matrices  are  useful,   Kronecker products  are  also  very  useful  for  constructing  designs,  as  it  turns  out.  I  have  a  few  more  preliminaries  to  go  over.  Let  me  talk  a  little  bit  about   Hadamard matrices.   Hadamard matrices  are  also  square  matrices,  but  they're  constructed  of  ones  and  minus  ones.   We  have   Hadamard matrices  built  into  JMP  for  every  multiple  of  four  runs  or  four  rows  and  four  columns  up  to  668  rows  and  668  columns.  So  every  multiple  of  four,  we  can  support  a  Hadamard  matrix. That  well  known.   Hadamard designs  are  the   Plackett–Burman  designs  and  the  two  level  fractional  factorial  designs.  These  are  both   Hadamard matrices.   Hadamard was  a  French  mathematician  who  lived  in  the  late  19th  century,  and  he  invented  this  idea.  I f  I had my matrix as  m  rows,  it's  transposed  times  itself.  Is  m  times  the  identity  matrix. T hat  means  the   Hadamard  matrix  is  orthogonal,  number  one,  and  number  two,  it  has  the  greatest  possible  information  about  the  rows  and  the  columns  in  the  matrix  that's  possible,  given  that  you're  using  numbers  between  negative  one  and  one.   They're  very  valuable  tools  for  constructing  designs. Now  we  have  everything  that  we  need  to  show  how  to  construct  these  new  designs.  We  call  them  orthogonal  mix  level  designs  or  OMLs.   They're  mixed  level  because  half  of  the  columns  or  half  of  the  factors  are  three  levels.  Therefore,  the  three  level  continuous  factors  and  half  of  the  columns  or  factors  are  two  levels,  and  they're  for  categorical  factors  or  for  continuous  factors  for  which  we're  not  worried  about  nonlinear  effects. Here's  the  first  method  for  constructing  one  of  our  OMLs.   If  C  sub  K  is  a  k  by  k  conference  matrix  and  H  sub   2k  is  a  2k  Hadamard  matrix,  so  H  sub   2K  is  a   Hadamard matrix  which  has  twice  as  many  rows  and  columns  as  the  conference  matrix  has. Then  if  we  stack  a  conference  matrix  on  top  of  its  foldover  and  then  replicate  that,  we  get  this  matrix  DD,  which  is  just  C  negative  C,  C  negative  C,  all  stacked  on  top  of  each  other.   DD  is  two  DSDs  stacked  above  each  other,  minus  the  two  center  runs  that  DSDs  normally  have. Now,  HH  is  because   DD ends  up  having  4k  runs  because  there  are  four  conference  matrices  with  K  rows  and  columns  stacked  on  top  of  each  other.  So  there  are  4k  rows  in  this  design  and  K  columns.  HH  is  just   Hatamard  matrix  stacked  on  top  of  its  fold over  design. Since  H  has  two  k  rows  and  columns  already,  stacking  it  on  top  of  itself  makes  it  have  4k  rows  just  like  the  DD  matrix  has.   It  turns  out  that  you  can  just  concatenate  these  two  matrices  horizontally  to  make  an  orthogonal  multilevel  design.  The  DD  part  of  it  all  has  three  levels  per  factor.  And  the  HH  part  of  it  has  two  levels  per  factor.  And  you  can  see  that  there  are  k,  three  level  factors  and  two  k  two  level  factors.  Therefore, a  4k  row  design  you  can  have  as  many  as   3k  columns. For  example,  if  your  design  K  was  six,  you'd  have  24  rows  and  18  columns.  Six  of  the  factors  would  be  three  levels  and  twelve  of  them  would  be  two  levels.   Now  you  have  way  more  two- level  factors  and  you  haven't  lost  any  of  the  features  of  the  definitive  screening  design.  The  main  effects  of  this  design  are  orthogonal  to  each  other. Here's  an  example  where  I  constructed  an  OML  from  a  6 by 6  conference  matrix  and  a  12 by 12  Hadamard  matrix.   You  can  see  there  are  24  rows  in  this  matrix  and  18  columns.  The  first  six  of  them  are  the  six,  three- level  columns,  and  the  next  twelve  are  the  twelve  two- level  columns. Now,  of  course,  you  don't  need  to  use  every  column  of  this  design.  You  could  still  use  this  design  even  if  you  had  say,  four  or  five  three  level  factors  and  seven  two  level  factors.  You  just  remove  five  of  the  two  level  factors  and  a  couple  of  the  three  level  factors,  and  it's  sort  of  arbitrary  which  ones  you  might  remove. Here's  the  second  construction  approach  here.  C  sub  K  is  a  K  by  K  conference  matrix,  and  H  sub  K  is  a  K  by  K  Hadamard  matrix.   Now  we're  going  to  create  DD  the  same  way  we  did  before.  DD  is  just  a  definitive  screening  design  stacked  on  top  of  itself,  minus  the  two  synergies.   HH  is  a  replicated  Hadamard  matrix  on  top  of  the  same  Hadamard  matrix  folded  over  twice. Now  if  you  look  at  the  two  columns  of  ones  and  minus  ones,  you  might  notice  that  those  two  vectors  are  orthogonal  to  each  other.   That's  what  makes  this  particular  construction  really,  really  powerful.  In  this  case,  the  design  has  4K  rows  and  only  two  K  columns.  K  of  the  factors  are  at  three  levels  and  K  factors  are  at  two  levels. The  number  of  runs  in  this  design  is  twice  the  number  of  columns.  But  that's  still  a  very  efficient  number  of  runs  given  the  number  of  factors.  It's  the  same  effect  of  definitive  screening  designs  in  fact.  Definitive  screening  designs  have  twice  as  many,  twice  as  many  plus  one  runs  than  factors. Here's  an  example  created  using  a  4 by 4  conference  matrix  and  a  4 by 4 H adamard  matrix.  When  you  stack  them  on  top  of  each  other  four  times,  you  get  eight  columns  and  16  rows.  Columns  A  through  D,  you  can  see  are  three  levels  because  you  can  see  those  zeros  and  there  are  four  zeros  in  every  column. I  should  point  out  that  if  you  had  a  definitive  screening  design,  there  are  only  three  zeros  in  each  column.   Having  an  extra  zero  makes  the  power  for  detecting  a  quadratic  effect  a  little  higher  than  for  the  definitive  screened  design.  That's  the  second  construction  method. The  third  construction  method  is  very  similar  to  the  second,  except  that  you  have  two  different  ways  of  adding  Hadamard  matrices  to  the  example.  Here  we  have  the  DD  part  is  the  same  as  the  first  two  construction  methods.  The  HH  part,  there  are  two  HH  things.  One  with  the  vector  1, 1  -1 , -1 ,  and  the  other  one  with  the  vector  1,  -1 ,  -1, 1 .   The  three  vectors  that  are  ones  and  negative  ones  are  all  orthogonal  to  each  other.  That  yields  an  orthogonal  main  effects  design.  In  this  case,  the  third  construction  again  has  4K  rows  and   3K  columns,  that  is  K  factors  at  three  levels  and  two  K  factors  at  two  levels. Those  are  the  three  methods.  Here's  an  example  of  that  construction  with  using  a  4 by 4  conference  matrix  and  a  4 by 4  Hadamard  matrix.   The  result  is  a  twelve- column  design  with  16  rows.  Twelve  factors  in  16  rows.  Very  efficient  design  for  looking  at  twelve  factors.  It's  also  orthogonal  for  the  main  effects.   Main  effects  are  orthogonal  to  two- factor  interactions. Now  I  want  to  show  you  three  scripts  for  creating  these  designs  using  JSL.  In  the  meantime,  before  we  drop  this  methodology  into  JMP,  you  can  create  these  designs  with  a  very  simple  JSL  script.   The  first  command  is  creating  a  conference  matrix,  in  this  case  conference  matrix  with  six  rows  and  six  columns. Then  D  is  the  direct  product  of  the  vector 1, -1   ,  1,  -1  and  C.   That  gives  you  the  matrix  DD  that  we  saw  in  our  constructions.   Then  eight  is  a  Hadamard  matrix  with  twelve  rows  and  twelve  columns.  Notice  that  twelve  is  two  times  six.   We  were  requiring  that  for  the  first  construction,  the  Hadamard  matrix  has  to  have  twice  as  many  rows  and  columns  as  the  conference  matrix. We  make  HH  by  multiplying  1, -1  by  using  the  direct  product,  the  Carnegie  Product  Construction.  That  gives  you  a  24- run  HH  design.   The  H  thing  has  24  runs  and  twelve  columns.   Then  the  last  step  is  to  horizontally  concatenate  D  and  HH  to  produce  ML  which  I  just  shortened,  shortened  OML  to  ML.   Then  the  as  table  command  makes  a  table  out  of  that  matrix. The  OML  that  we  just  created  has  24  rows  and  18  columns.  Six  of  the  columns  are  factors  at  three  levels,  and  twelve  of  the  factors  are  at  two  levels.  Now,  the  six  in  the  first  line  can  be  replaced  by  eight, 10,  12, 14  up  to  30,  except  for  22.   The  twelve  in  the  third  line  must  be  twice  whatever  number  you  put  in  the  first  line.   You  can  use  this  construction  to  create  all  kinds  of  OML  designs  just  by  changing  the  numbers  in  the  first  and  third  columns. Here's  the  second  construction  method  script.   I  start  again  with  the  conference  matrix.  This  time  I'm  doing  a  conference  matrix  of  four  rows  and  four  columns.  D  is  a  direct  product  of  1, -1 , 1, -1  and  C.  That's  the  same  as  before. This  time  I  make  H  be  a Hadamard  of  four  instead  of  twice  the  number  in  the  first  line.   I  have  to  have  a  vector  with  four  elements  to  direct  product  with  H.   I use  1, 1, -1, -1,  use  the  chronicle  product  of  that  or  the  direct  product  in JMP and  JSL  speak.   I  get  a  design  that  has  16  rows  and  eight  columns  by  horizontally  concatenating  that  double  vertical  line  thing  horizontally  concatenates  two  matrices  and  then  the  S  table  makes  the  table  out  of  it.   This  second  construction  has  16  rows  and  eight  columns.  There  are  four  factors  at  three  levels  and  four  factors  at  two  levels.   The  four  in  the  first  and  third  lines  can  be  replaced  with  any  even  number  for  which  a  conference  matrix  exists. I  need  to  correct  myself.  The  conference  matrix  has  to  be  a  multiple  of  four  in  order  for  this  to  work,  because  the  Hadamard  matrix  is  a  multiple  of  four. Here's  the  last  construction  method.  Again,  we  have  a  conference  matrix  of  four,  but  it  could  be   four,  it  could  be  eight,  or  twelve  or  16.  We  make  a  direct  product  of  this  vector  of  ones  and  negative  ones  and  C  to  get  the  replicated  definitive  screening  design.  Here  we  create  a  Hadamard  matrix  of  four,  but  we  have  two  different  direct  products.  The  first  one  where  we're  making  a  chronicle  product  of  1,  1,  -1, -1  with  H,  and  the  second  one  we're  making  a  chronicle  product  or  direct  product  of  1, -1 ,  -1 ,  1  and  H. Those  are  two  different  matrices  and  they  happen  to  be  orthogonal  to  each  other.   Then  we  horizontally  concatenate  all  three  of  these  matrices  and  make  a  table  from  that.   This  design  now  has  16  rows  because  it's  four  runs  in  the  conference  matrix  times  four.   You  have  16  rows  and  twelve  columns.  There  are  four  factors  at  three  levels  and  eight  factors  at  two  levels.  Here  are  three  very  easy  JSL  scripts.  To  make  these  designs,  I'll  put  JSL  into  the...  When  it  goes  into  the  JMP  community,  I'll  add  the  JSL.   I'll  also  add  several  examples  of  these  OML  designs  that  you  can  use. Now  I  want  to  talk  about  a  little  bit  about  the  properties  of  these  designs.  Here  we  see  the  design  properties  for  method  one   and  the  colour  map  and  the  correlations  shows  that  there  are  no  correlations  between  any  of  the  12  factors  in  this  or  actually  18  factors  in  this  design.  The  three- level  factors  are  about  10%  less  efficiently  estimating  than  the  main  effects  than  the  two- level  factors.  That's  because  of  the  zeros  in  each  of  those  columns.  That  doesn't  help  you,  the  zeros  don't  help  you  estimate  main  effects. Now   I  want  to  show  you  the  alias  matrices  for  this  design  construction  method.  You  can  see  that  there  are  a  lot  of  main  effects  that  are  uncorrelated  with  two- factor  interactions,  but  there  are  also  a  lot  of  main  effects  that  are  correlated  with  two- factor  interactions. The  three- level  factors  main  effects  are  not  alias  with  any  of  their  two  factor  interaction.  The  same  is  also  true  of  the  two- level  factors.  Their  main  effects  are  not  alias  with  their  two- factor  interactions  because  both  sides  of  this  design  are  constructed  from  fold- over  designs.  W e  see  that  there's  quite  a  bit  of  potential  aliasing  of  main  effects  from  active  two-factor  interactions.   In  some  sense,  method  one  is  a  little  riskier  to  use  than  the  other  methods. Here  are  the  design  properties  for  method  two.  You  can  see  that  here  I'm  making  a  design  that  has  16  factors  and  eight  columns, I mean  16  rows f or  eight  factors.  The  three- level  factors  have  15%  longer  confidence  intervals  than  the  two- level  factors.  Again,  that  is  because  those  four  factors  all  have  four  zeros.  Four  of  the  16  runs  are  zero  instead  of  1  or  -1. The  cool  thing  about  the  second  design  construction  is  that  none  of  the  main  effects  is  correlated  with  any  two- factor  interactions.   That  has  many  of  the  desirable  effect  or  characteristics  of  a  definitive  screening  design.  There's  a  lot  of  orthogonality  between  pairs  of  two- factor  interactions,  but  there  are  also  some  correlations.  You  can  see  here  are  some  correlations,  here  are  some,  and  so  on. Finally,  the  design  properties  for  method  three  show  that  three- level  factors  are  a  little  less  efficiently  estimated  than  two -level  factors  15%,  15.5%.  We  can  see  that  there's  some  aliasing  between  main  effects  and  two- factor  interactions,  but  not  as  much  as  for  the  design  construction  number  one.   In  terms  of  risk,  this  accommodates  more  factors  with  less  risk  than  the  first  construction  method. I'd  like  now  to  compare  DSD  to  an  orthogonal  main  effect  or  mixed- level  design.   You  can  make  a  DSD  with  eight  factors,  eight,  three- level  factors,  and  that  would  have  17  runs.  I f  you  get  rid  of  the  centre  run,  you  would  have  a  run  that's  directly  comparable  with  a  multi- level  16- run  design  that  you've  seen  in  design  construction  too. Now,  if  we  compare  the  efficiency  for  estimating  main  effects,  the  definitive  screening  design  is  only  91%  D  efficient  with  respect  to  the  mixed- level  design,  the  G  efficiency  is  92%,  the  A  efficiency  is  roughly  90%  and  the  I  efficiency  is  roughly  82%.  Y ou  can  see  that  the  fraction  of  the  design  space  plot  shows  that  the  curve  for  the  mixed- level  design  is  below  the  curve  for  the  definitive  screen  design  pretty  much  everywhere.   This  design  is  clearly  preferable  to  the  defendant  screen  design  for  estimating  main  effects  at  least. Now  I  want  to  talk  about  data  analysis  and  use  an  example.   I've  created  a  design  using  the  second  construction  and  I  created  a  Y  vector  by  adding  random  normal  errors  to  a  specific  function  with  both  main  effects  and  two- factor  interactions  and  rounding  it  to  two  decimal  places.   The  true  equation,  the  true  function  is  this  one  here.  It  has  A, B, E  and  F  main  effects  are  all  active  and  the  AB, BE  and  EF  two- factor  interactions  are  all  active. That's  a  function  without  error.  I  added  normal  random  errors  with  a  standard  deviation  of  one.   What  I  used  was  since  this  design  can  be  fit  using  the  Fit  Definitive  Screening  Design  platform  within  JMP,  that's  what  I  used.   Here  you  see  that  the  Fit  Definitive  Screening  Design  finds  all  seven  real  effects  and  doesn't  find  any  spurious  effects.   It  gets  the  exact  correct  set  of  effects. The  deviation  between  the  true  parameter  values  and  the  estimated  parameter  values  are  pretty  small.  For  example,  the  true  parameter  value  for  factor  A  is  2.03  and  its  estimate  is  2.45  plus  or  -35.  That's  one  standard  deviation  so  it's  just  a  little  bit  more  than  one  standard  deviation  from  its  true  value. Here  that  the  true  value  of  the  coefficient  of  B  is  3.88  and  I  get  3.94,  which  is  very  close  to  its  exact  correct  value.  You  can  see  for  yourself  the  estimated  value  of  the  root  mean  squared  error  is  1.2  and  the  true  amount  of  random  error  I  added  was  exactly  one.  You  can  see  again,  this  analysis  procedure  has  really  chosen  the  exact  correct  analysis. Now  I  want  to  do  a  little  JMP  demo  that  shows  basically  the  actual  by  predictive  plot  that  you  see  there  below.  Then  this  plot  shows  that  the  residuals  don't  have  any  indication  of  any  problem  as  well.  I'm  going  to  leave  PowerPoint  for  a  second  and  just  go  to  JMP.  Here's  my  data,  here's  the  function  with  no  error.  I  can  show  you  that,  that's  just  this  formula  that  I  showed  you  in  the  slide. Then  here's  the  data  where  I've  added  random  error  to  each  of  these  values  with  it.  Then  what  delta  is,  is  the  difference  between  the  prediction  formula  of  Y  and  the  Y  with  no  error.  These  values  are  how  far  we  missed  the  true  value  of  Y  for  every  point. If  I  do  the  Fit  Definitive  Screening  Design  of  Y,  I  get  what  I  showed  you  before  and  I  get  the  correct  main  effects  and  also  the  correct  two  factor  interactions.  Then  when  I  combine  them,  I  get  the  correct  model  with  the  correct  art,  very  close  to  the  true  estimate  of  sigma. If  I  run  this  model  using  Fit  model,  this  is  the  actual  by  predicted  plot  I  get.  This  is  the  residual  plot  I  get.  Here's  the  prediction  profiler  that  shows  the  predictive  value  of  Y.  You  can  see  that  if  you  look  at  the  slope  of  the  line  of  B  and  see  as  I  change  the  value  of  A,  the  slope  of  B  changes.  If  I  change  this  B,  the  slope  of  A  changes,  if  I  change  E,  the  slope  of  F  changes. This  indicates  interactions  happening.  If  I  wanted  to  maximize  this  function,  I  would  choose  the  high  value  for  each  of  the  factors.  Then  one  of  the  things  I  did  was  I  created  a  profiler  that  where  I  look  at  this  is  the  true  function,  this  is  my  predictive  formula  and  this  is  the  difference  between  those  two  functions. This  is  the  setting  that  leads  to  the  largest  difference  between  the  predictive  value  and  the  true  value  of  the  function.  That's  what  I  wanted  to  show  you in  JMP  and  I'll  move  back  to  my  slides  now.  Let  me  summarize,  we've  talked  about  definitive  screen  designs  with  their  pros  and  cons. I  then  introduced  the  idea  of  a  Chronicler  product  and  showed  how  to  construct  these  orthogonal  multilevel  designs  in  three  different  ways.  I  showed  you  the  JSL  scripts,  I  shared  the  JSL  scripts  for  constructing  these  designs. You  can  use  this  script  by  changing  the  numbers  in  the  first  and  third  lines  to  make  designs  that  with  increasingly  large  numbers  of  runs.  I  talked  about  the  statistical  properties  of  these  designs  and  particular  showed  that  their  orthogonality,  but  also  in  the  case  of  design  construction  two,  not  only  are  they  orthogonal  for  the  main  effects,  but  the  main  effects  are  orthogonal  to  the  two  factor  interactions. Design  construction  two  only  exists  for  designs  that  have  a  multiple  of  16  rows  and  columns,  which  is  a  slight  disadvantage  compared,  there's  more  flexibility  with  the  other  approaches.  Then  finally  I  showed  how  to  analyze  these  designs.  Let  me  make  a  couple  of  recommendations. Design  construction  method  two  is  the  safest  approach  because  of  all  of  the  orthogonality  involved  and  the  fact  that  the  two  factor  interactions  are  uncorrelated  with  main  effects.  I  pointed  out  already  that  you  don't  have  to  use  all  the  columns. You  can  create  one  of  these  designed  and  then  throw  away  certain  columns  in  order  to  accommodate  the  true  number  of  factors  that  you  have.  The  advantage  of  doing  that  is  that  you  will  also  get  better  estimates  of  the  error  variance. Then  it's  important  to  remember  that  the  three  level  factors  are  for  continuous  factors  only.  It  wouldn't  make  sense  to  have  three  level  categorical  factors  for  these  columns  because  there  are  far  fewer  zero  elements  than  ones  and  plus  ones. A  couple  of  more  things.  Quadratic  effects  it  turns  out,  are  slightly  better  estimated  by  an  orthogonal  main  mixed  level  design  than  a  DSD.  But  if  you  wanted  to  improve  the  quadratic  effect  estimation,  you  could  add  two  rows  of  zeros  to  the  continuous  factors.  Those  would  be  like  center points. Then  for  the  categorical  factor,  you  can  choose  any  vector  of  plus  ones  that  has  plus  ones  and  minus  ones  in  it.  Then  the  other,  the  second  row  would  have  just  the  fold  over  of  the  first  row  in  the  ones  and  minus  ones.  That  is  all  I  have  for  you  today.  Thank  you  very  much  for  your  attention.
Saturday, March 4, 2023
There is a big problem with how we are educating our future scientists: we tell them that you are only allowed to change one thing in an experiment and you have to keep everything else the same. When trying to learn about the effect of many factors on a process or system, it is much more effective to change all your factors simultaneously. But people rarely learn about this method, "Design of Experiments" or "DOE." Or they only hear about it later in their careers when they are resistant to new ideas. The result is a huge waste of time and resources due to inefficient experimentation. In the summer of 2022, JMP® launched a competition with an engaging and simple experiment to demonstrate the power of DOE. In this presentation, you will hear from the contest winner and the designer of the experiment. You will hear how experimenters of all ages can get their hands dirty growing garden cress under different conditions according to a statistically designed experiment. And you will see how the results can be easily analyzed with compelling visuals, as well as using sophisticated Functional DOE analysis in JMP® Pro.     I'm  Phil  Kay  and  I'm  joined  by  Weronika,  and  we're  going  to  talk  about  a  fun  experiment  that  we  set  up  as  a  competition  with  the  idea  that  it's  sowing  the  seeds  of  love  for  design  of  experiments.  And  it's  all  about  growing  cress,  which  I'm  sure  many  of  you  will  have  done  when  you  were  at  school  or  at  home.  And  there's  a  problem,  I  think,  in  how  we  educate  our  young  scientists. T his  is  taken  from  the  British  Broadcasting  Corporation's  bite  size  for  Key  Stage  2,  so  for  young  scientists.  The  curriculum  in  the  United  Kingdom,  at  least,  tells  people  about  this  fair  test  idea.  And  that  is  when  you  are  testing  something,  you  need  to  make  sure  it  is  a  fair  test.  To  do  this,  everything  should  be  the  same  except  the  thing  you  are  testing.  So  we're  only  allowed  to  change  one  thing  at  a  time. And  that's  not  ridiculous.  It's  not  necessarily  wrong,  but  it's  not  all  of  the  truth  as  well.  It's  not  necessarily  the  best  way  when  you  come  to  experiment  in  commercial  R&D  and  industry.  T he  consequences  of  this,  if  we  accept  that  we're  only  going  to  test  one  thing  at  a  time,  let's  imagining  we're  experimenting  to  understand  what  affects  the  height  of  garden  cress,  and  we  want  to  understand  what's  the  effect  of  light  conditions,  sunlight  or  dark. We're  pretty  sure  that's  going  to  have  an  effect,  but  we'd  like  to  understand  what  it  is.  What's  the  effect  of  the  growing  medium  whether  we  grow  on  soil  or  on  cotton  wool?  Again,  we  think  it's  probably  going  to  have  an  effect.  We'd  like  to  experiment  to  understand  or  quantify  the  effect.  T he  fair  test  way  of  doing  this  would  be  to  take  control  conditions.  So  we  grow  some  cress  in  sunlight  and  on  soil.  And  then  we  do  a  fair  test.  We  just  change  one  thing. So  for  Fair  test  1,  we  change  the  growing  medium  to  cotton  wool,  and  we  see  what  the  effect  is.  For  Fair  test  two,  to  understand  the  effect  of  light  conditions,  we  change  to  dark  and  we  keep  everything  else  the  same.  We  keep  our  other  factor  the  same.  T his  would  be  fine,  fair  tests  would  be  fine,  except  nature  doesn't  necessarily  play  by  those  rules.  Nature  doesn't  play  fair  all  the  time.  And  what  we  should  really  be  doing  in  this  situation  is  a  designed  experiment.  And  in  this  case,  we  would  test  all  possible  combinations.  W e  wouldn't  just  be  changing  one  thing  at  a  time,  we'd  make  sure  we  tested  all  the  possible  combinations , we  changed  all  the  factors  according  to  a  strategy. And  what  this  enables  us  to  do  is  gain  a  richer  understanding.  So  we  can  understand  things  like  interactions  between  factors.  And  for  this  cress  experiment,  we  can  see  the  height  at  day  five  after  five  days  of  growing,  we're  looking  at  the  effect  of  light  condition,  sunlight  or  dark.  And  we  can  see  that  the  effect  of  light,  whether  it's  sunlight  or  dark,  is  dependent  on  the  growing  medium. F or  soil,  we  are  seeing  a  bigger  difference  between  dark  and  sunlight  than  we  are  with  cotton  wool.  This  is  an  interaction.  We  can  only  understand  these  interactions  when  we  use  designed  experiments,  and  these  are  often  critical  in  commercial  R&D.  So  what  we  need  is  some  fun  ways  of  introducing  these  ideas  to  young  students,  to  students  of  any  age. Now,  let  me  go  to  a  digression  where  we  got  the  name  of  this  talk  from.  It's  from  a  song  by  a  group  called  Tears  for  Fears.  It's  not  a  very  new  song,  so  if  you're  young,  you  may  not  have  heard  it.  If  you're  a  bit  older,  you'll  probably  know  it  because  it  was  nominated  for  the  best  postmodern  video  at  the  MTV  Music  Awards  30  odd  years  ago,  whatever  best  postmodern  video  means. The  first  line  is,  high  time  we  make  a  stand  and  shook  up  the  views  of  the  common  man.  I  don't  know  if  I  like  that  first  line  very  much,  but  I  think  it's  appropriate  here.  We'd  like  to  shake  up  people's  views  about  how  we  should  do  experiments,  how  we  should  change  the  factors  in  an  experiment.  I  was  a  little  bit  concerned  that  this  is  a  British  band  that  people  may  not  have  heard  of  Tears  of  Fears,  may  not  have  heard  of  this  song.  So  I  looked  at  the  data  and  actually  I  found  that  it  was  a  worldwide  hit  and  particularly  big  in  Canada.  It  reached  number  one  there  in  1989. The  Cress  experiment,  how  did  this  start?  Well,  my  colleague,  Michael,  in  marketing  here  at  JMP,  wondered  if  we  could  make  a  fun  experiment  out  of  growing  garden  cress.  That  hadn't  occurred  to  me.  When  I  first  heard  this,  I  thought,  that's  a  brilliant  idea.  What  we  wanted  to  do  was  create  an  experiment  that's  simple  enough  for  anyone,  for  experimenters  of  all  ages,  young  experimenters,  old  experimenters.  We  wanted  it  to  be  simple  enough  you  could  do  it  at  home.  One  of  the  challenges  with  coming  up  with  good  examples  of  design  of  experiments  is,  science  is  generally  expensive. Measuring  the  outputs  of  your  scientific  experiments  often  requires  really  expensive  instruments.  So  we  wanted  something  that  was  simple  and  cheap  to  do.  And  we  wanted  it  to  be  an  interesting  way,  just  a  fun  way  to  introduce  the  key  concepts  of  statistical  design  of  experiments.  W e  didn't  want  it  to  be  difficult,  we  didn't  want  you  to  have  to  do  lots  of  very  complex  analysis.  We  wanted  it  to  be  very  immediate  and  fun  way  of  introducing  these  ideas. I  did  some  experiments  in  the  Kay  Family  Research  Kitchen  here  with  some  assistants.  I  had  my  eight- year- old  daughter  and  my  15- year- old  daughter  help  me  with  this.  My  12- year- old  daughter  was  too  busy  watching  DOC,  I  think.  And  it  was  very  successful.  They  had  a  good  time  doing  it,  I  think,  and  it  started  some  interesting  discussions. W e  did  this  experiment  and  we  set  up  the  experiment  so  that  we  were  growing  some  of  them  in  soil,  some  of  them  in  cotton,  some  of  them  in  dark,  some  of  them  in  light  conditions.  And  my  eight- year- old  child  said,  "Well,  Dad,  it  would  have  been  easier  if  we  just  put  all  the  soil  ones  in  the  dark  and  all  the  cotton  ones  in  the  light." I  didn't  say  anything,  so  I  wait  for  my  15- year- old   to  respond.  She  said,  "Well,  but  then  we  wouldn't  know  if  it  was  the  soil  or  if  it  was  the  dark  that  had  the  effect."  This  is  a  beautifully  concise  way  of  describing  confounding.  This  was  a  very  proud  moment  for  me  as  a  parent  that  one  of  my  children  could  explain  this  concept  of  confounding  in  a  much  more  succinct  way  than  I  have  ever  managed  to  do. And  we  got  great  data.  T he  15- year- old  lost  interest  after  we'd  set  it  up,  but  my  eight- year- old  daughter  carried  on  with  the  experiment,  observing  it  over  a  number  of  days.  W e  measured  the  height  of  the  tallest  plant  in  each  pot,  actually  within  each  compartment  of  an  egg  box.  W e  measured  those  and  she  took  all  the  measurements  and  we  got  some  really  good  quality  data. Let  me  just  show  you  first  of  all,  though,  the  actual  experiment.  T hree  factors,  we  tested  substrate,  soil  or  cotton  wool,  the  light  conditions,  dark  or  light,  and  we  used  plain  or  curled  cress  types.  We  got  two  different  types  of  cress  seeds. And  this  is  a  two- to- the  three  full  factorial  for  those  DOE  nerds  out  there.  And  we've  replicated  on  the  two  to  the  three  minus  one  half  factorial  there.  So  that  gives  us  12  runs,  12  pots,  which  works  well  because  in  the  UK  at  least,  egg  boxes  generally  come  in  sixes.  So  we  could  use  two  egg  boxes  to  do  these  12  runs. And  as  I  said,  the  data  was  very  good.  We  can  do  some  simple  analysis.  This  is  one  of  the  things  I  like  about  it,  is  that  we  can  just  look  at  the  ones  that  were  grown  in  the  light  and  the  ones  in  the  dark  and  see  how  the  height  is  different  after  seven  days.  And  it's  very  compelling,  there's  a  very  big  difference. It  wasn't  really  the  difference  that  I  was  necessarily  expecting,  and  it  was  an  interesting  surprise  to  all  of  the  experimenters  involved. W e  can  do  some  simple  analysis,  just  some  simple  visuals.  Let's  just  plot  the  heights  versus  light  conditions.  And  again,  you  can  see  the  big  effect  there,  big  effect  of  substrate,  very  little  effect  of  cress  type  there.  So  introducing  these  simple  analysis,  and  then  we  can  obviously  take  it  to  a  greater  level  of  sophistication,  build  a  full  statistical  model. And  that  brings  us  to  the  profiler,  which  I  think  is  just  such  a  great  way  of  understanding  design  of  experiments  and  statistical  models.  Very  powerful,  compelling  way  to  understand  the  effects  of  each  factor  and  interactions  between  factors  as  well.  I f  we  look  at  day  seven,  we  can  see  there's  an  interaction  between  light  conditions  and  our  substrate.  And  we  can  take  it  to  an  even  greater  level  of  sophistication  because  this  is  actual  functional  data. I f  you're  interested  in  Functional  Data  Explorer,  well,  this  is  a  great  example  data  set  because  we're  collecting  the  height  data  as  a  function  of  time  for  each  of  the  runs  of  our  experiment.  W e  can  use  Functional  Data  Explorer  and  Functional  DOE  to  understand  how  the  factors  affect  the  shape  of  this  growth  curve. We  can  see  the  rapid  growth  with  soil  versus  cotton  wool.  We  can  see  the  rapid  growth,  increased  rate  of  growth  in  the  dark,  and  actually  the  fact  that  it's  starting  to  die  off  towards  the  end  of  the  experiment  here.  I  was  really  delighted  with  how  the  experiment  went.  It  was  very  simple  to  do,  very  compelling,  really  accurate  results.  It's  so  hard  to  find  experiments  that  people  can  do  at  home  where  they  can  get  an  accurate,  continuous  quantitative  response  out  that  they  can  measure  just  with  a  plastic  ruler  in  this  case. We  went  ahead  and  did  this  as  a  competition.  I  wrote  a  blog  post  about  it,  which  we'll  provide  the  link  to  that  as  well.  W e  ran  this  last  summer,  summer  of  2022.   I'm  going  to  introduce  next  our  competition  winner,  Weronika.  I've  also  done  some  visuals  of  Weronika's  results  in  JMP  Public. We'll  share  the  link  to  that  as  well  so  you  can  actually  see  Weronika's  data,  download  the  data  yourself  if  you  log  into  JMP  Public  and  see  the  results  for  yourself.  But  now,  Weronika  is  going  to  show  you  what  she  found  in  this  cress  experiment  and  the  impressive  results  that  she  got  that  meant  she  was  the  competition  winner.  I  think  you're  on  mute,  Weronika. Thank  you,  Phil  for  introducing  me.   I  would  like  to  share  with  you  my  experience  in  a  competition,  my  experience  regarding  design  of  experiments,  regarding  the  planting  of  the  cress.  T he  main  aim  of  the  challenge  was  to  introduce  design  of  experiments  to  researchers,  to  engineers,  to  students,  to  any  of  the  people.  But  also  in  that  experiment,  we  have  to  check  with  design  of  experiments  what  factors  has  influence  on  the  health  of  the  garden  products. T hen  defined  factors  by  the  organizers,  Phil ,   there  were three  factors.  It  was  the  surface.  W e  use  cotton  wool,  and  g arden  soil.  Then  the  second  factor  was  light  conditions.  So  we  plant  garden  cress  in  sunlight  and  in  the  dark.  And  also  we  had  to  check  what  influence  has  been  soaking  on  the  height. What  was  my  first  impression?  As  Phil  said,  they  wanted  experiment  to  be  simple  enough  for  everybody.  But  I  was  not  so  convinced  at  the  beginning  because  when  taking  a  look  at  my  previous  experiments  with  planting,  it  was  not  so  good.  So  I  didn't  expect  that  my  garden  cress  acted  in  different  way.  And  I  wasn't  mistake,  I  wasn't  wrong.  My  first  results  were  good. First  of  all,  I  put  so  many  seeds  in  one  spot  that  the  pre- soak  samples  become  a  shell.  Some  kind of shell.  They  didn't  germinate ,  so  I  don't  receive  any  plants.  Moreover,  my  egg  box  was  broken  by  the  water,  which  can  be  seen  here.  It  was  broken.  Also,  some  marker  was  destroyed  and  I  saw  no  numbers  of  spot.  And  also  the  soil  migrated  from  one  hole  to  the  adjacent  one.  I t  was  mixed  with  the  cotton,  especially  when  I  put   the  waters  on  the  soil,  it  was  not  good. After  my  first  failure,  I  drove  some  conclusion  why  I  received  a  failure.  First  of  all,  I  decided  to  use  plastic  espresso  cups  instead  of  the  paper  cups  because  plastic  is  better  than  water.  Use  less  number  of  seeds  in  each  hole.  Don't  put  as  many  as  I  can,  but  do  it  smartly.  And  also  in  this  moment,  I  come  to  idea  to  maybe  add  the  fourth  factor  to  my  experiment,  density  of  the  seeds.  Also,  I  wanted  to  check  not  only  how  surface,  light  condition,  and  soaking  influenced  the  height,  but  also  the  density  of   seed.   I  set  two  levels,  low  and  high. In  low  density,  I  use  20  seeds  and  evenly  spread  them  in  a  cup.  In  high  density,  I  took  40  seeds  and  try  to  put  every  in  the  middle  of  the  cup, so it's  [inaudible 00:16:09] .  M y  design  had  four  factors.  Each  factor  had  two  levels.  I  used  a  full  factorial  design  as  a  design  time.  I  received  16  number  of  treatments,  2  to  the  power  of  4. I  decided  to  replicate  eight  treatments  in  order  to  receive  variability  and  be  able  to  estimate  some  standard  deviation  and  so  on.  I n  total,   I  received  24  test  runs.  Experiment  was  done  in  August  when  it  was  very  warm,  so  it  was  nice  weather  for  a  planting  and  being  a  gardener.  Okay,   those  are  my  results.  Here  we  can  see  design  table  with  all  factors,  test  runs  24.  Here  I  put  the  height  after  three,  five,  and  seven  days. In  that  table,  you  can  see  the  factor  effect  estimates  after  seven  days.  We  can  end  with  the  bold  font.  I  marked  variables,  factors  which  I  found  to  be  statistically  significant,  and  it  was  surface,  light,  density.  Soak   occurred  not  to  be  important,  but  alone  as  a  main  effect  only. It  occurred  to  be  important  in  two  factor  interactions.  T he  interaction  between  surface  and  soak  and  the  light  occurred  to  be  significant,  so  we  cannot  assume  that  the  soaking  is  not  important.  Also,  two  three-way  interactions  were  significant,  four- way  interaction  not  significant.  N ow  I  would  like  to  present  you  some  pipeline  and  some  steps  which  I  used  in  my  design  experiment.   I  think  that  it's  quite  a  good  approach  which  everyone  can  use  in  the  experiment. First  of  all,  we  have  to  generate  the  design.  As  a  first  step,  we  shall  define  what  factor  we  want  to  check  and  what  levels.  And  when  we  set  it,  we  have  to  choose  design  type,  because  usually  choosing  the  type  is  dependent  on  the  factors,  how  many  factors  we  have,  or  is  only  two  or  three,  or  maybe  we  have  no  factors,  how  many  levels.  D efine  number  of  replicates  we  have  to  include,  and  then  we  can  generate  this  table  with  which  in  JMP  is  very  quickly  and  convenient.  When  we  have  a  table,  we  can  run  experiment,  collect  data, put  in  a  table. When  we  have  everything,  we  can  go  to  the  next  step,  estimation  of  the  factors.  We  formulate  the  full  regression  model  and  estimate  factor  effects.  So  we  check  which  factor  is  important.  Here  you  can  see  the  main  effects  plots,  two- way  interaction  plots,  three- way  interaction  plots  after  seven  days.  What  it's  worth  to  mention  is  that  interaction.  This  is  what  Phil  said,  that  the  interactions  are  important.  They  happen  in  the  real  world. And  here  it's  a  good  example.  F or  example,  when  we  have  cotton  at  the  surface,  it's  better  to  use  no  soaking.  If  you  use  soil,  it's  better  to  pre-soak  samples.  And  this  is  when  we  would  check  only  one  factor at  a  time,  so  for  example,  take  soil.  And  with  soil,  we  would  receive  that  presoking  is  better.  With  a  cotton ,  we  would  use  also  pre-soaking,  but  in  that  case  it's  not  true.  T his  is  the  beauty  of  the  interactions.  And  that's  why  we  have  to  take  into  consideration  their  health. Then,  statistical  test.  C hecking  which  effect  is  important.  In  JMP,  we  can  also  see   parameter  estimates,  the  effect  tests,  and  conclude  which  is  significant.  W hen  we  see  which  are  not  significant,  we  should  redefine  the  model  after  dropping  the  non- significant  effect  and  calculate  estimate  one  more  time,  linear  regression. In  that  case,  linear  regression .  But  we  cannot  finish  on  that,  but  we  have  to  also  check  assumption  that  our  model  is  correct,  statistically  correct.  So  we  have  to,  for  example,  check   the  residuals  for  normal  distribution.  It  can  be  done  with  the  normal  probability  graph  of  residuals  in  a  JMP.  When  we  see  the  observations,  residuals  follow  the  straight  line  and  are  in  the  border  range,  it  means  that  it's  correct,  it's  normal  distribution.  But  also  we  can  check  it  with  the  test,  with  numerical  test  like  Shapiro- Wilk  test,  to  check  if  residuals  follow  normal  distribution,  and  then  mean  test  to  check  if  the  mean  value  is  equal  to  zero. When  we  finished  that,  we  can  draw  the  conclusion.  In  my  conclusion,  in  my  experiment,  was  that  the  most  important  factor  was  light,  and  its  effect  was  about  eight  times  higher  than  the  effect  of  the  second  most  important.  Plants  cultivated  in  dark  grow  higher  than  those  in  the sun .  The  other  significant  factor  was  surface,  and  I  obtain  the  result  that  the  garden  soil  is  better.  In  garden  soil,  the  plants  grow  higher.  Also,  the  fourth  factor  which  I  added,  sowing  density,  also  occurred  to  be  important,  but  its  significance  increased  over  the  time. After  the  three  days,  sowing  density  was  not  insignificant,  but  after  five  days  it  was  significant,  and  after  seven  days  it  was  even  more  significant.  So  it  increased  with  time.  A lso  in  general,  during  seven  days,  three  different  three-way  interaction  were  significant,  which  suggests  that  all  factors  interact  really  together  and  we  cannot  interpret  them  separately.  That  all,  sun,  soil,  water,  everything  in  nature  is  combined  and  have  some  dialog  inner  dialog. Also,  except  of  that,  I  checked  different  physical  things,  let's  say.  And  cress  cultivated  inside  light  become  green  and  developed  big  leaves.  Whereas   in  dark,  they  were  very  yellowish,  they  were  fragile.  When  I  touched  them,  they  broke  down.  E xcept  that  they  were  higher,  but  they  were,  I  would  say,  not  healthy. And  also  roots  for  plants  cultivated  inside  light  go  longer.  Here  you  can  see  inside…  it's  very  difficult  to  see  because  roots  are  white  and  cotton  is  white.  But  you  can  see  somehow  that  they  are  here  rolling  around  the  roots,  and  here  there  is  just  plain  cotton.  With   soil,  it's  better  to  visualize  because  it's  better  to  discern the s oil.  And  we  can  see  that  in  some  light,  we  have  longer  roots,  whereas  in  dark,  they  are  very  short. And  to  maximize  the  height  after  seven  days,  we  shall  use  soil,  we  should  pre-soak  samples,  seeds,  we  shall  put  them  in  dark  and  use  high  density.  Those  are  the  picture  of  my  results.  We  can  see  that   throughout  the  experiments,  samples  in  dark  all  the  time  they  were  yellow,  they  were  thin,  whereas  in  the  sunlight  they  were  healthy  green and thicker. My  conclusion  regarding  the  design  of  experiments,  my  experience.  Design  of  experiment  is  a  great  tool  which  can  be  used  to  optimize  any  process.  Even  something  like  cultivated  garden  cress  can  be  fitted  to  the  design  of  experiments.  It  helps  to  incrementally  gain  knowledge  about  the  process.  For  example,  like  me  at  the  beginning,  I  had  no  idea  how  the  density  influenced  the  height,  but  when  I  put  so  many  things,  I  decided  that  I  gain  knowledge  that  it  has  influenced  and  I  have  to  do  something  about  it  and  also  consider  it. We  can  also  increase  our  confidence  about  our  our  results,  and  that  our  results  will  be  indeed  statistically  significant.  So  we  will  have  no  biases.  We  know  that  interactions  are  involved.  Of  course,  some  factors  can  be  alias  with  others,  for  example,  in  factor  design. But  the  advantage  of  design  of  experiment  is  that  we  are  aware  which  one  are  confounded,  and  we  can  draw  proper  conclusion  based  on  that.  So  if,  for  example,  one  pair  of  confound  factors  appears  to  be  significant,  but  we  don't  know  exactly  which  one,  we  know   to  what  we  have  to  focus  on.  And  also,  do  not  be  afraid  and   disaffected  in  the  first  try  to  not  be  successful,  treat  it  as  a  lesson  and  draw  a  conclusion  why  it  happened. D on't  give  up,  but  sit,  think,  why  I  failed,  what  I  can  do  in  other  way,  what  I  can  improve,  and  do  it  and  try  one  more  time.  And  design  of  experiment  can  bring  fun  with  the  proper  attitude  because  this  experiment  really,  really  have  fun.  And  as  I  said,  it  was  August,  it  was  very  sunny,  so  it  was  nice  weather,  nice  time  to  spending  time  on  the   [inaudible 00:27:04] .  Thank  you  for  your  attention. Yes,  thanks  very  much  and  thanks,  Weronika.
Roselinde Kessels, Assistant Professor, University of Antwerp and Maastricht University Chris Gotwalt, JMP Director of Statistical Research and Development, JMP Guido Erreygers, Professor of Economics, University of Antwerp   In 1919 and 1921, Raymond Pearl published four empirical studies on the Spanish flu epidemic. He explored the factors that might explain the epidemic’s explosiveness and destructiveness in America’s largest cities. Using partial correlation coefficients, he tried to isolate the net effects of the possible explanatory factors, such as general demographic characteristics of the cities and death rates for various diseases, on the variables measuring the severity of the epidemic. In this presentation, we revisit Pearl’s data and apply variable selection with a pseudo-variable in JMP®'s Generalized Regression platform instead of Pearl’s correlation analysis. We use Poisson forward selection on the variables in the platform with AICc validation. We find that our results largely correspond with Pearl’s conclusions but contain some additional nuances that are substantive. This paper contributes to the literature showing that Poisson regression proves useful for historical epidemiology. JMP’s Generalized Regression platform offers researchers much flexibility in choosing the most appropriate model.     Okay,  welcome  everybody  to  this  presentation  in  which  I  will  talk  about  Pearl's  Influenza  Studies.  Pearl  a biostatistician,  he  lived  quite  a  while  ago  during,  specifically,  the   Spanish Flu,  and  he  was  the  first  one  to  analyze  the  weekly  data  that  was  collected  by  the  US  Bureau  of  the  Census  about  the   Spanish Flu,  which  occurred  in  1918,  1920,  and  this  for  the  large  American  cities. Pearl,  he  was  the  first  to  do  so  to  analyze  the  data  regarding  the   Spanish Flu,  and  he  wrote  two  reports  about  it,  his  influenza  studies  one  and  two,  which  we  will  revisit  now  during  this  presentation,  and  we'll  see  how  we  will  be  able  to  look  into  the  data,  analyze  the  data  using  JMP.  This  is  joint  work  with  Chris  Gotwalt,  who  contributed  to  the  methodological  component,  and  Guido  Erreygers,  who  initiated  the  idea  of  revisiting  Pearl's  influenza  studies. The  overview  of  this  talk  is  as  follows.  First,  I'll  discuss  a  little  bit  the   Spanish Flu,  what  it  was  all  about.  Quite  a  deadly  pandemic  at  that  time.  Then  I'll  introduce  Pearl's  Influenza  Studies,  and  I'll  talk  a  little  bit  about  the  data  that  Pearl  used  in  his  analysis.  We  have  added  some  census  data  from  1910  on  top  of  that  for  our  analysis,  which  consists  of  a  variable  selection  procedure  with  a  null  factor,  a  random  factor,  an  independent  normal  factor  in  our  analysis,  and  bootstrap  simulation  from  the  data.  Then  we'll  be  discussing  our  results  and  compare  those  results  with  Pearl's  results,  and  then  we'll  conclude. First  of  all,  the   Spanish Flu  pandemic,  to  frame  that  a  little  bit,  it  was  one  of  the  deadliest  pandemics  in  history,  as  witnessed  by  this  list  here  showing  the  world's  deadliest  pandemics  in  overtime.   You  can  see  that  the   Spanish Flu  here  ranks  fifth  in  terms  of  the  number  of  deaths  that  it  caused.  T his  list  is  headed  by  the  Black  Deaths,  which  killed  about  four  times  more  people  than  the   Spanish Flu  did.   Then  below  this  list,  you  see  COVID- 19  pandemic  appearing,  which  we  got  all  exposed  to.   It  also  appears  in  this  list  still. Just  as  with  COVID- 19,   gatherings  were  more  encouraged  to  happen  outside,  and  also  at  the  times  already  of  the   Spanish Flu  pandemic  there,  that  was  the  case  as  well,  and  even  for  classes  to  happen  outdoors,  as  you  can  see  on  this  photo. The   Spanish Flu  pandemic  consisted  of  three  waves.  The  first  one  started  in  March  1918  in  the  US  and  spread  to  Europe  and to  the  rest  of  the  world.  Then  a  more  severe  wave,  the  severest  one  started  in  August  1918  in  France  and  spread  rapidly  to  the  rest  of  the  world,  as  well  as  coincided  with  the  end  of  the  First  World  War.  Then  a  third  one  which  was  less  severe  than  the  second  one,  but  more  severe  than  the  initial  wave,  started  at  the  early  beginning  of  1919  and  hit  some  specific  countries  like  Australia  and  Great  Britain. Here  you  see  the  timeline  of  the  three  waves  of  the   Spanish Flu  pandemic  occurring  in  the  US,  more  specifically  because  we  have  Pearl's  data  from  the  US  which  we  look  into.  The  death  toll  was  humongous  and  most  deaths  occurred  actually  in  India  and  China. What  was  specific about  the   Spanish Flu  pandemic  was  that  many  young  healthy  adults  died  as  shown  by  a  W  pattern  of  mortality  rates,  which  is  here  shown  by  the  full  black  line.   In  contrast  to  the  U  shape  of  the  normal  seasonal  influenza  and  pneumonia  mortality  rates  which  were  registered  prior  to  the  pandemic. For  seasonal  influenza  and  pneumonia,  this  U  shape  shows  that  individuals  younger  than  four  years  of  age  and  people  older  than  75  years,  that  these  were  most  hit  by  seasonal  influenza  and  pneumonia.   Characteristic  of  the   Spanish Flu  pandemic  was  that  besides  those  two  age  groups,  also  the  young  adults  in  the  range  between  20  and  40  years  of  age  got  specifically  hit  by  this  pandemic,  eventually  leading  up  to  a  huge  death  toll. Then  further  onwards  throughout  history,  epidemiologists  and  historians  of  epidemiology  have  applied  statistical  analysis  to  the   Spanish Flu  pandemic.  More  specifically,  people  worked  on  the  US  as  well,  namely,  Markel  and  colleagues,  they  studied  the  effect  of  non- pharmaceutical  interventions  in  American  cities  like  school  closure,  cancelations  of  public  gatherings. Mamelund  studied  the  influence  of  geographical  isolation  of  specific  areas  and  specific  regions  in  the  US.   Clay,  Lewis,  and  Severnini  studied  cross- city  variation  of  438  US  cities,  but  also  elsewhere  around  the  world,  so  not  only  in  the  US  but  also  in  Europe,  like  in  Norway  and  Spain.  Data  about  the   Spanish Flu  pandemic were  analyzed  together  with  census  data  and  so  forth.   The   Spanish Flu  pandemic  kept be  intriguing  for  many  researchers  over  time. About  Pearl's  influenza  studies  now,  here  you  see  the  first  of  his  influenza  study,  the  beginning  of  it.   First,  he  talks  about  the  severity  of  the  outbreak  and  actually  that  the  death  toll  for  the  United  States  was  set  at  about  550,000,  which  is  about  five  times  more  than  the  number  of  people  who  actually  died  during  the  First  World  War.  Hence,  showing  the  severity  of  this  pandemic  and  actually  how  deadly  it  was,  how  explosive  it  was.  That  was  specifically  what  Pearl  was  interested  in  examining  and  relating  to  other  variables. The  second  of  his  influenza  studies  looks  as  follows,  so  the  beginning  at  least.   That's  an  update  of  the  first  of  his  studies  because  he  got  some  criticism  after  first  study  by  some  peers  who  criticized  him  for  not  being  accurate  with  certain  variables.  He  had  issues  with  what  we  now  refer  to  as  construct  validity.  Some  of  the  data  were  not  really  measuring  what  they  were  supposed  to  measure,  so  they  were  not  so  accurate  and  they  could  be  defined  more  appropriate,  more  accurate  in  order  that  they  really  measure  what  they  should  measure.   He  did  that.  He  tackled  the  data  again,  the  variables.  Set  up  new  definitions  of  the  variables  and  moved  forward. The  data  for  the  first  of  his  studies  looks  as  follows.  We  have  data  from  Pearl  of  39  American  cities.   The  response  variable  he  wanted  to  study  was  the  epidemicity  index,  which  is  given  by  the  symbol  I 5  in  his  studies.  That  was  the  first  response  variable  he  wanted  to  study  and  he  wanted  to  also  find  other  variables  like  demographics  that  could  be  predictive  of  this  response  variable. Now,  this  response  variable  was  a  measure  for  the  explosiveness  of  the  outbreak  in  terms  of  epidemic  mortality,  so  how  explosive  the  outbreak  was  and  hitting  the  various  cities  in  the  US.   He  defined  the  peak  time  ratio,  so  the  peak  of  the  excess  mortality  rates  divided  by  the  peak  date.   In  this  way,  he  wanted  to  compare  cities  with  one  single  very  sharp  peak  to  cities  with  a  long  flattened  curve  of  excess  mortality  rates.   He  devised  this  epidemicity  index  himself. He  wanted  to  relate  this  epidemicity  index  to  various  factors  like  the  demographics.  First  of  all,  the  population  density  for  1916,  and  then  the  geographical  position,  which  is  a  straight  line  distance  from  Boston.  He  also  included  the  age  distribution,  chi- square,  which  is  an  age  constitution  index  showing  the  deviation  in  the  age  distribution  of  the  cities  from  a  fixed  standard  age  distribution.  He  also  studied  the  percentage  population  growth  in  the  decade  1900,  1910. Besides  the  demographics,  he  also  involved  the  death  rates  for  1916,  so  prior  to  the  pandemic.  First  of  all,  a n  aggregate  death  rate  from  all  causes  and  then specific diseases,   death  rates  for  specific  diseases,  namely  pulmonary  tuberculosis,  organic  heart  disease,  acute  nephritis  and  Bright's  disease,  or  failure  of  the  kidneys,  influenza,  pneumonia,  typhoid  fever,  cancer,  and  measles.  There  was  quite  some  correlation  between  these  death  rates. I already  told  you  a  little  bit  about  the  response  variable  that Pearl  developed,  namely  the  epidemicity  index.  That  did  not  arise  into  just  a  single  attempt,  but  he  started  actually  from  I 1.   First  of  an  epidemicity  index  that  he  further  improved  into  I 2,  I 3,  I 4,  and  then  he  was  happy  enough  to  move  along  and  to  work  with  the  I 5  epidemic ity  index.   To  distinguish  then  cities  with  single,  very  sharp  peaks  to  those  with  long,  low  flat  curves  of  epidemic  mortality. He  updated  the  variables  from  his  first  study  and  the  second  study,  and  in  that  sense,  he  also  updated,  he  also  modified  the  epidemicity  index  and  the  new  epidemicity  index.  He  referred  to  that  one  as  I 6.   As  another  variable  of  interest,  he  also  defined  to  this  destructiveness  variable  as  containing  excess  mortality  rates.  Then  he  wanted  to  relate  these  two  responses  in  his  second  study  with  the  normal  death  rates,  which  were  the  mean  death  rates  over  the  three  years,  1915  to  1917.  Then  he  also  brought  in  the  demographics.  Again,  he  modified  the  age  constitution  index  based  on  the  1910  census  data,  and  he  used  that. Then  also  he  involved  the  sex  ratio  of  the  population  of  males  versus  females.  The  population  density  of  1910  is  used  in  the  first  of  his  studies,  remained  in  the  second  study.  Then  instead  of  a  geographical  position,  he  used  latitude  and  longitude  in  his  second  study.  Then  lastly,  he  used  the  percentage  population  growth  in  the  decade  1900,  1910,  again. These  were  Pearl's  data  to  which  we  added  some  additional  census  data  from  1910  which  are  given  over  here.  Instead  of  the  age  constitution  index,  we  used  the  share  of   the  different  age  groups,  and  the  pure  numbers  actually,  because  we  were  not  really  happy  with  the  way  Pearl  defined  the  age  constitution  index.  It  was  really  quite  complex  and  not  clear  enough  so  that  we  thought  to  just  go  about  and  use  the  1910  share  ages  instead  of  this  age  constitution  index  developed  by  Pearl. Besides  that,  we  looked  into  the  number  of  persons  to  a  dwelling,  the  percentage  of  homes  owned,  the  school  attendance  of  population,  6  to  20  years  of  age,  and  the  illiteracy  in  the  population,  10  years  of  age  and  over,  and  see  whether  one  of  these  factors  or  some  of  these  factors  could  be  predictive  of the  tree  response  variables  in  Pearl's  studies. What  was  Pearl's  analysis  all  about?  Well,  he  got  into  multiple  correlation.  He  studied  all  the  data  making  use  of  partial  correlation  coefficients  as  well  as  the  normal  correlation  coefficients  of  zero  order,  but  he  did  that  very  rigorously.  He  had  computed  this  all  by  hand  and  did  this  quite  well  actually  and  took  into  account  various  other  factors  in  these  partial  correlations  by  having  those  other  variables  being  constant.   For  the  partial  correlations  and  the  other  correlations,  he  also  computed  probable  errors  to  find  out  whether  these  correlation  coefficients  were  significant  or  not,  so  he  did  this  quite  well. Now,  the  analysis  that  we  are  using  is  the  one  in  which  we  are  going  to  actually  select  variables  with  a  null  factor  and  we  are  going  to  bootstrap  our  data.  We  are  doing  so  because  our  P  values  for  our  data,  which  are  unfortunately  not  orthogonal,  they  can   become  heavily  biased.  Since  we  are  not  really  using  nicely  orthogonal  data,  the  P  values  can  become  quite  biased  towards  zero.  It's  always  the  danger  with  P  values  for  observational  data  that  unimportant  factors  all  of  a  sudden  become  important  and  that  the  type  one  error  rate  is  not  under  control. To   solve  the  fact  that  the  P  values  are  not  uniformly  distributed  anymore  in  the  case  of  an  unimportant  variable,  to  solve  that  issue,  we  are  going  to  involve  a  random  variable,  a  null  factor,  an  independent  normal  variable  into  the  analysis  and  see  to  what  extent  it  appears  in  our  procedure  of  selecting  variables.  This  idea  is  inspired  by  or  originated  from  the  JASA  paper  by  Wu,  Boos,  and  Stefanski  in  2007. Specifically,  what  we  have  done  is,  well,  we  included  a  single  null  factor  in  the  variable  selection  procedure  and  performed  2,500  bootstrap  replicates  for  variable  selection  using  JMP.   Then  we  calculated  the  proportion  of  times  each  variable  enters  the  model.  Variables  that  enter  as  often  or  less  than  the  null  factor  are  ignorable,  and  variables  that  enter  more  often  than  the  null  factor,  well,  these  are  the  ones  that  we  are  going  to  select  as  being  predictive  of  the  response  variable. In  JMP,  we  specified  two  new  columns,  and  actually,  one  formula  is  needed.  That's  the  formula  for  the  bootstrap  frequency  column  that  you  see  here.  The  bootstrap  frequency  column  is  based  on  our  null  factor,  which  is  an  independent  normal  variable  that's  reinitialized  each  time  during  the  bootstrap  simulation.   Based  on  this  reinitialization,  the  frequency  column  gets  updated  so  that  we  have  a  100 %  resampling  of   our  sample size of our data  with  the  fixed  sample  size  as  it  is. Then  for  the  variable  selection,  what  kind of regression  also  would  we  apply  that  we  could  find  out  in  the  generalized  linear  model  platform  of  JMP,  because  taking  into  account  the  frequency  column  also,  if  we  do  variance  selection,  we  also  need  to  do  the  distribution  determination  at  the  same  time.  It  has  to  happen  simultaneously.  You  can't  do  it  apart  based  on  the  original  data.   Actually,  the  graph  on  the  left- hand  side  is  a  little  bit  misleading  because  that  graph  contains  the  original  data,  but  distribution  determination  you  should  actually  do  while  doing  the  variable  selection  itself. Assuming  a  Poisson  distribution,  well,  that  assumption  was  not  rejected,  so  we  could  actually  move  forward  with  the  assumption  of  a  Poisson  distribution,  which  is  also  maintained  in  the  literature  throughout  for  mortality  rates  and  so  forth  for  analysis.  It  was  not  really  rejected  and  it  was  good  to  assume  such   a  Poisson  regression  for  the  analysis. Then  also  we  applied  Poisson  regression  in  combination  with  variable   selection  for  the  epidemicity  index  I6.  However,  we  switched  to  normal  regression  or  less  regression  for  the  third  response  variable,  the  destructiveness  variable. The  way  to  apply  this  regression  in  JMP  by  means  of  variable  selection  is  by  using  the  generalized  regression  platform  where  we  then  define  the  response  variable  for  analysis  and  put  the  frequency  column  into  the  freq  box.   Then  as  model  terms  in  the  construct  model  effects  window,  we  involve  all  the  variables,  all  Pea rl's  variables,  together  also  with  our  null  factor  or  independent  normal  variable  that  we  also  included. Then  we  move  forward  with  the  regression  procedure  by  selecting  forward  selection  estimation  method.  As  a  criterion,  we  used  the  Akaike  information  criteria  with  the  correction  for  small  sample  sizes  to  decide  upon  the  final  model  to  include  the  final  model  then  to  select  with  the  selection  of  the  variables. The  solution  part  that  you  see  here  is  based  on  normalized  data  to  put  them  on  the  same  scales, so scales  and  center  data  and  also  to  diminish  the  effect  of  multicollinearity  in  the  data  because  there's  quite  some.   Then  you  see  again,  the  original  predictors  popping  up  in  the  lower  output.   Then  we  select  a  model  with  the  lowest  Akaike  information  criteria.  One  after  the  other,  the  variables  coming  being  selected  by  forwards  variable  selection  based  on  the  Akaike  information  criteria.  As  you  can  see  in  this  output  actually  here,  the  null  factor  got  into  the  model.  The  null  factor  completely  unimportant,  uninformative,  so  got  into  the  model,  which  is  not  good,  of  course. We  had  to  do  this  estimation  method,  this  variable  selection  procedure  2,500  times,  and  we  did  this  in  JMP  by  right  clicking  on  the  estimate  column  and  then  hitting  simulate  and  then  selecting  the  frequency  column  as  a  column  to  switch  in  and  out   between  the  different  bootstrap  replicates  so  that  initializations  of  the  null  factor  were  always  being  guaranteed  between  the  different  bootstrap  replicates.  Then  finally,  so we  got  the  2,500  model  selections  out  of  the  bootstrap  simulation. Then  we  computed  the  proportion  that  all  of  these  variables  got  into  the  model  or  were  given  as  non- zero  estimates.  We  were  represented  by  non- zero  estimates.  Especially,  of  course,  we  were  interested  in  how  many  times  the  null  factor  appeared  in  the  selection  of  the  models.  This  turns  out  to  be  41 %  of  the  times,  which  is  quite  high. T hese are  our  new  false  entry  rates,  actually.   A  very  high  percentage  in  which  the  null  factor  got  selected. We  accounted  for  some  upper  bound,  an  upper  99.9 %  confidence  limit  also  that  we  took  into  account,  and  even  a  little  bit  higher  sometimes.  To  be  assured  that  the  factors  that  you  see  in  green  that  they  appeared  most  often  then  in  the  model  selection.  As  you  will  see  also  another  example  now  too,  where  we  got  tied  with  the  null  factor  as  well.  The  factors  or  the  variables  in  reds,  they  have  not  been  selected  since  their  occurrence  is  lower  than  the  occurrence  of  the  null  factor  over  the  different  bootstrap  simulations.  The  variables  of  interest  here  that  got  selected  and  are  predictive  of  the  epidemicity  index  I5  are  the  death  rates  of  causes,  death  rate  from  organic  heart  disease,  the  death  rates  from  pneumonia,  death  rate  from  cancer,  death  rate  from  measles,  and  the  geographical  position. Now,  the  death  rate  from  all  causes,  that's  an  aggregate  for  the  death  rates  from  the  individual- specific  chronic  diseases.   We  were  also  interested  to  see  what  would  happen  if  we  were  to  take  this  out  of  the  analysis.   This  is  what  we  did  on  the  following  slides.   We  took  it  out,  death  rate  all  causes,  and  then  we got  a  different  picture.  The  death  rate  from  measles  and  the  death  rate  from  cancer  turned  out  to  be  unimportant  now,  whereas  death  rate  from  pneumonia  got  in  the  green  zone,  as  well  as  the  death  rate  from  pulmonary  tuberculosis.   These  were  also  quite  highly  correlated  with  the  death  rate  of  all  causes.  Death  rate  of  all  causes  masked.  These  variables,  although  some  of  the  death  rates  are  also  correlated  among  each  other,  so  we  have  to  be  careful  here.   That  was  a  new  result  that  we  saw. Now,  eventually,  we  repeated  these  variable  selections  further  onwards  with  only  the  variables  in  green  that  you  see  here  on  the  screen,  so  the  ones  that  got  selected  to  which  we  added  our  new  1910  census  data,  like  the  shares  of  ages  and  the  illiteracy  and  the  schooling  and  so  forth.  We  did  this  with  and  without  dead  rate  or  causes  to  finally  then  select  the  variables  that  were  common  kinds of  overall  analysis.   All  the  different  analyses  selected  some  other  variables  still,  but  still  there  were  some  variables  which  were  present  all  the  time,  and  these  are  the  ones  that  we  finally  then  contained  into  the  model,  which  is  our  final   Poisson  regression  model  that  we  obtained  in  the  end. Which  contains  death  rates  from  heart  disease,  organic  heart  disease,  death  rates  from  all  causes  we  kept  telling,  as  also  Pearl  stressed  that  as  being  quite  important.  Share  ages,  0  to  4.  School  attendance  of  the  population,  6  to  20  years  of  age  and  the  geographical  position.   That  was  our  final  regression  model  for  the  first  response,  the  epidemicity  index  I5,  and  below  you  see  the  results  of  Pearl.  Having  done  a  correlation  analysis,  he  was  able  to  identify  pulmonary  tuberculosis,  organic  heart  disease,  and  acute  nephritis  and  Bright's  disease.  Besides  the  death  rate  of  all  causes  that  he  deemed  quite  important,  which  was  actually  also  always  the  case  for  our  analysis.  It  always  came  on  top  of  our  analysis.   We  kept  it  in. Now,  Pearl  also  pointed  towards  some  specific  chronic  diseases,  but  we  were  not  able  to  put  these  on  top.  We  did  not  find  them  to  be  stably  present.  Overall,  our  analysis,  so  we  did  not  identify  them.  Also,  Pearl,  actually,   after  his  first  study,  got  criticized  because  of  pointing  towards  these  individual  diseases.  Then  in  the  second  of  his  study,  also,  he  was  more  prudent,  more  conservative,  and  only  pointed  towards  the  death  rate  of  all  causes  and  the  organic  diseases  of  the  heart  as  the  ones  that  are  predictive  for  his  modified  index  of  the  explosiveness  of  the  outbreak  I6. Our  final  analysis  is  the  one  that  you  see  here  pointing  towards  also  death  rates  from  organic  heart  disease,  death  rate  from  all  causes,  the  share  of  the  ages  between  zero  and  four,  and  the  population  density rule.  W e're  always  able  to  see  the  population  density  coming  up  high  in  the  list  of  variables  that  occurred  almost  continuously  over  the  bootstrap  replicates.  That  also  turned  out  to  be  an  important  variable  from  our  analysis. With  the  destructiveness  variable,  we  did  not  find  all  that  much  of  variation.  The  range  actually  of  values  was  not  as  large  as  for  the  other  two  responses,  the  epidemicity  index .  We  were  only  able  to  identify  a  few  variables  and  specifically  the  death  rates  coming  from  the  organic  heart  disease.  Then  we  identified,  besides  the  share  of  the  youngest  people  between   0 and   4 years,  also  the  share  of  people  ranging  between  20  and  40,  so  25  and  44  years  of  age.  The  healthy  people  actually  prior  to  the  pandemic,  that  was  also  indicative  of  excess  mortality  rates  over  the  different  cities  in  the  United  States. Pearl's  data  sets,  as  we  could  see  also,  we  only  had  very  little  data.  There  were  very  tiny  39  observations  in  the  first  study  and  in  the  second  study  only  34  observations  because  they  modified  some  of  the  variables,  some  observations  got  lost.   The  data  are  also  observational  with  quite  some  multicollinearity  involved.  The  quality  of  the  data  could  have  been  better.  Therefore,  our  analysis  for  sure  are  useful,  but  they  are  not  magical,  as  well  as  Pearl's  correlation  analysis.  Specifically,  his  first  analysis,  it  was  not  supported.  He  knew  afterwards  that  people  did  not  really  support  his  first  analysis.   Anyway,  as  George  Box  said,  "All  models  are  wrong,  some  are  useful." We  were  able  to  select  satisfactory  models  in  a  sequential  manner.  First  of  all,  we  included  Pearl's  variables  and  retained  the  selected  variables,  to  which  then  we  added  new  1910  census  variables  to  finally  select  those  variables  that  are  informative  and  each  time  with  and  without  the  death  rate  from  all  causes.  Then  we  retained  the  variables  that  popped  up  in  the  green  zone  to  be  quite  predictive  of  the  response  that  popped  up  all  the  time, so   over  all  the  analysis  that  we  did  to  then  have  the  models  that  I  presented.   I  hope  this  was  informative  for  you  to  listen  to  and  many  thanks.
Autonomous vehicles, or self-driving cars, no longer only live in science fiction. Engineers and scientists are making them a reality. Their reliability concerns, or more importantly, safety concerns, have been crucial to their commercial success. Can we trust autonomous vehicles? Do we have the information to make this decision? In this talk, we investigate the reliability of autonomous vehicles (AVs) produced by four leading manufacturers by analyzing the publicly available data that have been submitted to the California DMV AV testing program. We will assess the quality of the data, evaluate the amount of information contained in the data, analyze the data in various ways, and eventually attempt to draw some conclusions from what we have learned in the process. We will show how we utilized various tools in JMP® in this study, including processing the raw data, establishing assumptions and limitations of the data, fitting different reliability models, and finally selecting appropriate models to draw conclusions. The limitations of the data include both quality and quantity. As such, our results might be far from conclusive, but we can still gain important insights with proper statistical methodologies.   Link to CA DMV disengagement reports Link to AV Recurrent Events Paper     Hello,  my  name  is  Caleb  King.  I'm  a  developer  in  the  DoE  and  reliability  group  at  JMP.  Today  I  figured  I'd  showcase  what  I  think  is  a  bit  of  an  overlooked  platform  in  the  reliability  suite  of  analysis  tools,  and  that's  the  reliability  growth  platform.  I  thought  I'd  do  that  in  the  context  of  something  that's  become  pretty  popular  nowadays,  and  that's  autonomous  vehicles.  They're  fast  becoming  a  reality,  not  so  much  science  fiction  anymore.  We  have  a  lot  of  companies  working  on  extensive  testing  of  these  vehicles. It's  nice  to  test  these  vehicles  on  maybe  a  nice  track  at  your  lab  or  something  like  that.  But  nothing  beats  real  actual  road  testing,  which  is  why  early  in  the  2010s,  the  state  of  California's  Department  of  Motor  Vehicles  actually  put  together  a  testing  program  that  allowed  these  companies  to  test  their  vehicles  on  roads  within  the  state.  Now,  as  part  of  that  agreement,  each  company  was  required  to  submit  an  annual  report  which  would  detail  out  any  type  of  disengagement  incidents.  Or  heaven  forbid,  any  crashes  that  happened  involving  their  autonomous  vehicles. Those  had  to  be  reported  to  the  Department  of  Motor  Vehicles,  the  DMV.  Now,  one  benefit  of  the  DMV  being  a  federal  institution  is  that  these  reports  are  actually  available  upon  request.  In  fact,  we  can  go  there  right  now  to  the  site  and  you'll  see  that  you  can  at  least  access  the  most  recent  reports.  We  have  the  2021  reports.  They're  still  compiling  the  2022.  You  could  also,  if  you  want,  email.  If  you  wanted  some  previous  ones,  I  did  that  with  a  brief  justification  of  what  I  was  doing.  They  were  pretty  quick  to  respond. Now,  we  have  different  types  of  reports  and  different  types  of  testing.  We're  we're  focusing  on  testing  where  there  is  a  driver  in  the  vehicle  and  the  driver  can  take  over  as  necessary.  This  isn't  a  fully  autonomous  vehicle.  You  do  have  to  be  in  the  driver's  seat  to  do  this.  We're  using  these  disengagement  events  as  a  proxy  for  assessing  the  reliability  of  the  vehicles.  Obviously,  we  don't  have  access  to  the  software  in  these  vehicles.  If  you  worked  at  those  companies,  you  could  probably  have  more  information.  We  obviously  don't. But  they're  a  proxy  because  if  you  want  our  vehicle  to  be  reliable,  that  means  it  needs  to  be  operating  as  you  intend  within  the  environment.  Any  time  you  have  to  take  over  the  AI  for  some  reason,  that  could  be  a  sign,  "I t's  not  exactly  operating  as  I  intended."  We  can  use  it  as  a  bit  of  a  proxy.  Again,  it's  not  the  best  approximation,  but  it's  still  pretty  good.  Of  course,  I'm  not  the  first  one  to  think  of  this.  This  is  actually  an  informal  extension  of  some  work  I've  done  recently  with  my  advisor,  Yili  Hong,  and  a  bunch  of  other  co-authors  where  we  actually  looked  at  this  type  of  data  from  a  recurrent  events  perspective. I'm  going  to  do  a  slightly  different  approach  here.  But  there  is  a  preprint  of  this  article  available  if  you  want  to  check  it  out  that  does  something  similar.  Let  me  go  in  and  describe  the  data  you  for  you  real  quick.  I'm  not  going  to  be  looking  at  every  company  doing  testing.  There's  so  many  out  there.  I'm  going  to  focus  on  one,  and  those  would  be  events  submitted  by  Waymo,  which  was  Google's  self  driving  car  project.  Now  they're  their  own  subsidiary  entity.  These  are  their  annual  reports.  Let  me  define  what  we  mean  by  disengagement  events. I'm  in  the  driver's  seat  and  if  something's  happening,  I'm  in  autonomous  mode  and  I  need  to  take  over  and  take  over  driving.  That's  a  disengagement  event.  I  disengage  from  autonomous  mode.  That  could  be  for  any  reason.  They,  of  course,  need  to  report  what  that  reason  was.  We're  just  using  that  as  our  proxy  measure  here.  These  annual  reports  are  going  to  go  all  the  way  back  to  about  2015,  2014.  That's  when   Waymo started  participating  in  this  program.  The  2015  report  actually  contains  data  back  to  2014.  They  start  in  the  middle  there. Each  report  essentially  covers  the  range  from  December  of  the  previous  year  to  November  of  the  current  year.  The  2016  report  would  contain  data  from  December  2015  up  to  November  of  2016.  That  way,  they  have  a  month  to  process  the  previous  year's  numbers.  There  are  two  primary  sources  of  data  we're  looking  at  in  each  report.  The  first  one  is  going  to  list  each  incident  that  occurred,  when  it  happened  that  could  be  as  detailed  as  day  and  time,  or  it  could  just  be  the  month. Again,  there's  not  a  lot  of  good  consistency  across  years.  It's  something  we  ran  into.  But  they  at  least  give  some  indication  of  when  it  happened.  They  might  say  where  it  happened  and  they  can  describe  what  happened.  It  could  be  very  detailed  or  it  could  just  be  falling  into  a  particular  category  that  they  give.  Then  the  second  part  of  data  is  going  to  list  the  the  VIN  or  partial  VIN  of  the  vehicle,  so  the  vehicle  identification  number.  Something  to  identify  the  vehicle  and  how  many  autonomous  miles  that  vehicle  is  driven  that  month.  You  might  see  later  on  when  I  show  this  data,  there  might  be  a  bunch  of  zeros.  Zero  just  means  I  either  didn't  drive  that  vehicle  or  I  just  didn't  drive  it  in  autonomous  mode. In  either  case,  I  was  not  doing  active  testing  of  the  autonomous  mode  of  the  vehicle.  Now,  as  I  mentioned  earlier,  there  was  a  bit  of  inconsistency.  Prior  to  2018,  when  they  listed  the  disengagement  events,  they  actually  don't  give  the  benefit  of  the  vehicle.  We  don't  know  what  vehicle  was  involved.  We  know  how  many  autonomous  miles  it  drove  that  month,  but  we  have  no  idea  what  vehicle  was  involved.  Starting  in  2018,  that  information  is  now  available.  Now  we  can  match  vehicle  to  the  incident,  which  means  when  we  do  this  analysis,  I'm  going  to  do  it  at  two  different  levels. One  is  at  an  aggregate  level  where  I'm  going  to  be  looking  at  each  month  all  of  the  vehicles  being  tested  at  that  time.  Then  looking  at  the  incident  rates  overall  in  an  aggregate  measure.  The  second  will  be  then  I  will  zoom  in  at  the  vehicle  level.  I'll  look  at  it  by  VIN.  For  that  data,  I'll  only  be  going  back  to  2018.  For  the  aggregate  level,  I  can  take  all  of  it.  Now,  before  we  get  through  the  analysis,  actually  wanted  to  show  you  some  tools  that  JMP  has  available  that  allowed  us  to  quickly  process  and  accumulate  this  data.  Again,  to  show  you  how  easy  it  is  and  show  off  a  few  features  in  JMP.  Some  of  them  are  really  new,  some  of  them  have  been  around  for  a  little  while. Let  me  start  by  showing  you  one  thing  that  helped  us,  and  that  was  being  able  to  read  in  data  from  PDFs.  Prior  to  2018,  a  lot  of  these  data  were  compiled  in  PDFs.  Afterwards,  they  put  them  in  an  Excel  file,  which  made  it  a  lot  easier  to  just  copy  and  paste  into  a  JMP  table.  But  for  those  PDFs,  how  did  we  handle  that?  Let  me  give  you  an  example  using  data  from  2017.  This  is  actually  one  of  the  best  formatted  reports  we  see  from  companies.  Some  summaries  here,  some  tables  here  and  there.  This  in  appendix  A  is  the  data  I'm  looking  at. You  can  see  here,  this  is  the  disengagement  events.  We  have  a  cause,  usually  just  a  category  here.  They  have  the  day,  which  is  actually  the  month.  A  bit  of  a  discrepancy  there,  the  location  and  type.  But  this  is  basically  just  telling  us  how  many  disengagement  events  happen  each  month.  Then  we  have  a  table  here  or  a  series  of  tables  actually  here  at  the  back.  This  is  showing  us  for  each  vehicle,  in  this  case,  it  only  gives  partial  event  information.  There  is  not  a  lot  of  information  available  in  these  early  reports  and  then  showing  you  how  many  autonomous  miles  were  driven  each  month.  How  can  we  put  this  into  JMP?  Well,  I  could  just  copy  and  paste,  but  that's  a  bit  tedious.  We  can  do  better  than  that. Let  me  come  here.  I'm  going  to  go  to  my  File,  I'm  going  to  go  to  Open.  There's  my  PDF.  I'm  going  to  click  Open  and  JMP  has  a  PDF  import  wizard.  Awesome.  Now  what  it's  going  to  do  is  it's  going  to  go  through  and  look  at  each  page   and identify  whatever  tables  it  finds  there.  It's  going  to  categorize  them  by  the  page  and  what  the  table  is  on  that  page.  Of  course,  when  you  save  it  out,  you  can,  of  course,  change  the  name. Now,  I  don't  want  every  table  on  every  page.  What  I'm  going  to  do  is  I'm  going  to  go  to  this  red  triangle  on  this  page  and  just  say,  "Ignore  all  the  tables  on  this  page.  I  don't  want  these."  I'll  say,  ""Okay,"  I'll  do  the  same  here.  It's  a  nice  summary  table,  but  it's  not  what  I  want.  Then  I  start  saying, " This  is  the  data  I  want."  Now,  we're  going  to  notice  here,  this  is  formatted  pretty  well.  It's  gotten  the  data  I  want.  If  I  scroll  to  the  next  one,  this  is  technically  a  continuation  of  the  table  from  before.  However,  by  default,  JMP  is  going  to  assume  that  every  table  on  each  page  is  its  own  entity. What  I  can  do  to  tell  JMP  that  actually  this  is  just  a  continuation  is  to  go  to  the  table  here  on  the  page,  click  the  red  triangle  and  say  for  the  number  of  rows  to  use  as  header,  there  actually  are  none.  This  is  a  way  to  tell  JMP  that  actually  that's  a  continuation  of  the  previous  table.  We'll  check  in  the  data  here,  and  it  looks  like  it  did  it  now.  I'm  going  to  check  here  at  the  bottom  and  I  noticed,  "Oh,  I  missed  that  October  data.  That's  okay.  I'm  going  to  do  a  quick  little  stretch  there  and  boom,  I  got  it."  That's  okay.  You  can  manipulate  the  tables.  I f  it  didn't  catch  something,  you  can  stretch  and  manipulate  the  table  to  adjust  it.  You  can  also  add  tables  it  didn't  find. In  this  case,  I  missed  this.  That's  okay.  I'm  going  to  drag  a  box  around it.  Boom.  There's  a  new  table  for  you,  JMP.  I'm  going  to  go  in  here.  It's  going  to  assume  that  there  are  some  header  rows.  Actually,  there  are  none.  Okay,  great.  Now  it's  captured  that  part  of  the  data.  There's  a  bit  of  an  empty  cell  here.  That's  just  a  formatting  error  because  this  is  technically  two  lines,  so  they  didn't  put  this  at  the  top.  It's  okay.  Easy  fix  on  the  back  end.  Now  for  these  tables,  what  we  notice  if  we  go  to  that  is  it  actually  thinks,  "Well,  this  is  actually  one  table."  Unfortunately,  it's  technically  not  correct  because  there  are  two  tables,  but  it's  an  easy  fix.  I  can  simply  go  to  each  one  and  say,  "It's  actually  not  a  continuation  JMP.  This  actually  has  its  own  header." It  says,  "Okay,"  and  you  can  do  that  for  each  of  these  tables.  I  won't  do  it  for  all  of  them.  I'm  just  doing  this  to  illustrate.  What  we'd  have  to  do  is  we'd  probably  end  up  with  a  bunch  of  tables  here  where  we'll  have  to  horizontally  concatenate.  That's  just  the  way  they  decide  to  format  in  the  report.  But  JMP  has  a  lot  of  tools  to  help  us  with  concatenating  and  putting  tables  together.  But  you  can  see  this  is  a  lot  easier  than  trying  to  copy  and  paste  this  into  JMP,  making  sure  that  the  formatting  is  all  good.  JMP  is  going  to  do  a  lot  of  that  for  us. Okay,  another  helpful  tool  that  came  out  recently  in  JMP  2017  is,  if  you've  probably  heard  of  it,  the  JMP  workflow.  That  was  super  helpful  because  obviously  we  have  multiple  reports  over  multiple  years.  We'd  like  to  at  least  combine  across  all  the  years  into  two  reports,  one  with  the  disengagement  events,  one  with  the  mileage.  What  we  did  is  we  created  an  initial...  We  followed  some  steps  to  set  up  the  table  in  a  way  that  we  can  then  concatenate  them  together  into  one  table,  and  then  we  saved  it  into  a  workflow. That's  what  I  have  demonstrated  here.  This  is  a  workflow  builder  that  we  put  together  for  that.  I'm  going  to  demonstrate  it  using  this  data  set.  This  is  particular  for  our  mileage.  What  we  have  here  is  a  table.  This  represents  what  we  would  have  a  raw  output  from  one  of  the  reports.  Here  we,  of  course,  have  it  broken  down  by  VIN  number.  We've  got  a  lot  of  information  here.  We'd  like  to  reformat  this  table.  First  thing  I'm  going  to  do,  I'm  just  going  to  walk  through  each  step.  I'm  not  going  to  show  too  many  details  in  each  step.  You'll  see  what  they  are.  It's  pretty  self  explanatory. This  first  one  is  going  to  change  the  name  of  this  column  to  vehicle.  That  way  it  matches  a  column  in  our  concatenated  table.  I'm  going  to  go  over,  I'm  going  to  delete  this  total  column.  I  don't  need  that.  Then  I'm  going  to  do  a  stack  across  all  the  dates.  You  can  see  I've  got  that  here.  We  conveniently  called  it  stacked  table,  very  informative.  Now,  one  thing  I  need  to  do  here,  I  put  a  pause  here,  that's  the  little  stop  sign.  In.  That's  because  I  would  usually  need  to  go  in  and  change  the  year. Now,  something  I  could  do  right  now,  I  couldn't  really  figure  out  a  way  to  get  a  variable,  say  year,  that  you  could  just  fill  out,  put  the  year  there,  and  then  it  automatically  fill  it  in  here.  That's  maybe  something  I  can  go  to  community.jmp.com ,  go  onto  the  wish  list  and  say,  "Hey,  it'd  be  nice  if  I  could  do  this."  But  for  right  now,  I  just  put  in  the  years.  It  was  pretty  easy  to  do  compared  to  doing  this  multiple  times.  Pretty  straightforward.  But  again,  I  can  also  highlight  for  you  how  you  can  actually  go  in  and  adjust  the  script  itself.  You  can  go  in  and  tailor  this  to  your  needs.  What  this  is  going  to  do  is  recode  these  so  it  shows  the  month  and  the  year.  I'll  do  that  real  quick.  There  we  are. The  next  step  is  going  to  take  this.  Right  now,  this  is  a  category,  it's  a  string,  I  want  a  number.  That's  what  I  do  next.  Now,  this  isn't  pretty.  This  is  just  the  number  of  seconds  since  some  date  in  1900,  I  believe.  Obviously,  that's  not  pretty.  I'd  like  to  show  something  more  informative.  That's  what  I  do  in  the  next  step.  Now,  it  shows  the  month  and  the  year.  I'm  going  to  stop  here.  I'm  not  going  to  continue  because  at  this  point  I'd  have  another  table  open.  This  next  step  would  then  concatenate  the  tables  and  then  close  off  these  intermediate  tables.  What  I'm  going  to  do  is  I'm  going  to  reset  stuff.  I'll  reset,  click  here.  I'm  going  to  reopen  this  table.  I'm  going  to  do  this  just  so  you  can  see  how  fast  this  goes. Here  I'm  going  to  click  over  here,  I'm  going  to  click  Play,  I'm  going  to  click  Play  again.  Look  how  fast  that  was.  Now,  imagine  doing  this  for  multiple  reports.  How  much  faster  that  is  than  repeating  the  same  steps  over  and  over  and  over  again.  This  workflow  was  really  helpful  in  this  situation. Now,  I'm  going  to  close  all  these  out  because   it's  time  to  get  into  the  analysis.  Let's  do  that.  I'm  going  to  start  with  the  aggregate  level  data.  Here's  my  table.  I  compiled  across  all  the  data,  all  the  time  periods.  We  have  the  month,  we  have  how  many  disengagement  happened  in  that  month.  I  got  a  column  here  for  the  cumulative  total.  I've  got  here  how  many  autonomous  models  were  driven.  I  got  two  columns  here  that  I'm  going  to  talk  about  in  just  a  second.  You'll  have  to  just  wait. What  I'm  going  to  do  is  I'm  going  to  go  in,  I'm  going  to  go  to  Analyze.  I'm  going  to  go  under  Reliability  and  Survival,  and  then  I'm  going  to  scroll  all  the  way  down  until  I  reach  Reliability  Growth.  I'll  click  that.  Now  we  have  multiple  tabs  here.  I'm  only  going  to  focus  on  these  first  two  because  these  last  two  concern  if  I  have  multiple  systems.  I'll  revisit  those  when  we  get  to  the  actual  vehicle  information.  For  right  now,  let's  pretend  that  we're  looking  at  the  whole  AV  system,  the  artificial  intelligence  system  in  these  vehicles.  Think  of  it  as  one  big  system. There  are  two  ways  that  I  can  assess  this.  One,  I  can  do  as  time  to  event,  essentially  how  many  months  until  a  certain  event  happened  or  days  if  we  had  that.  Or  I  could  do  it  via  a  particular  time  stamp.  Basically,  what  was  the  time  at  which  it  occurred?  I  do  have  that  type  of  formatted  data.  I  have  it  at  the  month.  The  month  is  a  fine  timestamp.  It  just  says  in  that  month  I  had  this  many  events  happen.  That's  all  I  need  to  put  in.  I  have  all  the  data  I  need.  I'll  click  OK. Now,  great  thing  about  this  is  before  you  do  any  analysis,  you  should,  of  course,  look  at  your  data,  visualize  your  data.  It's  nice  because  the  first  thing  it  does  is  it  visualizes  the  data  for  you.  Let's  look  at  it.  One  thing  we're  looking  at,  we're  looking  at  cumulative  events  over  time.  What  we  expect  is  a  behavior  where  early  on  we  might  have  what  I'll  call  a  burn  in  type  period  where  I  have  a  lot  of  events  and  it's  happening.  I'm  tweaking  the  system,  helping  fix  it,  helping  improve  it. Then  ultimately,  what  I'd  like  to  see  is  this  plateau.  I'd  like  it  to  increase  and  then  flatten  off.  That  tells  me  that  my  number  of  incidents  is  decreasing.  If  it  goes  completely  flat,  that's  great.  I  have  no  more  incidents  whatsoever.  I  wish  the  were  like  that,  it  is  not.  But  we  can  see  patterns  here  in  the  data.  Let's  walk  through.  We  have  a  burn  in  period  here,  early  2015,  and  then  about  mid  2016,  we  flatten  off  until  about  here.  We  see  a  little  blip,  about  summer  of  2016,  something  happens.  We  get  a  few  more  incidents.  We  level  off  again  until  we  get  to  about  here,  about  late  spring  of  2018. Something  else  happened  because  we  start  going  up  again.  They're  not  very  steep.  This  one's  a  bit  longer.  Then  we  pretty  much  at  here,  we  almost  flattened  out.  We've  reached  the  period  where  we're  really  having  no  incidents  happen,  essentially,  till  the  end  of  2020.  Then  something  happens  in  2021  and  where  we've  reached  essentially  another  burn  in  period,  something's  going  on.  Essentially  what  we've  got  is  four  phases,  if  you  will,  happening  in  the  growth  of  the  system.  Something's  changed  two  or  three  times  to  impact  the  reliability. Another  way  to  visualize  this,  I'll  run  this  plot.  This  uses  some  data.  I'm  plotting  again  the  cumulative  total.  I'm  also  plotting  something  what  I  call  the  empirical  mean  time  between  failures.  It's  a  very  simple  metric  to  compute.  It's  just  the  inverse  of  the  number  of  disengagements.  It  is  a  very  ad  hoc,  naive  way  to  try  and  estimate  the  mean  time  between  incidents.  But  I  plotted  here  so  that  you  can  see,  you'll  notice  these  four peaks  that  correspond  to  the  bend  in  the  curve.  But  there  are  four  of  them  indicating  these  four  places  where  something  has  changed  in  the  system  to  affect  its  reliability. What  we  can  do  then  is  try  to  figure  out,  what  are  those  breakpoints?  One  way  you  could  do  that  is  the  reliability  growth  platform  has  a  way  to  fit  a  certain  model.  I'll  pause  here  to  talk  about  the  model  a  bit.  All  of  these  are  actually  the  same  model  with  slight  modifications.  They're  all  what  we  call  a   non-homogeneous poisson process.  That  is  a  fancy  way  to  describe  a  counting  process.  I'm  counting  something,  but  the  rate  at  which  the  accounts  might  occur  per  unit  time  is  changing  over  time.  A  Poisson process  just  means  that  at  a  constant  rate,  so  the  rate  at  which  incidents  would  occur  would  to  be  constant,  that  would  be  equivalent  to  seeing  a  straight  line. It's  very  easy  to  model,  but  it's  bad  for  reality  because  obviously  we  don't  want  the  rate  to  stay  the  same.  We  would  like  it  to  decrease  to  essentially  zero.  That's  why  we  have  a   non-homogeneous poisson process. We  want  it  to  change  over  time.  Here  we  have  a  model  where  we  can  actually  let  JMP  try  and  figure  out  a  change  point  in  the  process.  If  I  run  it,  what  it's  going  to  do  is  it's  actually  going  to  catch  this  big  piece  and  say,  "Hey,  something  really  changed  there.  For  most  of  it,  it  was  the  same  thing,  but  after  this  point,  it  really  changed."  Now  here  it's  only  going  to  change  one  at  a  time.  I  have  talked  to  the  developer  about,  wouldn't  it  be  nice  if  we  could  identify  multiple  change  points? A pparently  that's  a  bit  of  an  open  research  problem,  so  me  and  him  might  be  working  together  to  try  and  figure  that  out.  But  what  I  did  is  I  essentially  eyeballed  it  and  said,  "I  think  there  are  certain  phases.  I  think  there's  about  three  or  four  phases,  and  I  did  it  empirically,  which  is  where  you  get  this  column."  I'm  going  to  run  that  script.  Let  me  show  you  how  I  did  it.  I  come  here  under  redo,  go  to  relaunch,  and  all  I  did  was  I  added  the  phase  column  here.  This  tells  you  that  there  are  different  periods  where  the  reliability  might  have  changed  significantly,  excuse  me,  in  some  way. If  we think  of  that,  we're  going  to  look  at  the  key  metric  here  as  the  mean  time  between  failure.  We're  going  to  see  early  on,  so  this  is  in  months,  this  here  is  about  three  days,  4- 5,  about  a  week,  and  this  is  about  a  day,  day  and  a  half.  Early  on,  we  have  a  bit  of  a  low  time.  It's  pretty  frequent.  We  can  also  look  here,  I'll  show  you  the  intensity  plot.  That  might  be  another  thing  to  interpret.  What  we're  looking  for  is  we'd  like  the  mean  time  between  failures  to  be  long.  We'd  like  it  to  be  a  long  time  between  incidents,  ideally  infinite.  That  means  nothing  ever  happens,  and  our  intensity  to  decrease. What  we're  looking  here  is,  we  get  a  bit  of  a  good  start.  About  middle  of  2016,  we're  doing  really  well.  In  fact,  we  get  to  about  a  week  between  an  incident  for  any  vehicle.  There  was  a  bit  of  a  blip,  but  we  primarily  get  back  to  where  we  were  until  we  get  to  the  end  of  2021,  where  now  it's  essentially  about  a  day  between  incidents  for  any  vehicle.  Something  big  happened  here  at  the  end  of  2020  with  these  vehicles  with  this  software  system,  if  you  will. Again,  you  can  see  here  with  the  intensity,  you  can  almost  do  one  curve  and  we  get  down  to  about  six  or  seven  incidents  per  month.  Whereas  here  it's  almost  30,  essentially,  once  a  day.  We've  been  able  to  look  into  here  and  discover  what's  going  on,  at  least  at  the  aggregate  level.  Before  we  get  to  the  vehicle  level,  I'm  going  to  run  one  more  graph  that's  looking  at,  we've  got  all  these  autonomous  miles.  Could  it  be  that  if  I  drive  it  more  often,  maybe  I  encounter  more  incidents?  Could  that  have  an  effect?  T here's  a  quick  way  to  assess  that.  Just  using  a  simple  graph.  We'll  just  plot  autonomous  miles  versus  the  total  disengagements. We  see  here  for  are  a  few  number  of  disengagements,  that  might  be  true.  The  more  you  drive,  the  more  you  might  see.  But  in  general,  long  term,  not  really.  There's  really  no  big,  strong  correlation  between  how  many  autonomous  miles  driven,  how  many  engagements  you  see.  There's  something  else  going  on. T hat's  actually  what  we  found  in  the  paper  that  I  mentioned  earlier  is  that  the  mileage  impact  was  very  minimal. Now,  let's  zoom  in  to  the  individual  vehicle.  We're  not  going  to  have  all  the  data,  even  though  I  actually  do  have  it  here.  But  we're  not  going  to  have  complete  data  for  all  of  the  vehicles.  Let  me  break  it  down.  I  have  the  month,  I  have  the  vehicle  identification  number.  Notice  some  of  these,  it's  only  partial.  I  have  here  what  I  call  a  VIN  series.  This  is  very  empirical.  I'm  just  taking  the  first  four  digits  of  the  VIN.  You'll  see  here,  I'm  going  to  scroll  down  a  bit  and  we'll  see.  Let's  see,  maybe  I  will  drag  down  a  little  bit.  There  we  go. Some  of  these  VINs,  a  lot  of  them  actually  start  with  the  same  four  digits,  2C4R.  I'll  call  them  the  2C4  series.  There's  a  bunch  of  vehicles  that  have  this  as  their  starting  one.  This  identifies  a  particular  fleet  of  vehicles,  at  least  from  an  empirical  view.  If  I  scroll  down,  we're  going  to  run  into  a  different  series,  which  I'm  going  to  call  the  SADH  series.  This  is  the  one  that  was  introduced  about  2021.  That's  when  I  saw  the  Venn  numbers  change  to  the  SADH  designation. Again,  I  have  how  many  of  miles,  I  have  a  starting  month,  when  did  that  vehicle  start?  I'm  going  to  use  this  to  compute  the  time  to  events.  First,  I'm  going  to  do  a  plot.  I  think  this  is  the  most  informative  plot  you'll  see  for  this  analysis.  What  I've  done  here  is  I've  essentially  created  for  you  a  heat  map.  You  can  see  I've  got  the  heat  map  option  select.  Selected,  I  got  for  each  vehicle  and  over  time,  essentially  a  cell,  and  that's  just  going  to  indicate,  was  I  driven  in  autonomous  mode  anytime  that  month? I  got  it  color  coded  by  the  series.  These  vertical  lines  correspond  to  the  transitions  between  those  empirical  phases  I  mentioned  earlier.  What  this  is  telling  us  is  basically,  can  we  identify  what  might  have  caused  those  transitions?  Here  we  see  an  initial  series  of  vehicles,  and  it  looks  like  there  wasn't  a  big  change  in  what  vehicles  were  introduced  here.  Maybe  there  was  a  bit  of  a  software  upgrade  for  this  particular  series  that  may  have  introduced  those  new  incidents.  Here  we  see  that  a  new  series  was  introduced,  a  smaller  number  of  vehicles,  maybe pilot  series.  Then  a  bunch  of  them  introduced  about  that  same  period  where  we  saw  the  other  transition. Here,  this  seems  to  correspond  to  a  new  fleet  of  vehicles  with  maybe  a  slightly  updated  version  of  the  software.  Here,  we  see  a  clear  distinction.  Obviously,  in  2021,  a  completely  new  series  of  vehicles  was  introduced.  We  have  a  bit  of  the  old  vehicles  still  there  in  the  mixture,  but  most  of  them  are  the  new  vehicles.  That  probably  explains  why  we  got  a  new  batch  of  new  incidents.  We  got  a  burn  in  period  for  this  new  series  of  vehicles.  This  is  cool  because  now  we  have  a  bit  more  explanation  as  to  what  was  going  on  with  the  aggregate  data,  which  is  why  it's  important  to  have  this  information. Now  let's  break  it  down  by  VIN.  I  have  right  here  script  to  indicate  we've  got  a  table  here  and  it's  similar  to  the  table  I  have  previously.  Notice  some  of  these  have  been  excluded  and  this  is  because  if  for  that  particular  vehicle,  the  total  number  of  incidents  was  less  than  three,  the  platform  is  not  going  to  be  able  to  fit  a  model  for  you  because  it  needs  at  least  three  incidents  per  vehicle.  That  makes  sense.  I  have  only  one  or  two,  that's  not  really  enough  information  to  assess  the  reliability.  If  I  have  three  or  more,  now  we're  talking,  I  can  do  something. I  also  have  the  month  since  the  start.  I  have  some  cumulative  information  there  which  month  it  started.  I'm  going  to  go  ahead  then  and  run  the  platform.  Don't  worry,  I  will  show  you  how  I  did  this.  I'm  going  to  go  to  the  redo,  relaunch.  I'm  going  to  get  rid  of  that.  That's  some  leftover  stuff.  I'm  looking  at  one  of  these  two  last  platforms.  These  are  about  multiple  systems.  Now  we're  thinking  of  each  vehicle  as  its  own  system.  The  concurrent  just  means  I'm  going  to  run  each  vehicle  one  after  the  other. That's  not  what's  happening  here.  The  vehicles  are  essentially  being  run  in  parallel.  Multiple  vehicles  are  being  driven  at  a  time.  Here  I  need  a  column  to  indicate  the  time  to  event,  in  this  case,  how  many  months  since  the  start  until  this  many  events  happened.  I  have  the  system  ID.  The  one  thing  that's  not  shown  here  is  I  actually  took  the  VIN  series  and  used  that  as  a  by  variable,  which  is  why  we  have  the  where  VIN  equals  so  and  so.  There's  only  going  to  be  two  because  the  one  with  a  little  asterisks  has  no  information  about  incidents.  That's  because  that  was  two  earlier  that  was  prior  to  being  able  to  tie  the   VIN to  the  vehicle. But  they  were  there  for  completeness,   I'm cancelled  out  of  that.  Now,  what  we're  going  to  see  here  is  a  list  of  models  that  you  could  run.  These  first  four,  I'm  not  going  to  be  able  to  do  because  I  only  have  one  phase.  Essentially,  the  phase  now  corresponds  to  the  VIN  series.  But  there  are  two  that  I  could  run  and  the  only  difference  is,  do  I  want  to  run  a  model  for  all  of  them  saying  these  are  all  part  of  an  identical  system?  Makes  sense.  These  are  all  vehicles,  they  probably  run  the  same  software.  Maybe  I  can  run  a  model  for  all  of  them.  Or  I  can  have  a  model  where  I  fit  it  to  each  individual  vehicle. Before  I  run  those  models,  let's  take  a  look  at  this  plot.  Again,  start  with  the  visualization  and  what  we  see  plotted  is  all  the  vehicles.  Notice  here  there's  a  bit  of  a  shallow  slope  to  this.  Essentially  there's  a  bit  of  a  steep  curve,  but  then  it  levels  off  pretty  quick.  This  is  a  pretty  good  sign  of  reliability  going  on  here.  I'm  going  to  compare  them,  I'm  going  to  scroll  down  to  the  next  one,  the  set  series.  Now,  initially  the  axes  here,  just  so  you  know,  when  you  run  it  next  time,  the  axes  are  going  to  only  go  to  the  complete  set  of  data.  This  would  actually  be  a  smaller  range. I  fix  them  so  they  had  the  same  range.  You  can  clearly  see  that  this  is  much  steeper  than  this.  Clearly,  we  have  more  incidents  happening  with  this  new  series  than  this  one.  But  we  can  do  a  quick  model  fit.  I'm  going  to  do  the  identical  system.  Again,  it's  a   non-homogeneous poisson process.  Although  in  this  case,  I'm  going  to  ignore  the  estimates  for  right  now,  if  you  want  to  look  at  that,  you  can.  I'm  going  to  go  straight  to  the  mean  time  between  failure  and  you'll  notice  that  for  all  the  months  it's  pretty  much  flat. What  it's  essentially  done  is  this  is  just  a  poisson  process.  The  rate  is  constant,  which  is  good  for  modeling,  not  so  good  in  terms  of  assessing  it.  It's  just  saying  across  this  whole  time  for  this  particular  series,  we  pretty  much  reached  for  any  one  vehicle,  a  mean  time  of  about  five  months  between  incidents.  Now,  let's  compare  that  to  what  we  saw  with  the  aggregate  where  it  was  about  a  week,  that's  across  any  vehicle.  It's  just  saying  for  any  vehicle,  it  was  about  a  week  between  an  incident  for  any  vehicle.  Whereas  this  seems  to  be  implying  it's  about  five  months  for  any  one  vehicle. You  can  think  of  it  as  the  running  in  parallel,  so  you  can  see  it's  staggered.  Any  one  vehicle,  it  could  be  about  five  months.  Again,  this  is  an  average,  there's  a  lot  of  range  between  there.  For  one  vehicle,  it's  a  pretty  long  time.  But  in  aggregate,  they're  probably  staggered  enough  that  it  seems  like  it's  about  a  week  for  any  one  vehicle.  They  can  be  consistent  like  that.  But  this  is  still  pretty  good.  That's  about  five  months  between  an  incident,  a  disengagement  event  here  for  that  series.  If  we  run  to  the  SADH  series  and  do  the  same  model,  let  me  go  here.  There  we  see  clearly  increasing.  I'm  going  to  hide  that.  If  we  look  at  this,  it  says,  "No,"  early  on  we  probably  had  about  two  months.  That's  a  bit  of  a  start  there.  But  we've  dropped  to  less  than  a  month,  almost  two  to  three  weeks  between  incidents. Clearly,  there's  a  bit  more  work  to  do  on  this  series.  Again,  it  was  just  introduced,  so  this  is  probably  more  of  the  burn  in  phase.  If  we  get  the  2022  data,  we  might  start  to  see  it  level  off  like  it  did  in  the  previous  series.  This  actually  might  flip  and  be  more  of  a  level  curve.  That's  about  all  I  want  to  show  for  these  platforms.  I  can  show  you  some  of  the  individual  distinct  systems,  but  there  are  a  lot  of  systems  here  and  so  it's  going  to  get  crowded  very  quickly.  There's  a  plot  for  each  one.  There's  estimates  for  each  one.  You  can  look  at  the  mean  time  between  failure  for  each  one. If  there  are  particular  vehicles  you  wanted  to  call  out  and  see  how  they  might  differ,  this  is  what  you  can  do.  You  can  see  some  increase,  some  decrease,  but  overall  more  or  less  flat.  You  can  also  look  at  intensity  plots.  If  you  find  that  more  interpretable  than  the  mean  time  between  failure,  you  have  other  metrics  that  you  can  incorporate  here.  Okay, t hat's  all  I  want  to  show  for  this  platform.  Now,  of  course,  there's  data  I  didn't  include  here.  For  example,  we  could  break  it  down  by  cause. For  some  of  this  data,  cause  might  be,  I  just  need  to  take  over  because  it  was  getting  too  close  to  the  side  of  the  the  road.  Or  maybe  the  car  stopped  at  the  stop  sign,  did  what  it  was  supposed  to,  started  rolling,  and  some  other  driver  blew  through  the  stop  sign  coming  the  other  way.  In  which  case,  maybe  that  might  not  necessarily  be  a  reliability  hit.  The  car  did  what  it  was  supposed  to.  Somebody  else  wasn't.  It'd  be  interesting  to  break  it  down  by  that,  also  by  location.  The  number  of  incidents,  you  get  more  when  you're  in  the  city,  maybe  on  the  highway,  something  like  that. Real  quick,  we  should  look  at  the  mileage  impact.  Again,  same  information,  one  or  two  incidents,  sure,  that  might  change.  But  overall  it's  going  to  be  flat.  The  mileage  impact  on  the  incident  rate  is  minimal.  Of  course,  this  is  just  one  of  many  platforms  available  in  the  reliability  suite.  You  can  see  there's  a  ton  of  options,  very  flexible  for  helping  assess  reliability.  Again,  that's  all  I  have  to  show  you.  Hopefully,  I've  been  able  to  demonstrate  for  you  how  well  JMP  can  help  initiate  discovery  and  analysis.  Hopefully,  you  discovered  a  lot  of  things  about  this  particular  company's  autonomous  vehicles.  I  hope  you  enjoy  the  rest  of  the  conference.  Thank  you.
In a regulated environment, systems are put in place to ensure product safety, efficacy, and quality. Even though ‘You cannot inspect quality into a product.’, as Harold F. Dodge said, the CFR - Code of Federal Regulations, 21CFR820.250(b) states, “Sampling plans, when used, shall be written and based on a valid statistical rationale.” Lot acceptance sampling plans (LASP) provide the statistical rationale and have been used in many industries. Practitioners, however, usually rely on the tables in ANSI/ASQ Z1.4 (attributes) and Z1.9 (variables). Since the operating characteristic (OC) curve is the best way to evaluate and compare lot acceptance sampling plans, an add-in was needed to facilitate this process. The evolution of appropriate sampling plans within the biotech or medical device industries that balance the customer design requirements and technology opportunities lead us to develop an interactive JMP Add-in for designing and evaluating valid lot acceptance sampling plans for attributes and variables. During this highly interactive session using JMP, we will demonstrate how to use the add-in to efficiently design and evaluate lot acceptance sampling plans by showcasing its flexibility and ease of use.     Live  from  100  SAS  Campus  Drive, your   one  and  only  SAS  world  headquarters  in  Cary,  North  Carolina.  It's   The   Sampling  Plan  Show   starring  Stan  Koprowski.   Now  here,  he  is  the  host  of   The  Sampling  Plan  Show ,  Stan  Koprowski. Thank  you.  Thank  you  very  much.  You're  too  kind.   Thank  you.  We  love  you.  No, that's enough.  Please.  All  right,  we  have  a  great  show  for  you  today.  Before  we  get  started,  just  a  little  background  here.  I  wasn't  very  good  at  statistics.  In  fact,  I  got  a  paper  cut  from  my  statistics  homework.  What  are  the  odds?  Enough  of  that. Today  we're  going  to  talk  about  acceptance  sampling  and  sampling  plans.  We're  going  to  learn  about  the  OC  curve  and  what  it  is.   I'll  be  honest,  this  is  one  of  my  all  time  favorite  ANSI  standards.   We're  going  to  hear  about  the  ANSI  standards.  We'll  make  some  predictions  for  the  big  sampling  plan,  and  then  finally,  we'll  show  you  some  fantastic  highlights  using  the  JMP  sampling  plan  added  with  one  of  my  all  time  favorite  industrial  statisticians,  Dr.   José Ramirez. I'm  glad  to  be  here.  Proud  of this  show.  I   think  it's  going  to  be  fun.  S orry  to  hear  about  statistics  being  hazardous  to  your  health  with  a  paper  cut.  Man. It  was  a  rough  journey  there,  but  we'll  get  through  it.  As  you  see,  here's  the  title  of  our  talk,  and  then  I  will  play  some  other  slides  here  for  you.  Your  book,  I  like  this  book.  We  use  this  a  lot  in  the  division.   The  Statistical  Quality  Control  book:   The  JMP  Companion .  This  is  the  companion  book  to  Doug  Montgomery's  book.  I  think  you're  going  to  mention  Doug's  book  later  in  the  talk  as  well. That's  true. Then  I  have  your  other  book  up  here  too,   Analyzing  and  Interpreting  Continuous  Data  Using  JMP .  I  know  you've  been  a  long  time  user  of  JMP,  probably  back  since   JMP  two.   I  think  you  were  probably  one  of  the  first  support  cases  that  came  into  tech  support  with  their  own  director  of  customer  enablement,  Jeff  Perkinson.  Jeff,  as  we  all  know,  started  out  in  tech  support  and  I'll  have  to  look  back  through  the  cases.  But  what  I  understand,  you  were  a  long  time  user  of  JMP. Welcome  to  our  show.  I'm  glad  you  could  be  with  us  today.  Super  excited  to  have  you.   We're  going  to  talk  about  some  options  here  for  the  folks  to  call  in.  If  you  want  to  call  into  the  show  or  message, you can  message us  at  JMP L ot  Plan.  The  phone  number,  if  you  have  a  rotary  dial  phone,  you  can  call  us  up  there,  or  if  you  want  to  reach  us  on  Instagram  at  JMP  Lot  Acceptance  Sampling  Plan.  Let's  go  ahead  and  see.  I  think  I  do  hear  a  call  again.  Wait  a  minute. Let me  see  who  that  is. Yes,  this  is  Dr.  Julian  Parris  calling  in.  I'm  the  Director  of  JMP User  Acquisition  and  I  have  a  question  for  Dr.  Ramirez. Go  ahead,  Julian. Dr.  Ramirez,  can  you  explain  the  difference  between  lot  acceptance  sampling  plans  and  variables  acceptance  sampling  plans? I  don't  really  believe  that  was  Julian.  That  was  probably  Julio  from  down  by  the  school yard. Julio,  do  you  have  a  question? Yes,  he  did.  He  was  asking  if  you  could  give  us  a  little  introduction  to  sampling  plans. Sampling,  okay .  He's  doing  some  sampling  by  the   schoolyard.  I  wonder  what  he's  doing  there.  Julio,  since  you're  in  a  school,  you're  by  the   schoolyard,  let's  go  back  to  the  dictionary  and  look  at  sampling,  what  the  dictionary  says.  There  are  a  few  definitions  here  in  the  dictionary,  and  one  thing  I  like  about  the  first  one  it  talks  about  a  suitable  sample.   For  statisticians  that  has  the  meaning,  part  of  that  is  there,  the  sample  has  to  be  representative.   For  those  of  you  familiar  with  stats,  the  other  piece  that  we  also  include  in  a  definition  of  suitable  sample  is  random. But  what  you're  talking  about,  about  these  sampling  plans  is  the  second  definition,  and  they  got  it  right  because  a  sampling  plan  is  essentially  deciding  on a  small  portion  of  a  lot  or  a  population.  The  population  can  be  either  infinite  or  it  can  be  a  finite  number  like  10,000  items.   We  want  to  take  a  small  sample  from  that  so  we  can  do  some  inspection.   By  inspection,  we  mean  that  we  are  going  to  decide  the  fate  of  that  population  or  the  fate  of  that  lot. Okay,   I  understand  a  little  bit  of  what  you're  saying  there.   You're  going  to  take  a  sample  from  a  population  and  then  what's  the  difference  between  the  sample  types?  It  looks  like  you  were  talking  here  about  an  inspection.  Is  it  always  an  inspection  or  are  there  other  types  of  sampling  that  you  can  do? Well,  in  general,  when  people  think  about  lot  acceptance  sampling  plans,  there's  some  type  of  inspection,  some  type  of  checking  that  goes  on.  T he  way  we're  going  to  do  this  sampling  is  we're  going  to  apply  some  statistical  principles  [crosstalk 00:06:22]. Oh, gosh, that's scary. But  actually  that's  what  the  agencies  want.  For  example,  if  you  look  at  some  of  the  documents  from  the  FDA,  if  you're  required  to  do  some  type  of  sampling,  they  want  you  to  do  it  in  a  statistical  way.   How  are  we  going  to  do  that?  That's  part  of  this  show  that  we're  having.  But  in  the  old  days,  and  we're  going  to  talk  about  those  old  days,  people  used  standards,  and  those  are  the  standards  that  are  used  right  now.  The   ASQ/ANSI  Z1.4  and  Z 1.9. That  sounds  even  scarier,  but  I'm  a  bit  confused.  I  thought  when  I  was  doing  my  research  before  I  brought  you  on  as  a  guest,  that  there  were  some  military  standards.  I  thought  the  sampling  plans  were  based  on  military  standards  and  now  you  just  told  me  there's  some  other  standards  in  play  here.  W hat's  the  story  with  this? Well,  yeah,  that's  true.   These  sampling  plans  have  a  long  history.  They  go  back  to  probably  the  1930s,  1940s,  and  there  were  two  military  standards,  the  105 E,  which  is  the  one  that  corresponds  to  the  Z 1.4,  and  that's  what  you're  doing  using  a  technical  term,  sampling  by  attribute,  and  then  there  is  the  military  standard,   414,  which  is  the  Z 1.9  that  corresponds  to  sampling  by  variables.  You  can  think  of  discrete  versus  continuous. What  happens  is  that  those  standards  were  taken  over  now  by  the  American  Society  for  Quality,  that's ASQ,  and  the  American  National  Standard  Institute,  the  ANSI,  and  they  rebranded  those,  and  now  they're  called  Z 1.4  and  Z 1.9.  But  they're  still  the  same  standard,  the  same  table  that  people  use  to  generate  sampling  plan. L et's  talk  about  how  we  do  that.   The  way  to  understand  this  is  that  every  type  of  lot  acceptance  sampling  plan,  and  that  is  the  L-A-S-P  or  LASP,  has  components  and  risks.  T he  components  of  the  plan  essentially  is  how  many  samples  do  I  need  to  take  from  a  population  of  10,000  or  infinity  that  I'm  going  to  test?  I'm  going  to  do  some  inspection,  and  I  want  to  know  if  in  that  sample,  there  is  a  pre- specified  percent  defective. We're  going  to  define  quality  as  a  percent  defective  in  the  population.   What  we're  doing  is  taking  a  sample  to  see  if  that  sample  contains  that  or  less.   In  order  to  determine  if  we're  going  to  pass  or  fail  that  lot,  we  also  relied  on  the  acceptance  number,  which  tells  us  how  many  defective  parts  out  of  the  sample  we  can  accept  in  the  plan.  Anything  that  is  greater  than  that,  it's  going  to  say  we're  going  to  reject  or  fail  the  lot.  That's  why,  as  I  said  before,  we're  determining  the  fate  of  the  lot  with  this  [inaudible 00:10:03] . Here  we  go.  I  had  my  fate  a  long  time  ago  with  that  paper  cut.  I'm  a  little  anxious  here.  What  kind  of  fate  are  we  talking  about? Yeah,   we're  going  to  decide  if  we  are  going  to  pass  the  lot  and  make  it  available  for  whatever  purpose  that  lot  is  being  manufactured,  or  we're  going  to  put  it  in  quarantine  [inaudible 00:10:31]   and  maybe  do  some  more  inspections  or  trying  to  understand  why  the  fraction  defective  is  larger  than  the  prespecified  one. You  mentioned  that  there's  some  risk  involved.  What  kind  of  risk?  Is  it  out  of  my  control  or  is  it  within  my  control?  Whose  risk  is  this? Yeah,  there's  risk,  and  that's  the  beauty  about  my  profession.  In   statistics,  you  don't  have  to  be  certain  about  anything.  I  can   95%  confident.  I  can  even  go  up  to  99%  confident,  I  don't  have  to  be  certain.   Again,  for  those  of  you  familiar  with  statistics,  you  know  what  we're  talking  about.   There's  the  chance  that  there's  a  perfectly  good  lot  that  we're  going  to  reject  because,  again,  we're  taking  a  small  sample,  so  we're  not  sampling  the  whole  population.   There's  a  risk  associated  that  this  sample  may  determine  that  a  good  lot  is  not  good  or  there's  a  chance  that  a  bad  lot  may  be  released.  Y ou  can  think  in  terms  of  false  positive.  Y ou  think  in  terms  of  a  medical  test. That's  similar  to  like  if  you  were  to  take  the  COVID-19  test.  If  you  actually  had  it  and  it  came  back  negative ,  is  that  a  false  positive? Yeah,  it's   like  that,  a  false  positive.   We  may  have  a  false  positive  that  means  this  lot  is  bad,  but  the  sampling  plan  that  it  was  good.  Or  vice  versa,  we  have  a  good  lot  and  we're  going  to  reject  it  because  the  sampling  [crosstalk 00:12:19] .  That's  why  it's  important  to  use  a  statistical  principle  in  designing  these  sampling  plans.  That's  part  of  why  people  use  standards.  Those  were  derived  in  a  way  that  when  you  select  a  plan  from  those  standards,  these  risks  are  going  to  be  balanced.   That's  what  we  talk  about  the  generation  of  the  lot  acceptance  sampling  plan. We  have  these  two  risks.  Again,  we  can  say  that  a  good  lot  is  going  to  be  rejected  and  there's  also  a  chance  that  a  bad  lot  may  be  released  like  the  false  positive.   We're  going  to  assign  some  probabilities  to  this  risk.   One  thing  we  want  to  make  sure  is  that  if  it's  good,  we  want  to  accept  that  most  of  the  time.   How  do  we  define  that? Well,  standard  practice  is  used  95%.  A gain  that  sounds  like  95%  confidence  when  you  use  some  type  of  statistical  test.   What  we're  saying  is  we're going  to  pre define  a  fraction  defective  for  that  lot  and  we  are  going  to  select  a  plan  that  guarantees  that  or  almost  guarantees  that  95%  of  the  time,  a  good  lot  is  going  to  be  accepted. A  good  lot  is  going  to  pass. On  the  other  hand,  we're  going  to  also  say  that  we're  going  to  have  a  high  chance  of  rejecting  a  bad  lot.  Or  if  you  flip  that,  you  can  say  there's  a  small  chance  of  passing  a  bad  lot.   90%  here  transforms  itself  to  a  10%.   We're  saying  there's  a  95%  of  accepting  a  good  lot,  but  only  a  10%  chance  of  accepting  a  bad  lot.   Those  are  standard  numbers  in  sampling  plans,  in  the  standards  and  the  way  people  use  those  sampling  plans. You  were  talking  about  the  user.  The  user  has  the  option  of  changing  those  numbers.   Rather  than  use  95%,  we  can  use  99%.  Rather  than  using  10%,  we  can  use  5%.  However,  if  we  get  too  greedy,  then  as  some  of  you  may  know,  the  sample  size  is  going  to  increase  and  that's  part  of  that  balance.  We  want  to  find  a  sample  size  that  is  small  enough  that  it's  going  to   guarantee  these  probabilities. Should  we  do  some  examples?  Are  there  things  we  could  do  to  explain  some  of  the  balance  that  you're  talking  about  between  these  two  competing  risks? Yes. It  sounds  like  one  might  be  on  the  consumer  side,  and  maybe  one  of  the  risks  is  on  the  producer's  side. Exactly.   The  first  chance  of ...  If  you  look  at  the  95%  chance,  that  means  there's  a  5%  chance  that  a  good  lot  is  going  to  be  rejected.  That's  on  the  producer  side.   Because  as  a  producer,  you  don't  want  your  good  lot  to  be  rejected.  The   loss  of  material  or  good  product. On  the  other  hand,  the  10%  chance  of  accepting  a  bad  lot,  that's  on  the  consumers  because  we  don't  want  the  consumers  to  receive  something  that  is  bad.  Now,  in  order  to  do  this,  there's  this  tool  that  we  use  in  sampling  plans  which  is  called  the  operating  characteristic  curve  or  OC  curve.  Some  people  may  be  familiar  with  that.   Those  curves,  I'm  going  to  show  you  some  examples,  are  the  ones  that  are  used  to  find  that  balance  between  this  risk  and  probability.  Risk  and  probability. What  is  an  OC  curve?  An OC  curve  is  essentially  a  plot  that  shows  you  the  probability  of  accepting  a  lot  as  a  function  of  that  fraction  defective  in  the  population.   You  look  at  these  two  curves,  the  blue  line  here  on  the  Y- axis,  we  have  the  probability  of  accepting  the  lot,  and  the  X- axis,  we  have  the  proportion  defective  in  the  population. As  you  can  see,  when  the  proportion  defective  is  very  small,  there's  a  high  probability  of  accepting  the  lot.  A s  soon  as  that  probability  starts  increasing,  then  the  probability  of  accepting  that  lot  decreases,  and  that's  the  shape  that  we  want  to  see  in  an  OC  curve. Now,  remember  we  have  two  probabilities  and  two  risks.   We  want  a  95%  chance  of  accepting  a  good  lot  and  a  10%  chance  of  accepting  a  bad  lot.  Now,  the  definition  of  good  and  bad  is  in  terms  of  that  proportion  defective  in  the  population.   We  as  users  of  sampling  plans,  we  have  to  define  what  is  a  good  fraction  defective.  Of  course,  ideally,  the  good  fraction  defective  should  be  zero,  but  that  will  throw  a  [inaudible 00:17:22]   in  the  math.  If  you  know  that  zero  divide  by  zero,  you  get  infinity,  meaning  you  have  to  sample  everything  if  you  want  perfection.   What  we  do  is  define  a  small  number,  and  that  is  called  the  AQL,  the  acceptable  quality  level. On  the  other  side,  we  define  the  RQL  or  rejectable  quality  level.  Granted,  there  are  many  terms  that  people  use.  Sometimes  you  may  see  the  LTPD  instead  of  RQL.  That's  the  lot  tolerance  percent  defective,  the  maximum  percent  defective  that  you  can  accept  in  your  population.   With  the  probabilities  and  those  fraction  defectives,  you  define  two  points  in  this  curve.  H ere,  the  AQL  is  1.8%,  and  we  have  a  95%  probability  of  accepting  that. What  that  means  is,  as  long  as  the  fraction  defective  in  the  lot  is  less  than  2%,  let's  say,  then  there's  a  high  chance  of  accepting  that  lot.  But  as  long  as  it  goes  higher,  like  2%,  then  we're  going  to  accept  that  lot  very  infrequently.   These  two  points  are  the  ones  that  you  look  at  in  the  OC  curve,  and  those  are  the  ones  that  are  going  to  determine  the  sample  size  and  the  acceptance  criteria. Got  it. I  think  it  will  help  Julio  if  we  run  some  example. I  think  Julio  is  texting  and  he  said  that  would  be  great  if  we  could  do  an  example.  Okay,  let's  do  that.  Why  did  you  develop  or  why  did  we  produce  a  JMP  add-in? Yes,  Julio,  what  we're  going  to  do  is  we're  going  to  show  you  an  app  to  do  this.   People  sometimes  ask  us,  "Why  did  you  do  this?  Why  do  you  spend  a  lot  of  time  for  writing  code  and  package  it  in  an  add- in?"  Well,  to  tell  you  the  truth,  [crosstalk 00:19:30] . That  is  really  tiny.  I'm  going  to  have  to  get  a  new  set  of  readers  or  a  giant  magnifying  glass.  Is  that  really  how  people  do  this? Now  it may  seem  weird,  but  still,  I  believe  in  some  industry,  people  may  use  a  standard.  Granted,  it may  not  be  the  old  book  that  they  use,  maybe  a  pdf,  but  sometimes  they  still  use  this.  But  these  are  very  tedious  and  they're  discreet  in  the  sense  that  they're  just  approximations  to  the  plans. T here's  a  process  to  do  that.  Of course, there  are  multiple  tables  that  you  have  to  go  through  in  order  to  find  the  appropriate  sample  size.  W hat  we  wanted  to  do  is  make  our  life  easier,  actually,  because  to  be  truthful  here,  we  use  sampling  plans.  We're  also  tired  of  using  these  tables.  So  we  wanted  to  automate  the  generation  of  the  LASPs. Okay,  got  it.  What  else  do  we  need  to  know  before  we  do  some  examples? This  is  JMP,  which  is  one  of  the  greatest  pieces  of  software  out  there  for  doing  statistics  and  getting  insights  out  of  your  data.  A nother  thing  that  we  did  is  that  not  only  we're  automating  the  generation  of  the  plans,  making  life  easier,  but  we're  using  all  the  visualization  tools  in  JMP,  like  the  profilers,  to  understand  these  OC  curves.  Remember,  we  just  showed  you  that  the  way  we  determine  or  generate  the  plans  is  via  an  OC curve.  The  OC curve  is  also  very  important  to  evaluate  plan.  How  do  we  know?  If  someone  gives  us  a  sampling  plan,  how  do  we  know  if  that's  good?  T hat's  part  of  this. Let's  look  at  an  example.  T his  comes  from  Professor  Montgomery's  book.  A gain,  here a  shameless  PR  for  us.  We  wrote  a  companion  book.  You  saw  that  book  at  the  beginning  for  this.  It's  called   JMP  Base .  I f  you  look  at  this  figure,  this  is  in  chapter  15, P rofessor  Montgomery  also  shows  another  approximation.  It's  another  way  of  generating  sampling  plan,  which  is  using  a  nomo graph.  A  nomo graph  is  this  figure  that  you  see  here.  Here,  you  have  to  figure  out  what  your  AQL  and  RQL  are,  the  probabilities,  and  that  you  have  to  go  in  there  and   approximate  that. Here,  they  give  you  an  example  where  they  say  the  acceptable  quality  level  is  2%  or 0.02 ,  the  rejectable  quality  level  is  8%  or 0.08 .  The  probability  of  accepting  a  lot  that  has  an  AQL  of  2%  of  less  is  0. 95.  As  we  say,  that's  high.  T he  probability  of  accepting  a  lot  that  has  RQL  of  8%  or  more  is  only  10%.  A gain,  these  0. 95  and  0.1  are  our  standard  value  in  industry. Y ou  notice  this  diagram  here,  they  put  the  0.02,  they  draw  a  line  here  to  intersect  the  0. 95.  The  0.08  intersects  at  that,  and  then  you  intersect  those  two  lines  and  you  guess  at  that.  Okay,  that's  a  93  plan.  Meaning,  out  of  an  infinite  population,  you  take  90  samples,  you  inspect  them  all,  and  you  can  accept  up  to  three  defectives.  If  you  see  more  than  three  defectives,  you  reject  the  lot.  If  you  see  three  or  less,  then  you  accept  the  lot.  T hat's  sampling  plan.  A  little  cumbersome  too . Yeah,  it  looks  a  little  difficult  to  line  up  exactly.   I  hope  we  could  get  away  from  doing  that  with  the   add-in. Well,  let's  show  Julio  how  we  can  do  that  with  JMP. Okay,  sounds  like  a  good  idea.  All  right,  Julio,  let's  jump  over  to  some  examples  here.  H ere  is JMP  and  to  get  to  the   add-in,  you  just  go  to  your   add-in,  JMP  Sampling  Plans,  and  let's  look  at  attribute  sampling  plans.  T hen  this  particular  one,  we're  going  to  do  just  a  single  lot  acceptance  sampling  plan.  W e  have  the  menu. After  we  make  that  choice,  it  comes  up  and  it  gives  us  three  options.  We  can  evaluate  an  attributes  plan,  we  can  generate  or  create  an  attributes  plan,  or  we  can  compare  plans.  The   add-in  gives  us  the  ability  to  compare  up  to  five  different  plans.  There's  an  option  to  keep  the  dialogue  open  in  case  you  want  to  look  at  more  than  one  plan.  L et's  go  ahead  and  generate  that  plan  that  you  were  just  sharing  here  from  Dr.  Montgomery's book. I'll  give  you  the  numbers.  Let's  see. All  right. This  is  the  interface. [inaudible 00:24:51] AQL. Right. Y ou  have  the  couple  of  different  sections  in  the  interface.  You  get  to  put  in  your  quality  levels,  your  probabilities,  and  then  you  have  the  optional  area  about  the  type  of  lot  sampling  you're  going  to  do.  T hat  lot  sampling  is  based  on  a  distribution.  S o  we  could  either  do  a  hypergeometric  distribution  or  a  binomial  distribution. Lets put Montgomery's  numbers  there.  The  AQL in  Professor  Montgomery's  book  is  0.02  or  2%.  The  RQL  he  has  is  0.08. All  right. Oh,  they're  pre- populated.  Yes,  those  are  the  standard  values, 0.95. The  standard  values.  So  we  keep  the  default  there.   I  think  you  told  me  there  was  no   theoretical  lot  size  here.  So  we're  going  to  do  type  B. Actually,  in  Montgomery's book,  it  says  that  he's  using  a  binomial  nomograph,  so  it's  meaning  he's  using  the  binomial  distribution.  So  yes,  that's  the  right  choice. Then  all  we  have  to  do  is  hit  okay,  and  then  what  happens?  JMP  gives  us  this  curve  here  and  an  output  window  and  a  report.  T here's  three  sections  to  the  report.  You  get  a  little  at  the  top  of  summary,  you  get  a  summary  of  the  plan.  I t  shows  you  your  input  parameters  and  then  what  the  plan  generated  as  a  sample  size  and  acceptance  number.  T hen  it  gives  you  some  information  on  how  to  interpret  the  plan. That was helpful. O ut  of  the  plan  recommendation  here,  of  the  98  samples,  it  says  you  can  accept  the  lot  here  as  long  as  the  number  of  defectives  is  less  than  or  equal  to  four  out  of  that  98  sampled.  O therwise,  you  reject  it  if  it's  greater  than  that.  T hen  it  tells  you  some  additional  information  there.  I f  we  look  at  that  OC  curve,  I  think  this  is  what  you  were  showing  us  earlier. Yes. So  we  have  a  probability  at  95%  and  the  quality  level  of  2%.  But  if  I  recall,  you  told  me  in  the  literature  that  this  was  a  plan  of  90 and 3.  So  I'm  confused  again  here  or, should I say, Julio texted  me  and  said  he's  confused.  How  is  it  different  here? Or  why  is  this  different? We're  getting  98  and  4.   Professor  Montgomery,  in  his  book,  he's  showing  the  nomo graph  and  he's  getting  90 and 3.  Let  me  go  back  there  just  1  second.  R emember,  what  we're  using  is  this  graph  here  where  you  have  all  these  approximations.  You  have  a  line  that  goes  200,  300,  you  go  between  70 and 100. Got  it. There's  not  really  a  93  or  94,  anything  like  that.  A s  I  say,  you  had  to  approximate  these.  T his  nomog raph  is  just  an  approximation.  What  we're  actually  trying  to  do  is  solve  these  equations  for   those  of  you  mathematically  inclined.  The  one  minus  alpha  is  the 0. 95,  the  beta  is  the 0.10 ,  p1  is  the  AQL  and  p2  is  the  RQL.  I f  we  put  all  those  four  things  in  here  and  we  solve  these  equations  to  try  to  find  the  minimum  n  that  satisfied  those  four  things. W hen  you  do  that,  actually, software  does  that  for  us,  the  code  that  we  wrote,  we  get  the  98  and  4. I  get  that. The  moral  of  the  story  here  is  that  the  nomograph  gives  approximate  lot  acceptance  sampling  plan  again,  because  it's  just  an  approximation  game.  T his  is  again  one  of  the  advantages  of  using  the   add-in,  because  you  get- Sampling plan add-in,  of  course. -more  exact  sampling  plan.  Y ou  show  that  we  can  evaluate  a  sampling  plan.  L et's  do  that.  Why  don't  we  evaluate  this  plan,  the  93?  Show us ho w  to  do  that. Since  I  left  that  window  open,  we  don't  have  to  go  back  to  the  menu  again.  L et's  evaluate  a  plan  now.  W e  have  in  that  interface,  again,  we're  going  to  put  in  our  previous  0.02, 2%,  and  I  think  he  told  me  the  RQL  was  8%. Yes. This  time,  though,  instead  of  a  sample  size  of  20,  we're  going  to  do  90  and  evaluate  three.  Again,  it's  a  binomial. Here,  you're  entering  the  four  quantities  that  we  talked  about  in  the  generation,  but  you  also  enter  the  actual  sampling  plan  that  you  want  to  evaluate. Exactly.  Now  when  I  say,  okay,  it's  going  to  look...  Sorry  about  that. I'll j ust  redo  that  quick. [inaudible 00:30:31],  no? Yes,  sorry.  I  want  to  evaluate  that plan,  you get 90 and 3. Three, yes. When  I  rerun  that,  we  get  the  exact  same  style  of  report,  but  the  information  is  slightly  different,  I  see  here.  I  noticed  also  down  in  this  table,  there's  some  color  coding  and  direction  of  the  arrows. Yeah,  I see some  red  there.  I  see some  red,  yes.  That's  an  issue. Is  this  an  indication  that  the  specified  quality  level  is  better? Actually,  no.  I  think  that  the  reason  that  it's  red  is  because  what  happens  here  is  what  this  is  telling  us.  If  I   use  a  sample   size  90  with  an  acceptance  number  of  three,  you  can  see  the  associated  probability  of  acceptance  is  0. 89.  Remember,  we  wanted  that  to  be  0. 95. So  it's  actually  lower. It's lower and that's  why  that  is  red.  T his   add-in  is  giving  you  a  signal  that  your  probability  of  acceptance  is  less  than  the  one  you  specify  and  that's  an  issue.  T hat's  why  the  98 and 4  is  a  better  plan.  A lso  you  can  see  at  the  bottom  that  the  probability  of  acceptance  for  defective  lot  is  6%  rather  than  10%.  I n  that  case,  it's  blue  because  it's  better,  that  probability  is  better.  Y ou're  going  to  have  a  smaller  consumer's  risk.  You  can  see  that  the  n  is  6.47  versus  the  10. 67.  T hat  is  what's  happening. That  producers  risk  is  really  just  the  difference  between  that  one  minus  the  associated  probability  of  acceptance  there? Exactly. Got  it. Exactly.  T he  producer's  risk,  we  want  it  to  be  5%,  and  in  this  case, it's  10%.  No w  this  is  something  that  I  haven't  seen  in  any  other  software.  Normally,  what  you  see  is  the  plot  on  the  lot.  You  get  that  for  the  associated  AQL  and  all  that,  and  they  assume  that  the  probabilities  are  the  ones  that  you  specify,  but  they're  not. W hat  the  add-in  does  too,  is  it  flips  things  around  and  says,  okay,  if  rather  than  fixing  the  AQL  and  RQL,  I  fix  the  probabilities,  I  want  to  be  95  because  that's  what  it  is,   I'm  neurotic  that  way.  I  want  it  to  be  95  and  0.1.  Then  what  are  the  corresponding  AQL  and  RQL  for  that?  T hat's  what  that's  telling  me. Okay.   I  wasn't  really  just  seeing  double.  It's  really  a  different  calculation  on  the  right  hand  side. Got it. What  that  says  is  if  the  probability  of  acceptance  peaks  at  0. 95,  then  the  AQL  is  not  2%  but  1.53%. Okay, got it. It  has  to  be  way  less  than  that. So it's a... Okay,  go  ahead. A lso  at  0.1,  then  it's  not  8%  but  7%. Okay.  7.3%,  roughly. That's  why  both  cases,  they  are  blue.  They  are  blue.  A gain,  just  very  quickly  here  to  show  this.  In  summary,  for  this  one  is  that,  again,  the  nomogram  gives  an  approximate  lot  acceptance  sampling  plan.  The  90  and 3  plan  shows  you  that  we're  not  hitting  that. Are  you  sharing? Yes,  I'm  sharing. Okay. Hopefully,  people  can  see  that.  Our  producers  risk  is  now  10%  versus  5%.  But  there  is  one  more  thing  that  you  had  there,  which  is  compare.   I'm  curious,  can  we  use  that?  I'll tell  you  tell  can  tell  you   what  I  want  to  do.  I  want  to  use  the  add-in  to  compare  the  planning  Professor  Montgomery's  book  of  '93  with  a  plan  that  the  add-in  gave  us,  which  is  98,  4 All  right.  W e  did  mention  that  we  can  compare  up  to  five  different  plans.  Y ou  want  me  to  compare  the  two  plans  that  we  just  created.  All  right. Again,  that  8%  RQL.  In  this  case,  we  had  a  90  and  an  acceptance  of  three. Yeah. Then  I  think  you  told  me  it  was  98  and  4. That's  what  they  do.  I  didn't  tell  you.  That's what  the   add-in  gave  us.  Yeah. Wow,  that,  that  is  pretty  slick.  A gain,  if  you  wanted  to  do  more  than  two,  you  would  just  check  the  additional  rows  and  then  the   add-in  will  calculate  up  to  five  different  comparisons  here.  I'm  going  to  go  ahead  and  click  OKay.  N ow  it  looks  a  little  bit  different,  Jose,  here.  Now  I'm  seeing  two  OC  curves.  I'm  getting  a  comparison  of  both  of  my  OC  curves  on  the  same  plot  I  see  here. Exactly.  H ere  the  blue  one  is  the  plan  93 and he red line is- T he  blue  is  the  93.  Yeah,  exactly.  I  see that. The  red  line  is  the  98, 4 .  Y ou  can  see  that  the  red  line or  curve,  is  on  top  of  the  blue  curve.  T hat's  what  you  want  to  see.  You  want  the  curve  to  be  on  top,  literally.  T his  shows  you  that  you  have  higher  probabilities  of  acceptance  according  to  the  parameters,  the  IAQL  and  RQL  that  we  prespecified  with  the  plan  98, 4  than  with  the  plan  93. T his  is  very  helpful  because  you  may  be  in  situations  where  you  may  get  an  approximate  plan  from  a  book  or  someone  may  suggest,  "Hey,  why  don't  you  use  this  plan?"  With  this  one,  you  can  compare  them  all  and  see  which  one  is  better.  I t's  easier  to  negotiate  the  sample  size  using  these  tools  than  just  getting  into  an  argument  and  say,  "No,  we  should  use  93  because  that's  in  the  book  or  something  like  that. "I  think  that's  a  great  feature  to  be  able  to  compare,  and  that  way  you  move  the  discussion  into  some  actual  information  and  rather  than  subjective,  individuals  can  now  compare  directly.  Ag ain,  if  you  want  the  reference  lines,  it  didn't  make  sense  to  show  all  the  reference  lines  across  five  different  curves. W hat  we've  done  is  we  could  just  display  a  single  set  of  reference  lines  by  toggling  the  filter,  and  it  will  update  the  graphs  for  you  as  you  toggle  between  them.  That  way,  if  you  do  want  to  see  that  individually,  you  could  do  that,  and  then  you're  just  focused  in  on  the  table  and  the  graphs  for  that  particular  set  of  observations. That's,  I  think,  a  nice  feature,  Julio. It  is. All  right. It's time for one more.  [inaudible 00:38:32]. I  think  that  Julio  is  probably  getting  tired  there. He  is  very  exhausted. Julio, there's  some  other  things  that  the  add-in  can  do.  So  maybe  it should  be  a  follow  up  to  this. Maybe  we  could  do  another  session  for  Discovery  Summit  Japan  or  Discovery  Summit  China. We'll  continue  showing.  [inaudible 00:39:05]. I  think  I'm  getting  breaking  news  coming  across  here.  Just  into  the  news  center  here  of  the  show.  A nyone,  it  looks  like,  can  get  that  sampling  plan   add-in.  If  you  just  go  to  the  JMP  user  community  and  search  for  the  sampling  plan   add-in,  you  could  download  it  yourself. And that's free. No? That's free? Absolutely  free.  We  would  never  charge  for  that  on  the  community.  G o  ahead  and  download  that.  If  you  have  feedback  or  you  run  into  issues,  feel  free  to  message  me,  and  we'll  get  those  defects  entered  and  get  a  corrected  version  out  there  as  soon  as  we  can.  Other  things  just  before  we  wrap  up  here,  I  just  want  to  say  thank  you  to  Jose.  Thank  you  for  joining  us  on  the  JMP  sampling  plan  show  today.  It  was   great  to  have  you  here.  Really  great.  T hank  you  again.
The prediction profiler in JMP® is a powerful tool for visualizing and optimizing models from designed experiments. This presentation will focus on new features in the prediction profiler for exploring and optimizing models with known constraints and for determining factor ranges that assure quality as defined by the specifications associated with Critical Quality Attributes (CQA), thereby solving a fundamental Quality by Design (QbD) problem. While previous versions of JMP were able to create designs that respected disallowed combination constraints and combinations of factors that are known in advance to be physically impossible or undesirable, the model exploration and optimization in the profiler in the last step were unable to obey these constraints. We will demonstrate how the profiler, since JMP® 16, handles these complex design constraints automatically when exploring the model and performing optimization. We will also demonstrate how the Design Space Profiler, new in JMP® 17, finds subregions of the design space that maximize the probability of maintaining product quality for the CQA specifications while maintaining maximum flexibility. These two capabilities make the prediction profiler indispensable for a high-quality product and process innovation.     Hello.  My  name  is  Laura  Lancaster.  I  am  a  statistical  developer  in  the  JMP  group.   Today,  I'm  here  to  talk  to  you  about  dissolved  combinations  and  operating  region  optimization  for  critical  quality  attributes  with  the  JMP  17  profiler. Everything  we're  going  to  talk  about  today  has  to  do  with  the   Prediction Profiler,  and  I  hope  that  everyone  is  familiar  with  it.  It's  a  wonderful  tool.  But  if  you're  not,  the   Prediction Profiler  is  a  tool  in  JMP  that's  great  for  interactively  exploring,  visualizing,  and  optimizing  the  models  that  you  create  in  JMP.  Specifically,  we're  going  to  talk  about  two  recent  new  features  that  were  added  to  the   Prediction Profiler.  The  first  is  the  ability  to  explore  and  optimize  the  models  that  you've  created  in   DOE  in JMP  that  have  known  disallowed  combination  strength.   The  second  is  the  ability  to  determine  an  optimal  operating  region  for  your  manufacturing  processes  that  ensure  both  quality  and  maximum  production  flexibility. Let's  go  ahead  and  get  started  talking  about  exploring  and  optimizing  models  from  designed  experiments  with  this  all  combination  constraints.  I t  often  happens  that  when  you're  designing  experiments,  it's  not  possible  or  it's  not  desirable  for  various  reasons  to  be  able  to  experiment  over  the  usual  entire  rectangular  design  region.   When  that  happens,  you  need  to  be  able  to  apply  constraints  to  your  design  region  before  you  create  the  design  and  certainly  before  you  run  the  design.  Thankfully,  ever  since  JMP  6,  which  has  been  a  long  time,  the  custom  design  platform  has  been  able  to  create  design  experiments  with  constrained  design  regions.   Since  then,  constraint  support  has  also  been  added  to  fast,  flexible  filling  designs  and  covering  array  designs. Now,  what  types  of  constraints  are  available  in  JMP's   DOE platforms?  The  first  type  of  constraint  is  the  simpler  of  the  two.  It's  linear  constraints  on  continuous  and  mixture  factors.  H ere's  a  picture  where  we  have  two  linear  inequality  constraints  that  are  shown  in  the  gray  shaded  region.   Then  the  design,  you  can  see  stays  out  of  the  disallowed  linear  constrained  region. The  next  type  of  constraint  is  called  a  disallowed  combination  constraint.  It's  a  more  general  and  can  be  more  complicated  type  of  constraint.  It  can  consist  of  continuous,  discrete,  numeric  and  or  categorical  factors.  What  it  is  is  a  constraint  that's  a  JSL  Boolean  expression  that  evaluates  to  true  for  factor  combinations  that  are  not  in  your  design  region.   Here's  an  example. We  have  a  two- factor  design  where   X1 L1  and   X2 L3  cannot  be  in  the  design.  They're  disallowed  and  they're  written  as  a  JSL  Boolean  expression,  which  you  can  see  right  here.  N otice  that  this  design  is  created  and  stays  out  of  this  disallowed  region.  Now,  originally,  all  of  these  disallowed  combination  constraints  had  to  be  entered  as  JSL  like  this.  But  then  in  JMP  12,  a  disallowed  combination  filter  was  added  that  made  it  easier  to  create  these  JSL  expressions  if  you  have  fairly  easy  disallowed  combinations  such  as  individual  factor  ranges  combined  with  and/ or  expressions.  We'll  look  at  an  example  of  this  shortly. Now,  what  about  the   Prediction Profiler  with  constrained  regions?  Why  is  it  important  for  the   Prediction Profiler  to  be  able  to  obey  constraints  when  you  have  models  with  constraints?  Well,  if  the  profiler  ignores  constraints,  then  it's  possible  that  the  user  could  navigate  to  predictions  that  are  not  feasible  and  they  may  not  realize  it.  So  you  could  end  up  in  an  area  that's  not  possible,  not  desirable,  and  you  certainly  haven't  tested  there.  It's  an  extrapolation,  so  this  is  bad.   Then  probably  even  worse,  if  you  want  to  optimize  your  model,  you  could  end  up  with  an  infeasible  optimal  solution.   If  that  happens,  the  user  would  have  to  either  try  to  manually  find  a  feasible  optimal  solution,  which  could  be  really  hard  or  even  impossible,  or  they  would  have  to  use  another  tool. What  were  the  challenges  with  getting  the   Prediction Profiler  to  obey  constraints?  Why  did  it  take  so  much  longer  to  get  these  constraints  in  the  profiler  versus  DOE?  Well,  the  main  reason  had  to  do  with  the  constrained  optimization.  The  desirability  function  is  a  nonlinear  function.   That  means  that  our  optimization  has  a  nonlinear  objective  function  and  possibly  both  continuous  and  categorical  factor  variables  that  could  be  involved  in  constraints.  This  is  known  as  a  mixed  integer  nonlinear  programming  problem,  and  it's  an  extremely  difficult  type  of  optimization  problem  unless  you  know  something  favorable  about  your  objective  function  or  your  constrained  region.  It's  just  very, very  hard. But  good  news,  the   Prediction Profiler  now  works  with  all  the  same  constraints  as  the   DOE platforms.  Turns  out  that  the   Prediction Profiler  has  actually  obeyed  linear  constraints  on  continuous  variables  all  the  way  back  to  JMP  8,  j ust  a  couple  of  releases  after  they  were  added  in  JMP  6.  We  were  able  to  do  this  sooner  because  these  constraints,  these  linear  constraints  on  continuous  variables  have  really  nice  properties.  B ecause  of  that,  we  were  able  to  implement  a  wolf- reduced  gradient  variant  algorithm.  That  algorithm  does  a  really,  really  good  job  of  finding  the  global  optimum,  especially  if  you  don't  have  categorical  variables.  In  that  case,  you  should  find  the  global  optimum. Now,  since  JMP  16,  the   Prediction Profiler  now  also  obeys  disallowed  combination  constraints  on  both  continuous  and  categorical  variables.  Now,  this  was  a  lot  harder  because  these  constraints  are  very  general.  You  could  put  absolutely  anything  inside  that  JSL Boolean  expression.  So  we  cannot  assume  anything  favorable  about  our  constrained  region  in  these  cases.  Thus,  we  had  to  implement  a  genetic  heuristic  algorithm,  which  is  a  very  general  type  of  algorithm  for  the  constrained  optimization.  B ecause  of  this,  we  can't  guarantee  a  global  optimum  solution.  But  you  should  find  a  solution  that's  very  close  to  global  optimum,  if  not  the  global  optimum. Let's  go  ahead  and  start  looking  at  some  examples.  First,  we're  going  to  look  at  a  chemical  reaction  experiment.  This  experiment  has  one  response,  and  the  goal  is  to  maximize  yield.  We  have  three  factors.  Two  of  them  are  continuous,  time  and  temperature,  and  catalyst  is  categorical. We  have  two  constraints.  When  catalyst  B  is  used,  temperature  must  be  above  400.   When  catalyst  C  is  used,  temperature  must  be  below  650.  W e  used  Custom   DOE to  create  a  response  surface  design  with  dis allowed combinations.   Because  these  are  fairly  simple  constraints,  we  were  able  to  use  the  disallowed  combinations  filter. You  can  see  here  that  when  I  set  catalyst  to  B,  temperature  cannot  be  below  400.  This  is  my  first  disallowed  region.   Also  if  catalyst  is  C,  the  temperature  cannot  be  above  650.   Then  once  we  created  the  design,  you  can  see  that  the  design  points  stay  out  of  the  constrained  regions  that  are  gray  here.  These  are  the  dissolved  combinations  regions. Then  we  ran  the  experiment  and  we  used  Fit Least  Squares  to  fit  a  response  surface  model  to  the  data. Now,  I  want  to  show  you  how  you  would  use  the   Prediction Profiler  to  explore  the  model  and  find  the  maximum  yield.   I'm  going  to   go  get  out  of  PowerPoint  and  go  to  JMP  really  quickly.   Here  is  the  data  table  from  the  chemical  reaction  experiment  that  we  created  with  JMPs  custom  design  platform.   We've  already  run  it  and  entered  all  the  results.  The  important  thing  I  want  to  point  out  is  that  Custom   DOE added  this  data  table  script  called  Disallowed  Combinations.   When  you  open  it  up,  you  see  it's  got  the  JSL  Boolean  expression  of  my  disallowed  combinations.  And  this  is  what  the   Prediction Profiler  reads  in,  and  that's  how  it  knows  about  my  disallowed  combinations  constraints. I've  already  saved  my  response  surface  model,  and  I'm  going  to  run  it  and  go  down  to  the  profiler.   Because  I  have  that  disallowed  combinations  constraint  saved  in  the  table,  it's  able  to  read  those  in  and  the  profiler  can  obey  the  constraints. I f  I  set  catalyst  to  B,  notice  that  I  cannot  get  to  a  temperature  400  or  below.  If  I  set  catalyst  to  C,  I  cannot  get  to  a  temperature  650  or  above  because  those  are  disallowed  regions.  Also,  when  I  maximize  yield,  I  end  up  with  a  solution  that  is  feasible,  it's  not  in  a  disallowed  region. Now,  what  would  have  happened  in  a  version  of  JMP  prior  to  JMP  16?  Well,  we  can  see  what  would  have  happened  by  looking  at  the  exact  same  data  table  without  the  disallowed  combination  script.  I'm  going  to  run  the  same  exact  model  and  go  to  the  profiler.  Now  this  time,  the  profiler  doesn't  know  about  my  constraints.  So  when  I  set  catalyst  to  B,  I  can  go  down  into  a  disallowed  region  down  to  350.  Catalyst  C,  I  can  wander  up  into  another  disallowed  region,  temperatures  above  650.  W hen  I  do  the  optimization,  I  do  end  up  with  an  infeasible  solution.  I'm  in  the  disallowed  region  where  catalyst  is  C  and  temperature  is  750. I  would  be  forced  to  have  to  try  to  manually  find  a  feasible  solution  that's  not  in  a  disallowed  region.  But  thankfully,  that's  been  solved  since  JMP  16.   Let's  go  clean  up  and  let's  go  to  another  example. Okay,  the  next  example  we're  going  to  look  at  is  a  tablet  production  experiment.  The  goal  of  this  experiment  is  to  maximize  dissolution.  We  have  five  factors.  Four  are  continuous  and  one  is  categorical.   We  have  two  constraints.  The  first  constraint  is  that  when  screen  size  is  3,  mill time  has  to  be  below  16,  and  my  spray  rate  and  coating  viscosity  follow  this  nonlinear  constraint. I  used  Custom   DOE next  to  create  a  response  surface  design  with  disallowed  combinations  using  these  two  constraints.  Because  this  is  a  complicated  constraint,  we  could  not  use  the  disallowed  combinations  filter,  so  we  had  to  enter  it  as  a  script,  which  is  not  hard  to  do.  Here's  where  I've  entered  that  nonlinear  constraint  as  a  script.  Notice  I've  flipped  the  inequality  to  show  what's  disallowed  instead  of  what  should  be  allowed. T hen  I've  also  added  screen  size  equals   3 and  mill  time  greater  than  16  as  the  other  disallowed  region. Now,  we  can  see  by  looking  at  two  different  slices  of  my  design.  This  first  graph  is  spray  rate  versus  coating  viscosity.  I  can  see  that  all  the  design  points  stay  out  of  the  disallowed  region  set  by  this  nonlinear  constraint.  W hen  I  look  at  screen  size  versus  mill  time,  when  screen  size  is  3,  m ill time  cannot  be  above  16.   Then  we  ran  the  experiment,  and  we  used   Fit Least Squares  to  fit  a  response  surface  model  to  the  data.  N ow  we're  going  to  use   Prediction Profiler  to  explore  the  model  and  find  the  maximum  dissolution. I'm  going  to  go  back  to  JMP.  This  is  the  tablet  production  experiment  that  was  produced  by  JMP's  Custom   DOE platform.  N otice  that  once  again,  it  has  saved  the  disallowed  combinations  data  table  script  to  the  table.   I'm  going  to  look  at  that.  You  see  that  it's  the  JSL  Boolean  expression  of  my  dis allowed combinations,  and  this  is  what  the  profiler  will  read  in.  I've  saved  the  response  surface  model  to  the  table.   When  we  go  to  the  profiler  to  explore  the  model,  you  can  see  that  it  obeys  my   disallowed combinations  constraint.  When  screen  size  is  3,   mill time cannot  be  above  16. Also,  spray  rate  and  coating  viscosity  obey  that  nonlinear  inequality  constraint.   When  I  maximize  the  solution,  I  end  up  with  an  optimal  solution  that's  feasible  and  notice  that  it's  actually  on  the  constraint  boundary.  T hat  tells  me  that  if  I  had  not  been  recognizing  the  constraints,  I  almost  certainly  would  have  ended  up  with  an  optimal  solution  that  wasn't  feasible,  and  I  would  have  had  to  try  to  manually  find  it,  which  would  have  been  very  difficult,  if  not  impossible. All  right .  Let's  move  on  to  the  next  topic.  Here  we  go.  Back  to  PowerPoint.  Okay .  Our  next  topic  is  operating  region  optimization  for  critical  quality  attributes.   This  is  where  I'm  going  to  introduce  the  new   Design Space Profiler  that's  new  to  JMP  17. What  do  we  mean  by  design  space  when  we're  talking  about  the   Design Space Profiler?  Well,  this  is  an  important  concept  that's  used  in  pharmaceutical  development  that  identifies  the  optimal  operating  region  that  gives  maximal  flexibility  of  your  production  while  still  assuring  quality.  This  concept  was  introduced  by  the  FDA  and  the  International  Conference  on  Harmonization  when  those  agencies  decided  to  adopt   Quality  by  Design  principles  for  development,  manufacturing,  and  regulation  of  drugs.  W hen  they  did  that,  they  put  out  some  really  important  guideline  documents,   ICH Q8-Q12,  that  most  drug  companies  follow. Specifically,  we  want  to  look  at   ICH Q8 ( R2) ,  which  covers  design  space.  It  defines  design  space  as  the  multidimensional  combination  and  interaction  of  material  attributes  and  process  parameters  that  have  been  demonstrated  to  provide  assurance  of  quality. Now,  there  are  a  number  of  steps  that  need  to  be  taken  to  determine  design  space  for  a  product,  and  several  of  them  need  to  be  done  before  you  can  get  to  the   Design Space Profiler  and  JMP.   One  of  the  first  things  that  you  need  to  do  is  you  need  to  determine  what  your  critical  quality  attributes  are  and  what  the  appropriate  spec  limits  are  to  maintain  quality.   We'll  refer  to  these  critical  quality  attributes  as  CQAS.  The   ICH document  defines  a  critical  quality  attribute  as  a  physical,  chemical,  biological,  or  microbiological  property  or  characteristic  that  should  be  within  an  appropriate  limit,  range,  or  distribution  to  ensure  the  desired  product  quality.   This  is  the  important  first  step. Next,  we  want  to  use  designed  experiments  to  determine  what  are  our  critical  manufacturing  process  parameters  that  affect  those  critical  quality  attributes.   We'll  refer  to  these  as  CPPs,  critical  process  parameter,  because   ICH Q8  defines  a  critical  process  parameter  as  a  process  parameter  whose  variability  has  an  impact  on  a  critical  quality  attribute  and  therefore  should  be  monitored  or  controlled  to  ensure  the  process  produces  the  desired  quality.  Then,  once  you've  determined  your  CQAs  and  your  CPPs,  then  you  want  to  find  a  really  good  prediction  model  for  your  CQAs  in  terms  of  your  critical  process  parameters.  Once  you've  done  all  of  that,  you  can  use  the   Design Space Profiler  to  determine  a  good  design  space  for  your  product. Let's  talk  a  little  more  specifically  about  the   Design Space Profiler  and  JMP.  The  goal  of  the   Design Space Profiler  is  to  determine  a  good  design  space  by  trying  to  find  the  largest  hyper rectangle  that  fits  into  the  acceptable  region  that's  defined  by  your  critical  quality  attribute  specifications  applied  to  that  prediction  model  that  you  found.  Once  you  found  that  hyper rectangle,  it  will  give  the  lower  and  upper  limits  of  your  critical  process  parameters  that  determine  a  good  design  space. The  problem  is  that  that  acceptable  region  is  usually  non linear,  and  finding  the  largest  hyper rectangle  in  a  non linear  region  is  a  very, very  difficult  mathematical  problem.   Because  of  that,  we  wonder  how  does  the   Design Space Profiler  actually  determine  Design  Space  then?  Well,  instead  of  trying  to  find  the  largest  hyper rectangle  mathematically,  we  use  a  simulated  approach.  What  it  does  is  it  generates  thousands  of  uniformly  distributed  points  throughout  the  space  defined  by  your  initial  CPP  limits.  Then  it  uses  that  prediction  model  that  you  found  to  simulate  responses  for  your  CQAs.   Note,  because  your  prediction  model  is  not  without  error,  you  should  always  add  response  error  to  your  simulations. Once  you've  got  your  simulated  set,  it  calculates  an  in-spec  portion,  accounting  the  total  number  of  points  in  that  set  that  are  in-spec  for  all  your  CQAs  from  all  the  points  that  are  within  the  current  CPP  factor  limits.   This  is  easiest  to  see  by  actually  looking  at  an  example  and  going  to  JMP  and  looking  at  the   Design Space Profiler.  That's  what  we're  going  to  do  next. We're  going  to  look  at  an  example  of  a  pain  cream  study.  The  goal  of  this  study  was  to  repurpose  a  habit- forming  oral  opioid  drug  into  a  cream  that  provides  the  same  relief  as  the  oral  drug.  T he  first  thing  that  we  needed  to  do  was  determine  our  critical  quality  attributes  for  this  drug.   We  determined  that  there  were  three  of  them  entrapment  efficiency,  vesicle  size,  and  in- vitro  release.  We  also  needed  to  determine  what  are  the  spec  limits  that  assure  quality.   That's  what  these  numbers  are. Next,  we  ran  experiments  to  determine  which  of  our  manufacturing  process  factors  affect  these  critical  quality  attributes.   It  turns  out  there  were  three  of  them.  They  are  emulsifier,  lipid,  and  lecithin,  and  these  are  the  initial  factor  limits  for  these  CPPs. Next,  we  used  custom  design  and   Fit Least Squares  to  find  response  surface  models  for  our  three  critical  quality  attributes  in  terms  of  our  three  critical  process  parameters.   Once  we  did  all  of  that,  now  we're  able  to  go  to  the   Design Space Profiler  and  JMP  to  determine  a  design  space  for  this  pain  cream.   Let's  go  back  to  JMP. I'm  going  to  open  up  my  pain  cream  study.  T his  was  my  response  surface  model  design  created  in  JMP's   DOE platform.  I've  got  my  design  in  terms  of  my  three  critical  process  parameters  here,  and  these  are  my  three  critical  quality  attribute  responses  here.   The  important  thing  I  want  to  point  out  is  that  for  each  of  these  critical  quality  attribute  responses,  I've  saved  spec  limits  as  column  properties.  T hat  is  because  the   Design Space Profiler  has  to  know  what  the  spec  limits  are  for  your  critical  quality  attributes.  So  if  you  don't  enter  them  as  column  properties,  you'll  be  prompted  to  enter  them  once  you  launch  the   Design Space Profiler,  unless  you've  added  them  here. I've  already  saved  my  response  surface  models  as  a  script.  I'm  going  to  run  that  script.  It  launches   Fit Least Squares,  and  I  have  it  set  up  to  automatically  show  the   Prediction Profiler.   This  is  the  same   Prediction Profiler  that  you're  probably  used  to  seeing.  I  have  my  three  responses,  my  critical  quality  attributes  here,  my  three  critical  process  parameters,  my  factors  here,  and  I  can  explore  the  model  as  usual.  But  now  I  want  to  try  to  figure  out  a  design  space  for  my  manufacturing  process. Now  I  can  easily  do  that  by  going  to  the  production  profiler,  little  red  triangle  menu,  and  several  down.  I  see  there's  a  new  option  for   Design Space Profiler,  and  if  I  select  that  right  below  the   Prediction Profiler,  the   Design Space Profiler  will  appear. As  I  noted,  if  I  hadn't  already  had  spec  limits  attached  to  my  responses,  it  would  prompt  me  for  that.  But  now  I  can  see  that  it's  brought  them  in  from  my  column  properties.  You  can  see  right  down  here.  It's  also  brought  in  an  error  standard  deviation.  These  values  are  coming  from  the  root  mean  squared  error  of  my   Least Squares  models.  Y ou  can  see  here,  RMSE  is  here,  is  the  same  value  for  in-vitro  release  as  the  error  standard  deviation  here.  Of  course,  you  can  change  these,  you  can  even  delete  them.  But  we  highly  recommend  that  you  have  some  error  for  your  predictions  since  your  predictive  models  are  not  perfect,  not  without  error. Okay.  The  first  thing  you  might  notice  about  this  profiler  is  that  it  looks  a  little  different  in  that  each  factor  cell  has  two  curves  instead  of  the  usual  one  curve.  That's  because  we're  trying  to  find  factor  limits.  W e're  trying  to  find  an  interval,  we're  trying  to  find  the  operating  region,  the  design  space  where  we're  optimizing  our  operating  region.   The  blue  curve— we  have  a  legend  to  help  us— this  represents  the  in-spec  portion  as  the  lower  limit  changes,  and  the  red  curve  represents  the  in-spec  portion  as  the  upper  limit  changes. You  can  see  how  if  I  were  to  change  the  upper  limit  of  emulsifier,  it  would  increase  my  in-spec  portion.   That  would  be  a  good  thing.   That's  how  that  works.   Also  the  in-spec  portion,  you  don't  see  the  value  over  here  on  the  left  like  you  usually  do,  but  it's  right  over  here  to  the  right  of  the  cells.  It's  initially,  79.21%  of  my  points  are  in-spec  and  that's  in-spec  for  all  of  the  responses  to  all  of  the  CQAs.  If  you  want  to  see  the  individual  in-spec  portions,  you  can  find  them  down  here  next  to  the  specific  response. Also,  you  can  notice  this  volume  portion  is  telling  me  that  I  am  currently  using  all  of  my  simulated  data  and  that's  because  the  factor  limits  are  set  at  their  full  range  initially.   To  be  able  to  change  the  factor  limits  or  try  to  change  the  operating  region,  you  can  either  move  the  markers  as  usual  or  you  can  enter  different  factor  limit  values  here  in  this  table  or  right  here  below  the  cells  or  you  can  use  these  buttons .  I  really  like  these  buttons.  If  I  click  on  Move  Inward,  it's  going  to  find  the  biggest  increase  in  in-spec  portion.  It's  going  to  find  the  move  that  gives  me  the  biggest  increase.   It's  going  to  find  the  steepest  upward  path .  Move  Outward  would  do  the  opposite.  It  would  find  the  steepest  path  downward. If  I  click  Move  Inward,  notice  that  my  emulsifier  lower  limit  has  increased  from  700  to  705,  and  my  in-spec  portion  has  increased  to  81.95.  If  I  click  it  again,  now  my  lecithin  lower  limit  has  increased  from  30  to  31,  and  my  in-spec  portion  has  gone  up  to  84.5.   I  can  keep  doing  this. But  before  I  keep  doing  this  until  I  find  the  desired  in-spec  portion  that  I  like— and  I'm  happy  with  the  factor  limits,  I  think  it's  a  reasonable  operating  region— there  are  several  options  in  the  Design  Space   Profiler  menu  that  I  like  to  look  at.  The  first  one  is  make  and  connect to  random  table.  W hat  this  does  is  it  creates  a  new  random  table  of  uniformly  distributed  points.  You  always  want  to  add  random  noise.  It's  going  to  use  the  same  random  errors  we  used  before.  I'm  going  to  click  Okay.   Now,  I  get  this  table  of  10,000  new  random  points,  and  they  are  color- coded.  The  ones  that  are  marked  as  green  are  in- spec,  the  ones  that  are  red  are  out  of  spec,  and  the  ones  that  are  selected  are  within  my  current  factor  limits,  my  current  operating  region. It's  useful  to  look  at  the  table,  but  I  really  like  to  look  at  these  graphs  that  are  produced  by  some  of  these  saved  scripts.   If  I  run  Scatterplot  Matrix  Y,  it  will  give  me  a  response  view  of  all  my  data .  The  shaded  region  that's  green  here  is  the  spec  limits. T hen  I  also  like  to  look  at  the   Scatterplot Matrix  X,  which  gives  me  the  factor  space  view.   It's  nice  if  I  can  look  at  them  both  at  the  same  time.  While  I'm  altering  my  factor  limits,  if  I  click  on  Move  Inward  again,  you  can  see  how  the  points  change .  I  find  it  even  more  useful.  You  also  see  how  the  factor  space  changes.  I  find  it  even  more  useful  to  hide  all  the  points  that  are  not  in  my  current  operating  region,   then  I  don't  even  have  to  look  at  them. Now,  as  I  keep  clicking  on  Move  Inward,  you  can  see  how  that  operating  region  is  shrinking.   If  you  only  want  to  be  concerned  with  the  out- of- spec  points,  you  can  click  on  Y  Out  of  Spec,  and  that  will  only  show  the  out -of- spec  points  that  are  occurring.  Notice  that  my  in-spec  portion,  as  I  keep  moving  my  factor  limits  in,  is  increasing . I'm  going  to  keep  going  until  I  either  hit  100%  or  my  operating  region  looks  like  something  I  can't  that  isn't  feasible,  that  I  just  won't  be  able  to  attain.   I'm  going  to  keep  clicking  Move  Inward.  Things  still  look  good.  Move  Inward,  just  going  to  keep  clicking  it.  Okay,  I  hit  100,  and  I  still  think  that  these  factor  limits  represent  an  operating  region  that  I  think  I  should  be  able  to  attain. To  be  able  to  look  at  that  further,  I  can  send  the  midpoints  of  these  factor  limits  to  the  original  profilers,  see  what  that  looks  like.  I  think  that  looks  pretty  good.   I  can  also  send  the  limits  to  the  simulator  in  the   Prediction Profiler,  and  I  can  decide  to  use  different  distributions.  I  actually  think  that  my  critical  process  parameters  follow  normal  distributions.   I'm  going  to  select  this  Normal  with  Limits  at   3 Sigma .  It  turns  on  the  simulator,  and  it  sets  my  distributions  to  normal,  and  it  figures  out  the  mean  and  standard  deviations  for  these  limits  with  Sigma,   3 Sigma. Of  course,  you  can  change  all  these  values  as  you  think  seems  fit  for  your  own  situation,  for  your  own  manufacturing  process.  You  can  change  the  distribution,  you  can  change  the  mean  cedar  deviations.  I'm  just  going  to  leave  it,  and  I'm  going  to  see  what  simulating,  what  the  normal  distributions  looks  like.  It  looks  really  good.  You  can  see  my   defect  rate.  When  I  keep  hitting  Simulate,  it's  often  0. I  also  like  to  simulate  to  the  table  to  be  able  to  just  get  a  view  of  what  my  capability  analysis  would  look  like  just  as  a  sanity  check.  I f  you  come  down  here,  you  can  simulate  the  table,  and  it's  going  to  use  these  normal  distributions  for  the  critical  process  parameters.  It's  going  to  use  the  same  errors  for  your  predictions  as  we  used  before. I'm  going  to  click  Make  Table,  and  when  I  do  that,  it  automatically  creates  some  scripts.  One  of  them  is  distribution.  If  I  run  that,  I  can  very  easily  look  at  my  capability  reports  because  I  saved  my  spec  limits  as  column  properties.   I  see  that  the  capability  looks,  at  least  for  the  simulated  data,  it  looks  really  quite  good.  So  I'm  pretty  happy  with  this,  even  though  this  is  just  on  the  simulated  data.  Of  course,  I  need  to  check  the  real  data,  but  I'm  really  happy  with  what  I'm  seeing  so  far.  I  think  I'm  going  to  use  these  limits  as  my  design  space. Now,  just  to  note  before  I  go  further,  I  have  a  good  situation  here,  but  let's  say  that  you  didn't  have  a  good  situation  where  your  in-spec  portion  wasn't  where  you  wanted  it  to  be,  and  you  really  can't  adjust  your  factor  limits  anymore.  You  could  do  what- if  scenarios  by  changing  your  spec  limits  or  your  errors  if  you  think  that  is  something  that  could  reasonably  happen.  But  I  have  a  good  situation,  and  I'm  happy. I  am  going  to  use  this  option  Save  X Spec Limits, and that's going to save  these  factor  limits  back  to  my  original  data  table,  to  my  critical  process  parameters,  so  I  can  save  these  factor  limits.   When  I  do  that,  when  I  go  back  to  my  original  table,  you  can  see  that  those  factor  limit  settings  have  been  saved  as  Spec  Limits  to  my  critical  process  parameters. I  find  it  really  helpful  to  be  able  to  look  at  this  design  space  in  terms  of  the  contour  responses  and  the  acceptable  region.  I've  already  saved  my  predictions  as  formulas  and  I  have  a  script  saved  to  run  the   Contour Profiler.  I'm  going  to  run  that .  This  is  going  to  give  me  my  contour  responses  for  all  combinations  of  my  factors,  my  critical  process  parameters.   I  don't  know  if  you  can  see  the  faint  rectangles,  but  that  is  my  design  space  as  defined  by  those  factor  limits  that  got  saved  as  spec  limits  on  my  critical  process  parameters.   The  shaded  colored  areas,  these  are  my  spec  limit  response  contours. You  can  see  that  my  design  space  is  nicely  within  an  acceptable  region  for  all  these  contours .  It's  even  further  in.  It's  not  touching  them.   That's  because  we  added  that  error end  for  our  predictions.   I'm  really  happy  with  this. Okay,  let's  get  back  to  PowerPoint.  I  just  want  to  give  you  a  few  takeaways  about  the   Design Space Profiler  before  we  wrap  up. First  of  all,  that  in-spec  portion  that  we  saw  in  the   Design Space Profiler  shouldn't  be  taken  as  a  probability  statement  unless  you  believe  that  your  factors,  your  critical  process  parameter  factors,  actually  follow  a  uniform  distribution  because  that's  what  was  used  to  distribute  them.  Also,  the   Design Space Profiler  is  not  meant  for  models  that  have  a  large  number  of  factors  or  very  small  factor  ranges  because  of  the  simulated  approach  that  it  takes. It's  also  recommended,  as  I've  mentioned  several  times,  to  always  add  random  error  to  your  responses  because  your  prediction  models  are  not  without  error.  And  finally,  I  just  wanted  to  make  a  statement  that  even  though  this  was  motivated  by  pharmaceutical  industry,  it  really  is  applicable  much  further  than  that.  In any  case  where  you  want  to  find  an  optimal  operating  region  and  you  want  to  maintain  flexibility  and  quality,  then  this  can  be  helpful. There  were  many  things  about  the  Design  Space  Profiler   I  didn't  have  time  to  show.   I  really  hope  that  you  will  check  it  out.  Any  questions?
Saturday, March 4, 2023
JMP 17 introduces the Easy DOE platform, providing both flexible and guided modes to users, aiding their design choices. In addition, Easy DOE allows for the DOE workflow from design through data collection and modeling. This presentation offers a preview of the new Easy DOE platform, including insights from a 7-year-old using the new platform on a DOE problem of her choosing.     Hello .  I'm   Ryan Lekivetz ,  Manager  of  the  DOE  and  Reliability  team  at  JMP . And  I'm  all  Rory Lekivetz. We're  here  today  to  talk  to  you  about   Easy DOE .  The  question  is  it  easy  enough  for  a  seven- year- old? N ow  that you're eight  years  older  you're a lot  wiser  to  answer  that  question. For  those  of  you  who  don't  know  about   Easy DOE, so  it's  a  new  platform  and  JMP  17 .  Now ,  the  idea  with  Easy DOE  is  it's  going  to  be  a  new  file  type  that  encompasses  the  design  through  the  analysis  of  a  designed  experiment .   No  more  do  you  need  to  worry  about  splitting  up ,  going  from  the  DOE  platform  to  a  data  table  and  then  running  the  analysis  separately . Now ,  the  idea  with  Easy DOE  is  that  we're  trying  to  aid  novice  users  through  that  entire  workflow .   There's  going  to  be  a  guided  mode  where  we've  tried  to  add  hints  and  useful  defaults  to  guide  those  users  while  at  the  same  time  having  a  flexible  mode  for  those  who  are  more  comfortable  with  Easy DOE . Now  before  we  started  doing  this  idea  with  Easy DOE  and  running  our  experiment,  I  did  talk  to  Rory  about  the  daily  workflow .  I f  you  open  up  the  DOE  documentation ,  we  outline  this  idea  of  a  Easy DOE  workflow  which  goes  through  the  described  phase ,  which  is  where  we  identify  the  goal  and  the  responses  and  the  different  factors,  specify  where  we're  looking  at  our  model .  We  create  the  design ,  collect  the  data ,  fit  a  model  to  that  data ,  and  then  use  that  model  to  predict . Right  now ,  if  you  think  about  the  way  traditionally  we've  done  this  in  JMP,  at  that  design  phase  is  where  we  create  the  data  table .   Using  that  data  table  the  experimental  go  and  collect  the  data  and  then  perform  the  remaining  steps ending it.  N ow  what  you'll  see  in  Easy DOE ,  there's  the  tabbed  interface  where  each  tab  represents  one  of  these  steps in  the DOE  workflow . Now  what  was  the  experiment  that  we  did ? Paper  airplanes . Rory  had  found  a  website  that  talked  about  different  ways  to  create  paper  airplanes .  You  want  to  tell  them  what  was  the  response ?  What  were  you  trying  to  measure ? We  were  trying  to  measure  the  distance  which  was  inches . What  factors  did  you  end  up  deciding  that  we  could  change ? For  factors  we  decided  on  war  plane  type,  paper  type,   flying force  and  paperclip . Yeah .  Now  you  to  tell  them  about  some  of  these  different  tabs .  Okay ,  so  let's  start .  What  was  the  define  tab ? The  define  tab  was  where  you  got  to  choose  your  factors  and  your  responses . That's  right .   I  should  mention  here  as  well  that  when  we  were  using  Easy DOE ,  I  left  Lory  in  control  of  the  entire  platform .   She  launched  it .  She  was  the  one  entering  everything  and  clicking  between  tabs  and  all  of  that .  I  think   after  the  define  tab ,  we  moved  to  the  next.  What  was  that  next  tab? Model. F or  the  model  tab ,  you  had  to  choose w hich  one  of  these  four   was  the  best  for  your  experiment. Now  I'll  say  too,  on  this  one  this  is  where  we  had  to  talk  a  little  bit  more  about  what  these  different  model  types  mean .   Of  course ,  for  a  seven- year- old  and  even  an  eight- year- old ,  now  that  idea  of  understanding  interactions  can  be  a  difficult  thing . Now ,  the  main  effect  versus  the  interaction .  One  of  the  nice  things  was  the  website  that  we  had  found  about  creating  paper  airplanes .  It  talked  about  how  some  of  the  different  types  of  paper  airplanes  do  better  when  you  throw  it  hard  versus  light . It  already  had  discussed  that  idea  of  interactions,  so  that's  why  ultimately,  I  helped  her  decide  on  picking  that  two- factor  interaction  model  with  the  main  effects . Once  we  had  that  model,  then  what  happened? Then  was  the  design .  The  design  shows  you  what  you're  going  to  be  making .   Since  we  were  doing  paper  airplanes  and  we  entered  the  factors  for  tight  paper  throwing  for  some  paper  clip ,  then  it  sounds  like  different  types ,  different  papers,   different  throwing  forces  like  route  and  paper  clip  or   no paper clip. Yes ,  I  think  we  made  the  16  different  paper  airplanes  and  so  each  one  was  a  different  one .  I  think  we  put  a  number  on  it .  Is  that  right ?   We  label  it  with  a  number  one .  Yeah .   Then  what  happens  after  we  have  that  design,  what  do  we  do  with  that ? Then  we  do  good  data  entry .   With  data  entry  is  where  you  enter  in  how  many  inches you want. Yeah .   I  think  we  went  outside and  we  took  those  paper  airplanes  and  we  flew  them  anyway  and  then  just  measured  that .  Yeah ,  that's  right .  W hat  happened   after we had  that  data  entry ? Then  we  go  to  be  analyzed .   Analyze  is  where  you  figure  out  which  ones  were  the best. Yeah ,  which  ones  really  were  impacting  that  distance  flown .  Now ,  I  should  mention  here ,  so  this  is  a  novel  thing in Easy DOE .  The  confidence  intervals  for  each  of  our  different  effects  are  clickable . All  right,  I  had  actually  thought  that  this  was  going  to  be  a  really  difficult  time  to  talk  about  the  analyzed .  But  as  soon  as  we  got  there,  so  it  starts  with  all  of  the  terms  in  the  model  and  very  quickly  figured  out  that  you  could  click  on  them .  She   looked  at  the  ones  that  were  close  to  zero  and  just  removed  them . On  the  top  was  actually  her  model .   She  actually  picked  a  much  simpler  model  than  even  what  the  best  model .  You'll  notice  there  is  a  best  model  one .  But  one  could  argue  that  I  actually  might  even  prefer  her  model  to  the  one  that  was  picked  by  the  best  model . But  again ,  still  a  very  nice  way  to   play  around  with  your  model  and  see  what  happens  if  term  enter o r  are  removed  just  by  clicking  on  those  confidence  intervals .  T hen  after  we  moved  to  the  analysis ,  what  was  that  tab they had? The  predict  tab  was  where  you  could  see   which  types  or  which t hings   were the  best .   The  best  look  like  it  would  probably  be  a  dirt  metal  construction  and  differently  for  us  since  it  was  like,  the   hard in blue light   was  like… Did  it  matter or  not  really? Not  really .   It  would  be  like  you  could do hard or light  in  your  paper . I  should  mention  here,  so  it  was  interesting  to  see  she  hadn't  really  seen  the  prediction  profiler  so  much  before .  I  mean ,  wasn't  familiar  with  it .  S he  did  have  to  be  told  to  click  in  there  to  see  what  happens .  But  even  for  a  seven- year- old ,  it's  interesting  to  see  once  they  have  that  sense  that  they  can  click  within  that  prediction  profiler ,  she  was  really  able  to  get  the  hang  of  it . J ust  some  final  thoughts  from  Easy DOE .  I  asked  Rory  a  few  questions  ahead  of  time .   What  would  you  like  to  tell  people  about  Easy  DOE ? It  was  really  fun . Yeah .   If  you  were  to  do  this  experiment  again ,  would  you  change  or  what  would  you  change ? The  factors ,   maybe  different  days  for  the   weather . You think like  it  might  be  windy  on  some  days  and  not  on  others .  F or  my  own  perspective ,  you  know ,  so  she  was  actually  able  to  complete  this  with  minimal  help  from  me .  I  mean ,  she  was  in  control  the  entire  time of the  Easy DOE  platform .   A  lot  of  these  different  choices  she  was  making  on  her  own .  Even  when  it  came  to  the  factors  that  she  picked . As  well   she  actually  did  help  us  find  some  usability  issues .   There  were  pieces  like  in  the  design  tab  that  I  think  we  improve  throughout  because  of  users  trying  this  out  not  just  for  her ,  but  as  well  as  other  users  that  we  had  in  the DOE  program .  The  model  she  definitely  needed  help  with ,  but  the  analysis  was  easier  than  expected . Just  some  references ,  acknowledgments .   Really ,  I  just  want  to  thank  all  the  members  of  JMP  that  helped  in  the  development  of Easy DOE .   There's  a  huge  list  that  you'll  actually  see .  We  have  a  Discovery  America  presentation  there  as  well ,  where  we  talk  about  this  in  a  little  bit  more  detail .  Again ,  all  the  feedback  from  external  and  internal  users  that  have  seen  this  before  the  release  of  17  and  since  it's  been  released . Thank  you  for  your  time  and  joining  us  today .   We  hope  you'll  join  us  during  Discovery  where  we  can  discuss  this  poster .  Not  sure  yet  if  you'll  be  able  to  join  us ,  but  I  definitely  will  be  and  hopefully  you as  well .  Thank  you . Thank  you .
The powerful Reliability Platform in JMP® is often overlooked and underutilized. This talk demonstrates some of the basics of the Reliability Platform in JMP by answering the seemingly simple question, "How long will my retirement endure?" Why guess at this extremely important figure when planning for retirement? Use JMP to explore this question! This was accomplished using Reliability and Survival Fit Life by X and historical data from retirees. Uncertainties of this prediction were also quantified. The optimum retirement age was addressed, considering that retiring earlier draws less income for a longer period. In addition, JMP was used to visualize the employer's retirement tool to optimize the most financially desirable retirement age and explore the most enjoyable chapter of life — retirement!     Hello ,  my  name  is  Don  Lifke ,  and  I'm with  Sandia  National  Labs ,  I'll  be  presenting  on  Reliability  Platform  in  JMP .  The  example  I'll be  using  has  to  do  with  jumping  into  retirement .  Sit  back  and  enjoy .  And  let's  see  where  this  leads  us . A  little  bit  about  Sandia.  We  are ,  federally  funded  research  government  owned,  contractor  operated  facility .  I 've  been  at  Sandia  since  about  2005 ,  roughly.  I'll  tell  you  a  little  bit  about  what  we  do at Sandia.  We  work  primarily  on  nuclear  deterrence .  Six  major  programs  that  we're  working  on  right  now,  I  won't  go  into  the  details  of  these ,  but  all  of  these  programs  that  you  see  on  my  screen  are  programs  that  I  worked  on  and  have  actually  applied  JMP to  all  of  these  as  well .  The  reason  I'm  presenting  data  on  retirement  is  because  a  lot  of  the  stuff  that  I  do  is  just   too  sensitive  and  I  can't  present  it  in  this  particular  environment .  So,  we're  going  to  use  some  fun  data  that  everybody  can  relate  to .   Naturally ,  this  is  a  recording ,  so  you  hold  your  questions  to  the  end . But  this  is  the  same  presentation  I'll  be  giving  live  at  the  conference . L et's  talk  about  retirement .  Why  is  it something  that  is  so  certain  but  has  so  much  uncertainty  in  it  that  makes  it  a  little  bit  hard  to  plan ?  I'm  not  sure  I  would  want  to  be  without  that  uncertainty .  I   prefer  a  little  uncertainty  in  that .  It  does  make  planning  for  retirement  a  little  bit  difficult .  But  I  guess  on  the  bright  side ,  if  I  were  a  cat ,  it  would  be  even  more  complicated. By  the  way ,  that  the  GATO  here  is  a  repairable  system ,  and  that's  a  whole  new  topic  in  general .  We're  just  going  to  be  doing  systems  that  are  not  repairable ,  like  cats .   What  we  can  do  is  we  can  use  the  Reliability  and  Survival  tool  in JMP.  Some  of  the  screenshots  that  you  see  maybe  are from  an  older  version  of  JMP.  This  was  created  pre-COVID .  Now  that  we  are  on  newer  versions  of  JMP there  might  be  some  slight  differences  in  what  you  see  and  what  I've  read  from  the  screenshots ,  but  should  be  fairly  similar  anyway . But  we're  going  to  be using  the  Reliability  and  Survival  and  the  Life  Distribution  and  Fit  Life  by X  in  JMP  to  do  some  analysis  of  some data. I'm not   sure  about  where data  is  coming  from .  At  Sandia ,  we  have  a  Lab  News  that  comes  out,  every  couple  of  weeks  and  in  that   Lab  News  they  like  to  post  retiree  deaths  of  the  people  that  we  worked  with  for  a  long  time,  just  to  let  us  know  that  they've  moved  on  and  gone  to  a  big  R&D  facility  in  the  sky . So,  I  thought  I  could  use  that  data  to  help  me  plan  for  my  retirement .   I  grabbed  some  of  that  data  and  I  took  data  from  four  different  periods ,  of Lab  News  archives .  I  pulled  some  data  from  2001 ,  2007 ,  2013  and  2018 .   The  number  of  data  points  that  I  grabbed  from  each  of  those  is  here .  We'll  look  at  some  of  these  data  and  see  how  things  are  changing  through  time  as  well .  But  anyway ,  the  number  of  data  points  is  pretty  significant . This  little  clip ,  this  little  picture  on  the  right  is  a  little  trivia  there .  If  you  look  real closely  at  the  violinist  on  the  left ,  some  of  you  might  recognize  him  or  let's  leave  that  up  to  you  to  figure  out .  Maybe  I'll  sing  it .  It's  actually  my  partner ,  Claire .  She's  a  pharmacist .  But  also  a  beautiful  opera  singer . T he  nice  thing  about  using  the  retiree  death data  for  me  is  that  that  population  is  a  better  representation  of  my  lifestyle .  They  tend  to  take  less  risks ,  lead  little  more  conservative  lifestyle  and  a  similar  income  and  education  level ,  of  course ,  where  the  best  and  the  brightest  are left at  the  probably  at  the  bottom  of  that  list ,  but  at  least  I'm  in  that  group .   It's  an  honor  to  be  working  with  the  folks  at  Sandia .   Another  nice  thing  about  using the  retiree death data  is,  it  only  includes  those  who  actually  made  it  to  retirement .   I  don't  really  care  about  the  data  for  not  making  it to  retirement .  And  I  apologize  to  my  kids .   I'm  in  Albuquerque ,  New  Mexico  right  now . For  those  of  you  who  love  Breaking  Bad  and  know  that  Breaking  Bad  was  primarily  filmed  here  in  Albuquerque ,  and  so  I'm  a  big  fan  of  the  show ,  so  I've  got  to  put  a  little  bit  of  fun  stuff  in  the  presentation .  That's why I  threw  a  little  bit  of  Albuquerque  reference  in  there  for  those  of  you  who  are  Breaking Bad  fans  as  well .  And  some  Better  Call  Saul  stuff  to . Let's  go  right  into  JMP  and  start  doing  some  of  the  analysis .  I'm  going  to  bailout  of that  and  I'm  going  to  open  up  my  data  file .  Let  me  see  some  screens  around  here . It'll  take a  second  to  load  on  your  screen .  You  should  be  seeing  my  data  file .   What  you  see  are  the  columns  of  age  in  the  Lab  News .  That's  the  newspaper  that  I  got  the  data  from.  The  bi- weekly  date  and  I  broke  that  down  to  year .  And  these  other  columns  I'll  talk  about  in  a  little  bit .  Right  now ,  let's  just  look  at  the  age  and  the  year .   Let's  just  look  at  the  distribution  of  the  data  and  see  what  it  says . If  we  analyze  the  distribution  of  age ,  I'm  just  going  to  analyze  distribution  of  age  in  general .  If  we  look  at  that ,  this  is  what  the  distribution  looks  like .  You  can  see  a  sort  of  a  skewed  left  distribution  and  we  can  really  see  a  little  more  detail  on  this  if  we  go  into  some  of  the  display  options  in  JMP. This  data  actually  best  fits  Weibull  distribution ,  a  two  parameter  Weibull  distribution ,  which  I'll  show  on  some  of  the  future  screens .  But  what  we  want  to  do  is  take  this  data  and  use  the  Life  Distribution  to  look  at  it . I'm  going  to  drag  over  my   PowerPoint  here  and  show  you  that  I  set  these  distributions  in  the  interest  of  time  rather  than  just  actually  doing  it .  L ooking  at  the  distributions  of  the  four  years ,  the  four  categories  of  years ,  which  are ,  2001 ,  2007  and  2013 ,  2018 ,  all  of  these  fit  a  Weibull  distribution  fairly  well .   I  wanted  to  check  my  assumption  when  I  start  analyzing  the  data  in  Life  Distribution . Then   I fit  them  to  a  Weibull  distribution .  Let  me  open  up ...  Let  me  give  you  a  little  bit  of  background  on  the  Weibull  distribution ,  that  will  help  you  understand  some  of  the  data  that I'm  going  to  show .  This  is  actually  a  Weibull  distribution...  Weibull  distributions  generated  using  JMP's  formulas ,  actually .   I've  taken  distributions  with  different  health  plans  and  different   pay DOEs  just  to  show  you  what  the   alpha betas  do  with  the  Weibull  distribution .   Let  me  run  my  script  on  this  and  show  you  what  these  look  like . Basically,  I'm  just  doing  a  Fit  Life  by  X  in JMP ,  and I  save  the  script  to  the  data  table .   This  is  primarily  just  to  show  you  what  happens  with  Weibull  distributions  when  you  turn  on  a  local  data  filter  here  and  show  you  what  the  parameters  do  with  different  alphas  and  different  betas .  If  you  have  Weibull  distributions  where...  I'm  going  to  choose  the  three  that  have  a  beta  of  one.  The  beta   Weibull  distribution,  essentially  determines  this  is  the  spread  of  the  distribution  you  think  of  in  terms  of  standard  deviation . When  you  plot  the  data  on  a  Weibull  curve ,  you  will  see  that  they  all  are  parallel  lines  and  they're  basically  just  scooting  across  the  X  axis  here .  The  beta  is  the  slope  of  this  line ,  which  is  the  spread  of  the  data .  If  I  look  at  three  curves  that  have  the  same  alpha  but  different  betas ,  you'll  notice  that  they're  all  centered  at  about  the  same  point .  What  the  beta  is  doing  is  changing  the  spread .  Alpha  is  changing  the  location  of  this. They  all  across  this  point ,  which  is  actually  at  0.632 .  I'll  talk  about  that  here in  a  little  bit .  But  though  the  characteristic  lifetime  is  the  alpha ,  and  that's  the  point  where  your  line  crosses  63.2% ,  which  comes  out  to  be  one  minus  one  over  80  if  you  want  to  get  into  the  proof  of  it .   The  beta  is  the  spread  of  the  data,  the  alpha  is  the  location  of  the  data.  Scale  and  location, it's  sometimes  what  they  call  it .  Just  a  little  background  on  that .   I'll  swing  this   PowerPoint  back  over  here  and   show  you  and summarize  slide  what  I  mean . These  plots  have  the  same  alpha .  They're  all  centered  at  the  same  place .  The  three  that  have  the  smaller  rectangle  around  them  are  the  same  beta .  So,  they  have  the  same  spread ,  but  they're  located  differently  because  they  have  different  alphas .   Just  a  little  brief  tutorial  on  what  the  beta  distribution  parameters  do  to  the  curve . Let's  just   look  at  all  the  data  on  a  close  up.  Example ,   probable  curve  data ,  and  let's  just  look  at  the  age  parameters .  If  we  analyze  the  Reliability ,  Survival  and  just  look  at  the  Life  distribution  of  age ...  What  we  see  is  linear  scales  on  the  X  and  Y ,  but  we  can  actually  determine  which  distribution  system  fits  here .  And  what  we  find ,  of  course ,  is  that  the  Weibull  fits  best .   This  is  what  the  Weibull  distribution  looks  like,  on  all  of  that  data  crammed  together .   Using  the  distribution  profiler ,  I  can  actually  manually  scoot  this  over  and  say,  I  want  to  be  90%  sure  I  don't  run  out  of  money . I  better  plan  on  a  retirement  age  of  roughly  92  or  so .   About  92  to  be  90%  sure  I  don't  run  out  of  money  every  time .  That  was  really  the  focus  of  this  day  in  trying  to  help  me  plan  my  retirement age .  You  know ,  they  always  tell  you  how  long  you  want  to  plan  your  retirement  for .   Of  course ,  we  don't  know .   Usually  we  just  take  a  wild  guess ,  but  this  is   a  little  bit  of  data  driven  decision  making  going  on  here . What  if  we  look  at  these  through  the  four  different  years ,  we  can  use  the  same  platform ,  the  Reliability  Survival  or  Fit  Life  by  X.   We  can  look  at  the  age  versus  the  actual  year  here .   Let's  see  how  the  years  look  differently .  Let  me  turn  on  the  density  curves  here  so  you  can  see  where  those  fall .   I want this to be a Weibull  distribution .   We  can  also  turn  on  quantile  lines  to   see  how  things  change  through  time .   This  is  the   10,  15 ,  98%  quantile line . It  looks  like  maybe  maybe  from  2001  to  2007  things  have  change  a  lot ,  but  about  2013 ,  maybe  the  people  are  living  a  little  bit  longer  based  on  the  2013  and  2018  data .  We  go  down  here  and  look  at  the  details  of  this  Fit Life by X .  You  can  see  the  plots  all separately  here .  These  little  profilers  are   fun  to  mess  with  here .  You  can  see  the  age ,  the  probability  of  failure .  Looks  like  I've  got  the  long  run  and  the  probability  of  failure  is...  I'll  just  put  a  90  here . But  we  can  see  that  it  looks  like  we  are  getting  a  little  bit  healthier  because  this  Probability  is  going  down  through  time ,  at  least  for  90  years  old .  There's  a  lot  of  stuff  in  here  that  you  can  tinker  around  with .  I  don't  have  the  time  to  show  you  all  of  that ,  but  what I  want  to  get  down  to  here  is ...  This  location  scale .   This  little  test  here  is  telling  me ,  are  my  locations  different ,  assuming  we  don't  have  the  same  data ,  the  same  failure  mechanism ,  which  in  theory  we  do ,  right ? We  all  have  the  same  failure ,  but  the  physics  of  the  human  like  failures  should  be  constant .  And  so  our  data  shouldn't  change  much .  But  assuming  we  have  the  saving  betas  and  the  alphas  changing ,  in  other  words ,  is  the  location  of  these  changing ?  And  the  data  says ,  yeah ,  that  we're  rejecting  the  null  hypothesis  that  the  data  are  not  scooting  across  this  X  axis  through  the  years .  There's  a  change .  Now  looking  at  the location  and  scale ,  it  looks  like  that's  marginal . I'm  going to talk  more ,   when  we get  down  to  the  Weibull  here,  the  actual  Weibull  data .   The  Weibull   looking  at  is there  a  difference  in  the  beta ,  in  the  slopes  or  in  the  spread  of  the  data  year  to  year.  It's  right  on  the  edge  of  being  rejecting  the  null  hypothesis  that  they're  located ,  that  they're  actually  the  same  on  the  slopes .   If  you  want  to  look  at  different  distributions  through  the  years ,  you  can  actually  do  a  statistical  test  to  see  if  things  are  changing  through  time ,  the  distributions  of  your  reliable  data . Right .  So,  what  am  I  forgetting ?  Let  me  slide  back  over  here  to  the   PowerPoint .   What  I'm  not  doing ,  I'm  not  considering  censored  data .  I  don't  have  the  data  for  everybody  who's  still  alive .  The  retirees  are  still  alive ,  but  I  just  don't  have  access  to  it .  So,  I  thought ,  well ,  what's  that  going  to  do  to  my  analysis ?  Well ,  I  can   play  around  with  the  data  I  have  and  go  back  in  time .  For  example ,  I  can  go  back  to  2007  and  I  can  treat  the  2013  and  2018  data  as  censored  data  because  I  know  that  those  retirees  are  still  alive .   Just  to  see  how  it  affects  my  analysis ,  the  fact  that  I'm  not  including  censored  data .   I  did  do  that .  I  went  back  and  messed  around  with  that .  I'll  show  you  a  little  bit  of  it ,  but  I  didn't  really  find  out  much .   As  our  great  friend  George  Box  that  models  around  the  practical  questions ,  how  wrong  do  they  have  to  be  to  not  be  useful ? And  I  really  didn't  find  anything  useful  in  doing  that .  But  I  will  show  you  at  least  what  I  did .  To  convince  myself  that  not  having  that  the  data  for  the  retiree  still  alive,  is  not  a  problem .   If  we  look  at  the ...  Let  me  tabulate  my  data  real  quick .  If  I  look  at  the  age ,  the  year  of  the  data  versus  whether  or  not  I  suspended  the  data ,  what  I  did  is  I  took  the  2018  data  and  suspended  it  and  worked  my  way  back  to  2013 . We  did  an  analysis  on  2013  using  the  three  years  of  data  and  then  using  the  2018  data  as  suspended  data .  Of  course ,  I  had  to  take  the  data   from  2018  and  I  had  to  subtract  five  from  it  because  that  would  be  the  age  in  2013 . I n  order  to  include  that  suspended  data ,  I  had  to  adjust  their  age  accordingly  as  well .  But  I  also  did  that  for  2007 .   Let  me  show  you  in  tabular  format ,  I  created  a  column  called  Suspend  2007  where  I  took the  data  from  2018  and  2013  and  also  suspended  it . I  took  out  my  data  of  known  deaths  and  half  my  data  as  suspended  data ,  people  who  are  still  alive .   You  can  see  that ...  Let  me  show  you  what  that  did  to  the  data .   For  the  age on  2013  data  still  alive ,  you  can  see  that   it's  basically  the  2018  data  minus  the  five  years .   If  I  look  at  the  actual  age  here .  If  we  look  at  the  mean  age ,  basically  this  is  the  mean  age  for  the  2018  data .  By  suspending ,  I  basically  just  subtracted  off  the  five  years  here .  And   treated  out the  suspended  data . And  the  same  goes  for  the  2007  data .  T here  instead ,  I  believe ,  I simply  just  subtracted  off  nine  years,  and  five  years  for  the  18  to  2013  data .   When  I  wrote  the  analysis  of  those ,  I'm  just  going  to  show  you  on  the   PowerPoint  slide  what  it  looks  like  because  it  works  out  convenient  for  me  to  jump  back  and  forth  between  these  two  to  show  you  the  difference .   What  I  found  was  that  this  was  my  original  data  and  then  this  is  the  data  treating  2018  data  suspended  and  doing  the  analysis  of  2013  data . You  can  see  that  the  area  of  interest  where  I'm  really  concerned  about  is  where  I'm  crossing  that  maybe  90%  probability .  It  didn't  change  much  at  all .  And  then  this  is  throwing  the  2018  and  2013  data  assets  during  the  analysis  based  on  2007  data  and  the  2001  data.  You  can  see  that  it  really  doesn't  change  the  curve  much  in  the  area  that  I  care  about .   That  gave  me  the  comfort  that  I  really  am  not  missing  much  by  not  censoring  out  the  data . I  mean  by  not  having  this  censored  data .   How  did  I  calculate  my  90%  confidence  intervals ?  I'm  going  to  just  show  you  in   PowerPoint  how   I  did  this  just  in  the  interest  of  time .   I  can  use  the  quantile  profiler  and  look  at  the  5%  95%  probability  and  calculate  my  90%  confidence  intervals .  Typically ,  what  you'll  see  for  data  and  in  this  case ,  it's  66  years  old,  96  years  old .  But  really  what  I  care  about  is  the  upper  limit .  I  only  care  about  my  90%  probability .  And  in  this  case  it  was  age  94 . Based  on  this  analysis ,  I  now  know  that  I  want  to  be  90%  sure  I  don't  run  out  of  money .  I  should  plan  to  live  to  be  94 .  This  is  based  on  my  historical  data .   Quantifying  that  uncertainty ,  this  is  basically ...  A  little  bit  humour  tone  in . The  next  phase  in  this  was  to  try  and  figure  out ...  I'm  going  to  throw  in  a  little  bit  of  bonus  material  here  beyond  the  Life Fit situation .   Here  we  have  this  tool  called  the  Pension  Tool.  We  could  actually  put  in  the  year  we're  going  to  retire,  and  we  could  also  assume  a  salary  increase  and  a  non  base ,  which  is,  essentially,   what  you  would  call  a  bonus  in  private  industry . We  could  put  those  into  this  tool  and  it  would  spit  out  our  pension,  our  estimated  monthly  pension .  I  thought ,  well ,  that  doesn't  help  me  much.  What  I  want  to  do  is,  I  want  to  reverse- engineer  that  tool  so  that  I  have  the  profiler  to  tinker  with .   I  went  into  the  JMP  and  I  created  a  Response  Surface  model ,  and  I  basically  input  three  different  ages  at  retirement ,  62 ,  65  and  68 . I  did  three  different  salary  increases  and  I  tried  three  different  based  awards  as  well .   Let  me  pull  data  over  here .  This  is  what  the  data  looked  like  when  I  ran  the  experiment .   I'm  just  going  to  show  you  the  the  screenshots  from  JMP  rather  than  do   an  actual  JMP  analysis  because  I'm  like  20  minutes  already   and I want to  make  sure  I  have  enough  time  to  cover  all  of  this . When  I  created  this  experiment ,  you'll  notice  that  the  runs  are  sorted .  That's  okay  because  I'm  doing  a  computer  simulation  so,  I  don't  have  to  worry  about  lurking  variables  like  the  temperature  in  the  room ,  the  humidity  and  the  operators .  I'm  going  to  get  the  same  answer  regardless  of  the  order  that  I  ran  this .   So,  it's  easier  for  me  to  do  the  tool  to  run  them  in  order .  But  when  you  do  that  design  of  experiments ,  you  want  to  randomize  your  run .  Do  not  run  them  in  order  out  of  convenience .  If  you  do,   you  feel,  okay,  your  set of strains  will  tell  you  that  you  can  use  the  how  to  set  easy,  how to  changes  with  the  change,  feature  in  JMP  to  make  sure  you  do  that  properly  by  using  some  blocking . My  inputs  were  all  orthogonal .  I  covered  all  combinations  of  the... My  age  at  retirement ,  the  three  different  ages  and  the  three  different  month  based  awards  and  the  three  different  salary  increase  percentages .  This  is  the  look  at  the  analysis  you  can  do  when  you're  setting  up  your  experiment   in JMP.  You  can  make  sure  that  your  main  effects  are  only  correlated  to  themselves .  And  that  there's  minimal  correlation  to  other  variables ,  in  two  factor  interaction  is  also  a  squared  term  since  I  did  a  Response  Surface  model. But  the  net  result  was  a  nice  prediction  profile  and  I'm  going  to  go  ahead  and  just  run  this  and  show  you .  Prediction  profiler .   When  I  set  up  the  experiment ,  I  tell  JMP.  When  you  set  up  an  experiment  in JMP ,  it's  nice  because  it  gives  you  these  scripts  already  in  your  data  table .   I'm  just  running  the  model  data  table .  And  JMP  says ,  Oh ,  I  know  what  you  want  to model ,  because  I  was  here  when  you  set  up  the  experiment.  And  it  gives  me  my  Y  and  it  gives  me  my  Xs . It  runs  my a nalysis  for  me .   I  can  now  use  the  profiler  JMP,  rather than  having  to  put  in  an  age,  salary  increase,   and a non base  award  into  that  tool  and  getting  one  number  out .  Now  I  have  the  profiler  I reverse  engineered  their  tool  essentially.  I  can  see  what  my  monthly  benefit  is  going  to  be  versus  a  salary  increase ,  non  base,  where  you  see  that  non  base  award  doesn't  really  have  much  effect.  Salary  increase ,  not  much   effect .  And  of  course ,  age  as  we  would  expect,  have  the  most  effect . What  do  I  do  with  this  information ?  Well ,  what  I  really  care  about  is  my  lifetime  benefit .  And  I'm  also  I'm  concerned  about  inflation .   So,  I  added  some  formulas  to  this .  I  added  a  lifetime  benefit ,  which  is  just  my  monthly  benefit  times  how  long  I'm  going  to  live .   If  you  want  to  look  at  the  formula ,  you  can  see  it's  basically  my  monthly  benefit  times  how  long  I'm  going  to  live .  And  that's  my  lifetime  benefit . That's  what  I  really  care  about ,  is  how  much  money  am I  going to get  in  my  lifetime .  I  know  if  I  retire  earlier ,  I'm  going  to  get  less  money .  That's  a  no  brainer .  But  I  want  to  know ,  is  there  a  point  of  diminishing  returns.  W hen  I  looked  at  the  age  at  retirement ,  my  lifetime  benefit  for  80 ,  84  and  98  created  three  separate  columns  for  those  data  will  be  the  column. All these   data wil l  be  provided  to  you  as  well ,  if  you  want  to  tinker  with  it . But  what  you  can  see  is  a  lifetime  benefit  for  aging .  It is  actually  a  point  of  diminishing  returns.  Where  I  might  as  well  just  retire  at  65  because  I'm  going  to  get  less  money ,  but  I'm  going  to  get  it  for  a  longer  period  of  time .  Now ,  as  that  starts  to  increase,  if  my  lifetime  at  age  84 ,   turns out  my  lifetime  benefit  would  have  been  a  little  bit  better  hanging  on  there .  And  as  I  get  older ,  of  course ,  I'm  better  off  waiting  as  long  as  I  can  to  retire . But  I  really  don't  know .   I  don't  know  what  that  number  is .  My  best  guess  is  84 .  That  was  the  fiftieth  percentile   on my  fit .  If  I  want  to  be  really  conservative ,  I'm  looking  at  90 .  But  I  also  wanted  to  look  at  how  inflation  matters .   I  looked  at   the  present  value,  I  put  a  formula  in  JMP .  There's  actually   a  formula  for  present  value .  Let  me  explain  [inaudible 00:26:05]  I'm going to  show  it to you  real  quick . In  JMP  you  can  actually  use  a  formula  similar  to  what  Excel  has  and  calculate  this present  value.  And  that  adjusts  for  inflation .   This  is  what  your  money's  worth  now,  relative  to  what  you're  predicting ,  the  future  inflation  rate  is  going  to  be  annual  inflation  rate .   At  0%  this  present  value  is  just  your  lifetime .  The  number  of  payments  you're  going  to  have  through  your  lifetime  times  the  payment  amount  essentially ,  but  you  get  penalized  as  inflation  goes  up ,  it  becomes  less  and  less .   If   I  look  just  at  the  inflation  present  value  data ...  If  I  just  model  those ,  what  I  noticed  was ...  I'll make these  little  bigger  so  you  can  see  them . I  noticed  that  when  my  salary  increase  is  smaller,  lower ...  This  is  a  two  factor  interaction .  You  can  see  the  slopes  changing .  That  was  some  small  salary  increases ,  which  are  probably  not  unrealistic  in  the  coming  future .  There  is  a  point  of  diminishing  return  when  we  penalize  for  inflation .   At  lower  inflation ,  there  is   not  that  big  of  a  benefit  to  wait  to  retire .  Once  I  hit  about  65  or  so ,  it's  starting  to  flatten  out .   Really  the  difference  isn't  that  big . If  we  look  at  the  predicted  value ,  these  are  not  my  actual  numbers .  These  were  based  on  censoring  the  data  using  a  pitch factor  of  $4,000  a  month ,  which  is  the  typical  retiree  income .  I  noticed  that  as  I  weigh ,  it   flattens  out  so  it  doesn't  really  pay  to  wait  to  retire .   I'm  using  this  information  to  help  me  decide  when  I  really  want  to  retire .  Right  now ,  I'll  be  62  in  May ,  so  I'm  starting  to  approach  the  ability  to  do  this . Probably  going  to  wait  till  64 ,  65- ish  to  where  the  difference  in  my  present  value  is  not  that  big .   That's  a  little  quick  bonus  on  how  I  use  the  designer  experiments  feature  in JMP  to  reverse- engineer  this .   In  summary ,  I  looked  at  the  retiree  death  data  and  I  set  the  Life  Distribution  and  Fit  Life by X,  I  looked  at  four  different  time  periods  and  noticed  maybe  we  did  get  a  little  bit  healthier  and  healthier  in  2013 ,  but  by  2018  that  have  flattened  out .  I  do  have  some  2021  data  that  I  don't  have  in  this  presentation,  but  next  year  at  the  conference  in  Spain  in  March ,  that  it's  actually  pretty  flat  from  2013  to  20 18  to  2021 .  We're  really  not  getting  healthier ,  living  longer .   That's  going  to  help  me  with  my  decision .  I  took  a  custom  designed  experiment  as bonus  material  here ,  and  I  reverse- engineered  this  web  based  applet  that  we  have  and  used  the  profiler  to  replace  that  one  data  point  at  a  time .   A  really  cool  thing ,  you  can  do   in JMP .   I  would  take  questions  here  if  were  alive . And  with  that ,  I  want  to  say  one  last  thing .  I  really  want  to  dedicate  this  presentation  to  my  brother ,  who  I  lost  a  couple  of  years  ago  to  brain  cancer ,  who  was  only  15  months  younger  than  me .  He  was  a  fellow  Sandi  and  also  a  fellow  JMP  user .  Some  of  you  may  have  met  him  at  the  JMP  conference  and  so I'm  dedicating  this  to  him .  Thank  you  very  much .
This presentation demonstrates how to access different production plant data to look into all corners of a production plant for quality assurance. JMP was used to access production, process, and quality data with one click. The presentation shows how data accessed via REST-API from an existing application was optimized by data import, data cleanup of input tables using JSL, and visualization of automated dashboards. Specification and control limit management for automated reporting was critical. Also, where data import possibilities were limited, a combination of Python and JMP was used to import process factors and responses for each production step of interest in mass production plants. A full range of quality assurance tools will be demonstrated, which could be used to begin helpful discussions with production teams for continuous improvement plans and PDCA Cycles.     Hello,  everybody.  My  name  is  Ole  Lühn  and  I  am  member  of  Global  Quality  Assurance  at  BASF.   Today  at  the  JMP  Discovery  Summit  in  Europe  2023  in  Sitges  in  Spain,  I  want  to  talk  about  my  topic,  which  is  called  Don't  Lose  Your  Time  Anymore,  Automatic  Assess,  Visualisation  and  Evaluation  of  Product  Environmental.  So  how  to  get  fast  access  to  your  production  sites. And  here  in  the  background,  we  see  the  site  I'm  working  at,  which  is  BASF  Schwarz heide  in  Germany.  On  this  picture  and  on  this  slide,  you  see  all  the  sites  of  BASF  worldwide.  So  there  are  Verbund  sites,  which  is  a  combination  of  different  production  environments.  We  have  R&D  centers,  production  sites,  and  regional  centers.  And  for  Europe,  I  am  located  in  Schwartzheide.  And  on  the  next  slide,  I  will  show  you  a  little  bit  more  detail  where  Schwartzheide  is  located. I  am  working,  in  fact,  with  different  sites  closely  together,  which  are  Ludwig shafen,  which  is  the  main  part  or  the  main  site  of  our  company  with  approximately  39,000  employees.  And  Schwartzheide  is  located  roughly  140  kilometre  south  of  Berlin,  and  we  have  about  2,000  employees. Today   at  the  JMP  Discovery  Summit  in  Sitges,  which  is  located  a  little  bit  north  of  Tarragona  and  south  of  Barcelona,  we  are  having  the  Discovery  Summit. I  explicitly  showed  Tarragona  here  on  the  slide  because  I  have  also  worked  with  them  to  do  where  we  formulate  our  end -use  product,  which  we  sell  to  the  market.  And  on  the  upper  right  corner,  you  see  a  glance  of  Europe  side  from  BASF,  and  for  the  color  code,  please  see  the  previous  slide. I  am  working  in  Schwartzheide,  and  here  you  can  see  the  picture  of  the  site  as  of  today.  So  the  plant  where  I  am  working  at  and  where  I  am  working  for  is  in  the  orange  rectangular  shown  here  on  this  picture,  and  on  the  lower  left,  this  picture  shows  the  view  from  my  former  office.   I  had  a  look  out  of  the  window  and  could  see  the  plant  where  I  was  working  for.  So  at  that  time,  I  was  still  located  in  a  different  office.   Of  course,  I  want  to  know  more  about  what  is  happening  in  the  plant,  and  that  is  what  my  talk  is  about,  to  have  access  to  data  in  the  plant  when  you're  not  currently  working  there. So  the  introduction  of  BASF  and  me,  I  am  a  member   in  the  business  unit,  Agricultural  Solutions,  in  the  team  of  the  Global  Quality  Assurance.  And  I  am  a  JMP  user  since  roughly  JMP  9,  so  it  must  be  something  2010.  My  task  is  to  assure  production  processes  and  times  of quality, so  quality  assurance.  Also,  I  am  working  in  quality  management  and  I  am  a ud itor  for  different  ISO  norms,  and  I  am  involved  in  nonconformance  management  and  deviation  management. For  the  final  product,  I'm  responsible  for  the  release,  and  I  also  have  to  sign  each  COA  that  we  pack  and  that  we  pack  to  the  product  shippings  around  the  world,  and  on  each  COA,  I  have  to  sign.  So  I  want  to  know  what  is  happening  in  the  production.  So  from  a  statistical  point  of  view,  I  want  to  know  if  there  are  differences  in  the  production  or  in  the  production  process  and  the  production  environment.  From  a  practical  point  of  view,  I  need  to  evaluate  if  these  differences  do  really  matter.   I  want  to  know  the  details  of  the  production  as  fast  as  possible  and  as  easily  available  as  possible,  and  that's  why  I  started  this  topic  a  while  ago.  My  goal  is  to  have  a  proper  root  cause  analysis,  preventive  and  corrective  actions  in  nonconformance  management. The  work  started  roughly  2020  when  we  all  realized  that  the  corona  pandemic  was  developing.  And  like  in  many  companies,  there  was  a  restrictive  access  to  the  plant.  So  while  I  was  sitting  at  home,  I  was  asking  myself  how  to  get  access  to  the  data  from  the  plant  without  being  on  site.  So  how  can  I  see  what  happens  in  the  plant  when  I'm  not  really  sitting  in  the  plant?  And  how  can  I  be  part  of  the  production  and  the  production  teams  when  I'm  not  available  around? So  the  idea  is  that  I  go  digital  to  the  plant  and  realize  what  happens.  Without  the  data  access  control  V  and  control  V,  which  takes  everybody  notices  in  our  community  days,  maybe,  or  hours  to  get  a  real  evaluation  done.  So  my  idea  came  up  while  I  was  participating  on  a  seminar  on  the  tablet  production  process  in  the  JMP  lecture  courses,  and  my  goal  was  to  have  the  highest  degree  of  automation  in  my  data  access  in  seconds. Here  is  what  it  is  all  about.  So  here  on  this  picture,  you  see  a  general  overview  of  our  plants.   In  fact,  we  have  three  plants  which  are  producing  parallel  our  product.  And  the  very  general  value  stream  is  shown  on  the  top  of  this  graph.  We  start  with  raw  materials,  we  have  intermediates  in  our  process,  and  we  go  to  the  final  product.  So  this  data  of  the  last  three  processes  is  available  via  a   REST-API  interface  where  I  can  get  the  data  from. All  the  data  is  prepared  in  one  table.  I  have  the  time,  I  have  my  final  lot,  I  have  the  QC  data  of  my  final  product,  and  for  my  intermediates,  I  have  different  factors  and  different  responses  available.   What  I  mean  with  factors  and  responses  is  shown  on  the  upper  right  picture,  which  I  took  from  the  JMPs  course. This  work  was  started  with  a  colleague  from  me,  Bernd  Heinen,  who  helped  me  out  in  writing  the  scripts  and  preparing  my  data  access.   This  is  what  I  want  to  show  you  now  in  JMP.   We  prepared  our  scripts  in  a  very  easy  way.   We  put  the   REST-API  address  here  in  our  scripts.  We  do  some  scripting  instructions.  I  had  a  problem  with  the  data  time  format  when  this  table  was  imported  into  JMP.  I  was  asking  in  the  community  and  someone  gave  me  the  solution  for  my  time  formatting  problem.  I  add  some  specs  to  my  columns  and  I  save  it  on  a  drive.  I  always  save  it  as  an  Excel  file  as  well. The  data  which  is  shown  here  is  the  data  for  one  of  my  plants.  I  opened  this  table  now.  It's  table  number  two.  Here  I  have  my  time  of  different  processes  in  the  plant.  I  have  the  lot  of  my  final  product.  I  have  the  response,  the QC  data  of  my  final  product,  responses  and  factors  of  previous  processes. Firstly,  I'm  interested  in  the  final  product.  I  started  to  plot  the  data  in  the  control  chart  of  one  of  my  responses  for  my  final  product.  However,  it  is  an  X -bar  and  R -chart.  The  reason  for  this  is  that  from  my  last  intermediate,  five  lots  are  put  together  in  the  final  reactor,  and  therefore,  the  X -bar  and  R -chart  is  created,  and  not  the  individual  and  moving  range  chart,  which  I  am  interested  in.  So  I  need  to  do  a  little  bit  of  data  preparation  work. The  outcome  of  the  data  preparation  work  is  shown  on  the  lower  part  of  the  slide  where  I  have  the  access  of  all  three  plants  parallel,  I  can  do  my  data  preparation.  I  use  the  function  in  JMP  to  concatenate  these  data  tables  and  I  can  plot  easily  an  overview  of  my  final  QC  data  in  this  video.  I  use  the  Fit Y -by -X  platform  for  the  three  plans,  use  the  column  switcher  and  recorded  a  video.  Everything  is  possible  within  seconds. The  idea  of  my  work  started  when  I  had  a  look  at  the  tablet  production  processes  or  the  tablet  production  process  from  the  JMP  lecture.  Here  we  have  this  red  triangle  which  shows  us  suspicious  values  in  a  production  trend.  My  idea  was  to  also  create  in  my  data  table  a  column  where  I  add  suspicious  data  or  suspicious  results  and  color  it  in  a  binary  format. I  changed  my  continuous  variable  into  a  binary  one,  like  good  or  bad.  For  me,  it  is  more  easy  to  discuss  this  with  the  team  from  statistical  point  of  view  and  everybody  in  our  community  knows  this  as  well.  There  are  more  powerful  ways  in  order  to  get  this  data  analyzed  for  regression,  for  example.  However,  I  choose  this  way.  I  will  show  you  what  I  mean.  I  prepared  this  in  my  slides. I  wanted  to  have  these  indicators,  good  or  bad,  or  suspicious  or  not  suspicious  in  my  data  table,  so  this  data  table  is  taken  from  the  JMP  library,  the  tablet  production  process.  And  in  the  end,  I  had  my  data  prepared  like  this. I  added  a  column  to  my  data  table.  Here,  because  I  had  five  lots  of  my  intermediates,  I  needed  to  split  my  data  first  and  to  evaluate  only  the  final  product.  I  added  a  column  where  I  evaluated  on  my  individual  and  moving  range  charts  how  is  the  data  distributed?  And  here  we  can  see  that  my  internal  specification  is  far  above  my  data.  However,  there  are  suspicious  events  in  the  production  process. And  this  data  I  can  also  plot  in  a  dashboard  here  where  I  cut  off  above  the  upper  control  limit  my  data.  With  this  data  prepared,  I  can  have  easy  discussions  with  my  team  and  tell  them,  "Look,  here,  what  happened  here?  I  just  added  time  and  say  roughly  around  this  time,  we  had  a  suspicious  event  here  in  our  factory.  What  happened?" Let's  go  back  to  the  JMP  table  or  to  the  presentation.  I  created  this  column,  good  or  bad.  And  from  a  statistical  point  of  view,  we  know  that  regression  is  better.  The  script  in  order  to  evaluate  all  this  data  at  once  within  20  seconds  of  time  is  containing  actions  like  to  split  the  data,  make  a  summary,  analyze  the  control  limit,  summarize  my  control  limit  in  an  extra  table,  introduce  this  column. First  I  did  it  manually,  but  I  now  had  a  nice  script  to  do  it  automatically.  I  merged  my  data,  my  summary  back  into  my  first  data  table,  which  I  extracted  from  my   REST-API  interface  here,  and  it's  done.  Everything  was  in  seconds,  and  from  the  JMP  team,  Florian  helped  me  to  do  this  here. Here  is  just  the  proposal  for  me  and  for  people  who  work  in  the  quality  department,  how  to  discuss  these  graphs  with  the  team.  Of  course,  there  are  different  ways  how  to  analyze  data  in  JMP.  I  like  most  the  Fit  Y -by -X  platform,  and  I  also  like  the  hypothesis  testing  ways  which  are  shown  on  the  video  here  on  slide  seven,  where  I  plot  the  three  plants  parallel  to  each  other  and  go  through  all  my  QC  data  of  interest  and  compare  it. Furthermore,  I  also  like  the  process  performance  graph  which  is  available  in  JMP.  But  one  way  of  analyzing  my  data  is  also  that  I  can  easily  plot  Pareto  plots  of  my  out -of -control  events.  I  have  the  data  prepared.  We  go  to  the  table.  We  have  only  the  values  which  are  not  okay  or  above  the  control  limit  selected.  I  click  on  Pareto  Plots  and  I  can  discuss  this  easily  with  the  team  and  tell  them  these  two  variables  are  most  of  the  time   out-of-control.  What  do  we  do? I  can  also  compare  three  plants  at  once  immediately.  It's  more  or  less  the  same,  but  I  can  have  a  look  here  and  create  meetings  and  discuss  about  suspiciencies  between  the  three  factories.  So  obviously  the  factory  which  is  colored  red,  there  the  less   out-of-control  events  are  created.  This  is  part  of  my  story  to  have  fast  data  available  here  to  to  be  discussed  in  the  teams. Of  course,  I  know  that  there  are  different  options  in  JMP,  so  we  can  do  regression,  we  can  do  predictor  screenings,  and  the  data  table  for  the  predictor  screening  is  also  shown  here.  In  fact,  I  have  my  final  data  table  which  I  extracted  from  the  REST -API  interface,  and  I  have  this   out-of-control  column  added  to  my  previous  data,  and  I  can  do  a  predictor  screening  and  I  can  have  a  look. Where  are  the  factors  in  the  plant  which  influence  the  outcome  in  my  final  product  first? Of  course.  Okay,  he's  doing  the  calculation  at  the  moment.  Of  course,  these  kind of things  are  difficult  to  discuss,  and  it's  also  not  part  of  my  job  to  discuss  here  about  process  suspiciencies.  My  topic  is  quality  and  I  want  to  reduce  the  number  of  off -spec  events  in  the  plant.  I'm  talking  about  off -spec  and  not   out-of-control  anymore. We  had  a  time...  I  go  to  the  right  part  of  the  slide.  We  had  a  time  in  the  plant  where  we  created   off-spec  events.  I  did  a  partition  with  the  data  and  the  data  told  me,  "Look  here,  this  is  suspicious,  below  12,000  or  13,000.  Below  this  value,  it  is  most  probably  that  we  create   off-spec  events."  Here  in  the  graph  builder,  these  events  are  marked  red.  Obviously,  this  is  important.  It  is  a  summary  of  chemicals  that  we  put  on  a  filter  unit  in  the  plant.  I  will  show  you  the  video  soon. The  second  step  is  that  in  the  partition,  we  came  to  the  conclusion  that  above  a  certain  solven t  dosing,  it  is  most  probable  or  it  is  likely  that  we  produce   off-spec  lots.  There  are  two  results  from  this  evaluation.  Don't  put  too  much  and  be  careful  with  the  solven t  dosing. Now  I  want  to  show  you  the  video.  During  the  this  time,  we  sent  a  colleague  to  the  plant  and  he  took  videos  out  of  this  reactor.  The  red  rectangular  is  actually  more  or  less  the  amount  of  what  we  put  here  on  the  filter  unit.  The  green  rectangular  is  the  amount  of  solvent,  which  is  dosed  afterwards.   If  it's  too  few  put  on  the  filter,  and  too  much  solvent,  obviously  we  created  here  a  problem.  After  we  discussed  this  with  the  team,  it  was  about  one -and -a -half  years  ago,  we  did  not  have  these  events  anymore  in  the  factory  or  in  the  plant.  So  for  me,  it  was  a  quite  successful  story  how  we  came  here  to  the  conclusion  that  we  can  be  more  careful  in  our  plant  about  our  processes. My  primary  result  of  this  first  part  of  my  talk  is  that  via  the  scripts  and  the  way  I  imported  data  to  JMP  via  the   REST-API  interface,  it  is  possible  to  sharpen  the  awareness  in  the  production  with  respect  to  deviations.  Of  course,  there  are  more  powerful  regression  possibilities  in  the  tool  which  we  are  using,  but  it  is  not  my  job  as  a  quality  manager.   The  goal  is  to  hand  over  here  a  powerful  tool  which  we  can  use  in  our  factory  and  plant  for  deviation  management  and  root  cause  analysis.   My  idea  is  if  someone  is  not  wanting  to  use  our  tool  JMP  or  wants  to  use  a  different  tool,  I  can  save  always  the  scripts  or  the  files  as  Excel  tables. However,  only  the  last  three  processes  are  available  here,  and  it  is  limited  because  this  is  a  project  which  was  started   2017  to  prepare  all  this  data  correlation  and  time  between  these  three  processes.  However,  what  do  I  do  if  I'm  interested  in  previous  processes  like  the  raw  material  or  my  first  intermediate?  How  can  I  get  this  data  in  to  jump  from  the  outside  of  the  plant? Therefore,  I  started  to  use  and  to  investigate  the  interface  and  the  possibilities  to  import  data  via  a  connection  between  Python  and  Azure.  It  started  all  when  we  had  the  manufacturing  execution  system  changed  by  the  end  of  2020. The  way  I  tried  to  go  was  stopped  because  this  manufacturing  execution  system  was  stopped.  I  thought,  "Was  everything  for  nothing  now?"  No.  I  found  a  paper  which  was  written  about  the  synergies  from  JMP  and  JMP  Pro  with  Python.  The  concept  is  shown  here  on  the  left  side  of  the  slide.  In  fact,  I  have  a  JMP  script,  I  have  a  Python  part,  and  I  have  an  option  in  this  Azure  where  I  do  my  data  access.  Everything  is  also  possible  within  seconds  to  get  the  data  out  of  the  plant.  All  is  written  here  in  the  scripting  guide,  page  786.  These  are  only  a  few  pages.  If  questions  come  up,  Emanuel  from  the  JMP  team  helped  me  here  to  do  this. Here's  the  concept.  Let  me  move  this  Zoom  picture.  On  the  left  part,  it's  what  everybody  in  the  chemical  industry  knows.  Raw  materials  are  delivered  in  trucks.  They  are  stored  in  a  tank  farm  and  the  production  has  to  use  raw  material,  of  course.  And  raw  material  is  consumed  during  the  production  processes  and  new  raw  materials  are  delivered  and  the  tanks  are  refilled  again.  So  we  have  a  certain  level  and  everything  is  fine. It  starts  and  the  way  to  investigate  here  and  to  get  the  data  in to  JMP  is  we  have  to  look  up  our  individual  process  of  interest.  Every  point  of  interest  in  the  plant  has  a  certain  number,  and  this  number  needs  to  be  fined.  So  you  can  ask  colleagues,  you  can  ask  SAP,  you  can  ask  the  automation  team,  you  can  have  a  look  by  yourself.  You  need  to  find  a  way  in  your  company  who  can  help  you. When  you  found  the  number,  you  have  to  make  the  request  in  Azure.  This  number  I  was  talking  about  here,  it's  named  at  us,  it's  the  unique  ID.  I  type  in  here  how  many  days  back  I  want  to  see  the  data.  I  test  it  in  Azure.  He  creates  a  table  and  I  put  this  code  into  my  JMP  script  and  I  click  on  execute.  This  is  what  I  show  you  now.  It's  also  done  in  few  seconds  only  here. This  first  part  you  need  to  get  from  colleagues  or  you  can  look  up  in  the  internet.  I  looked  up  in  the  internet  and  this  is  not  part  of  JMP  support.  Here  you  can  see  that  within  seconds  I  can  have  a  look  at  the  filling  level  of  some  of  our  raw  material  tanks  or  the  tank  farm.  This  is  the  concept  that  any  process  of  interest  can  be  imported  like  this  into  JMP  within  seconds. I  wrote  a  manual  or  a  journal  how  this  can  be  done,  and  I  will  publish  it  also  with  my  work  in  the  community.  If  you  don't  know  how  to  do  it,  it's  written  here.  Where  you  can  find  it  in  the  scripting  index,  I  have  a  paper  here  how  you  test  your  system.  I  have  a  few  recommendations  for  the  help  in  the  scripting  index,  and  also  I  found  some  answers  which  I  placed  before  in  the  community,  so  I  hope  to  have  also  feedback  from  me  documented  there,  and  I  showed  you  how  to  extract  data  from  anywhere  in  the  FAB  or  in  the  plant.  And  this  brings  me  to  the  end  of  my  talk. The  summary  is  that  product  deviations  happen  to  all  of  us.  So  here  I  show  you  some  quality  assurance  tools,  and  via  one  click,  I  have  all  my  necessary  graphs  and  information  prepared  like  this  to  discuss  with  the  team.  My  message  is  that  don't  be  afraid  from  using  the  JSL  scripting  language  that  I  was  before.  When  you  start,  you  learn  fast  and  you  get  the  job  done.  You  can  prepare  good  discussions  with  your  production  teams  and  you  can  start  continuous  improvement  plans  and  PDCAs,  and  everything  can  be  available  within  a  few  seconds. I  also  have  some  ideas  how  to  go  on  here,  and  there  I  need  the  help  from  you,  so  from  the  colleagues  from  which  are  attending  the  conference.  Maybe  you  can  help  me  automate  and  schedule  my  evaluations.  I  know  that  this  can  be  done  in  JMP  Live  in  version  17.  If  you  don't  have  it,  I  need  to  use  the  Windows  Scheduler.  For  example,  I  need  to  improve  a  little  bit  by  creating  add -ins,  and  the  data  cleaning  still  can  be  a  little  bit  bit  more  better  and  improved,  but  okay. My  main  goal  was  to  make  the  members  talk  more  about  suspicious  events  in  our  factory  and   out-of-control  events,  which  is  the  lower  right  part  of  the  picture  of  the  slide,  and  less  about   off-spec  events  and   out-of-control  events.  From  quality  management  point  of  view,  I  really  am  a  fan  of  turtle  diagrams  and  turtle  tools  to  document  your  improvements. This  brings  me  to  the  end  of  my  talk.  I  hope  you  enjoyed  it  as  well  as  me  when  I  was  preparing  this  work,  and  I'm  looking  forward  to  your  questions  and  to  see  you  in  Spain.  Thank  you.
Most measurement systems have detection limits above or below which one cannot accurately measure the quantity of interest. Although detection-limited responses are common in many application areas, such as the pharma, chemical, and consumer products industries, they are often ignored in the analysis. Ignoring detection limits biases in the results and even drastically lowers the power to detect active effects. Fortunately, the Custom Designer and Generalized Regression in JMP® make incorporating detection limits easy and automatic. In this presentation, we will use simulated versions of real designed experiments to show how to get the analysis right in JMP® Pro 17 and the pitfalls that will occur if detection limits are ignored in the analysis. We will also show how simple graphical tools can identify parts of the design region that could be problematic or even make it impossible to estimate certain model terms or interactions. Our examples will include an experiment designed to maximize the yield of a chemical product where the response is a reduction in the number of microorganisms in microbial susceptibility testing of consumer cleaning products.     Hi,  I'm  Chris  Gotwalt  with  JMP,  and  I'm  presenting  with  Fangyi  Luo  of  Procter  &  Gamble,  and  her  colleague,  Beatrice  Blum,  who'll  be  joining  us  for  the  in- person  presentation  at  the  Discovery  Conference  in  Spain. Today,  we  are  talking  about  how  to  model  data  from  designed  experiments  when  the  response  is  detection  limited.  This  is  an  important  topic  because  on  the  one  hand,  detection  limits  are  very  common,  especially  in  industries  that  do  a  lot  of  chemistry,  like  the  pharmaceutical  and  consumer  products  industries. While  on  the  other  hand,  the  consequences  of  ignoring  detection  limits  leads  to  seriously  inaccurate  conclusions  that  will  not  generalize.  This  leads  to  lost  R&D  time  and  inefficient  use  of  resources.  The  good  news  that  we  are  here  to  show  today  is  that  getting  the  analysis  right  is  trivially  easy  if  you  are  using  generalized  regression  in  JMP  Pro and  know  how  to  set  up  the  detection  limits  column  property. In  this  talk,  we're  going  to  give  a  brief  introduction  to  sensor  data,  explaining  what  it  is,  what  it  looks  like  in  histograms  and  a  brief  description  of  how  you  analyze  it  a  little  bit  differently.  Then  Fangyi  is  going  to  go  into  the  analysis  of  some  designed  experiments  from  Procter  &  Gamble.  Then  I'm  going  to  go  through  an  analysis  of  a  larger  data  set  than  the  one  that  Fangyi  introduced.  Then  we're  going  to  wrap  up  with  a  summary  and  some  conclusions. What  are  detection  limits?  Detection  limits  are  when  the  measurement  system  is  unable  to  measure,  at  least  reliably,  when  the  actual  value  is  above  or  below  a  particular  value.  If  the  actual  value  is,  say,  above  an  upper  detection  limit,  the  measured  value  will  be  observed  as  being  at  that  limit.  For  example,  if  a  speedometer  in  a  vehicle  only  goes  to  180  kilometres  an  hour,  but  you  are  driving  200 kilometres  an  hour,  then  the  speedometer  will  just  read  180  kilometres  an  hour. In  the  graphs  above,  we  see  another  example.  We  see  five  histograms  of  the  same  data.  The  true  or  actual  values  are  over  here  on  the  left,  and  moving  to  the  right,  we  see  what  results  when  you  apply  an  increasing  detection  limit  to  this  data. What  happens  is  we  see  this  characteristic  bunching  at  the  detection  limit.  When  you  see  this  pattern,  it's  a  really  good  sign  that  you  may  need  to  think  about  taking  detection  limits  into  account  in  your  distributional  or  regression  analysis. Why  should  we  care  about  detection  limits  in  a  data  analysis?  Well,  if  you  don't  take  your  detection  limits  into  account  properly,  you'll  end  up  with  very  heavily  biased  results,  and  this  leads  to  very  poor  model  generalization.  The  regression  coefficients  will  be  way  off.  You'll  have  an  incorrect  response  surface,  which  leads  to  matched  targets  with  the  Profiler  being  way  off. I  think  the  situation  is  a  little  bit  less  dire  when  maximizing  a  response,  but  there's  still  quite  a  lot  of  opportunity  for  things  to  go  wrong.  In  particular,  Sigma,  your  variance  estimate  will  still  be  way  off,  which  leads  to  much  lower  power,  you  have  completely  unreliable p- values.  The  tendency  is  that  variable  selection  methods  will  heavily  under  select  important  factors .  The  actual  impact  that  a  factor  has  on  your  response  will  be  dramatically  understated  if  you  don't  take  the  detection  limits  into  account. The  two  tables  of  parameter  estimates  that  we  see  here  illustrate  this  very  nicely.  On  the  left  are  the  parameter  estimates  from  a  detection- limited  LogN ormal  analysis  of  a  regression  problem.  On  the  right, they  are  the  resulting  parameter  estimates  when  we  ignore  the  detection  limit.  We  see  that  the  model  on  the  left  is  a  lot  richer  and  that  a  lot of  our  main  effects,  interactions,  and  quadratic  terms  have  been  admitted  into  the  model. Whereas  on  the  right,  when  we  ignore  the  detection  limit,  we're  only  able  to  get  one  main  effect  and  its  quadratic  term  included  in  the  model,  and  the  quadratic  term  is  heavily  overstated  with  a  value  of  negative  11.5  about  relative  to  the  value  in  the  proper  analysis  where  that  quadratic  term  is  equal  to  just  negative  3. We  see  that  we're  really  missing  it  on  a  lot  of  the  other  parameters  here  as  well.  When  we  take  a  look  at  this  in  the  Profiler,  this  becomes  really  apparent.  O n  the  left,  we  have  the  Profiler  of  the  model  correctly  analyzing  with  the   Limit of Detection,  and  we  see  that  all  the  factors  are  there,  and  overall,  the  response  surface  is  pretty  rich- looking. On  the  right,  we  see  that  only  the  one  factor,  dichloromethane,  has  been  included  in  the  model.  T he  solution  to  the  problem  that  you  would  get  with  the  problem  on  the  left  is  likely  rather  different  from  the  one  that  you  would  get  on  the  right. Thanks,  Chris.  Now  I'm  going  to  share  with  you  a  little  bit  background  on  the  experiment  of  the  data  mentioned  by  Chris,  the  time  to  bacterial  detection.  The  objective  of  that  experiment  was  to  understand   hostility  impacts  of  our  formulation  ingredients  or  factors  on  a  liquid  consumer  cleaning  formulation. The  experiment  was  a  micro- hostility  Design  of  Experiment  with  36  samples,  and  we  have  five  key  formulation  factors  A,  B,  C,  D,  E.  W e  have  two  responses  from  this  experiment.  They're  from  microbial  testing.  The  first  one  was  the  one  mentioned  by  Chris.  It  is  time  to  bacteria  detection  in  two  days,  and  it  was  measured  by  hour.  I f  we  are  not  able  to  detect  the  bacteria  in  two  days,  then  time  to  bacteria  detection  is  right  censored  at  48  hours.  So  the   Limit of Detection  for  this  endpoint  is  48  hours. Another  endpoint  is  log  reduction  in  mode  from  Micro  Susceptibility  Testing.  For  this  endpoint,  what  we  did  is  that  we  add  certain  amount  of  mold  to  the  formulation,  wait  for  two  weeks,  and  measure  amount  of  mold  in  the  product  after  two  weeks.  T hen  we  calculate  the  reduction  in  log  base  time  mold,  and  this  is  the  second  endpoint. Limit of Detection  for  this  endpoint  is   six  unit.  T his  shows  you  the  detailed  data  from  the  experiment,  the  first  15  samples.  Y ou  can  see  the  formulation  factors  A,  B,  C,  D,  E,  and  they  were  from  response  surface  design.  W e  have  two  endpoints,  the  bacteria  detection  time  in  hours  and  the  log  reduction  in  mold.  The  data  highlighted  in  red  are  right  censored  data. We  can  use  histograms  and  scatterplots  to  visualize  our  data  as  well  as  factor  versus  censoring  relationship.  As  you  can  see  from  the  histogram,  more  than  50 %  of  samples  are  right  censored  at  48  hours.  If  an  observation  is  not  censored,  then  most  of  them  will  be  below  15  hours. O n  the  right,  we  have  the  scatter plot.  The  red  circle  indicates  the  censored  data  points.  You  can  see  that  we  have  censoring  at  all  levels  of  the  factors   except  for  factor  C.  We  don't  have  the  censoring  at  higher  level  of  C,  but  we  observe  censoring  at  all  level  of  the  factors. In  JMP  Pro  16  and  higher,  we  can  specify  column  properties  for  detection  limit.  W hen  you  go  to  column  property,  you  find  detection  limits,  and  then  you  can  specify  the  lower  detection  limit  and  upper  detection  limit. I f  a  data  point  is  below  the  lower  detection  limit,  that  means  it's  less  censored  at  the  lower  detection  limit.  If  a  data  point  is  higher  than  the  upper  detection  limit,  then  it  means  that  it's  right  censored  at  the  upper  detection  limit. For  the  bacterial  detection  time,  we  have  an  upper  detection  limit  and  it's  48  hours.  W e  put  48  hours  in  the  upper  detection  limit  box.  After  we  specified  detection  limit  on  the  column  property  in  JMP,  then  we  can  use  JMP  generalized  regression  modeling  to  analyze  the  data  by  taking  into  account  the   Limit of Detection.  So  this  is  a  new  feature  in  JMP  Pro  16  and  higher. For  this  type  of  analysis,  we  need  to  first  specify  the  distribution  for  a  response  and  estimation  method.  W e  try  different  distribution  for  the  data  and  use  the  forward  selection  method,  and  we  found  Normal  distribution  fits  the  data  the  best  because  it  has  lowest  AICc. We  can  also  analyze  data  ignoring  the  detection  limit.  Y ou  can  see  that  we  will  have  a  much  smaller  model  with  five  factors  left  in  the  final  model.  T he  model  ignoring   Limit of Detection  will  have  much  less  power  to  detect  significant  factors. This  showed  you  the  factors  left  in  the  final  model  from  the  generalized  regression  modeling.  If  we  take  into  account   Limit of Detection  for  the  response,  or  if  we  ignore   Limit of Detection  in  the  response.  As  you  can  see,  if  we  take  into  account   Limit of Detection,  then  we  have  much  more  significant  factors  in  the  model.  W e  can  only  detect  the  effect  of  C  and  D  and  their  quadratic  effect  in  the  model  if  we  ignore   Limit of Detection  for  our  response. Again,  this  is  comparison  of  the  parameter  estimate  from  the  model  if  we  consider   Limit of Detection  in  the  modeling  or  ignoring   Limit of Detection  in  the  modeling.  Ignoring   Limit of Detection  in  the  modeling  would  give  us  the  bias  estimate  of  the  parameter  as  well. This  slide  shows  you  the  prediction  Profiler  of  the  response  if  we  perform  the  modeling  by  considering  the   Limit of Detection  versus  ignoring  the   Limit of Detection.  If  we  consider  the   Limit of Detection  in  the  modeling,  then  we  get  a  model  with  all  the  terms  in  the  model,  the  main  effects  as  well  as  some  of  the  interaction  and  quadratic  terms.  T his  model  makes  much  more  sense  to  our  collaborators. Remember  that  at  lower  level  of  C  and  at  higher  level  of  D,  we  have  more  censoring  data.  That  means  the  detection  time  is  longer  and  the  prediction  Profiler  showed  that  at  lower  level  of  C  and  a  higher  level  of  D,  the  predicted  detection  time  is  longer.  A lso  because  we  have  more  censored  data  in  those  region,  so  the  confidence  interval  for  the  prediction  P rofiler  is  wider. If  we  ignore  the   Limit of Detection  in  the  analysis,  we  get  much  less  significant  factors.  Only  C  and  D  showed  up  in  the  model,  and  the  parameter  estimate  is  also  biased.  This  one  shows  you  the  diagnostic   plotting  of  observed  data  on  the  y- axis  versus  predicted  data  on  the  x- axis. If  we  consider   Limit of Detection  in  the  generalized  regression  modeling,  it  gives  correct  prediction.  But  if  we  ignore   Limit of Detection  in  the  modeling,  then  it  will  give  incorrect  prediction  for  your  data. In  addition  to  the  prediction  Profiler,  JMP  generalized  regression  modeling  would  also  give  you  two  profilers  similar  to  those  from  Parametric  Survival M odeling  platform.  Those  are  the  Distribution  Profiler  and  Quantile  Profiler.  The  distribution  profiler  will  give  you  the  failure  probability  at  a  certain  combination  of  our  formulation  factors  and  a  certain  detection  time. The  Quantile  Profiler  will  give  you  the  quantile  of  the  detection  time  at  a  certain  combination  of  our  formulation  factors  and  the  specified  failure  probability.  T hese  two  profilers  are  available  in  JMP  under  the  Generalized  Regression  Modeling. But  one  advantage  of  using  Generalized R egression  Modeling  to  analyze  time  to  failure  type  of  data  is  that  it  would  provide  you  the  Prediction  Profiler,  and  this  type  of  profiler  is  much  more  easier  for  our  collaborator  to  understand.  I t's  much  harder  to  explain  the  Distribution  Profiler  and  Quantile  Profiler  to  our  collaborators. Now  it  comes  to  the  analysis  of  the  second  endpoint,  the  log  reduction  in  mold.  Again,  we  can  use  histogram  and  the  scatterplot  to  visualize  our  data  and  visualize  the  factor  versus  censoring  relationship.  As  you  can  see  from  the  left  histogram,  you  can  see  that  we  have  a  lot  of  data  that  are  right  censored  at  six  unit. We  can  see  censoring  at  all  level  of  our  formulation  factors,  except  at  higher  level  of  C  and  lower  level  of  E.  T his  is  the  region  of  concern.  We  have  seen  a  lot  of  censoring  at  lower  level  of  C  and  higher  level  of  E.  That  means  at  lower  level  of  C  and  higher  level  of  E,  it's  good  for  the  product.  We  have  higher  log  mold  reduction. Again,  we  can  use  detection  limit  on  the  column  property  to  specify  the   Limit of Detection  for  this  endpoint.  W e  used  upper  detection  limit  of  six  in  this  column  property.  N ow  the  next  step  is  to  analyze  this  data  using  the  Generalized R egression  modeling  by  taking  into  account  the   Limit of Detection.  W e  use  LogN ormal  distribution  and  forward  selection. Interestingly,  we  found  that  the  RS quare  is  one  and  this  is  very  suspicious.  A lso,  we  see  some  red  flag.  The  AICc  had  a  severe  drop  after  step  17.  T he  standard  error  of  the  estimate  as  well  as  the  estimate  for  the  scale  parameter  seems  to  be  extremely  small.  A lso,  the  diagnostic  plot  showed  perfect  prediction  from  the  model.  W e  know  that  the  model  has  overfit. This  is  the  Prediction  Profiler,  and  they  showed  very  narrow  confidence  interval  for  the  prediction,  and  we  knew  that  our  model  is  overfit.  So  what  we  did  for  the  modeling  is  that  we  tried  a  simpler  model  by  removing  the  quadratic  terms  from  the  initial  response  surface  model. We  found  that  LogN ormal  with  forward  selection  model  fits  the  data  the  best  because  it  has  a  lowest  AICc  and  BIC.  T his  time,  the  solution  path  looks  more  reasonable  as  well  as  the  standard  error  estimate  of  our  parameters  and  estimate  of  the  scale  parameter  of  the  LogN ormal  distribution.  T he  diagnostic  plot  looks  more  reasonable  now. This  is  the  Prediction  Profiler  of  the  final  model  after  we  removed  the  quadratic  terms.  This  Prediction  Profiler  makes  a  lot  more  sense.  Recall  that  at  lower  level  of  C  and  at  higher  level  of  E,  we  have  more  censored  data  you  can  see  here.  That  means  at  lower  level  of  C  and  higher  level  of  E,  we  have  higher  log  mold  reduction. It  showed  on  the  Prediction  Profiler  because  it  has  more  censored  data  in  this  region  and  the  confidence  interval  for  the  prediction  is  wider.  We  can  also  compare  the  final  model  Prediction  Profiler  if  we  ignore   Limit of Detection  in  the  modeling. If  we  ignore   Limit of Detection  in  the  modeling,  we  got  less  significant  model  factors  as  well  as  biased  results.  If  we  ignore   Limit of Detection  in  the  Generalized  Regression  modeling,  then  the  second  model,  which  is  incorrect  and  is  trying  to  use  the  quadratic  term  to  predict  in  the  lower  level  of  C  and  higher  level  of  E. So t rying  to  get  the  predictive  value  close  to  the   Limit of Detection,  and  we  knew  that  this  result  is  biased. Fangyi has  nicely  shown  here  that  the  incorrect  analysis,  ignoring  the   Limit of Detection,  leads  to  some  seriously  biased  results.  And  that  getting  the  analysis  right  is  easy  if  you  set  up  the  detection  limits  in  either  the  custom  designer  or  as  a  column  property. I'm  going  to  go  through  one  more  example  that  has  measurements  at  different  times,  which  adds  a  little  bit  more  complexity  to  the  model  set  up,  and  in  our  case,  required  some  table  manipulation  to  get  the  data  in  the  right  format. Here  is  the  data  table  of  the  second  DOE  in  basically  the  form  that  it  originally  came  to  us.  In  this  data,  we  have  8  factors,  A  through  H,  and  the  data  has  measurements  at 1  day, 2  days,  and  7  days.  Originally,  our  intent  was  to  analyze  the  3  days  separately,  but  when  we  fit  the  day  7  data,  the  confidence  intervals  on  the  predictions  were  huge. It  was  apparent  that  there  was  so  much  censoring  that  we  were  unable  to  fit  the  model,  and  so  we  were  either  going  to  have  to  come  up  with  another  strategy  or  back  away  from  some  of  our  modeling  goals.  What  we  ended  up  doing  was  we  used  a  stack  operation  from  under  the  tables  menu  so  that  the  responses  from  different  days  would  be  combined  together  into  a  single  column,  and  we  added  day  as  a  column  that  we  could  use  as  a  regression  term. In  the  histogram  of  log  reduction,  we  see  the  characteristic  bunching  at  the  detection  limit  of  five.  Combining  the  data  like  this  certainly  seems  to  have  improved  the  impact  of  censoring  on  the  design  and  hopefully  allows  us  to  make  more  effective  use  of  all  the  data  that  we  have. As  in  the  previous  examples,  we  start  off  fitting  a  full  RSM  model,  but  in  this  case,  because  we  have  day  as  a  term,  we  add  a  day  and  interact  all  of  the  RSM  terms  with  day  in  the  Fit  Model  Launch  Dialog  prior  to  bringing  up  the  generalized  regression  platform.  Again,  we're  going  to  use  the  LogN ormal  distribution  as  our  initial  response  distribution. Because  this  is  a  large  model,  we  can't  use  best  subset  selection,  so  we  used  pruned  forward  selection  as  our  model  selection  criterion.  We  try  the  LogN ormal,  Gamma,  and  Normal  distributions,  and  clearly  the  LogN ormal  comes  out  as  the  best  distribution  because  its  A ICc  is  205.3,  which  is  more  than  10  less  than  the  second  best  distribution,  which  was  the  Normal,  whose  A ICc  was  257. Here,  the  model  fit  looks  really  reasonable  with  nothing  suspicious.  The  solution  path  standard  errors,  scale  parameter,  and  the  actual  by- predicted  plots  all  look  pretty  good  and  realistic.  There's  a  little  bit  of  bunching  down  at  the  low  end  of  the  responses,  but  the  thinking  is  that  wasn't  due  to  a  detection  limit,  just  a  part  of  the  discreetness  of  the  measurement  system  at  lower  levels  of  reduction. Now,  if  we  repeat  this  analysis,  ignoring  the  detection  limit,  it  guides  us  towards  the  normal  distribution.  Here  we  see  the  Profilers  for  the  model  that  incorporated  the  detection  limit  on  the  top  and  the  model  that  ignored  the  detection  limit  on  the  bottom. As  in  the  other  examples,  we  see  that  the  size  of  the  effects  are  dramatically  muted  when  we  ignore  the  detection  limit  and  we  get  quite  a  different  story  as  there's  a  strong  relationship  between  log  reduction  in  factor  E  when  we  take  the  detection  limit  into  account  properly,  and  that  effect  is  seriously  muted  when  we  ignore  the  detection  limit. If  we  compare  the  actual  by- predicted  plots  for  the  two  models,  the  model  with  the   Limit of Detection  taken  to  account  properly  is  tighter  around  the  45- degree  line  for  the  uncensored  observations.  W e  see  that  the  model  ignoring  the  detection  limit  is  just  generally  less  accurate  as  the  observations  are  more  spread  out  across  the  45- degree  line. Those  are  our  two  case  studies.  In  summary,  I  want  to  reiterate  that  detection  limits  are  very  common  in  comical  and  biological  studies.  As  we've  seen  in  our  case  studies,  ignoring  detection  limits  introduces  severe  model  biases.  T he  most  important  message  is  that  using  the  column  property  or  setting  up  the  detection  limits  in  the  custom  designer  make  analyzing  detection- limited  data  much  easier  to  get  correct. There  are  some  pitfalls  to  watch  out  for  in  that  if  you  see  standard  errors  that  are  unrealistically  small,  or  models  are  unrealistically  accurate,  you  may  need  to  back  off  from  the  quadratic  terms  or  possibly  even  interaction  terms. We've  shown  how  histograms  can  be  used  to  identify  when  we  have  a  detection  limit  situation.  It's  useful  to  see  the  censoring  relationship  between  different  factors,  because  if  there  are  big  corners  of  the  factor  space  where  all  the  observations  are  missing,  then  we  may  not  be  able  to  fit  interactions  in  that  region  of  the  design  space. A gain,  if  the  model  looks  too  good  to  be  true,  go  ahead  and  try  a  simpler  model,  back  off  a  bit.  That's  all  we  have  for  you  today.  I  want  to  thank  you  for  your  attention.
With version 17, JMP Clinical is now a fully JSL implemented product. This presentation will demonstrate the reimagined JMP Clinical and how it uses new JMP 17 features. Three new features in Tabulate format the tables produced by clinical reports to be publication-ready. Pack combines counts and percentages (or other statistics) into one column, while stack allows multiple grouping variables to be combined into one column. Tables displaying event counts also take advantage of the new Unique ID feature in Tabulate to count events only once per subject identifier. With these three new features, tables can be copied and pasted into any publication or report. JMP Clinical’s risk reports also use JMP’s new Response Screening platform to identify safety signals by calculating risk difference, relative risk, and odds ratio faster than previous versions. With all these new JMP features, JMP Clinical produces publication-ready reports quickly and effectively.      Hi.  Thank  you  for  joining  me  today.  My  name  is  Rebecca  Lyzinski.  I'm  a  senior  software  developer  for  JMP  Statistical  discovery.  Today  I'll  be  talking  about  how  JMP  Clinical  uses  some  of  the  new  JMP  17  features,  such  as  Pack,  Stack,  and   Response Screening. First  I'll  talk  a  little  bit  about  what  JMP  Clinical  is.  Then  I'll  go  into  what  changes  have  occurred  in  JMP  Clinical  17  compared  to  previous  versions,  and  then  show  a  demo  of  JMP  Clinical  and  how  it  uses  the  new  tabulate  features  of  Stack,  Pack  and  unique  IDs,  as  well  as  the  new   Response Screening  platform. First,  what  is  JMP C linical?  JMP  Clinical  is  a  JMP  product  that  is  used  to  analyze  clinical  trial  data.  It  works  by  using  the  standard  formats  of  CDISC,  SDTM  and  Atom  data.  Once  the  data  is  loaded,  JMP  Clinical  runs  interactive  reports  for  events,  findings,  interventions  and  more. JMP  Clinical  is  used  by  a  variety  of  fields,  including  medical  doctors,  medical  writers,  clinical  operations,  and  statisticians.  In  addition,  JMP  Clinical  works  with  JMP  Live  to  share  reports  across  your  organization. With  JMP  Clinical  17,  there's  a  big  change,  in  that  JMP  Clinical  no  longer  uses  SAS  as  the  basis  for  the  code  underlying  the  reports.  Starting  with  JMP  Clinical  17,  it  is  now  completely  built  off  of  JMP.  This  means  that  we  have  a  faster  installation  because  the  installer  is  now  more  compact  than  it  was  before.  JMP  Clinical  17  also  has  all  of  its  reports  redesigned  using  JSL  as  the  underlying  code  system  for  the  reports. Another  change  is  that  now  the  reports  will  auto  run.  There's  no  longer  a  need  to  click  a  button  in  order  to  get  the  report  to  run.  JMP  Clinical  17  will  also  include  some  new  reports,  including  the  FDA  Medical  Queries  and  the  Algorithmic  FDA  Medical  Queries?  One  additional  change  is  that  now  all  the  study  preferences  are  in  one  location.  You  only  have  to  go  to  one  place  to  change  a  preference,  and  it  will  take  effect  across  all  of  your  reports. Now  I'm  going  to  switch  over  to  JMP  Clinical  for  a  quick  demo. When  you  first  open  JMP  Clinical,  a  main  window  will  appear  with  three  different  tabs  one  for  Studies,  one  for  Reviews,  and  one  for  Settings.  The  Studies  tab  is  where  all  your  study  data  is  located.  Here  you'll  see  that  I  have  the  study,  the  Nicardipine  loaded.  You'll  see  paths  for  the  SDTM  and  Atom  locations  of  your  data,  as  well  as  which  domains  from  those  folders  have  been  loaded  for  the  study. This  is  also  where  you  can  add  a  new  study.  You  can  refresh  the  study  metadata  for  an  existing  study.   If  you  add  data  to  it,  or  you  add  variables,  or  you  change  variable  names,  you  can  refresh  the  metadata  and  all  those  changes  will  take  effect. You  can  also  set  study  preferences  or  set  the  value  order  in  color  for  a  given  study f rom  this  tab.  Set  study  preferences  is  new  in  JMP  Clinical  17.  It  will  open  a  new  dialog.   Here  you  can  change  any  of  these  widgets  and  the  new  values  will  take  effect  across  all  of  your  reports.  F or  example,  if  you  didn't  want  your  reports  to  run  off  the  safety  population  and  you  wanted  them  to  run  on  all  subjects  instead,  you  can  change  to  all  subjects  here.  Once  you  click  Okay,  all  your  reports  will  now  run  off  of  all  subjects  instead  of  the  safety  population. The  next  tab  is  for  Reviews.  Here,  when  you  click  Start  New  Review,  the  Review  Builder  will  open  and  you'll  be  able  to  select  which  reports  you  want  to  see.  For  this  example,  I'm  going  to  look  at  the  demographics,  distribution  AE D istribution,  AE  Risk  Report  and  the  two  FDA  medical  query  reports.  If  you  wanted  to  add  additional  reports,  you  can  click  on  Add  Report.  A  new  window  will  open  up  with  all  the  possible  reports  you  can  run  on  this  study,  and  you  can  make  additional  selections. Demographics  distribution  is  usually  a  good  place  to  start  in  any  clinical  trial.  Here  there  are  tables  and  distributions  for  each  demographic  characteristics  such  as  sex,  race  and  age. Tabulate  is  used  to  create  the  tables  at  the  top,  and  you  can  see  here  that  the   Counts and Percents  are  combined  into  one  column  using  Tabulate's  new  feature  of  Packed  Columns. Underneath  is  a  distribution  for  each  of  the  demographic  characteristics.  On  the  side,  there's  an  option  to  add  additional  distributions  if  there  are  other  characteristics  you  would  like  to  see.  By  clicking  the  Add  button,  you  can  add  any  variable  from  either  the  ADSL  or  DM  data  set,  and  it  will  show  up  under  Distributions. There's  also  an  option  to  perform  treatment  comparison  analyses.  When  this  button  is  clicked,  the  report  will  automatically  rerun.   Now  at  the  bottom  of  the  report,  there's  a  one  way  analysis  for  age  and  a  contingency  analysis  for  sex  and  race.  This  allows  for  comparisons  between  treatment  groups  to  be  done  to  see  if  there  are  any  differences  between  the  treatment  groups. Typically,  an  important  safety  analysis  that  occurs  in  any  clinical  trial  is  to  analyze  the  adverse  events  that  occur  throughout  the  trial.  In  Adverse  events  distribution,  there's  a  graph  and  a  table  showing  the  distribution  of  adverse  events  across  treatment  groups.  At  the  top  is  a  bar  chart  with  the  count  of  adverse  events  split  out  by  NiNicardipine  and  Placebo,  the  two  different  treatment  groups  for  the  NiNicardipine  study,  they're  shown  in  descending  order  for  each  treatment  group. Under  the  graph  is  a  tabulate.  Here,  you'll  see  that  the  first  column  is  body  system  organ,  class  and  dictionary  drive  term.  These  are  two  different  measure  terms  that  are  used  to  classify  adverse  events,  and  they're  being  stacked  on  top  of  each  other  in  the  tabulate.  In the  other  columns  are   Counts and Percent  split  out  by  the  planned  treatment  group,  as  well  as  a  total  count  and  Percent. This  table  uses  a  lot  of  the  new  JMP  17  features  for  Tabulate.   The  first  one  is  the  Stack  Grouping  Columns.  Here  you  can  see  if  you  right- click  on  the  Column,  the  Stack  Grouping  Columns  option  is  checked.  If  we  were  to  uncheck  it,  it  gets  split  back  out  into  two  separate  columns.   This  is  how  Tabulate  works  for  JMP  Clinical  8.1  in  previous  versions  where  we  had  to  have  two  separate  columns  for  the  two  different  variables. Now,  by  selecting  both  columns  and  right  clicking  and  going  to  Stack  Grouping  Columns,  we  can  combine  them  back  into  one  column.   This  allows  the  table  to  now  be  publication  ready  for  any  PowerPoint  or  journal  article  that  it  might  want  to  be  used  in. Somewhat  similarly,  we  have  the  Count  and  Percent  in  one  column  which  did  not  exist  before.  If  you  right- click  on  one  of  these  columns,  you'll  see  the  new  Pack  Columns  option.  If  we  unpack  the  columns,  they're  now  separate  into  two  columns,  one  for  the  Count  and  one  for  the  Percent. By  selecting  both  columns  and  right- clicking  and  going  to  Pack  Columns,  we  can  now  pack  them  back  into  one  column  so  that  the  Count  and  Percent  show  up  together. The  other  option  that  this  table  uses  is  if  you  open  up  the  control  panel  from  the  red  triangle,  you'll  see  that there's an  ID  variable  that's  been  added  that  didn't  exist  before.  Here  you'll  see  that  unique  subject  Identifier  has  been  entered  as  the  ID  variable  to  use  in  this  table. What  that  option  does  is  it  counts  each  subject  only  once  on  each  row  of  the  table.   For  example,  if  the  subject  had  both  a  vasoconstriction  event  and  a  hypertension  event,  they  would  only  get  counted  once  with  in  vascular  disorders.  Previously,  before  the  ID  variable  existed,  this  Vascular  disorders  row  would  have  been  a  sum  of  all  of  the  events  that  happened  underneath  it,  which  may  overestimate  the  number  of  subjects  that  had  a  vascular  disorder  event. You  can  also  see  at  the  bottom  of  the  table  that  this  option  now  adds  a  row  called  all.  What  this  represents  is  the  number  of  subjects  with  any  adverse  event.   That's  another  nice  additional  feature  added  through  the  ID  variable. With  these  three  changes,  we  now  have  a  very  nice  publication  ready  table  to  print  out  to  whatever  word  document  PowerPoint  you  want  to  include  it  in. A  couple  of  other  features  to  mention  on  this  report  before  moving  on  to  the  next  one  is  that  there  are  some  options  listed  under  Data.  For  example,  if  you  wanted  to  look  at  a  different  measure  term  than  the  ones  that  are  automatically  presented,  you  can  change  them  here  to  report  a  term,  Highlevel  Term,  etc .  You  can  also  change  the  report  to  run  on  pretreatment  events,  treatment  events,  on- treatment  or  off- treatment  events. The  Demographic  Grouping  Widget  will  change  out  the  variable  on  the  y  axis  of  the  graph  builder,  as  well  as  change  the  variable  used  in  the  Tabulate  to  whichever  variable  is  selected  from  demographic  grouping. There's  also  an  option  to  Stack  both  the  table  and  the  graph.   For  example,  if  you  wanted  to  see  the  adverse  events  split  out  by  severity,  we  can  select  severity.   Now  the  bar  chart  is  stacked  by  mild,  moderate  and  severe  events.   The  table  is  also  split  out  into  columns  for  mild,  moderate  and  severe. The  report  also  uses  a  local  data  filter  in  order  to  filter  both  bar  chart  and  the  tabulate.  You  can  filter  on  things  such  as  whether  or  not  the  event  is  serious,  whether  or  not  the  event  is  related  to  the  study  treatment.   We  can  also  filter  on  a correct overall  percent  occurrence  of  the  adverse  events.   For  example,  if  we  only  wanted  to  see  adverse  events  that  occur  in  5%  or  more  of  the  population,  we  can  change  this  filter.   Now  the  bar  chart  and  the  table  are  both  filtered  down  to  only  subjects,  only  adverse  events  that  have  at  least  a  5%  occurrence  in  the  population. Another  way  to  analyze  adverse  events  is  through  the  Risk  Report.   This  risk  report   uses  the  new  JMP  17   Response Screening  Platform  to  create  both  a  Risk  Plat  and  a  Tabulate.  The  Risk  Plat  shows  the  percent  occurrence  of  subjects  within  both  treatment  groups,  so  Placebo  and  a  Nicardipine,  and  it  also  shows  the  risk  difference  in  comparing  the  Nicardipine to  Placebo  along  with  a  95%  confidence  interval.  The  table  repeats  this  information  just  in  tabular  form  with  columns  for  the   Counts and Percent  in  each  treatment  group,  as  well  as  a  column  for  the  risk  difference  in  the  95%  confidence  interval. The   Response Screening  platform  works  off  a  table  that  looks  like  this  one,  where  we  have  unique  subject  identifier  as  the  first  column,  and  then  there's  a  column  for  each  adverse  event.  That's  an  indicator  column  with  zero  representing  no  event  and  one  representing  an  event.  If  we  pop  out  this  table.  The   Response Screening  platform  is  located  under  Analyze  Screening.   Response Screening.  It  will  open  up  a  new  dialog  where  you  can  select  your  variables  that  you  want  to  compare. Because  there  are  202  different  adverse  event columns ,  we've  combined them into  a  group  of  columns  and  this  allows  you  to  just  select  one  variable  and  it  will  automatically  put  all  202  columns  into  the  Y  Response  Column  using  Plan  Treatment  for  our  X  and  click  Okay.   Response Screening  then  brings  up  this  window.  The  default  view  is  to  look  at  the  FDRP  values  and  a  table  of  those  values.  JMP  Clinical  uses  the  two- by- M  results  table.  This  is  where   Response Screening  calculates  the  relative  risk,  risk  difference  and  odds  ratio. JMP  Clinical  works  by  creating  making  this  table  into  a  data  table  and  then  using  Graph  Builder  and  Tabulate  to  format  it  in  the  view  that  was  shown  in  the  report.  In  order  to  get  the  additional  columns  needed,  if  you   right-click  on  the  table  and  go  to  Columns,  you  can  select  the  different  95%  confidence  interval  variables  as  well  as  a  total  count  and  the  different  counts  for  the  positive  versus  negative  comparisons. Once  that   Response Screening  is  run,  then  it's  created  into  a  data  table  in  this  bar  chart  and  the  tabulate  are  created.  The  tabulate  again  uses  the  Pack  columns  option  to  put   Counts and Percent  into  one  column,  but  it  also  uses  it  to  put  the  risk  difference  in  95%  confidence  interval  into  one  column.  If  we  were  to  unpack  this  group  of  columns,  you  would  see  that  it  originally  started  as  three  different  columns.   Even  with  three  different  columns,  we  can  still  pack  them  together  into  one  column. If  you  didn't  like  the  format  of  the  way  that  they  automatically  packed  together,  you  can  right- click  on  the  column,  go  to  Pack  Columns  and  Template.  Here  you  can  change  the  format  of  how  the  column  appears.  For  example,  if  you  wanted  to  see  brackets  instead  of  parentheses,  you  could  change  them  here.  You  could  also  change  how  the  columns  are  delimited.  The  default  is  a  comma,  but  you  could  use  a  semicolon  or  any  other  character  that  you  wanted  to  separate  out  your  columns. Similar  to  the  AE  Distribution  Report,  this  report  has  a  few  different  options.  Some  that  are  different  are  that  you  can  change  the  risk  measurement,  so  you  can  look  at  either  risk  difference,  relative  risk,  or  odds  ratio.  You  can  also  display  the  risk  difference  as  either  a  percent  or  a  proportion,  and  you  can  sort  the  plot  in  the  tables  by  risk  measurement  count  or  alphabetically. This  report  again  uses  a  local  data  filter  to  filter  both  the  plot  and  the  table  by  either  a  dictionary  drive  term,  the  risk  difference,  or  the  absolute  risk  difference.  Here  you  can  see  that  I  filtered  the  risk  difference  down  to  two  or  greater  so  that  we  can  see  the  Plot and Table  a  little  more  clearly. Another  view  of  the  Risk  Plot  and  the   Response Screening  output  is  the  FDA  Medical  Query  Risk  Report.   This  starts  out  as  just  being  called  Medical  Query  Risk  Report,  and  then  there's  an  option  to  analyze  it  either  by  FDA  Medical  Queries  or  standardized  medical queries. Medical  Queries  are  a  way  to  group  adverse  events  into  different  medical  conditions,  and  these  are  the  two  different  standards.   Standardized  Medical  Queries  are  created  by  MedDRA  and  usually  come  as  SD  files.  In  September  of  2022,  the  FDA  released  their  own  Medical  Queries  as  an  Excel  file  that  can  be  downloaded  from  the  web. JMP  Clinical  handles  both  of  these  different  standards  and  can  be  switched  on  this  report  back  and  forth  by  selecting  either  FDA  Medical  Queries  or  Standardized  Medical  Queries. Just  like  on  the  AE  Risk  Report,  there  is  a  risk  plot  with  the  percent  occurrence  for  each  treatment  group  and  the  risk  difference  between  the  Nicardipine  and  Placebo.  The  difference  is  that  on  this  report,  the  Risk  Plot  is  split  out  by  scope.   Either  a  broad  medical  query  or  a  narrow  medical  query. Underneath  some  custom  scripting  is  used  to  create  tables  that  stack  the  medical  queries  by  the  preferred  terms  that  contribute  to  them.  Just  like  in  the  AE  Risk  Report,  we  have  counts  for  columns  for  the   Counts and Percents,  as  well  as  a  column  for  the  risk  difference  between  Nicardipine  and  Placebo. Here  you  can  see  that,  for  example,  for  Arrhythmia,  the  dictionary  derived  terms  that  contribute  to  that  medical  query  are  Atrial,  Flutter,  Atrial f ibrillation,  Arrhythmia,  Bradycardia  and  a  few  others. Underneath  that  table  is  the  same  table  just  for  broad  medical  queries  split  out  by  preferred  terms,  a  table  for  medical  queries  split  out  by  broad  or  narrow,  depending  on  the  scope,  and  a  table  for  which  medical  queries  are  contained  in  each  system  organ  class.   For  example,  gastrointestinal  disorders  is  made  up  of  abdominal  pain. The  last  report  I'm  going  to  show  is  a  brand  new  one  in  JMP  Clinical  17.1  inversions  beyond  that.  Within  the  FDA  medical  query  Excel  file,  there  are  some  text  boxes  for  different  algorithms  in  a  few  different  medical  queries.  The  algorithms  include  criteria  that's  not  just  limited  to  adverse  events.   For  example,  in H yperglycemia,  a  subject  could  be  categorized  as  having  Hyperglycemia  if  they  have  an  adverse  event  that  falls  into  the  Hypergysemia  FMQ  category.  But  they  also  could  be  classified  as  having  Hyperglycemia  if  within  the  lab  data  set,  they  have  more  than  two  plasma  glucose  values  over  180  milligrams  per  deciliter. This  report  uses  the  adverse  event  data  set,  the  lab  data  set,  and  the  continent  medications  data  set  to  determine  if  subjects  have  a  given  medical  query,  rather  than  just  looking  at  the  adverse  events  and  mapping  them  to  a  medical  query.  Similar  to  the  other  risk  reports,  this  report  uses  a  local  data  filter  to  allow  you  to  filter  on  the  medical  queries  the  risk  difference  and  the  absolute  risk  difference. Again,  we  have  the  same  options  to  switch  between  event  type,  the  risk  measurement  for  risk  difference,  relative  risk,  or  odds  ratio,  and  for  sorting  the  table  by  risk  measurement,  count  or  alphabetically. That  was  a  quick  overview  of  some  of  the  JMP  Clinical  features  and  how  JMP  Clinical  uses  the  new  JMP  17  features  in  Tabulate  and   Response Screening  to  make  our  reports.  However,  JMP  Clinical  is  a  much  bigger  product  than  just  those  five  reports.  We  actually  have  over  30  interactive  reports.  Some  commonly  used  ones  that  I  didn't  mention  are  the  Adverse  Event  Narratives,  The  Patient  Profiles,  A  Study  Flow  Diagram,  like  the  figure  below,  that  shows  you  how  subjects  progress  throughout  the  study  and  the  ability  to  analyze  by  high's  law  cases. JMP  Clinical  also  works  with  JMP  Live.   At  the  top  of  each  report  there's  a  button  that  if  you  click  it,  it  will  publish  and  share  the  report  across  your  organization.  There  are  also  future  features  coming  in  17.1  and  future  versions,  such  as  adding  the  ability  for  crossover  support,  for  analyzing  crossover  studies.   There'll  be  even  more  reports  being  added,  such  as  a  couple  of  oncology  reports. Thank  you  so  much  for  your  time.  I  would  appreciate  any  comments  or  feedback  if  you  want  to  leave  them  or  email  me  directly.  Again,  thank  you  for  your  time  and  hope  you  have  a  wonderful  day.
ABSTRACT Stress and lameness negatively affect the health, production, and welfare of animals. The following physiological and non-invasive measures of stress and lameness were measured: core body temperature, corticosterone (CORT) concentrations in serum and feathers, surface temperatures of the head (eye and beak) and legs (hock, shank, and foot) regions by infrared thermography (IRT), leg blood and oxygen saturation (leg O 2 ).  JMP Pro 17 Model Screening platform was used to fit several parametric and machine learning models to the binary response variable (Lame=1) of the 256 study birds on the nine health and stress indicators mentioned above.   We selected K Fold Cross-validation with K=5 and repeated the process twice (N Trials Folds=10).  The best models were the Neural Boosted (Mean AUC=.985 and misclassification rate of zero in 50 validation birds) and the Generalized Regression Lasso (Mean AUC=.975 and misclassification rate of 3 in 50 validation birds). The Stepwise Logistic Regression and most interesting for explaining BCO required only seven of the nine indicators and had a similar overall fit performance as the other two.  Both REG models “agreed on the significant predictor effects,” and when applying Model Comparison to compare them further found them nonsignificant to each other (comparing their AUC’s).     Hello,  everyone,  and  welcome  to  our  presentation.  My  name  is  Dr  Andy Mauromoustakos.  I'm  a  professor  at  the  Agricultural  Statistics  Lab  at  the  University  of  Arkansas.  My  co- presenter  is  Dr  Shawna  Weimer . She's A ssistant  Professor  at  Poultry  Science  Department  at  the  University  of  Arkansas,  and  she's  the  Animal  Welfare  chairperson  at  the  university. We're  going  to  talk  to  you  today  about  predictive  models  for  BCO  lameness  using  health  and  stress  and  leg  health  parameters  in  broilers.  Our  presentation  is  going  to  be  short.  We're  going  to  discuss  the  models  that  we  are  fitting,  the  champion  models,  the  ones  that  they  get  the  medals.  We're  going  to  evaluate  them,  and  we're  going  to  have  some  conclusions  in  the  end.  Shawna? All  right.  This  study  compared  physiological  and  non-invasive  measures  of  stress  and  lameness  in  clinically  healthy  and  BCO— or  b acterial chondronecrosis with osteomyelitis—laden  broilers.  BCO  is  a  leading  cause  of  infectious  lameness  in  broiler  chickens,  with  flock  diagnosis  requiring  euthanasia.   Thus,  there's  a  need  for  technological  innovations  to  detect  health  and  lameness  status  in  animals  to  project  the  disease  likelihood  prior  to  clinical  onset. In  this  study,  birds  were  raised  in  separate  environmental  chambers  with  either  wood  shavings  on  the  floor,  or  the  litter,  and  a  wire  flooring  model  that  is  validated  to  induce  BCO  lameness.  Nine  non-invasive  measures  of  stress  and  lameness  were  collected  from  256  birds,  male  broilers,  over  several  weeks,  which  included  core  body  temperature,  stress  hormone,  surface  temperatures  of  the  head,  the  eye,  and  the  beak,  and  the  legs,  the  hocks,  the  shank,  and  the  feet,  with  infrared  thermography.  Leg  blood  oxygen  saturation  was  also  measured  with  a  pulse  oximeter. Of  these  measures,  two we  sought  to  validate.  The  first  was  extraction  of  corticosterone  from  the  feathers.   Corticosterone is  the  major  primary  stress  hormone,  and  the  gold  standard  for  measures  is  blood  serum  concentrations,  which  requires  the  capture  and  restraint  of  the  bird  to  collect  it.  So  if  feather  corticosterone  could  be  validated,  then  we  could  simply  clip  a  feather  and  not  put  the  bird  through  that  stress  of  restraint  and  blood  draw. The  second  was  the  thermal  images,  of  which  each  pixel  has  its  own  temperature  recorded  and  can  be  used  to  quantify  external  changes  in  skin  temperature  related  to  blood  flow,  offering  a   non-invasive  tool  to  measure  health  and  welfare.  During  stress,  peripheral  blood  is  shunted  to  the  core,  and  we  expected  the  average  pixels  of  the  eye  and  the  beak,  or  Eavg  and  Bavg,  to  be  lower  in  lame  than  sound  birds,  which  we  correlated  with  the  serum  corticosterone. For  the  thermal  images  of  the  legs,  we  expected  the  average  pixel  temperature  of  the  hock,  the  shank,  and  the  foot  to  be  lower  in  the  lame  birds,  both  for  the  stress  reasons  and  also  for  the  colonization  of  the  bacteria  that  slowed  the  blood  flow,  which  we  correlated  with  the  leg  blood  oxygen  saturation.  There  were  marked  differences  between  lame  and  sound  birds. Our  objectives  is  to  identify  which  of  these  nine  health  and  stress  indicators  are  important  for  lameness.  We  want  to  build  models,  both  for  prediction  purposes,  but  we  are  in  agriculture,  and  we  like  to  publish  papers  that  try  to  explain  how  our  inputs  are affect  the  response.  We  are  hoping  that  some  of  the  models,  the  traditional  regression  models,  will  do  fairly  well  and  we  can  interpret  those. In  our  methods,  we  talked  about…  It's  a  balanced  study.  It's  a   match paired experiment  where  every  time  a  sound  bird  is  observed,  a  lame  bird  is  also  observed  in  the  same  indicators.  We  have  an  incidence  of  a  disease  of  0.5.  We're  going  to  take  advantage  of  the  Model  Screening  platform  of  JMP  Pro  17.  We're  going  to  select  the  lame  that  has  values  1  and  0  categorical.   1 stands  for,  yes,  it's   lame. 0  is  the  sound . We  have  our  nine  predictors  in  here.  We're  going  to  do  the  defaults.  We're  going  to   fit all  of  the  machine  learning  models,  that  they  are  checked.  We're  going  to  select  cross validation.  We're  going  to  have   5-fold  cross validation.  With  our  approximate  250  birds,  we're  going  to  have  about  50  birds  per  fold,  and  we're  going  to  repeat  it  twice.  We  selected  the  random  seed  so  we  can  reproduce  the  results.  Notice  that  I  did  not  add  the  quadratic  terms  and  interaction  terms  to  hopefully  have  easier  interpretations. When  we  select  this,  and  we  click  " Run ,"  JMP  takes  about  a  minute. I t's  going  to  come  up  with  the  ranking  of  the  models  that  we  have  fitted.  This  is  what  we  call  the  beginning  of  the  end.  We  have  10  different  data  sets  that  we  have  tried.  This  is  the  average  fit  criteria,  higher  the  RS quare, and  is  the  ranking  the  best,  second  best,  third  best. Hopefully  for  us,  we  expected  that  the   Neural Boosted  model  may  do  a  little  bit  better  than  the  traditional  regression  models,  such  as  the  Penalized  Regression  and  the  Logistic  Regression  models,  but  we  were  happy  to  see  that  these  are  our  close  second. If  we  wanted  to  see  how  our  best  model  did,  we're  going  to  go  and  see  that  the  best  model— the   Neural Boosted,  that  is  the  best  model  for  predicting  purposes —had  a  misclassification  of  0.  You  can  see  here  both  in  the  training  and  the  validation  that  our  receiver  operating  characteristic  curve  reaches  very  soon  the  1  and  stays at  1,  which  is  extremely  good.  We  see  the  confusion  metrics  that  out  of  the   49  birds,  we're  not  really  misclassifying  any  of  them, so  this  is  a  very  good  model  with  a  very  high  generalized  RS quare  and  all  of  the  other  fit  criteria  that  is  produced  in  here. But  we  are  more  interested  in  the  regression  type of  models.  The  regression  type  of  models,  the  Generalized  Regression,  we  can  see  that  when  we  use  Lasso   as  the  estimation  method,  the  model  included  all  of  the  variables,  and  we  have  couple  of  non -significant  variables  in  the  model  in  here.  We  can  see  these  non -significant  factors,  such  as  the   FCORT,  you  can  see  it  that  it's  not  crossing  the  horizontal  line.  You  can  see  that  the  year  average  is  not  significant,  but  is  included.  We  can  see  that  this  model  had  approximately   three misclassification, a misclassification  rate  of  about  7 %. Here  is  the  third  model  that   was...  If  we  had  to  give  it  gold,  silver,  and  bronze,  these  two  will  share  the  second  place.  The  Logistic  Regression  model,  when  we  did  the  step wise  procedure,  decided  not  to  include  the  two  highly  non -significant  factors.  We  can  see  in  our  regression  model,  in  the  Logistic  Regression  model,  that   SCORT  is  the  most  important  variable.  This  model  has  similar  misclassification  rate  of  three,  the  same  as  the  Logistic  Regression. Here  is  how  the  seven  indicator  variables  are  used  to  predict  the  probability  of  lameness.  What  we  like  about  the  regression  type  of  models  is  that  we  can  get  odds  ratios  that  will  help  us  to  interpret.  For  example,  for  our  most  important  variable,  we  can  see  that  the  odds  of  lameness  is  twice  with  one  unit  of  increase  is  serum  cortisol. Overall,  we  would  like  to  say  that  the  Logistic  Regression  model   only  used  seven  out  of  the  nine  indicators.  The   Neural Boosted  model  and  the  Generalized  Regression,  both  of  them  used  all  nine  indicators  for  lameness.  All  of  the  models  have  area  under  the  curve of  greater  than  0.9. All  of  the  models  have  a  lower…  The regression  type  of models  had  the  7 %  misclassification  on  the  validation  set,  and  the   Neural Boosted  did  not  have  anything. We  can  compare  our  three  winner  models  using  the  Model  Comparison  platform  in  JM P.  We  can  see  that  in  terms  of  predicting,  strictly  predicting,  the   Neural Boosted  model  is  significantly  better  than  both  of  the  Generalized  Regression  and  the  regression  model.  The  two  regression  models ,  the  one  with  all  nine  variables versus  the  one  with  the  seven  variables,  are  not  significantly  different  from  each  other.  They  all  had  a  very  similar  area  under  the  curve,  and  they  had  similar  misclassification  of  three  birds. This  is  our  presentation.  We'd  like  to  thank  you  for  your  attention.  We  have  some  references  that  you  can  find  similar  related  material  to  the  techniques  that  we  used  in  JMP  documentation  that  will  help  you  through  this  process.  Thank  you  for  your  attention.