Choose Language Hide Translation Bar

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Postpartum hemorrhage (PPH) is a major cause of maternal death in low-resource countries, accounting for 661,000 deaths worldwide between 2003 and 2009. To assess this burden, the WHO conducted studies to find methods for the prevention and treatment of PPH. Three large clinical trials were conducted in the past two decades by collecting blood loss volume data (V) for more than 70,000 deliveries. The outcomes were PPH (V>500 mL) and severe PPH (V>1000 mL). The parameters under comparison were the proportion of these events. The comparisons of small proportions led to very large (20,000 to 30,000) trial sizes. By using data from large trials,the Survival platform in JMP Pro showed clearly that the distribution of V is very close to the lognormal distribution. This finding allowed the efficiency of estimates of probabilities and relative risks to be improved and permitted a substantial reduction of sample size for treatments comparison (typically less than 4,000), in regard to those needed by the binomial outcome. Quicker and less expensive trials are very welcome to speed up obtaining results and have become common practice.     Hello. I  am  Jose  Carvalho, a  statistician  at  Statistic  Consulting  in  Campinos,  Brazil. I  thank  you  for  the  opportunity  to  show an  application  of  JMP  to  clinical  trials where  a  major  improvement  movement came  from  a  statistical  discovery. As  a  result  of  that  discovery, one  trial  ended  with  the  expected and  very  desired  results. Subsequent  trials  on  the  same  syndrome will  be  much  cheaper  and  faster. The  problem  is  the  bleeding  after  birth  or  postpartum  haemorrhage,  P PH  for  short. P PH  accounts  for  125,000  deaths  per  year. Even  in  developed  countries  like  the  United  States, it's  the  cause  of  11%  of  the  maternal  deaths. Now,   PPH  is  defined,  just  for  classification, as  blood  volume  in  excess  of  500 mL  in  24  hours  after  delivery. If  the  volume  exceeds  1,000  mL, then  it's  severe   PPH. It'll be  interesting  to  know the  main  cause  of   PPH. 90%  of  the  cause  are  uterus atony . It's  a  failure  of  the  uterus to  contract  after  the  delivery. If  the  uterus  failed  to  contract, then  the  bleeding  continues. Then  we  can  treat  that  by  giving  drugs to  contract  the  uterus or  by  some  physical  action. The   main  cause  are  trauma and  placental tissue  retention and  coagulation  system  failure. You'll  be  dealing  with  uterus atomy and  its  prevention. PPH  can  bring  serious  threat to woman's  life  and  health. Its  onset  must  be  quickly  diagnosed  during  the  delivery  and  treated. Treatments  include,  as  I  said,  drug  treatment  with  additional  uterotonics and as a last  resort, artery  irrigation  or h ysterectomy,  the  uterus  removal. New  drugs  and  devices  are  being  developed  to  prevent  PPH. Every  one  of  those  must  be  tested in  clinical  trials  before  they  are  allowed to  use  in  the  natural  deliveries. We  have  data  on  three  very  large  trials. The  first  one,  the  oldest  one, was  published  in  2001. It  was  the  Misoprostol. That's  the  name  of  a  drug that  was  compared  to  the  standard treatment  and  used  18,000  women. The  second  one  after  that, and  that  was  published  in  2012, is  the  Active  Management,  not  a  drug, but  a  physical  procedure of  pulling  the  umbilical  cords. Now,  the  Misoprostol  didn't  prove  to  be  as  effective  as  the  standard  drug of  treatment,  which  is  oxytocin. The  Active  Management  did not  show  any  improvement  also  on  PPH. Now  we're  going  to  deal...  Sorry,  with  the  Carbetocin trial , published  in  2018,  the  largest  of  all,  that  enrolled  29,000  women. In  all  these  trials,  the  primary  outcomes  were  severe  PPH  and/or PP H. Now,  to  diagnose  sPPH  and/or PP H,   we  need  to  know  the  blood  volume. The  observations  were  volume, the  numbers  volumes  in m L. But  only  the  indicators  of  SPPH  and   PPH were  considered   in  the  statistical  analyses. That  is  binomial  variance zero,  one,   yes  or  no. Okay,  in  spite  that  we  had  the  information about  the  blood  volume. Before  we  proceed,   just  a  small  explanation about  the  two  drugs   that  we'll  be  dealing  with, again. The  standard  drug to  use in  deliveries  is  oxytocin. It's  given  routinely   to  every  delivery  work, every  part  in  the  world. As  soon  as  the  baby  is  delivered, the  woman  receives  a  shot  of  oxytocin. It's  a  standard  procedure. Now,  oxytocin  is  very  nice. It  reduces  the  severe  PPH  rates  from  3.84%  to  2%. It  helps  the  incidents or  the  rates  of  the  sPPH. But  there  is  a  problem, it  is  a  heat-labile  substance. It  must  be  kept  in  a  cold  chain at  seven  centi grades  all  the  time. Now,  in  countries  with  low  resources, this  can  be  a  problem. If  you  do  not  keep  it  in  this  cold  chain  logistic, the  drug  will  lose  its  efficacy,  I'm  sorry. Sometimes  you  can  apply  a  drug that  is  not  effective  at  all. Now,  carbetocin,  it's  a  new  drug  which  has the  same  active  principle  of  oxytocin and  just  a  change  in  the  excipients that  makes  it  heat- stable. Carbetocin  can  be  kept for  six  months  at  30  centigrades, which  is  about  room  temperature in  most  places  in  the  world. Now,  there  were  very  high  hopes that  carbetocin  would  be   a  good  replacement  for  oxytocin, most  of  all  for  use  in  those  low- resource  countries. A clinical  trial  was  devised  for   PPH, it  was  done  by  the WHO and  it  was a  non- inferiority  trial. The  parameters  for  this  trial  are  in  the  objective. The  investigators  said  that, to  declare  carbetocin  non- inferior to  oxytocin,  it  should  preserve  75%  of  the  benefits. Now,  the  benefit  is  this  3.84%  minus 2% so  this  gives  them  non-inferiority  margin of  0.4 6%. We  are  talking  about  very  low  rates and  the  relative  risk  of  1.23. Carbetocin  would  be  declared non-inferior  to  oxytocin if  in  the  trial  we  could  prove   or  bring  evidence  on that  a  relative  risk  is  less  than  1.23. This  result  in  just  a mazing  competition  in  a  sample  size  with  over  30,000  people. We  ended  up  with  a  trial with  about  29,000. Those  were  in  several  countries  as  we  signed  that  table  before in  the  many  centres. It  was  a  very  expensive  trial,  just  a  data  collection of  well  over  almost  two  years. It's  a  very  serious  thing. Why  are  the  trials  so  large? Well,   the  obvious  response  answers  to  that  question, is  the  proportion  being  compared  are  small. The  effects  are  necessarily  even  smaller. Not  so  obvious,  but  it's  still  obvious, that  the  triumph  needs  to  be  so  large because  we  are  losing  a  great  deal of  information  by  mapping  V,  the  volume, into  two  categories,  like  this. On  this  histogram  here  we  have  the  actual distribution  of  the  blood  loss  volume for  the  29,000  subjects  of  the  trial and  then  the  cut- off point, thousands . Just  imagine, just  looking  at  the  histogram, how  much  information  is  lost by  taking  all  the  niceties   of  the  frequency  of  the  histogram in  zero, one  left  to  the  thousand  line, right  to  the  thousand  line. But  that's  the  way  it  was  done, because  for  some  reasons people  like  to  use  this  dichotomization. If  it's  over  1,000,  it's  severe  PPH. If  not,  it's  not. I  don't  know  even  if  that's  well  too  associated with  any  further  consequence  on  the  health  of  the  women. That's  the  way  it's  done. The  classification  is  that. Now  JMP  helped  us  to  discover that  the  distribution  of  the  blood  loss  volume  is  log normal. There  is  a  story  behind  it. We  set  forth  to  analyze  the  experiment  as  decided  by  the  investigators used in  the  binomial  distribution. But  we  saw  that  very  easily   that  the  two  distributions of  carbetocin  and  oxytocin, the  blood  loss volume  distribution were  pretty  much  the  same. We  were  not  very  happy  with  this dichotomization  to  begin  with, but  we  had  to  do  it. That's  what  the  protocol  said. Now,  once  we   the  statisticians   at  the  trial,  we  found  beyond  any  doubt that  the  distribution  was  log normal. When  I  say  the  distribution   of  blood  loss  volume  is  log normal, I  mean  a  big  if, it is, it is  not  an  approximation,  a  nice  fit, things  that  we  statisticians  like. No,  we  had  29,000  points  and  the  fit you  are  going  to  see  was  perfect. Then  we  went  to  do  some  homework   and  we  found  from  physics that  the  blood  loss  volume  distribution...  Excuse  me,  the  fluid  volume in  pipes  has  a  log normal  distribution, and  that  has  been  known since  the  19th  century. Coming  from  physics. Of  course,  we  realized  that  our  pipes are  blood  vessels,  so  they  are  elastic. The  viscosity  of  the  blood  changes  because  of  coagulation. But  still,  we  have  sort  of  a  model. We  have  fluid  in  pipes,  flowing  in  pipes, and  the  data  showed  that. We  were  very  excited  with  that. We  went  further  to  see  the  consequence of  using  V  for  the  estimation  of  the  risk and  we  got  nice  results. Now  then  we  had  to  convince the  investigators. Such  a  large  trial  has  lots of  investigators,  big shots. The  physicians,  they  own  the  problem, so  they  have  the  last  word   and  everything. They  thrown  at  the  idea. Some  of  them   really  didn't  like. They  said,  "Well,  we  use  no  hypothesis since  it  just  binomial  variant,   it  has  no  model." It  has,  but  they  think  it  doesn't. People  think  it's  too  simple. What  if  the  log normal distribution  is  not  correct? We  can  have  wrong  results. Then  we  did  exactly  what  we're  going  to  do  right  here. Now,  we  did  the  analysis  in  front  of  them, and  that  with  JMP  was  very  compelling, and  I  hope  you  agree  with  that. JMP  helped  also  on  the  communication of  the  discovery  to  the  investigators in  a  very  compelling  way. Just  then,  to  advance  the  result, using  the lognormal  distribution  saves the  results  of  the  experiment. That's  part  of  the  story. We  went  on  then  to  publish  those  results after  the  publication  of  the  experiment was  done,  because  the  experiment  failed. You'll  see  that  it's  a  nice  story. But  then   we  published  the  results   with  the lognormal  distribution as  a  secondary  analysis. That  touched  the  hearts of  the  European  authorities, like the ADMA. Right  now,  carbetocin  is  very  happily being  used  in  low- resource  countries,  where  it  is  needed. We are  very  happy  with  that. Let  me  show  you  how  it  went. First  of  all,  the  measurement. You  see  on  the  left, a  sort  of  collector  to  collect  the  blood. It's  used  in  many  case  in  deliveries. As  I  told  you,  sometimes you  have  to  take  very  fast  action when  the  woman  is  bleeding  too  much. People  can  evaluate  the  blood  loss  by  just  seeing  the  stain in  the  bed,  in  the  floor. But  in  many  case, people  use  that  collector. That  collector  has  a  scale that  I  enlarged  on  the  right. In  the  first  two  trials, the  blood  loss  volume  was  evaluated   with  that. Then  they  changed. They  changed  because  it  was  no  good, not  perfect  for  our  experiments, the  three  of  them  that's  been  running  for  about  20  years  now. Let  me  show  you  how  it  goes  with  JMP. Let's  see. I  feel  more  comfortable  with  JMP. Here  is  a  data  table  with  all  the  71,000  case of  the  three  trials. Miso prostol. Here  they  are. Mis oprostol, A ctive  Management,  and  Carbetocin . Let's  see  the  distribution  of  the  blood  loss  volume for  three  of  them by  trial,  not  by  treatment. The  difference  by  treatment  is  so  small that  it  won't  matter  for  this  short  demonstration  here. I'm  not  analyzing  the  experiment  yet. Here  is  for  Misoprostol  distribution. You  see  that  it's  a  very  nice  log normal,  isn't  it? Can  be  something  else, but  it  is  log normal. It  looks  like  a  nice  distribution, but  it  has  problems. It's  hiding  the  problems,  actually, not  for  fitting  a  log normal, but  for  analyzing  the  way  it  was   with  the  binomial  variants. Let's  use  the  Grabber  tool and  change  the  pins  of  the  histogram. Make  them  thinner. Okay,  there  we  go. What  we  see,  we  see  spikes  in  distribution. Regularly  you  have  spikes,   you  can  see  them  here. Let  me  change  a  little  bit, yes. Now  you  said,  well,  there's  no  problem. It's  like  numerical  integration. You  lose  on  the  one  beam  then  you  have  access  on  the  other  beam and  they  alternate  and  you  end  up with  a  nice  integration. Well,  not  the  case  here, because  we  have  a  problem that  in  1,000  we  have  a  cut- off  here. Let  me  take  a  zoom  of  distribution around  1,000,  which  matter  most  for  us. See  here's  the  spike  at  1,000 . But  you  see  part  of  this  frequency here  comes  from  the  left,  from  the  900. Because  of  the  reading  of  that  scale, that  scale  was  rough   and  people  tended  to  round  the  numbers. There  is  a  sort  of  a  digit  preference  here. It's  very  clear  that   some  of  the  known  cases  of  PPH were  moved  to  PPH. It's  no  trivial  quantity for  that  small  frequency  here. That  means  that  in  spite  of   having  no  model,  as  my  colleague  said, for  the  binomial  variants, we  probably  have  a  positive  bias   on  this  estimation. Now,  this  problem was  taken  care  of  by  taking  the  weight of  the  collector  device  before  the  collection,  before  the  use, before  the  procedure,  and  then   weighing it  again  after  the  procedure. That  was  done  only  for  the  carbetocin  trial that  started  on  the  carbetocin  trial. If  you  go  the  same  trick  here, change  the  beams. Now  you  see  that  we  have  a  nice  distribution, no  problem  with  spikes  anymore. Weighing  solved  that  problem. Now,  let  me  tell  you  this  collector is  not  for  the  experiment. It's  for  actual  clinical  use. The  evaluation  of  the  blood  loss   and  its  speed  during  the  delivery is  perfect  with  that  scale. We  cannot  remove  that and then  weigh  then  to  decide  that, you  have  to  take,  say, a hysterectomy or thing  like  that. It's  still  in  place, it's  still  used  like  that. We  just  changed  it  for  the  trial. We  wait  at  the  end. That's  just  for  curiosity or  something  interesting. That  came  also  from  the  ability that  we  have  so  easily to  do  this  sort  of  analysis  with  JMP. That's  more  important than  we  can  even  think  of. Now  let's  go  to  the  real  problem. It  is  also  easy  with  JMP. I'm  going  to  analyze the  results  of  the  carbetocin  trial but  then,  so  that  I  don't  get  mixed  up in  front  of  you, I  prepared  data  set  with   subset just  for  the   carbetocin trial . Here  it  is,  29,000 case  only. That's  a  subset  of  that  other  trial. Let  me  take  the  opportunity  to  tell you  what  the  data  that  I  have  here, of  course,  that's  not  the  full  data of  the  trial,  that  clinical  trials. Clinical  trials,  you  collect  the  hundreds of  columns  of  [inaudible 00:20:56] for  many  reasons and  for  controlling so on  and  so  forth. Here  we  have  just  the  center, because  the  experiment  was  randomized by  center  so  I  have  to  keep  it. Then  the  arm,  it's  one  and  two  here but  here  have  the  issue  that's  closed and  I  have  open  treatment  and  control  here the   trial is  over,  of  course. Then  the  volume, that's  all  the  data  we  need. Those  two   columns  here  are  derivations, are the sPPH indicator and  P PH  indicator so  they  are  just  very  easy  to  do. Just  an  indicator  of  [inaudible 00:21:48] PPH in  this  case. Let's  start  by  analyzing  the  way protocol  SEBs, perhaps  in  a  simple  way, not  doing  the  complete  analysis, but  let's  analyze  the  SPPH  response. Remember...  Not  remember, I  didn't  say  that  yet. In  the  actual  trial  analysis, we  came  to  the  relative  risk  of  1.26 and  the  maximum,  I  told  you, for   non-inferiority  was  1.23. So it  was  a  near  missed  situation. We  could  not  declare   non-inferiority and  if  you  go  to  the  publication of  the  experiment, you  can  find  in  the  reference  in  the  last slide  here. We  have  to  publish  that  we  didn't  prove non-inferiority,  much  to  our  regrets. Let's  go  and  do  it  just  to  show that's  a  sort  of  show  off  for  JMP. How  we  need  now  is  a  fit  Y  by  X. It's  so  simple  after  all  that  work. We  have  treatment  for  X and  we  have  to  use  block  for  centers, just  to  respect  randomization. And  there  we  can  explore the  results  of  this  here. But  I'm  looking  just  for  the  relative risk,  which  is  one  item in the... It's  one  item  on the  menu here,  relative  risk. Well,  one  is  our  response and  treatment  must  be  in  the  numerator. That's  our  choice. There  we  go.  We  have  down  here  1.255, that's  the  1.26 that  we  got  with  those  nice  models, random  models  for  center and  things  like  that. So  it's  1.25.  It's  a  near  miss  situation. We  didn't  prove   non-inferiority. Instead  of  just  weeping  over  the  results, we  went  on  and  tried  to  do  an  analysis that  was  not  planned, but  anyway,  we  published  it  as  a  sort of  secondary  analysis  afterwards. Let's  analyze  the  distribution  of  v. To  do  that,  I'm  not  going to  the  distribution  platform. Rather,  I'm  going  to  use  reliability and  survival,  life distribution because  it's  a  much  richer  platform for  studying  distributions, except  that  the  variable, the  column  must  be  non- negative. That's  the  case  for  volume,  okay. I  can  use  this  instead  of  timing  here. I  don't  need  sensory,  nothing  like  that. There's  no  such  a  thing  here. It's  just  a  tool for  fitting  distributions. Now  let's  get  down  to  business here. I  have  distribution  of  both  treatment and  control, that  is  carbetocin   and  oxytocin. Let's  separate  those. You  can  do  that  by  a  local  data  filter for  treatment and  then  I'll  choose  treatment  here, that's  carbetocin. On  the  right  here  we  have  the  data  points, those  black  dots,. They  are  so  many,  15,000  of  them. Those  that  were  treated  with  carbetocin, that  they  look  like  a  continuous  line but  those  are  the  points. They're  not  having  blue, they're  nonparametric  estimates, nonparametric estimates [inaudible 00:25:41]. They  are  the  same  as  the  binomial point wise,  because  they  have  no  sensory. Then  where's  the   lognormal  here? There's  no   lognormal in  the  menu  for  distributions. That's  because there  are  zeros  in  the  data. Then  we  cannot  fit a   lognormal  with  two  parameters. Some  women  are  very  lucky  enough to  have  [inaudible 00:26:06] zero millilitre for  blood  loss. Probably  that  was  some  mistake. There  were  women  that  went  almost to  4000  in  the  control and  those  were  probably  in  shock. This  large  span  here  for  the  binomial variation  was  separating  just  two. Okay,  let's  fit  the  threshold   lognormal. The   lognormal  that  you  take  a  shift  so that  we  can  put  the  zeros  in  the  field. Now  we  have  three  lines  here. The  red  one  is  the  threshold   lognormal. They  are  all  three.  They  are  hiding themselves,  the  three  of  them. Then  people  can  say  "Well, okay,  the  fit  is  very  nice,  perfect." It's  not  always  like  that. If  I  fit  a  normal  or  a  smallest  value  here things  like  that ,  you  can  see  that  you come  out  but  that's  no  need here. We  can  find  the  risk  in  several place  in  this   result here. The  risk  is  one minus 0.985.  If  you don't  want  to  do  this  sort  of  subtraction, we  can  show  the  survival  curve and  the  risk  is  1.47  for  carbetocin. If  I  want  to  see  the  risk  at  1,000 for  oxytocin,  it's  again  the  same,  1.47. Wow. We  have  also  confidence  interval  here. People  will  challenge  us  say, "T hose  distributions,  they  look  the  same because  of  the  scale  of  the  graph." Well,  let's  take  up  this  challenge. Let's  do  a  zoom  here. Let's  do  a  zoom  around  1,000. Just  because  we  are  caring  about  that. Look  how  close  the  fit  is. It's  very  close. Now  I  can  go  even  further,  like  this. And  now  we  can  see  even  more. We  see  that  the  point  estimates  this  black dot  here,  if  you  want, it's  almost  the  same  as  the  red  line, which  is  the   lognormal  fence. My  fellow  investigators  there  could  see that  I  don't  have  expressions  or  a  table. A  table  won't  say  anything. They  could  even— I  don't  know— but  they  could  even  think that  statisticians  were  cheating. Here  is  the   easy  way  to  show  it but  there  is  more  to  see  here. If  you  see  the  confidence  interval for  the   lognormal  distribution, it's  one  third of  that  of  the  nonparametric distribution. Well,  since  the  precision   goes with the... increase with  the  square  root of  the  experiment  size. We  can  guess  that  if  we  take size  one  ninth  of  that I  would  get  for  the   lognormal the  same  confidence  interval that  I  get  for  the  nonparametric  here. That's  interesting. Instead  of  using  30,000  women,  essentially I  could  use  3000  and  get  this  result. That  was  very  good  for  the  investigators, they  planned  on  that. This reduced [inaudible 00:29:55] of  the  confidence  interval came  from  the  lognormal, which  was  not  planned. So  something  else  to  hear. Well,  okay,  you're  doing fine  for  the  risk . You're  getting  the  risk  from  the  log normal which  is  the  same  as  the  binomial  rate and  you  have a  closer  confidence  interval if  the  log normal assumption  is  okay,  it  is. Now  what  about  the  relative  risk ? Well,  we  can  go  and  take the  logarithm  of  the  V. You  have  a  normal  distribution so  we  have  a  standard  apparatus  to  do  some regressions  and  find  the  relative  risk. But  I  remember  John  Sol  talking on  this  same  meeting  last  year. His  talk  has  a  nice  title, Delicate  Brute  Force. Let's use  the  same  thing, delicate  brute  force. If it's  good  for  John Sol, it's  going  to  be  good  for  us  too. Here  is  the  estimation... the estimated  parameters of  the   lognormal  that  we  get. If  we  can  do  a  bootstrap  sample  of  this, we  can  compute  the  risk, the  bootstrap  risk. We  have  a  bootstrap  sample  for  the  risk. We  can  do  that  for  carbetocin and  for  oxytocin  and  that's  good. Then  you  say, "Well,  I  have  to  program  this." "I  have  to  program  the  bootstrap  sample." It's  not  difficult  but  you  have  to  program and  then  you  have  to  compute  1000  times, 2000  times,  whatever it is  lognormal  fits. But  no,  JMP  is  nice  twice. If  you  click  with  the  right  button, this  table, you  have  bootstrap  on  the  menu. The  suggestion  is  to  take  2500,  we  can  take  5,000  or  whatever, but  it  takes  a  long  time. We  did  that  with  1,000. We  were  very  happy  with  that. It  takes  10  minutes  or  so  for  each of  treatment  and  carbetocin  or  oxytocin. I'm  not  going  to  make  you  wait  10  minutes, I  didn't  want  to  wait  for  longer, right. We  did  that  before and  here  is  the  bootstrap  sample for  the  control. I  mean,  that  is  oxytocin. The  output  are  the  parameters  here. The  first  line is  the  actual  result  of  the  experiments and  all  the  rest  is  1,000  bootstrap samples  that's  why  we  have  1,001  here. Now  this  column  here came  from  the  parameters. It's  just  the  risk  estimate. One  minus  the  log normal  distribution  at  the  point   1,000 minus  threshold, location  and  scale. Fine,  easy. Now  here's  the  same  thing  for  carbetocin. Now  I  use  a  result  that  I've  read the  book  by   [inaudible 00:33:18]   , the  man  who  knows everything  about  bootstrap. To  have  a  bootstrap  sample  of  the  relative risk, all  I  have  to  do  is  take those  two  bootstrap  samples  here and  join  the  tables  row  wise. It's  a  Mickey  Mouse  operation  for  JMP, like  we  do  with  the  tables  here  and  so  on. Here's  the  results. I  kept  just  the  risks  column  here for   carbetocin  here  and  oxytocin. If  you  don't  want  to  use  this  extra  point, I  don't  know  why  you  wouldn't  bu t. We  can  exclude  it  to  use just  the  bootstrap. We  have  the  relative  risk  here, just  the  quotient  of  those  two  columns. We're  done. Take  the  distribution  of  this  bootstrap sample,  the  relative  risks and  here  we  are, we're  almost  to  celebrate  now. Here's  the  distribution. You  don't  see  1.23  here... Yes,  you  see,  but  then we  need  now  one  sided  confidence  interval with  95%  coverage. I  need  the  5%  quantile, which  is  not  here. Okay,  so  we  kindly  asked  JMP to  compute  that  you  can  put  display options,  custom  quantiles and  we  need  0. 95 quantile, which  turns  out  to  be  1.11. We  even  have  a  bonus  result  which  is  the confidence  interval  for  this  estimate. If  you  want  to  be  really  safe, we  can  use  the upper, the upper  confidence  limit... For  the  limit of the confidence limit that's  too  involved  to  say.  Anyway, it's  far  away  than  1.23. Then  we  have  proven in  some  sense, we  have  thrown  evidence  that  carbetocin is  non- inferior  to  oxytocin. That's  the  result  we  published. A s  I  told  you,  that  publication with  some  work  by  the  investigators, it's  warmed  the  hearts  of  the  EMA, the  European  authority who  was  overlooking  this  trial and  carbetocin  is  not  being  used  on  places where  you  have  no  code  chain  assured. Let  me  use  your  time  if  I  can, just  to  show  the  efficiency  that  we  get. Let  me  go  back  to  presentation  here. Let's  see  the  relative  efficiency of  binomial  versus  lognormal. Let's  take  the  problem, not   non-inferiority  but  simple  problem of  testing the  superiority of  a  new  drug  over  oxytocin. The  new  drug  would  be  declared  superior if  it's  risk for  sPPH  is  less  than  1.5% compared  with  2%  of  oxytocin. We  have  here  all  we  need to  do  a  binomial  test. For  the   lognormal  test  we  need to  convert  from  this  piece  to  the  means. Let's  do  it. For  the  [inaudible 00:37:15] , you  have  this, for  the   lognormal,  we  just  do  this. We  want  to  know  the  risk, is  the  probability  of  being  larger than  1,000 so we  take  logs  on  both  sides, no  subtract  then  standardized, which  is  now  a  normal  variance. And  here  for  S, the  standard  deviation,  use  0.7. It's  important  in  every  bit  that  we  did, every  inference  that  we  did  with  those three  trials  and  then  a  few  more, that  we  have  data  available. Every  time the  standard  deviation  came  about  0.7. We  called  jokingly, this  is  a  universal  constant  of  P PH. The  standard  deviation  of  the  logarithm of  the  blood  volume  loss  is 0.7 we  replace  S  by  0.7.  We  compute the  quantiles  at  2%  and  then  1.5% here and we  solve  the  equations  on  them and  we  have  those  two  means  to  compare with  a  normal  distribution. The  difference  is  0.08 14. Okay,  now  we  go  back  to  JMP. I  feel  even  ashamed  of  showing  that but  it's  fun and  we  did  that  for  our  medical  team  there and  it  was  very  compelling,  as  I  told  you. Sorry  about  all  those  windows open  here. Very  simple. You  come  to  DOE,  Sample  Size  Explorer, Power,  it's mickey  mouse  stuff. Let's  do  that  for  the  sample  size for  two  independent  sample  proportions. We  have  one  sided, it's  a  superiority  test. The  proportion  under  the  new  is  2%, under  the  alternative  is  1.5%.  It's  going to  change  to  too  much,  1.5  in  hiding  here. Then  we  want  8%  for  Power and  the  sample  size  is  17,000. Okay,  17,000  to  go  old  fashioned. Let's  compute  the  sample size  for  the   lognormal. Using  Sample  Size  Explorers  from  DOE, Power  and  Power for  Two  Independent  Sample  Means. We  have  a  one  sided  test. We  have  to  add  the  standard  deviations which  are  0.7  for  both  groups and  then  the  difference  to  detect  we  had compute  0.0814 and  we  want  80%  Power. Okay,  we  came  to  the  result. Now,  the  sample  size  or the  experiment  size  is  1831. That's  about  one  ninth  of  the  17000 that  we  had  computed  for  the  binomial [inaudible 00:40:31]. That's  what  we  got  by  just  inspecting  the the width  of  the  confidence  intervals in  the  reliability  platform. That's  how  much  more  efficient  using lognormal  [inaudible 00:40:48]   binomial. Just to finish... Just  to  finish  the  rap-up  is the   lognormal  distribution  fits  very  well the  blood  loss  volume  distribution so  why  not  to  use  it? Using  this  fact,  the  estimates of the  risks are  much  more  precise. We  even  show  that  our  big  trial  was  saved in  some  sense  by  showing  non- inferiority of   carbetocin,  using  the   lognormal. We  are  very  happy  to  communicate  to  you that  a  new  trial  is  already underway  now  using  the   lognormal. Now  this  trial  would  not  come  to  life because  we  don't  have  money for  30,000  people. But  since  we  are  using  only  less  than 4000,  that  made  the  trial  possible. It's  underway  now.  It's  for  treatment, not  for  prevention  like  they're  of  others. That's  what  I  had  to  tell  you. Thank  you  very  much.
What if you could save time in your process of collecting data, cleaning it, and readying it to begin your analysis? Accessing data and getting it prepared for review is often the most time-consuming part of creating a new data analysis or project. With that in mind, we would like to introduce the Workflow Builder in JMP 17. With this exciting new feature, JMP users can now record their entire process from beginning to end, starting with accessing data from multiple sources. Working with the action recorder (added in JMP 16 to track steps and provide scripts that can be saved and reused), Workflow Builder tracks all your changes in data prep and cleanup, data analysis, and reporting. In this presentation, we will show how to operate the Workflow Builder, save each action, and then replay and share them in a polished report. This is sure to become your new favorite feature in JMP 17.  No manual clean-up means extra time in your day!     Hi,  thanks  for  joining  us  today. I'm  Mandy  Chambers and  I'm  a  principal  test  engineer in  the  JMP  development  group. And  I'm  going  to  talk  to  you  today about  a  brand  new  feature  for   JMP 17. It's  the  Workflow  Builder. And  I  want  to  talk  to  you about  how  to  navigate  the  data  workflow and  just  sort  of  the  idea   behind  the  fact  that the  Workflow  Builder  sort of grants  all  your  wishes  for  data  clean- up. What  is  the  Workflow  Builder? It's a  new  utility  that  records   your  data  preparation,  clean- up, and  analysis  steps,  and  it  makes  it  easy   to  create  a  workflow that  you  can  use  over  and  over  again. During  the  EA  cycle,  we  did  a  survey   where  we  went  out and  we asked  our  users  how  they  might  be  interested in  using  the  Workflow  Builder. And  we  got  a   smattering of  answers, but  a  couple  of  the  top  suggestions   where  people  thought  they  might  use  it for  sharing  work  with  others who  might  be  needing  to  do   the  same  actions, they  might  use  it  for  reusing   the  entire  sequence  of  steps again  and  again,  or  taking  those  steps and  applying  them  to  new  data. And  some  people  said  they  would  use  it for  archiving  their  work and  documenting  maybe  past  work so  they  wouldn't  forget how  they  did  things. There  were  lots  of  answers  all  in  there, educational  purposes,  teaching  JSL. Today  what  I'm  going  to  do is  I'm  going  to  introduce  you to  the  Workflow  Builder and  then  I'm  going  to  show  you several  samples  of  workflows that  I  created  to  demonstrate some  of  these  actions. Let's  get  started and  let  me  share a  couple  of  example  workflows. I'm  going  to  open  up  this  workflow and  I'm  just  going  to  show  you what  it  looks  like. This  is  one  I  created using   Big Class Families. And  then  I  just  made  value  labels and  ran  a   Graph Builder. And  so  what  we're  going  to  do instead  of  running  this  one is  we're  going  to  create  one  from  scratch. The  way  you  open  the  Workflow  Builder is  you  go  to F ile,  New, and  you  go  down  here and  you  select  New  Workflow and  you'll  see  that  the  workflow  opens  up and  it's  completely  empty. There's  nothing  in  it. In  order  to  activate  the  workflow, you  need  an  action  such  as   opening  a  table  or  importing  your  data. We're  going  to  start  by  opening... The  steps  will  go  in  here  as  you  open  things. And  this  is  a  recording  log  history that  is  really  fed  from  the  enhanced  log. The  script  that's  built when  you're  running   JMP is  fed  into  here. If  I  leave  this  open,   which  I  will  today, I'll  have  different  workflows  up. The  history  of  what  I'm  doing  all  day will  be  captured  in  this  lower part  of  the  Workflow  Builder. Let's  go  over  and  open  up   Big Class Families. And  I  want  you  to  notice that  as  I  open  this  up, a  little  button  pops  up  in your  window that  says,  "Hey,  do  you want  to  start  recording?" This  is  an  option  so  you  can  check  this to  say,  "Don't  ask  me  this  again." But  I  want  to  be  asked because  I want  to  know  what's  happening. I'm  going  to  just  leave  it the  way  it  is  and  I'm  going  to  say  yes. And  you'll  see  that here's  Big Class Families. Notice  our  new  pet c olumn  out  here that  we've  added  for  JMP  17. And  this  has  now  recorded this  step  over  here  to  where  it's  open Big Class Families  and  it's  also put  it  down  here  in  the  history. And  the  button  up  here  that  is  recording is  this  very  first  button. And  I  don't  know  if  you  noticed at  first  it  was  solid  red and  now  that  I'm  recording, it's  kind  of  hollowed  out. It's   white in  the  middle. Let's  do  a  couple  of  things that  we  would  normally  do if  we  opened  up  a   JMP data  table. I'm  going  to  go  into  this  column and  recode  it  and  I'm  just  going  to  make titlecase  because  that  will  be something  that's  simple and  I  don't  want  an  extra  column. I'm  going  to  say  recode  this  column   in  place  so  we'll  be   saying name  column. And  then  I'm  going  to  say  Recode. And  you  can  see  right  here, that's  been  added. And  then  I  want  to  go  to  this  step and  go  to  Column  Properties and  I'm  going  to  add  my  value  labels  here. And  so  if  I  can  type,  I'm  going  to add  this  and  add  some  labels. And  you  can  see  the  labels  are  changed and  there's  the  comment  there. And  then  I'm  just  going  to  grab a   Graph Builder  that's  out  here and  run  it. And  there's  a   Graph Builder. Now  you'll  notice  that  when  I  ran the   Graph Builder,  it  did  not  get  recorded in  the  workflow  builder  steps, nor  did  it  get  put  down  here. Platforms  do  not  get  recorded unless  you  close  them and  then  they're  recorded  as  actions. Or  we  had  some  feedback  during  the  cycle that  it  would  be  better to  be  able  to  save  those  if  you  wanted  to. Under  the  platform's  red  triangle  menu, you  can  go  here,  say  Save  Script, and  say T o  Workflow, so  you  could  add  it  here. I'm  not  going  to  do  it  this  way  right  now. I'm  just  going  to  close   Graph Builder and  you  can  see  it's  been  added. I'm  done  with  my  workflow that  I  want  to  demonstrate  here. I'm  going  to  stop  recording. And  then  the  second  button  is  a  button that  sort  of  resets  everything. It  will  close  up  everything and  reset  the  workflow. Now  I  can  rerun  it,  which  I  will  do. Before  I  do  that, I  want  to  talk  about a  couple  of  other  places   where  you  can  record  your  actions. The  enhanced  log  or  the  log  file  here is  another  place where  you  can  get  actions. As  I  said,  all  of  this has  been  tracked  in  here. We  opened  the  table, we  created  a  recode. We  did  value  labels, we ran  our   Graph Builder. Under  this  red  triangle  menu, we  also  added  a  save  script  that  says you  can  save  script and  add  it  to  the  workflow. I  could  click  this  and  it  will  add  it to  the  workflow  and  notice, it added  it  to  the  workflow and  I'm  not  in  record  mode. You  can  go  and  grab  things. Let's  say  you're  working throughout  the  day and  you  do  several  things and  you  forget  something. You  can  go  back  and  find  it  here or  you  can  find  it  in  the  log down  here  and  you  can  push  it  up. I  could  grab  this  recode  and  I  could grab  this  and  I  could  push  it  up  here. Again,  I'm  not  recording, it's just  adding  things  to  my  workflow. Now,  I  don't  need  all  multiple  steps in  here,  so  I'm  going  to  go  to  this  one and  right  click   and  I'm  going  to  remove  it. And  the  recode,  I  don't  need   a  second  recode, so  I'm  also  going  to  remove  that. I  have  four  steps  in  my  workflow and  let's  just  click  the  third  button. Here  is  a  button  that  will execute  all  the  steps. These  other  buttons  are  buttons  that  do one  step  at  a  time, or  you  can  go  backwards one  step  at  a  time. But  I'm  just  going  to execute  the  workflow. And  there  you  have  it. It  ran  the  workflow. You  can  see  my  column  here that  changed  the  names  to  title case. And  the  labels  are  here for  males  and  females. And so we  have  our  first  workflow. We've  built  it. Yay,  that's  successful. Let's  close  this and  let's  close  the  workflow. And  I'm  not  going  to  save  it  right  now. Let's  go  on  to  my  second  workflow. I'm  a  big  fan  of  Virtual  Join, so  I  created  this  little  workflow  here. And  what  I  did  was  I  went  and  opened  up the  three  of  the  pizza  tables that  we  have  in  samples. I  opened  Pizza  Profiles, Pizza  Subjects,  and  Pizza  Responses. And  then  I  created  my  own  link  IDs and  link  references  and I ran  a   Graph Builder. And  so  I'm  going  to  just  run  this  workflow and  then  we're  going  to  do a  couple  of  things  to  change  it. Here's  the  workflow. You  can  see  that  I've  opened  up  the  tables if  you're  familiar  with  Virtual  Join, which  most  people  are  at  this  point. Here's  where  I  created  a  link  ID. Here's  another  one  with  link  ID. And  then  here's  Pizza  Responses, which  is  my  table  that's  actually driving  the   Graph Builder. This  is  where  I  created my  link  references  for  these  columns. And  this  is  the  table  that  I'm   using for  the  actual  platform  here  that  I  ran. So  I  can  make  this  as  a  presentation and  I  can  make  it  a  little  cleaner. I  don't  really  need  to  see  the  tables. It  would  just  be  nice  to  see  the  report. I'm  going  to  show  you  a  couple of  things  more  about  the  Workflow  Builder so  that  we  can  do  that. If  you  go  to  this  panel  over  here, it  says  Step  Settings. And  you  open  this  up,  this  is where  all  the  magic' s  happening. This  is  where  all  the  JSL is  being  captured. As  I  hover  over  this,  you've  probably seen  things  popping  up  like  this. You  do  have  actions  where you... There's tool  tips  under  here that  will  show  you  the  script. But  there  are  a  couple of  things  under  here. If  I  don't  want  to  see  this  table, we  have  a  couple  of  actions  built  in  here. There's  a  thing  called  Show  Message which  I'll  talk  about. You  can  create  subsets  and  random  seeds. You  can  do  custom J SL. But  what  I  want  to  do  is I  want  to  hide  this  table. All  of  a  sudden, here's  my  step  where  my  table  is  open. And  then  here's  my  hide. I'm  going  to  hide  all three  of  these  tables. I'm  going  to  add  this  action to  all  three  of  these  steps and  hide  the  tables. And  then  there's  the  JSL  that  was  captured when  I  created  a  link  ID  and  another  ID. And  here  are  my  references  and  so  forth. And  there's  my   Graph Builder. I'm  going  to  close  this  right  panel here  and  I'm  going  to  run  this  again. And  this  time  you'll  see  it  run  through. And  there  you  have  it. It  runs  through. We  didn't  see  any  tables. It  hid  my  tables  for  me. You  can  see  them  down  here in  my  JMP  Home  window  down  at  the  bottom. They're  there  if  I  want to  open  something and  I  want  to  run  another  report. But  this  is  a  much  cleaner  report. Let  me  point  out   a  couple  of  other  things that  I  might  want  to  do  in  here. I  might  want  to  slow  this  down a  little  bit. Another  action  I  could  do  in  here is  I  could  add  a  custom  action. And  so  let's  just  add a  wait  statement  in  here. And  so  what  I'm  going  to  do is  I'm  going  to  type just like  you  would normally  type  a  JSL  step. Just write  there  to  say  wait. I  kind  of  want  that  to  be  after  this  step. I'm  going  to  push  it  down so  it  follows  the  step, the  step  setting  the  link  reference. And  there's  the  wait. And  then  let's  run  this  again and  just  pay  attention  for  a  second. See  if  you see it  hesitate before  it  runs t he   Graph Builder. There's  the  hesitation and  there's  the   Graph Builder. There  are  a  couple  other  menu  items. I  think  it's  easier  to  show  you  these all  along  the  way. If  you  want  to  save  your  workflow, you  go  to  the  File  menu  and  you  can  say Save  or  Save  As  and  it  will  save the  workflow  locally  for  you with a .jmpflow  is  the  ending  on  the  file. If  you  want  to  add  this  to  a  journal, one  of  the  things  that's  been  put  together for  us  is  the  ability  to  create a  journal  out  of  your  workflows. You'll  see  here,  here's  the  open. Here's  your  code so  you  could  run  your  code. And  here's  the  report  at  the  bottom. There's  a  thumbnail  here and  I  have  a  full- size  graph if  I  want  to  see  it  there. So  that's  really  a  nice  feature  for  this because  journals  are  sometimes  hard. I  did  create  this  one  to  the  right, but  I  have  create  them  a  lot  of  times and  save  them  and  reuse  them. So  this  is  just  nice that  this  is  sort  of  built- in. That's  a  nice  feature. The  other  thing  that  you  can  do is you  can  go  up  here  and  you  can  say save  your  script  to  the  script  window. Just  so  you're  clear, this  creates  a  script that  does  all  the  JSL that  we've  been  doing, but  it  does  not  regenerate the  workflow  dialogue. There  is  not  a  script that  will  create  that  window  for  you. This  would  run  just  straight  script. I  have  the  hide  function  in  here that's b een  created  to  hide  the  tables. It  would  run  the  same  thing  and  just  run the   Graph Builder  at  the  end, but  it  will  not  redo  the  workflow  window. I  think  that's  mostly what  I  wanted  to  show  you  in  this. One  other  thing  that  you'll  see  is there  is  the  ability  to  group some  of  these  steps and  I  have  some  more  workflows. I'll  show  you  where  I've  done  this. But  the  way  that  you  would  do  this is you  would  right- click  and  say  Group. You  might  want  to  do  that because  these  are  all  opens. These  are  actually  steps where  I'm  changing  things about  the  columns and  then  I'm  running  a  report. You  have  the  ability to  do  groups  within  groups and  group  some  of  your  workflow  together so  it's  a  little  cleaner and  you  know  what  you're  doing. Let's  close  this  workflow. The  next  workflow  that  I  want  to  show  you is  one  that  actually  Peter  Hersh  designed and  it's  more  along  the  line... I  titled  it  Distribution  Education  Type. And  he  had  done  this  cool  thing where  had  opened  up  a  data  table and  then  I  believe  he  had  run  a  one  way and  then  he  had  gone  through and  selectively  picked  different  areas of  the  platform  output and  then   done  a  definition of  what  each  one  was. I  kind  of  cloned  his  idea and  I picked  up  my  own  distribution. And  so  I'm  just  running  a  distribution on  this  pain  column  here. And  then  what  he's  actually  utilizing, and I'll  show  you  in  a  minute, is  the  Show  Message  window where  he  selected  this by  using  a  report  script. And  then  he  basically  grabbed  a  definition for  quantiles and  freezing  for  a  minute. It  won't  go  on  until  I  say  okay. And  then  it  moves  to  this  next  section and  it  pulls  up  the  definition  here. And  then  I'm  going  to  say  okay  again. Notice  the  little  running  man over  here  as  well. I  didn't  point  that  out  before. It's  hesitated  right  now. So  the  first  step  has  been  completed and  you  get  a  green  check. But  the  little  running  man  is  here  now. He's  kind  of  waiting  for  me  to  finish. And  when  I  click  this, he  turns  into  a  green  check,  too. Let's  close  this. And  then  I  want  to  show  you another  feature  where  you  can  go  down  here under  the  red  triangle and  you  can  duplicate  a  workflow. And  so  I'm  going  to  go  into  this  workflow and  get  into  the  side  over  here that's  kind  of  magical. I'm  going  to  close  this  one because  I  don't  really  need  it  anymore. And  then  I'm  going  to  show  you how  you  could  just  go  in  here. Now,  you  can  recapture  this by  bringing  up  another  table and  doing  all  the  steps  again. Or  you  could  just  go  in  here and  you  could  do  a  little  typing. And  I'm  not  the  best  typist, but  I'm  going  to  do  it  this  way. I'm going  to  go  in  here because  I  want  to  show  you how  this  was  done. There's  a  distribution  running  here and  the  distribution  was  captured. But  he  went  in  here  and  he  added  this and  assigned  it  to  a  report. He's  just  calling  it  Report  OW. I'm  going  to  go  in  here  now and  I'm  going  to  change  the  table to  the  body  fat  table. And  I'm  going  to  pick  a  different  column, percent  body  fat. And  I  need  to  type  it  right. And  then  this  is  the  part  down  here where  he's  selected  it, and  he's  doing the  definition  of  quantiles. So that's  the  first  part  of  this. And  then  this  action  here is where  he  added  Show  Message. The  Show  Message  step  is  right  here. And  he  typed  in  quantiles. We  typed  in  a  definition  here. He  selected  it  to  be  a  modal  window. And  then  he  went  to  the  next  step, which  is  a  custom  action, the  clear  action. And  he  did  that  by  selecting  custom  here and  he  named  it  clear  selection. So  he's  taking  the  report and  saying d eselect it, just  straight  up  JSL. And  then  the  next  step is  to  select  the  next  part  of  this, which  I  need  to  change  it   to  percent  body  fat. And  he's  selecting  the  summary  statistic and  then  the  Show  Message  for  that is  summary  statistic  and  the  definition. So  if  I  typed  this  correctly, and  hopefully  I  did, we  should  be  able  to  close  this  side, go  over  here,  and  run  this  again. There's  the  body  fat  table, different  table  open. Here's the  body  fat  distribution. Quantiles  are  selected  here. There's  my  quantiles  definition. Say  okay. And  now  there's  the  summary  statistic and  the  definition. This  is  kind of  a  cool  thing. It's  a  neat  way  to  use  it for  maybe  a  teaching  tool, some  kind  of  educational  piece. But  I  just  thought I  would  show  that  quickly because  it's  just another  way  to  use  workflow. Let's get into... I  decided  that  it  would  be a  little  bit   more  interesting to  maybe  show  you  a  real- life  example. qwas  talking  with  my  husband,  actually, and  we  were  talking  about what  kind  of  data  I  could  go  and  find. And  we  were  talking  about this  payroll  protection  program and  sort  of  a  real  data  example. So  this  is  government  data. It's  all  public  knowledge. I   was  able  to  get  in  there and  really  drill  into  court  orders and  court  proceedings. You  can  see  people's  names. There's  a  way  to  search  in  every  state for  any  kind  of  company. There's  a  lot  of  data  out  there. I  took  a  smaller  table, which  I'm  giving  you  guys an  abbreviated  version  of  the  journal. I  didn't  give  you  the  whole  thing. But  you  will  get  this  table. I  believe  it's  in  there. This  is  a  smaller  list, but  it  was  a  pre- alleged PPP  fraud  data  list. And  so  they  went  through, and  they've  actually  tried and  called  and  found  people. Couple  of  data  points  here. The  accused  folks  were  seeking about  $250 million  in  loans, but  they  actually  obtained  about 113 million  which,  okay,  to  me, that's  not  small  amounts  of  money. Joshua  Bellamy  was an  ex-NFL  football  player that  played  for  the  New  York  Jets, and  he  conspired  with  this  guy, Phillip  Augustin,  with  Drip  Entertainment, which  is  sort  of  a  music  industry... And  they   connected  together from  Florida  and  Ohio. And  I  kind  of  have  a  map  where  I  can  show how  I  went  in  and  found  the  two  of  them. And  I  think  they  came  up  with  about $17  or  24  million  or  something. But  it  was  all  fraudulent  money. This  place  called   Papillon Air, I  believe  they  got  a  large  amount of  money,  too,  but  they  took about $2.5 million  and  purchased luxury  cars  and  a  private  plane. There  were  people  in  Houston, one  individual, that  went  through  and  applied for  about  80  different  loan  applications, working  with  various  different  people, with  fake  companies, fake  different   licenses  and  agreements, and  he  purchased  a  Lamborghini and  a  lake  property. And  I've  got  a  little  note  here  that  says you  need  to  be  really  careful with  your  text  messages. I  read  some  of  the  correspondence in  that  particular  court  order, and  it's  right  there  in  the  court  order as he's  texting  these  different  people all  over  the  country  saying, "Hey,  it's  time  that  we  go and  file  our  tax  form," for  this  and  that  and  everything  else, and  it's  all  right  there, so  just  be  careful  with  what  you  do. This  is  read in from  the  government. It's  a  straight- up  Excel  file. I  created  the  whole  thing,  reading  it  in and  capturing  stuff  and  cleaning  up. Here's  an  example of  where  I imported  the  data. You  can  see  groups  here. I  grouped  columns to  change  names  of  columns, formats, change  things  to  multiple  response so  they  would  do  better in  a  mapping  situation. I've  got  labels  in  here, selected  deleting  rows and  hiding  and  excluding  certain  things, selection  of  rows  to  create  markers and  colors  to  make  another  map. And  then  there  are  several  reports that  we'll  run. I  want  to  thank  Lisa  Grossman— I'll thank  her  at  the  end,  too— but  she  helped  me with  some  of  these   Graph Builders. She's  on  my  team and  I  appreciated  her  helping  me so I  could  show  some  of  the  features that  are  in   JMP 17  for   Graph Builder. And  you'll  notice  down  here there  are  several  reports. And  this  particular  report  right  here is lighter  and  it's  italicized  to  the  right. And  I  just  want  to  say  that  the  way... That's  a  report  that  I'm  not  showing  you, I  didn't  want  to  delete  it, but  the  way  that  you  enable and  disable  that  is  you  right  click and  you  can  say  step  enabled or  not  enabled and  I'm  going  to  take  it  out. I  didn't  really  want  to  delete  it because  I  might  want  to  use  it some  other  time. So  this  is  a  nice  way  to  say, "Hey,  I'm  running  some  stuff, but  I  only  need  to  show  somebody maybe  one  or  two  things." That's  just  a  way  to  keep  something  in  there and  not  lose  what  you've  already  captured. Let's  run  this  workflow. And  you  can  see  it  runs  fairly  quickly. Now,  it  took  me  a  while to  build  the  workflow. Let's  go  through  and  let's  just  talk  about a  few  of  these  things. This  is  a  Text  Explorer. I'm  just  a  big  fan  of  Text  Explorer because  I  like  word  clouds. But  this  is  the  DOJ  records. They  had  a  column   where  they  were  talking  about what  all  happened  in  that  particular  aspect  of  the  charge against  these  different  people. It's  just  interesting to  see  the  words  here. According  to  allegations  and  allegedly and the  millions  and  the  companies. You  can  see  PPP  funds and  that  kind  of  thing. That  was  really  just  fun  for  me. I  wanted  to  show  you  that. Here's  a   Graph Builder  that  we  designed. And  I  made  myself  some  notes. I   used  the  aspect  of  this  part  of  the  workflow so  I  could  remember  what  I  wanted  to  say about  some  of  these  things. There's  a  little  notes  section  right  here. The  comment  we  made  here was  this  Hawaii  guy  really  cleaned  up. I  don't  know,  he  tried to  get  about  $18  million. He  got  almost  13. I  guess  that  was  good  for  him at  some  point,  but  not  in  the  end. And  it  said  he  ended  up  falsifying, how  many  different  employees were  there,  and  all  of  that. But  you  can  see  in  the  map  here, the  red  is  kind  of  honed  in  on him. That's  where  he  was  located. That  is  just  a  regular  map using  the  colors  here. And  then  this  particular  map is  a  map  of  the  states with... I  called  it  states  with  red  flags. And  in  looking  at  this, this  is  where  I  went  into  the  data and I actually  was  able  to  find— and  I'll  pull  the  table  over  here— I  was  able  to  find these  two  guys  pretty  easily. This  guy,  Phillip  Augustin,  is  right  here. And  then  Joshua  Bellamy  was  down  here. And  as  I  was  looking  at  these, I  was  able  to  go  across and  see  what  they  actually  did. Joshua  played  for  the  New  York  Jets and  lived  up  near  New  Jersey. When  you  link  these  together, you  can  see  that  some  of  the  addresses maybe  fall  in  Florida, but  other  places  fall  in  Ohio. And  then  the  address  he  used  for  the  company was  Cross  River  Bank, which  was  up  in  New  Jersey. And  then  the  notes  that  are  written  here are  that  this  guy  up  here  was  using this  Clear  Vision  Music  Company and  they  got  $17  million  in  funds. They  filed  about 90  different  fraudulent  applications for  the  millions  of  dollars  that  they  got. And  then  Joshua  was  down  here. So there's similar  comments, but  the  company  here was  Utilization  Review  Pros, and  the  company  here was  Clear Vision  Music  Company. So  just  an  interesting...   The  map  that  Lisa  made  me  here was  kind  of  cool  because  the  one  I  saw the  government  had  was  very  flat and  had  little  red  flags  in  it. We  kind  of  designed  it  this  way. Then  this  is  just the  distribution  that  I  had because  it's  an  easy  way to  see  the  states. Then  you  can  see  Florida  here,  Georgia, looks  like  New  York  and  Texas were hot  with  the  fraudulent  areas. We  had  this  other  field  out  here where  it  said,  "Did  they  plead  guilty?" We  were  looking  at  that, wondering  how  many  different places  did  they  plead  guilty. Did  it  make  any  difference? It  didn't  seem  to  make  any  difference. And  I  have  another  graph in  the  next  set  of  data. I  just  wondered  if  maybe they  got  off  easier. I  don't  really  know. But  if  you  want  to  look  at  that  later, I  do  have  the  links  in  there for  all  of  this  to  be  able  to  show  you. So  that's  that  particular  workflow. I  wanted  to  go  into  the  bigger  workflow. Let  me  go  down  into  this  one. This  is  the  single  entries for  every  one  of  the  PPP  loans. And  I  think  they're  about $1.6  million  in  rows or  1.6  million  unique  entries. I  went  in  and  I  created a  smaller  workflow that  actually  goes  through and  imports  the  data, concatenates  the  tables  together, and  saves  it. And  then  I  went  to  a  second  workflow where  I   did  all  my  cleanup. You  could  have  done  them  all  together, but  I   wanted  to  split  it  up. This  is  an  example,  too, of  just  using  something where  I  had  created  this. And  I  realized  that  I  already  had the  table  out  on  my  desktop so  I  didn't  want  to  be  redoing the  same  things  over  and  over so  I  added  a  JSL  step, just  a  custom  action  myself  to  say, "Hey,  go  look  for  this. If  it's  there,  then  delete  it." I'm  going  to  run  this  workflow and  pay  attention to the little  running  guy  right  here. He's  working,  he's  running  out to  go  get  the  data. It's  kind  of  big,  he's  importing  it. And  then  the  little  check  marks  are  done as we're  running  through  this and  we're  concatenating  the  tables, we're  getting  rid  of  those  tables, and  then  we're  saving  the  one  big  table. All  I  did  was  open  that, run  it,  and  create  my  data  table. I'm  going  to  close  this  up and  I'm  going  to  go  to  the  next  workflow. And  now  I  have  that  table and  I'm  ready  to  run my  bigger  workflow  here. Again,  I  created  formats, I  standardized  attributes  here. I  went  through  and  I  did  some  recoding. I  wanted  to  get  some  latitude and  longitude  for  different  cities. There  was  a  table that  got  opened  with  those that got  updated  into  this  table and  I  closed  that  table because  I  didn't  need  it. I've  got  a  tabulate  report and  a  couple  more   Graph Builders and  so  forth  and  distribution. Let's  just  run  this  through. Obviously,  it  takes  a  little  while to  build  these  things, but  then  once  you've  built  it, you've  got  your  reports and  you're  ready  to  roll so  you  can  see  how  quickly  it  runs. This  is  a  distribution  that  I  ran and in  this  particular  data  table because  it's  a  single  entry for  everything,  I  went  in and  I  was  reading  and  to  figure  out, okay,  what's  fraud  and  what's  not. You'll  notice  in  this  loan  status  here, this  Exemption  4s  were  the  ones  that  were  fraudulent and  so  those  are  the  ones that  are  interesting. If  I  click  on  that  you  can  kind  of  see that  almost  all  the  states had  maybe  something  in  there. Not  quite  all  of  them,  but  some  of  them. And  just  to  be  fair, there  was  a  fairly  large  amount  of  money here  that  was  paid  back  in  full. People  did  pay  back  some  of  the  loans. I  was  trying  to  see if  things  made  differences as  to  whether  they  were  corporations, limited  liabilities,  sole  proprietorships, or  anything  like  that. And  I  couldn't  really  see that  it  fell  one  way  or  the  other, whether  it  was  fraud  or  not  fraud. This  particular  graph  here, and  I'm  going  to  go  open  up my  little  side  panel  that  helps  me  cheat  a  little  bit. This  is  a   Graph Builder  that's  showing a  new  feature  in  JMP  17 that  I  want  to  point  out here  at  the  bottom. And  it's  the  tabular  summations. This  is  just  a  bar  chart  that  shows the  total  amounts  for  each  loan  status and  it's  overlaid  with  the  business  type. Again,  just   curious, did  it  make  any  difference if  it  was  a  new,  existing  business, if  it  was  younger,  older, or  start  up? And  I'm  not  sure  that  really  mattered. But  the  cool  feature  down  here is  just  to  kind  of  see. These  are  the  summations you can  now  get  in   Graph Builder. I  can  show  you  how  to  do  that. If  you  open  up the  control  panel  under  here, there's  something  called  a caption  box, and  the  location  here  is  an  access  table. And  so  if  I  drill  down  on  this, there's  an  access  table  here and  that's  been  selected. That's  what  allows  you to  be  able  to  do  that. It's  showing  the  sum  of  each  of  those different  business  age  descriptions. And  then  my  graph  behind  here— I'm  going  to  show  you  in  a  minute— uses  the   Axis Reference  Line, which  is  another  part  of  that. That's  where  that  comes  from. Let's  look  at  that  one. This  was  a  graph that  was  actually  done with  the   Axis Reference  Lines. And  this  is  actually  showing the  average  current  approval  amount per  business  type, whether  it  was  rural  or  urban. And  the  comment  here  that  Lisa  made  was, "I  don't  know  what  it  is  to  be  an  employee with  stock  options,  but  they  really cleaned  up  or  racked  up  here," if  you  look  at  that  bar  chart. Again,  I  was  looking  at  these, trying  to  figure  out does  it  have  anything  to  do with  whether  it's  limited  liability. There's  that  partnership  here, there's  two  of  those because  I  think  you  can  be  an  individual and  then  you  can  also  have  a  partnership. I  didn't  see  any big  differences  with  that. But  again,  here's  this access  reference  line that's  showing  the  main  rural and  the  mean  urban. That  is  just  a  marker that  you  can  now  add  in  there so  that  you  have  a  little  bit  of  measurement when  you're  doing  those  graphs. This  particular  graph   is  using  the  latitude  and  the  longitude which  we  brought  in. And  this  is  kind  of  showing  Hawaii. And so in  this  particular  graph, it  again  is  showing sort  of  a  new  feature  for  the... If  you  go  in  here  and  look  at  the  graphs and  look  at  the  background  maps, it's  doing  the  street  maps and  there's  been  some  things  added underneath  the  selections  for  URLs. And  this  is  using  the  map  box  satellite. So that's  a  nice  look,  a  nice  graph. Again,  this  is  Hawaii. And  I  think  our  guy  from  my  first  map was actually  in  this  part  of  the  islands. The  reddest dot  up  here is  this  one  up  here. I'm  not  real  sure  about  that, but  I  do  remember  Hawaii  was  all  red so it  seemed  like  there  was  a  lot that  went  on  in  Hawaii. This  particular  tabulate I  wanted  to  show  you  here, this  is  actually  showing a  feature  in  JMP  17 that  was  introduced  earlier  in  the  17  cycle. You  may  have  seen  it  if  you  use the  EAs,  but  we  created  this  ability to  do  pack  analysis  columns on  the  right- hand  side. And  so  the  way  that  you  do  those is  you  can  go  over  here and  I've  used  the  current  approval  amount with  the  forgiveness  amount. They'r e  packed  together by  right- clicking  and  saying pack  the  columns so  they're  packed, can  unpack  them. And  then  there's  a  template. I  had  gone  out  and  the  templates in  here  with  the  first  and  the  other, and  you  can  do  a  name  selector of  using  a  comma  or  something  else. But  I  went  in  here  and  added a  little  spacing  in  here and  I  changed  it  to  brackets. I  believe  the  default  comes  up with  no  spaces  and  it's  parentheses. It  kind  of  makes  a  nice  report. Again,  I  was  looking  at  the  exemption  part  of  this, which  is  the  fraud. Here's  the  paid  in  full, which  is  a  little  bit  more  money  maybe. But  if  you  look  at  the  exemption  part, I  kind  of  honed  in  here. And  this  is  where  I  was  looking  at... Here's  corporation that  said  30%  of  the  total and  the  limited  liability, the  LLCs  are  about  29%. Just  some  interesting  data  points. Like  I  said,  I  give  you  the  references to  these  and  if  you  want  to  go  and  dig  in and  look  yourself,  you  feel  free. That  wraps  up the  Workflow  Builder  demonstration. I  want  to  close  and  I  just  want  to  say thanks  to  the  development  staff that  worked  really  hard  on  designing the  Workflow  Builder,  Ernest  Vasseur, Dave  White,  Evan  McCorkel, just  to  name  a  few. There  were  a  lot  of  people that  worked  on  this. Julian  Paris  was  also  really  key in  the  design  phase  and  prototyping and   helping  a  little  bit with  initial  testing. Again,  I   thank Lisa for  the   Graph Builder  assistance  as  well. There  are  references  here,  like  I  said, included  for  the  PPP  data, so  you  can  look  at  that. And  I  just  want  to  close  with  saying that  I  think  Workflow  Builder  will  be the  best  new  feature  probably  in   JMP 17. I'm  probably  a  little  biased, but  I  think  it's  going  to  save  you  time with  your  data  clean  up  and  prep. I  think  you're  going  to  get  more out of  reusing  recorded and  repetitive  steps that  you  find  yourself doing  maybe  every  day. It  should  simplify  your  work  efforts and  maybe  accelerate  your  daily  processes, but  it's  going  to  leave  you  a  lot  more  time  in  your  day  for  other  stuff. So  try  it  out  and  we'd  look  forward to  talking  with  you  about  it and  good  luck  and  thank  you for  letting  me  share  with  you  today.
If you work with data, you have probably heard the adage that preparing your data for analysis makes up most of the time spent on analysis -- often as much as 80% or more! This talk focuses on using tools in JMP to gain knowledge from messy historic oil and gas drill rig count data. Though specific to a rig count use case, the example applies broadly to anyone who needs to gain insights from data sources, such as Excel with questionable formatting, structure and cleanliness.   The first part of the talk covers importing raw data obtained from a website, restructuring data tables, identifying errors and recoding errors with Recode. This is the 80% that must be done to get to the more exciting 20% where we glean insights. Next, I demonstrate how to use Graph Builderto gain insights from the data. The talk wraps up with using dashboards to share the insights.     Thanks  for  joining,  everybody. My  name  is  Jason  Wiggins. I'm  a  senior  systems  engineer  for  JMP. I come  from  a  fairly long  career  in  oil  and  gas and  manufacturing and  RMD  and  and  quality. What  we're  here  to  talk about  today  is  messy  data. I  really  believe  anyone  who  analyzes data  has  encountered  a  data  set somewhere  along  the  way  that  needed  a  lot of  work  before  it  could  be  analyzed. Cleaning  or  shaping  data  can  be  difficult, but  hey,  while  that's  a  mess,  I  find  it that  it  actually  can  be  quite  frustrating, especially  when we  have  to  do  it  manually. Some  of  my  messiest  data  problems have  come  from  Excel  spreadsheets. I  believe  there's  a  couple of  reasons  for  that. Excel  is  great  for  many  things,  but for  analysis,  it's  just  not  that  great. Part  of  the  reason  is  that  it  doesn't impose  the  format  for  the  data. M y  mind,  data  formats  are  as  varied as  the  imaginations  of  people  using  them. Excel  files  also  tend  to  be  hand  curated,. The  likelihood  of  misspelling and  inconsistent  naming  conventions are  really  quite  common. The  example  I'm  presenting  today comes  from  my  career  in  oil  and  gas. I  believe  that  the  problem and  solution  that  I'm  going  to  show can  be  found  in  many  of  your  data  sets. my  goal  for  everybody is  to  see  a  few  possibilities for  simplifying  that  front end  of  our  analytical  workflow. Let's  take  the  exclamation  point  out  of, wow,  that's  a  mess, and  just  say,  yeah,  but  no  problem I  understand  how  to  deal  with  this. Now,  I'm  also  using  an  example where  there  are  data  available  that  you can  download  off  the  Web. I'm  going  to  be  uploading my  presentation  materials in  case  anyone would  like  to  practice some  of  the  concepts  that  I'm going  to  work  through  today. All  right,  so  let's  get  to  this. Our  problem. Baker  Hughes  has  been  publishing rotary  rig  count  data. These  are  the  rigs  that  drill  for either  oil  or  gas  all  around  the  world. They've  been  posting  active  rotary rig  counts  for  generations. A rig  count  is  a  very  important  business barometer  for  the  oil  and  gas  industry. If  you're  in  the  industry, you  don't  need  the  explanation. You're  consuming  these  data  on  a  daily, if  not  weekly  basis,  but  it's  used  broadly all  the  way  from  downstream  to  upstream and  exploration  or  challenge. As  I  laid  the  groundwork, we  are  going  to  be  dealing  with  Excel  data and  some  of  the  problems that  we  have  with  that. One,  many  of  the  worksheets that  Baker  Hughes  makes  available are  not  in  the  right  format for  analysis  in  JMP. I  also  found  many  errors. This  debt  certainly  isn't  the  most error-prone  data  set that  I've  worked  with  coming  from  Excel, but  there  are a  few  in  there  that  are   doozy that  we'll  focus  on  today. And  really,  they're  going  to  be  around selling  and  inconsistent terms  and  abbreviations. Again,  in  terms  of  the  overall analytical  workflow. In  order  for  us  to  even  get  to  these analysis  and  knowledge  generation, knowledge  sharing, we  have  to  get  our  data  set  into  a  format where  we  can  begin  to  do  that  work. We're  going  to  be  focusing  on  getting data  into  JMP,  blending  and  cleaning, and  then  at  the  end,  we'll  do  a  little  bit of  data  exploration  and  visualization. Ultimately,  what  we're  shooting  for is  to  have  a  data  set where  we  might  look  at rig  count  trends  by  state,  for  instance. These  trends, might  be  very  telling  about the  economics  in  the  industry  over  time. We  may  also  want  to  look  at  more  of  a  time series  based  analysis,  like  a  bubble  plot, where  we  conceive  the  change  in  rigs over  time  for  the  different  states. Again,  in  order  to  get  to  that  point let's  see,  JMP  is  pausing on  me  for  a  second. There  we  go. Okay,  in  order  to  get  to  that  point, we  really  need  to  get  the  data into  something  that  we  can  work  with. this  is  the  final  data  set. This  is  what  we're  going to  be  pushing  toward. I  have  a  couple  of  ways  of  accounting for  date  in  the  data  set. This  is  what  comes  from Baker  Hughes  worksheet. But  I'm  going  to  create a  couple  of  other  variables that  we  may  want to  take  a  closer  look  at  times. we're  going  to  create some  new  variables. A  month  here  variable  and  a  year  variable. We're  going  to  fix  some  spelling  errors in  the  state, and  ultimately,  we're  going  to  join  this with  another  data  set that  has  the  Lat  long  coordinates for  the  capital  city  of  every  state. I  don't  know  what the  best  way  is  to  show that  time  series  growth  and  contraction have  to  choose a  point  capital  cities  available. We  have  to  do  a  few  things  in  order to  make  those  data  sets  connect. That's  where  we're  going. All  right,  that's  essentially what  I  outlined  verbally. But  I'll  pause  for  a  second  and  just  let everybody  take  a  look at  our  analysis  workflow, which  is  that  component, that  data  shaping,  data  blending, data  visualization. All  right,  let's  talk  about importing  data. For  those  of  you  who  want  to  try  this. I  want  to  point  out  that  the  data that  Baker  Hughes  publishes is  in  a  binary  Excel  format. At  this  point,  JMP  does  not  have a  way  to  directly  import  these  data. If  this  were  XLSX,  which  it  used  to  be, I'm  not  sure  when the  binary  file  format  was  adopted, but  it  used  to  be  that  you could  do  a  file  Internet  open and  ping  the  URL and  automatically  download  the  data. But  we  can't  do  that we  have  to  do  an  intermediate  step. It's  pretty  simple. If  we  go  to  the  website  let  me  pull that  back  up  again  real  quick. If  we  click  on  any  of  these  links, it'll  download  the  data. We  open  that  data  up  in  Excel and  then  save  it  as  an  Excel  SX. There  are  ways  to  do  this  automatically, but  they're  going to  happen  outside  of  JMP. For  those  that  really  want  to  explore this  and  make  it  automatic, there's  a  bit  of  a  challenge  up  front. I  will  point  out  some  ways  to  automate and  JMP  after  we  get  the  data  in. All  right,  so  this  is  what we're  looking  at, that's  in  fact  the  Excel sheet  that  we're  going  to  load. All  right,  so  the  first  thing  that  we need  to  do  in  fact. First  things  first,  let's  get  our  data  in Column  headers  that  are  on  row  five,  but  I have  a  couple  rows  of  column  headers. This  is  common  in  Excel. People  use  merge  cells  to  combine text  from  more  than  one cell  into  a  single  label. We  want  to  make  sure that  we  capture  that. We're  going  to  say  data  starts  on  row  five and  we  have  two rows  of  column  headers. I'm  sorry,  column  headers  start  on  row five  and  we  have  two  rows  of  them. Then  we  jump  out  of  layers  and  notes how  to  adjust  for  where  the  data  starts. This  is  good. I  always  like  to  take  a  look at  the  preview  and  just  make  sure  that getting  what  I'm  asking for  and  this  looks  right. We're  importing  the  correct  worksheet. Let's  just  import  that. All  right. This  is  in  a  wide  data  format. That's  not  typically  the  format that  JMP  likes,  doing  an  analysis. Almost  always  we  want  to  be in  a  tall  data  type  format. What  I'd  like  to  have  is  a  column that  has  these  column  headers  as  rows. Then  another  column  that  has  the  rig count  for  each  one  of  the  column  headers. The  operation  that  we  need  to  do for  that  is  a  stack  operation. Let's  just  talk  through  this. I'm  actually  doing the  presentation  in  JMP  17, and  the  reason  I'm  doing  this is  that  there's  a  cool  new  feature  in  17 that  I  find  to  be  so  handy for  this  type  of  work. All  right,  I  forgot  to  do  one  thing. Let  me  back  up  real  quick, just  to   keep  the  dialogue  simple. What  I'd  like  to  do  is  get  rid of  these  summary  statistic  columns. These  will  certainly  make sense  in  a  wide  data  context. They  are  going  to  make sense  if  we  stack  them. we're  just  going  to  delete  those. We  can  do  that,  deal  with  that  in  a  lot of  different  ways,  but  keep  it  simple. We'll  just  delete  it  out  of  the  data  table and  then  go  back  to  our  stack, turn  on  a  preview,  and  let's  just select  all  of  those  columns. This  is  great. One  thing  I  love  about  the  preview  is, first  off,  I  get  to  see  that, yes,  this  is  in  the  shape that  I  need  for  my  analysis, but  I  can  also  make  some  changes and  see  how  the  data  table is  going  to  look before  I  actually commit  to  making  a  data  table. If  you  remember,  we  wanted the  count  to  be  rig  count. That  data  column, we  want  to  be  rig count. Source  column. We're  going  to  name  this  state  and  type, and  then  we're  going to  separate  those  in  the  end. Another  example  of  creating  new  variables, we're  actually  going  to  split  those  apart so  we  can  use  them as  two  separate  variables. But  for  right  now,  I  think that's  pretty  descriptive. If  I  hit  enter  or  click  anywhere  else and  I  get  to  see  the  update and  yes,  indeed, this  is  something  that  I  want. That's  data  shaping. Step  one. We  may  have  to  do  several steps  of  data  shaping. In  this  case,  it's  a  simple  example, stacks,  appropriate, and  that's  all  we  have  to  do. All  right,  before  I  show  this  slide, let's  go  over  to  the  data  set. One  of  the  first  things  that  I  do  when I'm  manipulating  data  tables and  working  with  new  data  sets. Is  I  like  to  graph  the  data, plot  the  data, or  show  the  data  in  some  way  graphically to  help  me  see  whether  there  are issues  that  I  need  to  resolve. Distribution  is  a  great way  of  doing  that. I'll  just  pause,  let  everybody  look. It's  probably  small  print. Let  me  make  that  much  bigger. I  think  even  in  the  first  couple  of  bars, few  bars,  half  a  dozen  bars, hopefully  everybody  is  recognizing some  potential  issues. One,  what's  wash? I'm  not  sure  why  Washington was  abbreviated. There's  probably  some  reason  historically, these  data  sets  are  quite  old. We  have  abbreviations  for  west. We  have  abbreviations  for  north  and  south. Turns  out  that  is  the  exact same  abbreviation  as  for  new. We  have  several  issues. let  me  scroll  down. There's  another  doozy  here. Fortunately  there's  only  one  of  them. See  if  I  can  find  it. We're  looking  for  Tennessee. There  we  go,  Tennessee. Everybody  take  a  look  at  Tennessee there  and  see  if  you  notice  the  problem. We're  missing  an  S,  right? That's  something  that  I  would  do. I'm  a  horrible  speller  when  I'm  typing, especially  when  I'm  typing  fast. We  found  one  spelling  error  there. Now  the  trick  is,  how  do  I  parse that  out  and  fix  all  of  these  errors? More  importantly, how  do  I  do  that in  a  way  that  doesn't involve  1000  steps for  renaming  these different  abbreviations  and  misspellings? That's  where  regular expressions  come  in. In  Rico,  there  are  a  variety  of  different tools  to  deal  with  data  cleaning  problems. I  always  like  to  choose  the  simplest  one, but  often  they're  not  that  simple. What  are  regular  expressions? Well,  they're  sequences  of  characters that  specify  a  search  pattern. a  simple  example, I've  got  cats  and  dogs,  data  set. Each  one  of  these  characters  represents a  pattern,  a  search  pattern  for  the  text and  then  a  command  for  what  it is  that  we're  going  to  return. Why  we  use  them? Well,  they're  very  compact  and  flexible. I  can  solve  really  complex character  data  problems with  a  small  handful of  regular  expressions, and  I  just  can't imagine  how  else  I  might  do  that. It  definitely  takes  messy  problems and  makes  them  simpler, but  got  to  learn  a  new  concept. If  regular  expressions  are  brand  new to  you,  I  have  a  resource  that  I  like. It's  free,  it's  online,  it's  Debug X. One  of  my  favorite  parts  of  this  is  the cheat  sheet  or  the  quick  reference  guide. I  want  to  understand  what those  search  characters  mean. I  can  look  at  this  quick  reference and  I  can  start  to  begin  to  piece together  regular  expressions. I  don't  often  use  the  test  part  of  the  website, but  if  you  had  some  example  text and  wanted  to  test  your  regular  expression against  it,  you  could  do  it  here. I  prefer  to  do  that  and  JMP. It  just  saves  me  time. JMP  actually  has  something  that  you'll see  that  we  can  use  in  a  similar  way. All  right,  so  that's  what  they  are. Good  place  to  go  learn  about  them and  let's  take  a  look  at  how  we  use  them. All  right, I'm  only  going  to  fix  a  couple  of  these, and  we'll  fix  a  couple  of  them  together. I'll  speak  through  what  the  regular expression  means  as  I  type  it. Before  we  even  get  there. I'd  like  to  recommend  that  when  we're doing  this  type  of  work, that  when  we  recode,  we  want  to  dump the  results  into  a  formula  column. Reason  being  is  if  we  decide  to  add on  to  this  data  table, those  fixes  will  persist  in  the  future. It's  the  only  time  we're really  going  to  be  using  it. Maybe  we  don't  use  the  formula column,  but  I  prefer  that. In  fact,  I  really   like  to  see that  right  up  top  for  my  personal  use. How  do  we  get  to  the regular  expressions? We  have  a  replace  string  utility. Again,  there  are  many  other  utilities. Always  choose  the  simplest  one. If  we're  just  trying  to  pluck  first and  last  word,  for  instance, I  don't  want  to  write a  regular  expression  for  that, but  in  this  case, I  got  some  mess  I  need  to  clean  up. we're  going  to  use a  replace  string  to  do  that. A  couple  necessary of  ingredients. You  have  to  make  sure  that  the  Use  regular Expressions  check  box  is  turned  on. Remember  that  preview the  Debug  X  shows. Well,  here's  our  preview. We  only  have  to  type  once and  adjust  once,  hopefully. then  we  get  to  see  the  results. let's  try  that  out on  a  couple  of  these  examples. Again,  I'm  going  to  speak what's  happening  as  I  type. Let's  work  with  the  new  first. Remember,  N.  can  either  be  new  or  north. If  I  look  at  that  dataset, seems  logical  to  fix  the  new one  first  and  then  work  on  north. I'm  going  to  look  for  the  character  N in  the  string  in  the  row of  the  column  that  I'm  analyzing Then  I'm going  to  look  for  a  period. The  reason  I'm  putting  this  in  brackets  is that  at  a  period  also is  a  search  character. It  means  any  character. To  be  honest, you  could  leave  the  brackets  out  of  this is  still  going  to  work, but  it's  a  little  bit  confusing. If  we're  using  a  specific  character that  may  exist  as  a  search  character  also sometimes  it's  nice  to  bracket  them  out. Makes  it  a  little  more  interpretable. All  right,  after  N  we  have  a  space. This  backslash  is  white  space  character and  I'm  actually going  to  type  out  Mexico. Now  I  could  use  a  search  pattern, a  search  character  here  like a  W  star  or  something  like  that. I'll  explain  that  a  little bit  more  as  I  go. But  sometimes  when you're  writing  regular  expressions, it's  handy  to  have  something that's  a  little  bit  more  readable. I'm  choosing  to  type  out  the  words  here and  there  aren't any  problems  with  those  words. I'm  just  going to  reference  them  directly. Now  with  a  regular  expression I  can  use  logic. I  can  deal  with  all the  new  issues  but  one  in  a  single  line. I'll  tell  you  why  we're  going  to  deal with  the  New  Hampshire  one a  little  bit  differently. Let's  do  the  New  Mexico. New  York. That  pipe  again  is our  logical  ore  and  then  we'll  do  Jersey. The  reason  I'm  putting parentheses  around  this is  that  it  allows  me  to  return  the  result of  everything  inside  the  parentheses and  every  parentheses  left  to  right  is referenced  by 1, 2,3 so in numerical  order. I  think  that  is  probably  good  except  oh, we  do  need  to  have  that  last  part, that  dash  land or  if  there  were  offshore  rigs, I  don't  think  there  are  in  New  York, but  we'd  want  to  capture  that. There's  my  any  character  and  I'm looking  for  any  character  that  happens more  than  once and  put  parentheses  around  that. Here's  where  the  magic  is. Alright,  so  now  I  can  type  new. Now  I  have  a  white  space  in  there so  I  don't  need  to  actually  put  a  white space  in  the  replacement  text. But  this  one  and  two  reference what's  inside  the  parentheses. The  one  is  going  to  be the  logical  result  of  that  search inside  the  first  parentheses and  then  the  second  it's  going to  capture  all  the  characters that  go  on  behind  it. Let's  scroll  down  and  see  how  we  did. We  get  a  star  for  anything  that  was recoded  and  it  looks  good. New  Jersey,  New  Mexico,  New  York. I  think  we've  done  our  job  here. Again,  another  just  classic  reason  why you  want  to  use  regular  expressions  is that  I  got  in  a  single  statement, I  was  able  to  fix  three  problems. Let's  do  one  more. Let's  just  do  a  couple  of  more  with  our end  and  then  we'll  move  forward  with  this. All  right,  so  we'll  go back  to  replace  string. You  can  have  as  many  regular  expression replace  string  commands  as  you  want within  recode. Sometimes  again,  that's  nice. You  could  get  really  clever and  fix  a  lot  of  issues with  a  single  regular  expression. Sometimes  it's  a  little  more  readable  if we  tackle  them  maybe  in  a  few  at  a  time  or even  in  the  unique  cases  one  at  a  time. we  use  regular  expressions let's  deal  with  the  New  Hampshire  problem. Same  thing  we're  going  to  do N character followed  by  a  period. Then  we're  going  to  do, actually,  let's  not  do  parentheses because  we  don't  want  the  Hamp  period. We  want  to  look  for  it. Hamp, end  with  a  period, and  then  we  want  to  capture  all the  text  that  is  behind  that. now  we  do  New  Hampshire with  the  Dashland  behind  it, scroll  down  and  it  fixed  it. That's  great. I  could  do  something  similar. We're   running  short  on  time. I  want  to  make  sure  we  at  least  get to  some  of  the  graphing  part  of  this. But  we  could  do  a  similar  set  of  steps to  deal  with  the  north  and  the  south and  the  west  and  the  many  others  and  what that  looks  like  in  the  final  data  table. If  I  look  at  the  recoded  formula  column. Once  again,  very  nice  to  have this  because  it  is  portable. Those  are  the  regular  expressions that  I  use  to  fix  all  the  data problems  in  that  column. Again,  the  benefit  of  doing  this, is  if  you're working  with  a  really  huge  data  set, could  you  imagine  going  through and  hand  typing  those every  time  you  want  to  do  an  analysis on  new  data,  for  instance? It's  pretty  arduous. We've  saved  ourselves  a  lot  of  time with  just  a  little  bit  of  creativity. That's  regular  expressions. Again,  my  intent  wasn't  to  kind  of  teach regular  expressions, but  really  show  that  as  an  opportunity for  folks  to  investigate that  can  help  deal  with  many  of  your messy  data  problems. let's  play  around  with these  new  variables. I  think  let's  just  go  back  to... Yes,  I'm  actually  going  to  show  it in  the  same  table that  I  have  the  results and  I'll  just  show  you  how  I  got  there. When  we  finally  got  all  that  data cleaned  up,  we  had  a  date  column. If  we  look  back  here,  hey, there's  our  date  column. I'll  use  this  table. A  little  wishy  washy  here  today, but  just  so  you  can  see  what  this  looks like  if  we're  starting  from  scratch, let's  just  use  this  one.  I'll  reference back  to  the  complete  table. This  new  formula  column  has  a  date  time group  in  it,  which  is  really  handy. Now,  you  noticed  that we  are  in  a  day- month- year  format, and  that's just  when  the  record  was  made. Now,  in  an  analysis  context,  really, we  may  want  to  just  look  at  month  and  year so  we're   combining  some  of  those, making  the  time  component a  little  more  coarse. That  can  be  helpful  in  seeing the  trends  that  we  want  to  see. we're  just  going  to  choose  month,  year, JMP,  automatically  dumps that  formula  into  a  new  column. If  we  look  at  that,  that's  what  we  did. We  can  do  the  same  thing  with  year new  formula  column. Date,  time,  year. Now  we  have  two  other  variables  that  we can  use  in  Internet  analysis  context. Remember  the  fact  that  we  have  land and  offshore  tags  on  every  state? We  probably  want  to  split  those  out,  and to  do  that,  just  create  a  formula  column. Let's  see, I  actually  used  regular  expression  on  it, but  you  could  play  around  with  other ways  of  dealing  with  that. I  have  a  column that  has  a  regular  expression that's  plucking  off  the  Alabama. Now,  if  there  are  states  here, we  could  use  that  first  word, but  I  don't  think  if  I  remember  right, the  dash  doesn't  work. same  thing  with  type. Look  at  that  formula. I  just  have  a  regular  expression  that's looking  for  the  last  piece  of  that  string and  then  returning only  the  last  piece. I've  created  four  different  variables that  I  want  to  use  for  analysis. Let's  do  one  more... Part  of  the  reason  that  we  cared so  much  about  the  state  names. One  is  that  if  we  have  multiple representations  of  the  state, well,  that's  going to  complicate  our  analysis. But  the  other  rationale  for  doing  that  is if  we  need  to  join  it  up  with  something else,  like  Lat  long  coordinates. For  each  state,  we  need  to  have the  state  names  consistent. The  data  set  that  I  have, you  could  be  either  all  caps or  title  case  to  do  the  join. Joining,  that's  cool  we  can  do  that with  the  data  set  that  in  fact, I  think  I  have  an  intermediate table  that  we  can  open  and  look at  here  before  we  do  the  join. Let's  do  a  little  housekeeping  here. I'll  leave  that  one  up  in  case we  need  to  go  back  to  it. Okay,  so  we  have  our  state, it's  been  fixed,  recoded. We've  created  that  state  variable. Now  I  want  to  join  it  with my  US  state  capitals. Reopen  that  file. I'm  going  to  use  state  one  and  match it  with  state  in  the  state  of  table. Joins  and  JMP. I  always  like  to  thin I  want  to  invoke  the  join from  the  table  I  want  on  the  left. Then  I'm  going  to  choose the  one  that  I  want  on  the  right. I'm  going  to  match  again. We're  going  to  match  on  state  and  state. Interjoins  going to  be  appropriate  here. I  mean,  if  we  had  any  missing  state  names, I'm  not  sure  how  informative those  might  be  for  our  analysis. I really  don't  want  them. In  fact,  they  don't  exist. An  inner  joint  is  appropriate. it's  just  going  to take  anything  that  matches. Any  non matches  are  left  out  again. Hey,  I've  got  this  great  preview  in  17. We  can   look  at  that  a  little  bit. Let  me  get  down  to  the  scroll  bar and  right  at  the  very  end  I  should  see the  capital  and  the  latitude and  longitude. Now  I  ended  up  with  this  extra  state  too. if  I  don't  want  that,  I  can  select the  columns  for  the  join  table. If  ultimately  this  is  going  to  a  script, maybe  you  want  to  do  that just  so  you  don't  have  to  go  into week  columns  after  the  fact. But  that's  it  for  a  join. We  just  glued  those  two  data  sets  together by  matching  state  names and  we  have  latitude  and  longitude. Maybe  capital  is  not necessarily  important. We  can  just  get  rid  of  those  columns. Back  to  the  automation  component. We  did  several  steps  along  the  way, but  for  each  step  that we  do  in  the  tables  menu, there's  a  source script  associated  with  it. If  we  have  each  of  the  tables, the  intermediate  tables, that  get  us  to  the  final  result, we  could  steal  that  script, glue  it  together  and  automate. That  workflow. All  right,  that  is  through  joins. Let's  play  a  little  bit  in bubble  plot  and  Graph  Builder. Let's  do  Graph  Builder  first and  then  with  whatever  time  we  have  left, we'll  get  to  a  Bubble  Plot. Again,  we're  going  to  enjoy the  fruits  of  our  labors  here. We  exercised  a  lot  of  creativity. We  got  a  data  set  that  is analyzable  and  graphical. let's  see  what  we  can  learn  from  it. All  right,  so  because  we  have  state  or built  in  shape  files  in  junk, so  it'll  recognize  state  names  and  build state  boundaries  in  a  map in  the  graph  frame. We're  going  to  drop  that  state into  the  map  shape  role. Now  what  I  may  be  interested  in  is the  rig  count  by  state. I'm  going  to  color  by  this. to  me  I  don't  find  that  blue  and  red are  also  informative  for  what I'm  trying  to  get  across. I  really   prefer  spectral  colors, but  no  problem. We  can  change  the  gradient with  a  simple  right  mouse  click. I  like  white  to  red. It  seems  to  work  for  me  and  that's  great. We  the  rig  count  by  state. But  again,  this  is  really  course  right. We  have  a  lot  of  years  of  data, so  22  years  of  data  here. We  may  want  to  drill into  these  a  little  bit  deeper. Build  another  graph, and we look at date  and  rig  count. We're  going  to  turn  off  that  smoother. There  we  go. That  just  averaged  everything. What  I'm  doing  this  for  is I  want  to  steal  the  script  behind  this and  I'm  going  to  use  it to  get  those  cool  hover  graphs. Save  script  to  the  clipboard, go  back  to  this  map right  mouse  click,  hover  label. I'm  going  to  paste  the  graph  with, let  me  see  how  I  did. JMP  knows  that... Let's  just  click  down  here because  I  think  we're  getting  close. JMP  knows  that  I'm  only  looking  at  Texas, so  it's  only  going  to  give  me  the  rig count  trend  over  time  for  Texas. I  believe  that  is  fairly  handy. If  we  look  at  Louisiana,  for  instance. Well,  let's  look  at  North  Dakota. North  Dakota  is  interesting  because of  the  shale  unconventional  reservoir boom  that  happened  up  there. They  had  a  pretty  big  spike. They  had  a  pretty  big  drop  off, and  then  they're  building again  in  terms  of  bricks. We  are  drilling  into  this. But  hey,  this  actually  has land  and  offshore  in  here. Set's  split  those  out  to  turn our  control  panel  back  on. We'll  drop  type  down  on  page. We'll  get  a  different  graph for  each  of  those. Maybe  we'll  make  this a  little  bit  smaller, a  little  bit  smaller  still. Maybe  I  want  to  show  missing  data. Now  I  have  boundaries  for  all  the  states but only  the  ones that  have  offshore  rigs  are  colored. Now  if  I  hover  over  Louisiana, interestingly,  you  can  notice a  decline  in  rigs  off  the  coast of  Louisiana  in  the  gold. That  coincides  a  lot  with  what  we  see in  these  unconventional  wells that  are  really  increasing  in  number  quite rapidly  in  states  in  the  US. We're  able  to  drill  into  a  few  trends. We  were  able  to  do  that  because we  massage  the  data. We  got  it  into  a  format that  we  can  actually  use. I  think  we're  up  on  time. I  will  post  these  materials   out  onto  the  Web, and  you'll  have  example  scripts  in  each one  of  the  data  tables  that  you  can  run to  get  the  final visualization  that  I  was  working  on. then  you  can  try and  recreate  it  yourself. Also  I'll  have  two  example  data  files, but  I  would  recommend if  you're  going  to  try  this, to  actually  download  the  data, and  then  you  get  the  full  experience. With  that,  I  really  hope  that  I've highlighted  a  few  different  things that  could  potentially save  you  a  lot  of  frustration in  your  data cleaning  efforts. With  that,  thank  you  very  much.
Anudeep Maripi, Magic Leap   This paper presents how Magic Leap’s Eyepiece Manufacturing team leverages the changepoint detection and correlation matrix functions in the multivariate control charts to easily detect process drift, diagnose issues, and detect the exact moment when an issue occurred -- a previously impossible functionality. Magic Leap’s lithography double-side imprinting process requires precise wafer placement, highly accurate photoresist dispension, and master-template to wafer alignment. Process drifts such as robot wafer placement errors, wafer alignment variation, master-template alignment errors, and measurement metrology variability can cause excursions that result in significant yield loss. These process drifts are often subtle, gradual and interdependent on other parameters that traditional control charts fail to detect.   Our team used changepoint detection and correlation matrix to create an interactive dashboard that collects changepoint time stamps for output parameters and creates a phase in the existing control charts for input parameters. Multiple changepoints are handled using phase columns and correlations with input parameters, template changeovers, and PM or hardware upgrade activities. The dashboard’s correlation coefficient matrix between inputs and outputs compares the correlation before and after a detected changepoint. This dashboard became our daily driver to quickly find faults/process drifts and achieve high yield standards.     Hello,  everyone. This  is  Anudeep  Maripi. I'm  a  senior  process  engineer for  optics  lithography  at  Magic Leap. Today,  I'm  going  to  demonstrate fault  detection by  upgrading  control  charts with  change  point   detection  in  JMP, and  show  you  how the  change  point  detection  integration with  control  charts is  highly  useful on  our  manufacturing  shop  floor. I'll  start  the  presentation with  a  brief  introduction about  our  company,  Magic Leap, and  the  optics  manufacturing, the  DMAIC  problem- solving  approach and  the  JMP  tools  that  we  use in  the  manufacturing, our  challenges  that  we  face in  the  control  phase  of  DMAIC, upgrading  existing  control  charts with  change  point, and  by  doing  so,  detecting and  diagnosing  the  faults  in  a  process some  case  studies  that  we  regularly  use at  Magic Leap  and  a  JMP  demonstration on  how  to  create a  change  point  dashboard and  integrate  change  point into  your  control  charts, and  few  takeaways. To  start  with,  at  Magic Leap, we  envision  a  world with  physical  and  digital  are one. Our  mission  is  to  amplify  human  potential by  delivering a  most  immersive  AR wearable  platform, so  people  can  intuitively  see,  hear, and  touch  digital  content in  the  physical  world. Our wearable  device transforms  enterprise  productivity by  providing  AR  solutions. For  example,  collaboration  and  co- presence between  remote  teams  to  work  together as  if  they  are  in  the  same  room, 3D  visualizations  to  optimize  processes, augmented  workforce that  helps  train  and  upskill  workers using  see- what- I- see  capability. These  are  only  a  few  examples. The  device  has  industry's  leading  optics with  large  field  of  view, best  image  quality, high  color  uniformity, and  dynamic  dimming which  helps  display a  high- quality  virtual  content in  our  real  world. The  optical  lens that  is  used  in  this  device is  a  small  component but  a  very  critical  component of  Magic  Leap  2, enabling  best- in- class  image  performance. The  optical  lens goes  through  25  plus  complex manufacturing  process and  metrology  stations during  the  manufacturing. It  is  tested  over  100  plus critical- to- quality  parameters. Each  individual  process  needs  to  reach high  99%  yield  targets to  attain  90% Rolled  Throughput  Yield  targets. We  cannot  achieve  these  high  yields and  process  capability without  the  help  of  JMP  statistic  tools. One  of  these  high- yielding  processes is  imprint  lithography where  I' m  responsible for  quality  and  delivery  tools. This  process  has  many  steps, and  that  is  visualized on  the  right- hand  side  of  your… The first  step  is  where  we  align the  mask  to  the  template and  then  dispense fo r  resist, lower  the  template  and  EUV  cure to  replicate  the  exact  pattern from  the  mask  template  to  the  wafer. This  imprint  process  alone has  22  critical- to- quality  parameters that  operates  between  tight  control  limits such  as  just  20  nanometers  film  thickness and  100  microns  drop  accuracy. As  you  could  see, these  tolerances  are  extremely  small. For  reference,  human  hair is  just  20  microns  to  100  microns. Thus,  the  imprint  lithography  requires a  precision  placement  and  alignment that  are  indistinguishable to  the  human  eye. Meaning  we  rely  a  lot  upon onboard  metrology  and  vision  systems that  generate  tons  of  data from  large  numbers  of  input  parameters. JMP  helps  us  to  easily  analyze these  large  data  sets in  our  problem- solving  processes. We  use   DMAIC  approach in  our  problem- solving  process. Following  are  the  JMP  tools that  we  use  in  every  phase of  our DMAIC  processors,  respectively. Like  many  manufacturing  teams, we  found  out  Control  phase as  most  challenging  and  overlooked  phase, because  at  a  start up  like  Magic  Leap, we  continuously  evolve,  make  improvements, and  iterate several  new  designs  and  revisions. The  control  phase  is  overlooked because  it  takes  longer  time for  the  confirmation. For  example, we  find  a  problem  or  an  issue, say  like  process  exceeded  control  limits and  giving  low  CPK  values, resulting  in  low  CPK during  our  Control  phase. We  define  the  problem, we  measure  the  historical  data, analyze  the  problem by  multivariate  methods and  correlation  matrices, improve  the  problem by  implementing  the  corrective  actions, by  conducting  DOEs and  response  screening. But  once  this  is  resolved, we  move  on  to  our  next  severe  problem and  move  on  to  severity  3  problem once  you  resolve  the  severity, severity  2  problem, and  the  cycle  continues, overlooking  the  most important  phase  of   DMAIC, which  is  the  control  phase. This  was  the  reason  we  came  up with  the  change  point  detection  dashboard, which  is  fast  and  efficient. It  pinpoints  issues  faster, easier  to  visualize  faults  and  changes. Even  operators  who  are  not  proficient with  the  statistics can  use  this  dashboard to  diagnose  the  issues. For  example,  before  introducing this  change  point  dashboard on  our  manufacturing  shop  floor, the  escalation  progress is  like  the  left  sideway, where  the  operator  monitors the  yield  dashboard and  identifies the  yield  losses  and  faults. Operator  reports  to  technician to  look  into  the  faults. If  technician  unable  to  resolve escalates  the  engineer, engineer  analyzes  the  control  charts  data using  control  charts  alarms  and  trends, and  then  manually  join  these  outputs to  inputs  and  do  the  correlation and  find  out  and  diagnose  the  fault. And  then  engineer  implements the  corrective  actions. But  after  introducing our  change  point  dashboard on  the  manufacturing  floor, the  operator,  technician,  and  engineers monitor  the  change  point  dashboard on  the  shop  floor. If  change  is  found, they  look  into  correlation  of  inputs before  the  change  and  after  the  change and  easily  diagnose  the  issue, and  then  easily  improve the  corrective  actions. This  helped  us  easily  to  transition into  our  TPM  model, where  operators,  technicians  and  engineers are  all  coming  together to  minimize  the  faults. The  change  point  dashboard makes  our  control  phase  very  efficient. I'm  going  to  demonstrate in  the  following  slides on  how  this  is  efficient as  well  as  how  to  make this  change  point  dashboards. Before  showing  the  change  point  dashboard, I  want  to  show  you  our  journey from  traditional  control  charts  monitoring to  change  point  detection  dashboard. While  the  traditional  control  charts are  very  helpful  to  monitor  excursions in  our  process  using  Westgard  rules, they  show  the  immediate out  of  control  points, but  they  often  miss the  subtle  drifts  in  the  process, which  are  very  gradual but  still  significant. These  control  charts  are  very  helpful to  monitor  the  time  based and  also  give  us what  parts  are  impacted  by  this. But  as  I  mentioned  before, we  have  22  critical  parameters in  this  process  alone and  there  are  tons  of  input  parameters that  impacts these  22  critical  parameters, which  means  we  need  to  have many  control  charts  monitoring, at  the  same  time, to  have  a  stable  process. But  then  we  move  on  to  model- driven multi variate  control  charts. These  control  charts,  we  monitor both  inputs  and  outputs  together, and  we  immediately  get a  correlation  between  them. But  unfortunately,  these  doesn't  have any  time  series  or   port IDs to  help  and  deploy  this in  the  manufacturing  shop  floor. The  multivariate  correlation is  very  important  for  us, giving  the  relationship between  our  outputs and  inputs  in  the  process. And  this  again  doesn't  have the  time  series  data like  the  control  charts to  identify  when  the  issue  has  started. The  change  point  detection, whereas  we  found  very  helpful to  easily  detect  when  that  fault  or  change or  significant  abrupt  drift happened  in  our  process. But  it  only  gives  a  value when  a  change  point  occurred, it  doesn't  have  the  time  series and  not  very  valuable as  a  monitoring  tool on  our  manufacturing  shop  floor. Each  one  has  its  own disadvantages  and  advantages. We  combine  all  of  these  control  charts into  one  dashboard and  leveraging  the  control  point detection  functions  values  into  these. Our  next  logical  step  is  regression which  is  cause  and  effect  analysis. Find  out  the  cause  and  effect and  interlock  the  tool  inputs, stopping  the  failure  on  your  CTQs critical- to- quality  parameters or  on  your  outputs and  also  move  to  prediction  analysis. This  is  one  example  to  show  you how  the  control  charts  looks  different when  you  integrate the  change  point  detection. For  example,  on  the  left, I  have  a  control  chart without  the  change  point  detection. You  can  see  there  are  few  ups  and  downs between  the  control  limits. But  when  I  take  this  value  of  256, which  the  change  point  detection  gives  you when  you  run  for  the  same  data, and  take  this  and  integrate into  the  same  control  chart, you  could  see the  visualization  got  better. In  this  case,  before  the  change  point, you  have  a  demo  of  variation, you  have  a  baseline, you  have  the  mean is  below  your  target  line, and  after  the  change  point, your  mean  is  above  the  target  line  now and  your  variation  has  reduced. In  this  way,  you  could  easily  find  out and  visualize what  is  the  change in  your  existing  control  charts. Once  we  found  out how  valuable  this  change  point  is, quickly  telling  us where  the  change  happened or  where  the  fault happened  in  your  process. We  started  using  some  use  cases. We  started  with  some  use  cases such  as  this where  we  introduced  a  change and  we  want  to  validate  the  change. In  this  example,  we  have  a  process where  it  is  giving  low  Cpk  value because  of  the  population  is  mostly distributed  towards  your  positive  side. We  identified  it as  a  mean  shift. It  needs  to  mean  shift,  and  we  introduced the  known  change  mean  shift. When  we  introduced  this  mean shift, you  could  see  the  process became  more  capable, about  2.3  or  1.5. You  could  see  when  we  deploy the  change  point  detection  function for  the  same  data, the  change  point  gives  you  value and  the  direction whether  your  change  has  taken  place or  not  in  your  process. This  is  highly  visual  for  us to  evaluate  whether  my  known  change, which  I  just  introduced, is  impacting  my  process, the  positive  or negative. The  second  use  cases,  as  I  mentioned, we  have  lot  of  onboard  metrology  tools in  our  process, which  is  giving  a  lot  of  data. We  do  periodical  measurements  and  testing in  these  metrology  tools, but  we  always  rely  upon the  repeatability and  reproducibility  data, which  is  Gase  R&R  data from  these  charts. But  the  Gase R&R  data  is  not  finding  us any  faults  in  the  process or  when  the  metrologist  tools started  drifting in  either  positive  or  negative. But  for  the  same  data, when  we  deployed the  change  point  function, it  immediately  tells  us when  the  subtle  changes and  abrupt  significant  drifts happened  in  this  metrology,  too. That  is  very  high or  highly  helpful  data  to  identify and  make  sure  our  metrologist  tools  are still  stable  and  not  drifting  the  data. The  third  and  most  important  one is  to  detect  and  diagnose the  faults  with ease. This  is  a  dashboard  that  we  worked  on to  deploy  on  our  manufacturing  shop  floor. Our  workflow  would  be,  we  monitor multiple  response  parameters or  critical- to- quality  parameters as  outputs, and  then  we  identify if  there  are  any  changes using  the  change  point  dashboard. At  the  same  time, we  also  look  into  model- driven multivariate  change  point  dashboard where  it  will  tell  us what  are  the  excursions and  easily  correlate  that  excursions to  the  input  parameters. But  the  true  value  came  in when  we  take  this  change  point  values and  integrate  them into  our  existing  control  charts. In  this  example,  we  have  a  baseline  data before  the  change  and  after  the  change, which  means  when  the  fault  occurred, you  could  see  there  is  a  slight  mean  shift and  there's  a  huge  variation in  our  process. At  this  point, we  go  to  our  correlation  matrix. It  is  correlating  our  output  parameters to  our  input  parameters  in  the  process. The  correlation  matrix will  immediately  tell  you which  factor  has  a  strong  positive or  a  negative  correlation during  this  phase  when  compared to  during  your  normal  phase. That  will  immediately  tell  you what  input  parameter  has  gone  wrong or  that  might  have  contributed  a  lot to   this fault  area. That  is  how  we  are  able  to  quickly  find the  faults  and  diagnose  the  issues. From  here  on, I'm  going  to  demonstrate  in  JMP on  how  to  create  these  kind  of  dashboards, at  the  same  time, how  to  create  multiple  change  points. As  you  could  see  in  our  previous  slides, the  change  point  function is  giving  only  one  value. Let  me  turn  on  to  JMP. In  this  example,  I'm  using Boston Housing  data  which  is  from  1978. This  is  one  good  example as  a  beginner  learner for  correlation  and  regression. Many  tutorials  use  this  example. So  I  thought  I  could  use  this  example which  is  readily  available. Everybody's  JMP  sample  data  sets and  see  how  the  correlation  is, and  if  there  are any  multiple  change  points  in  the  data, at  the  same  time, correlate  to  the  practice. The  sample  data  table  looks  like  this. In  our  case, I  want  to  see  the  median  market  value or  median  prices of  the  owner-occupied  homes  in  Boston and  see  how  it  is  variating  with  time. At  the  same  time, there  are  corresponding  factors like  what  is  the  crime  rate  in  the  town or  what's  the  pollution  rate  in  the  town or  the  age  of  other  factors such  as  age  of  the  homes or  dis tance   in  these  regions. You  can  read  the  data about  these  column  names in  the  column  notes that  is  presented  over  here. I'm  going  to  run  the  script which  gives  us… Before  running  the  script, I  also  want  to  tell  you that  I'm  attaching  the  entire  script as  part  of  my  presentation, where  you  could  go  through  the  script or  run  the  script and  it  will  give  you  a  nice  dashboard about   Boston Housing  values  variation and  also  a  correlation. You  could  download  the  script and  look  into  my  score and  you  could  use the  same  methodology  and  the  logic on  how  I'm  able  to  detect multiple  change  points from  the  dashboard  using  the  JS  Subscript. When  I  run  the  script, runs  and  I  get  a  nice  dashboard  like  this. As  I  mentioned, our  problem- solving  framework would  be  like  look  at  the  change  points in  our  median  value of  the  homes  in  Boston. You  could  see  it  gave  us  the  median  value that  there  are  a  big  change  here and  it  tells  you  the  change  point appears  to  be  on  row  373, and  also  there  are several  second of  small  change. When  you  look at  the  model- driven  control  chart  data, you  could  see  there  are  high  excursions in  this  medium  value of  the  housing  markets. You  could  easily  tell what  caused  this  excursion by  looking  into  your  inputs  parameter, that  is  crime  rate. You  could  hover  on  any  of  these and  find  out  what  is  impacting these  huge  excursions in  your  monitoring  variable, which  is  median  value  of  the  houses. As  I  mentioned,  the  real  value  comes  out when  we  take  these and  integrate into  our  control  charts. See  this  one, this  control  chart  will  give  you these  integrated  change  points  from  here. We'll  give  you  a  good  visual. You could see  now in   Boston Housing  market  example, we  have  a  baseline  housing median  value  changes, a  baseline  value  at  20  million. And  then  when  the  housing  boom  happened, or  when  the  change  point  1  occurred, you  could  see  there  is  a  good  amount of  variation  in  the  housing  market. At  the  same  time, there  is  a  mean  shift,  too. But  soon,  there  is  another  change  happen and  the  housing  market  crash resulting  in  your  median  value is  below  your  baseline. This  is  easily  telling  us, giving  us  a  visual  into  parts where  what  is  the  baseline  data, when  the  first  change  occurred, what is and  how  much  is  the  change? The  second  change  occurred and  how  much  is  the  change  now. You  can  compare with  your  previous  two  changes. If  you  as  a  prospect  to homebuyer in  Boston and  you're  looking  into  this  kind  of  data, you  want  to  know   what  might  have  happened that  have  impacted  these  kinds  of  changes. In  your  data  set, you  have  some  nice  factors about  the  crime  zone, the  crime  rate  in  that  same  area in  the  same  period  of  time, and  the  pollution  rates,  et cetera. If  you  look  at  our  before  change you  could  see  the  market  value, how  it  is  correlating  with  your  factors. In  this  case,  you  can  see  there's a  good  amount  of  negative  correlation with  the  crime and  the  negative  correlation with  the  industry  area  and  also  pollution. And  there  is  a  strong  positive  correlation which  is  about   0.8  with  this  phase. Now, look  at  the  second  change  point  1  phase where  the  housing  boom  occurred [inaudible 00:22:46]   mean shifted  upwards. You  could  see  in  this  time  frame your  crime  rate  is  low or  no  correlation  with  these  changes. Your  pollution  is  low  again, but  your  rooms   still  same  strong positive  correlation  in  these  two  phases. If  you  ask  me what  would  have  happened  this  time, I  would  say   maybe  the  crime has  increased  in  the  area or  the  pollution has  increased  in  the  area, such  kind  of  dominant  factors that  are  visible , that  are  not  favorable for  a  prospective  home. That's  going  on  to  the  housing  crash when  it  happened. A comparison between  this  phase   to this phase, change point  to see how  the  correlation  just  changed. You  can  immediately  tell, "Okay,  my  comparison with  the  median  market  value from  this  phase, my  correlation with  the  client  is  pretty  low, my  rooms  has  very  positive  correlation, and  there  is  lstat  and  other  factors." But  if  you  come to  the  change  point 2  time, you  could  see  the  crime  rate has  increased  again, your  pollution  has  increased  again, which  resulted  in  the  negative linear  correlation  of  this  change. When  you  look  at  your  rooms, which  has  been  positively  contributing for  your  change  has  fell  down  completely. In  this  way,  this  is  telling  you  a  lot of  information  data  about  these  changes that  occurred  in  1978. Also,  easy  correlationship  data with  your   factors that  might  have  impacted  these  changes. I'll  repeat. Our  framework  is  detecting, for  any  time  series  data, we  first  detect  number  of  changes, if  it  is  one  change  or  two  change or  more  changes, and  then  integrate  these  change  points into  our  control  chart  to  get a  better  visualization  about  the  changes. Is  it  a  mean  shift  or  variation shift, or  what  is  happening? And  then  look  into  your  factors  that  might have  contributed  to  these  changes and  see  what  factors a  strong  contribution  for  this  change or   weak  contribution  or  no  contribution or  negative  contribution. Now  that  you  know  the  crime  rate, your  market  value —the  median  market  value is  impacted  by  the  crime  rate, pollution,  and  the  rooms— let's  look  at  a  simple  visualization. When  you  go  here,  you  can  see this  is  a  simple  graph  plot. I'm  taking  my  change  point and  visualizing  it  in  a  better  way. I'm  comparing  my  median  value with  the  nox,  the  pollution  value in  the  same  towns, in  the  same  time  period and  the  rooms  and  the  crime  rate. You  could  see  when my  housing  median  market  values  increase or  the  housing  booming  phase, you  could  see  the  pollution  is  very  low and  the  rooms  for  dwelling  has  increased and  your  crime  rate is  at  least  possible  state. As  a  prospective  homebuyer, these  are  all  favorable  factors  for  me, no  pollution  area, no  crime  area or safe  area, and  then  I'm  getting  a  bigger  home with a  convenient  number  of  rooms. But  let's  see,  in  this  phase, when  the  housing  market  crashed, at  the  same  time, your  pollution  rate  has  increased. The  number  of  rooms  for  dwelling has  decreased. And  then  there's an  astronomically  high  crime  rate which  is  impacting  the  same  thing. This  is  a  clear  visual, but  you  landed  up  in  this  problem or  with  this  kind  of  conclusion with  the  help of  your  correlation  matrices. Correlation  matrix and  then  a  change  point. Your  correlation  matrix and  then  the  change  point and  then  a  better  resolution  like  this. Let  me  go  and  show  you how  the  JSL  script is  pooling  multiple  change  points. You  could  find  the  change  point  data in  your  control  charts, multivariate  control  charts. And  again, this  is  a  multi variate  control  chart. You  could  use  as  many  as  variables as  your  multivariate  control  chart input  parameters. But  in  my  case, I'm  using  median  market  value, and  I  get  a  multivariate  correlation and  a  change  point. Clearly,  see,  it  gives  me  373  as  a  number. How  I  am  taking or  extracting  this  373  number is  using  this  kind  of  JMP  or  JSL can  extract  any  of  the  text  box  columns from  any  of  your  reports and  then  put  it  into  your  data  table. In  this  case, I  collected  this  data  point. You  could  see   the  equation  1  holds the  phrase  call  or  the  text  box  call. The  change  point  appears  to  be  373. This  one  simple  code  will  hold or  extract  that  value  from  the  report. And  then  I  take  that  into  my  columns in  the  change  point  1  column, and  I  extract the  numerical  value  out  of  it. I  wrote  the  JSL  as  simple  as  possible. I'm  very  beginner with  scripting  and  coding. You  could  see   I  used  simple  formulas and  simple  methods  to  get this. You  can  see  the  change  point  report is  only  giving  you one  change  point  value. But  you  could  see  in  your  process, there  are  other  changes,  too, that  you  might  be  interested  in and  we  want  to  find  them  out. What  I  do  is once  you  get  your  first  value, you  hide  and  exclude  everything  after your  change  point. After  this  373,  you  exclude  everything. Now  you're  left  with  this  phase. You  want  to  see what  is  my  change  in  this  phase, and  the  cycle  continues. For  example, in  my  equation  2  from  my  report  2, I'm  pulling  the  text  box  phrase called  the  change  point. Sorry,  this  is  hard  to  show. When  you  hover  on  this, you  can  see  that  variable has  the  phrase  [inaudible 00:29:49]. Variable  has  the  phase  of  the  change  point appears  at  row  157. You  take  the  value and  then  put  it  in  your  column  call, CP  Text,  something  like  that, where  you  could  extract your  second  change  point. Once  you  get  a  data  like  that, it  is  very  easy  to  construct a  dashboard  like  this, where  you  create your  control  chart  dashboard, each  individual  components, model- driven  multivariate  control  chart, and  create  the  dashboard  using the  new   dashboard  application. Once  you  create a  nice  dashboard  like  this, you  just  go  here  and  you  just take  everything  into  the  script. I'm  switching  down  to  my  presentation. I  hope  the   Boston Housing  data  example is  very  clear, because  how  easily  you  can  detect changes  in  any  time  series  data and  correlate  those with  your  input  parameters. In  real  life  or  manufacturing  scenario, you  would  have several  input  parameters  like  this. In  this  case,  in  my  dashboard, we  look  into  around  50  input  parameters comparing  with  our  output  parameters. We  also  monitor the  multivariate  control  charge. You  could  see   I'm  monitoring multiple  CTQs  together instead  of  each  individual traditional  control  chart. I'm  exporting  these  values into  my  existing  control  chart and  then  immediately  diagnosing  my  faults. This  dashboard,  as  you  can  see, it's  pretty  self- explanatory, easy  to  read,  easy  to  deploy on  your  manufacturing  floor. Technicians  even  not  comfortable with  statistical  analysis, and  doing  this  kind  of  building control  charts  and  everything, they'll  just  look  at  the  process saying  that, "O kay,  I  have  two  changes  in  my  process, and  let  me  see what  are  the  input  parameters that  are  correlating  these  changes and  see  if  I  can  change  them or  manipulate  them  so  that  my  process back  to  stable  state like this. Again,  the  correlation  coefficients are  pretty  easy  to  explain  to  anyone on  the  manufacturing  shop floor. For  example,  a  strong  positive  correlation would  give  you  a  blue  number  closer  to  1, or  a  strong  negative  correlation would  be  something  like  -0.7, but  the  variation would  be  something  like  this when  you  increase  something. With  subtle  bit  of  variation, you  would  see a  negative  correlation  there, and  then  a  weak  negative  correlation would  be  something  like  this, a  lot  of  variation instead  of  a  straight  line like this. And  then  there's  no  correlation. The  number  would  be pretty  much  close  to  0. You  would  see, even  if  you  increase  one, the  other  might  be  decreasing. It  is  very  easy  to  explain. It  is  very  visual on  the  manufacturing  shop  floor to  find  out  these  kind  of  numbers, looking  to  see  what  input  has  changed and  what  input  has  a  strong  correlation. There  is  an  added  bonus with  the  change  point  integration. As  you  can  see, these  are  all  numerical  data, which  is  getting a  numerical  correlation  like  this. But  the  change  point  is  telling  you when  the  fault  has  occurred and  when  you  know when  the  fault  has  occurred, you  just  look  into  your  data  such  as template  change   or  your  load  port  IDs, and  you  will  get  a  correlation  with  your non- numerical  data   [inaudible 00:34:10] . You  would  start  a  discussion on  your  manufacturing floor. Hey,  when  this  time  occurred, I  did  this  change  over  PM or  the  tool  is  setting  idle, or  I  did  a  prevent  day to  maintenance  at  this  exact  time. Maybe  that  prevent  day to  maintenance  time, some  issue  has  occurred. This  is  giving  you  a  full  picture. You're  looking  into   tons of  50  plus  input  parameters and  easily  getting  a  correlation with  your  CTQs and  easily  detecting  the  faults. At  the  same  time,  you  have non- numerical  data  such  as  like  this, and  you  find  out  and  you  relate  to like, what  was  I  doing  when  this  change or  fault  has  occurred  in  my  process? The  final  takeaway is  by  integrating  change  point  function to  our  existing  control  charts. We're  quickly  finding  faults and  diagnosing  them  faster, saving  millions  of  dollars in  our  fast- paced  manufacturing. With  JSL  scripting, we're  able  to  find  out  there  are multiple  change  points  in  our  process and  integrate  them to  our  control  chart  dashboards. This  change  point  and  multivariate  methods can  be  used  in  any  time  series, as  I  was  mentioning. For  example, when  I'm  preparing  for  this  presentation, I  gave  a  small  set  of  our  monthly  savings data  set  to  my  wife who  is  not  familiar with  statistics  and  control  charts. Upon  tracking  this  dashboard, my  wife  easily  found there  are  change  points  in  our  savings. But  at  the  same  time, she  did  not  like  the  fact that  when  the  dashboard  pointed  out that  our  Amazon  spending is  well  correlating for  our  low  saving  months. As  you  can  see,  like  this  time, this  change  point, you  could  use  it  on  any  time  series  data. You  could  find  out  your  changes in  your  time  series and  when  the  change  has  occurred or  when  the  significant  change has  occurred, and  then  you  can  correlate to  your  input  parameters. In  our  case,  we  are  just  correlating your  Amazon  spending  or  take- out  dinners and  several  other  factors [inaudible 00:36:18] . That's  pretty  much,  and  I  hope you  found  this  presentation  very  useful, and  you  also  learned  how  you  can easily  find  multiple  change  points through  JSL  scripting and  create  easy  dashboards to  where  demonstrators to  find  out  faults and  easily  diagnose  them. Thank  you.
This STEM paper studies the time series Antarctic glacier mass from April 2002 to March 2021. The objective of this paper is to forecast the Antarctic glacier mass level for 2021-2041. Among four STEM components: Science is geoscience of the glacier; Technology is using the GRACE-FO satellites to collect glacier ice sheet mass data; Engineering would focus on the COVID-19 factor on the glacier melting rate, and mathematics is mainly on time series ARIMA models. Both non-seasonal and seasonal ARIMA models were studied and compared. Both the 12-month seasonal pattern and long-term year-to-year trend were significantly observed. The glacier melting rate was 2% faster based on the seasonal ARIMA model. Smoothing models were also significantly identified in the seasonal ARIMA model to smooth out the random noise component to enhance the time series trend and the seasonal component to enhance the forecasting model. Forecasting glacier melting for 2021-2041 would be a challenging task to address both seasonal and trend components for a longer horizontal time from today. The prediction interval would become too wide to predict the future glacier melting rate, if more than 5 years away. The seasonal ARIMA model could provide a better fit than the non-seasonal ARIMA model. STEM methodology is a powerful and holistic way of conducting scientific research projects by modern GRACE-FO Technology in a practical engineering sense through a mathematical ARIMA forecasting analysis.      Hi,  everyone. I'm  Mason. T oday, I'll  be  presenting  a  study on  Antarctic  glacier  melting rate using  time  series  platforms. The  motivation  behind  this  project was  that  we  wanted  to  investigate the  long- term  effects  of  climate  change, and  we  targeted  places most  affected  by  global  warming, which  are  Antarctica  and  Greenland. Previously,  we  tried  using smoothing  and  decomposition  techniques to  study  and  forecast the  glacier  melting  rate. But  many  of  those  models  had quite  important  limitations. For  example,  they  were  unable to  consider  the  seasonal  or  trend  pattern. To  improve the  forecasting  accuracy  and  precision, we  wanted  to  try  other  methods, such  as  the  ARIMA  model, which  is  our  main  focus  for  today. The  objective  of  this  presentation is  to  utilize  time  series  platforms   in JMP to  examine  the  glacier  melting  mass  data from  2002  to  2021, and  to  forecast  the  glacier  melting  rates for  the  next  20  years. Why  are  we  studying  glacier  melting  rates instead  of,  for  example, atmospheric  temperature? Should  we  study  the  Greenland  ice  sheet or  the  Antarctic  ice  sheet? We'll  be  studying  the  Antarctic  data because  the  rate a t  which  the Thwaites   glacier in  Antarctica  is  melting has  been  rapidly  increasing in  the  past  years in  terms  of  the  surface  height. The   Thwaites glacier  is  significant because  it  is  the  broadest  glacier in  the  world and  already  contributes  to  4% of  global  sea  level  rises. But  what's  more  concerning is  that  recently,  in  2021, scientists  found that  there  was  more  warm  water underneath  the  glacier than  previously  thought, which  could  have even  more  dire  consequences in  terms  of  further  contributing to  sea  level  rises. We  want  to  help  forecast the  Antarctic  glacier melting rate to  inform  the  public about  the  effects  of  global  warming and  bring  more  awareness to  the  problem that  climate  change  can  cause. We  got  our  data  from  the  NASA  website, as  shown  on  the  right, and  we  transformed  the  data  into  JMP, as  you  can  see  on  the  left  side. Now,  the  Antarctic  mass  is  measured in   giga metric tons, and  1  metric  ton  is  equal to  1,000  kilograms. When  you  get  metric  ton, it's  equal  to  10¹²  kilograms. The   GRACE-FO mission, which  is  where  th is data  was  collected, measures  the  mass  variation. It's  not  the  total  mass, which  is  practically impossible  to  measure, but  the  change  in  mass relative  to  April  2002, when  the  GRACE  mission  started tracking  glacier  mass  variation. Previously,  as  I  said, we  use  the  smoothing anti- composition  models, but  these  techniques  either  fail to  consider the  nonlinear  downward  trend in  glacial  mass since  the  glacier  melting  rate is  increasing  over  the  years, or  these  models  failed  to  consider the  seasonal  variations. Warm  months  are  going to  have  a  faster  melting  rate. We  wanted  to  use  the ARIMA  model because  we  hope  to  improve  the  trend so  that  it  is  nonlinear and  also  incorporates the  seasonal  component  at  the  same  time. We  also  hope  that  the   ARIMA model can  help  further  narrow the  prediction  interval so  that  our  forecasts  are  more  precise. There's  two  types  of  ARIMA  models. There's  nonseasonal  and  seasonal. The  nonseasonal   ARIMA model does  not  consider that  there  is  a  seasonal  pattern, while  the  seasonal   ARIMA model  emphasizes that  there  is  a  seasonal  component before  the  model  is  generated. The  nonseasonal ARIMA  model, it does  implement  decomposition and  searches  for a  seasonal  component, but  it  has  no  knowledge of  the  seasonal  lag  period which  should  be  12  months before  it  generates  a  model. Now,  glacier  mass   variation should  have a  seasonal  pattern because  we  expect  glaciers to melt  faster  during  the  summer  months and  accumulate  during  the  winter  months. But  from  a  previous preliminary  time  series  analysis, we  do  not  see an  obvious  lag  period  of  12  months. We  aren't  really  sure what  the  optimal  seasonal  width  is because  of  growing  weather  inconsistencies as  a  result  of  global  warming. Without  specifying what  our  seasonal  lag  is, we  can't  use  the  seasonal  ARIMA  model. It's  also  common  practice to  use  the  non seasonal  ARIMA  model to  verify  that  lag  period, and  then  run  the  seasonal  ARIMA model once  we  know what  the  seasonal  lag  would  be. First,  we'll  run the  non seasonal ARIMA  model to  confirm  the  seasonal  lag  period is  indeed  12  months. Then,  we'll  implement the  seasonal  ARIMA  model based  on  the  optimal  seasonal  lag to  better  forecast the  glacier  melting rate in  the  next  20  years. If  you  look at  the  model  results  for (0, 1, 0) which  is  the  best  nonseasonal  model, you  can  see  that  the  slope is  not  significant. The  p-value  is  0.18, and  the  parameter  estimate  is   -10.42. Every  year,  the  glacier  mass  is  forecasted to  decrease  at  about  10 giga metric  tons. However,  this  model  is  not  significant, which  may  indicate that  we  need  to  use  a  seasonal  model. We  see  that  12  has the  highest  auto correlation for  lags  greater  than  zero on  this  right  graph. The  auto correlation  plot  further  confirms that  we  should  be  using a  seasonal  lag  of  12. After  running  the  non seasonal  ARIMA  model, we  wanted  to  compare the   (0, 1, 0)  nonseasonal  model and  the  best  seasonal  model. The  nonseasonal   ARIMA model is  shown  in  dark  pink and  the  seasonal ARIMA  model is  shown  in   light pink. The  colors  are  a  bit  similar  here. But  you  can  see that  the  prediction  interval for  the  seasonal  model  is  much  wider than  the  nonseasonal  model, and  the  prediction  interval from  the  seasonal  model reflect  the  seasonal  pattern. Interestingly, the  overall  trend  for  the  seasonal  model is  much  steeper than  the  nonseasonal  model, which  may  indicate that  if  we  do  not  decompose the  seasonal  component, then  the  seasonal  pattern  may  end  up being  a  random  noise  factor which  will  dilute  the  signal and  make  the  slope less  steep  than  it  should  be. The  prediction  interval for  the  seasonal  ARIMA model  is  larger, most  likely  because  it  considers the  seasonal  variation, which  is  another  factor  of  uncertainty. However,  we  do  not  want  to  see the  seasonal  pattern  in  the  forecasts since  we  want  to  predict the  glacier  mass  variation  for  each  month, not  just  each  year. If  you  look  at  the  ACF  graphs on  the  bottom  left, the  seasonal  ARIMA  model  has a  much  smaller  peak at of  seasonal  lag  of  12, which  is  right  over  here, than  the  non seasonal  ARIMA  model, which  is  on  the  right. It's  hard  to  see because  the  graphs  are  overlapping, but  for  the  season al ARIMA  model, the  auto correlation  is  approximately  zero for  residuals  greater  than  zero which  shows that  we  chose  a  good  lag  period. Also,  from  the  table  on  the  right, the  MA2, 12  is  significant, which  once  again  shows that  12  is  a  good  choice for  the  seasonal  lag. M A2, 12  would  be  the  seasonal  model at  a  seasonal lag  of   12 months. In  conclusion,  we  applied nonseasonal  and  seasonal  ARIMA models to  forecast the  Antarctic  glacier melting  rate in  the  next  20  years. While  the  nonseasonal  ARIMA model can  predict the  general  downward  trajectory of  glacier  mass  variation, it  fails  to  consider the  seasonal  pattern  in  the  forecasts. The  seasonal   ARIMA model  can  forecast the  seasonal  and  trend  behaviors, but  it's  prediction  interval is  much  larger. The  seasonal  ARIMA  model  also  has a  slope  that  is  20%  steeper than  the slope  found from  the  non seasonal ARIMA  model. That's  all  I  have  for  today. Thanks  for  listening.
This paper describes an approach for driving yield improvements by analyzing process performance data with JMP. Analysis of performance data -- including process long-term and short-term capability, stability and statistical control -- is particularly useful when monitoring hundreds of process KPIs retrospectively. Modern manufacturing requires many process and metrology steps to ensure healthy product lines and high-quality products. During the production ramp-up phase, identifying the processes of most concern is highly challenging. Using JMP scripting and quality data analysis, Magic Leap’s Eyepiece Manufacturing factory implements an automated process that can pull, analyze, visualize, correlate, predict and verify factory yield improvement based on a variety of performance metrics.      Hello  everyone. My  name  is  Harry  Dong. I'm  the  director  of  the  Optical  Process Engineering  Group  at  Magic  Leap. I'm  so  excited  to  be  here to  present  my  topic: Factory  Yield  Ramp- up  Approach through P rocess  Performance  Metrics Guided  Improvement  Activities. Today  I'm  going  to  cover Factory  Process  Performance  Overview, the   Process Screening  Tool and  the  Statistical  Process  Control using  JMP Script  to  automate  the  process capability  analysis  and  the  conclusions. People  have  been  talking about  process  performance, S uch  as  capability  stability, but  what  are  they? Basically,  process  capability  is  a  measure of  the  ability  of  the  process to  meet  the  specifications. While,  the  process  stability  refers to  the  consistency  of  KPIs  over  time. It's  very  important  to  realize there  is  no  inherent  relationship between  process stability  and  process  capability. Thus  it is both  extremely  important aspects  of  any  manufacturing  process. As  you  can  see from  the  bottom- left  picture. It's  showing, process  can  be  both  capable and  stable  meantime, this  is  a  perfect  world. Basically  your  process  is  super  tight against  your  lower  spec  limit, upper  spec  limit. But  on  the  other  side. When  you  started  seeing  instability, when  your  variation  is  small, you'll  still  be  able  to  meet the  process  specifications. But  over  time, you  will  see  a  lot  of  variations. The  other  scenarios  can  be your  process  is  super  stable. Over  time  you  don't  see a  lot  of  up  and  downs, But  because  it  could  be  like  a  small a three  Sigma, a lower  spec  limit,  upper  spec  limit, you  have  large  variation, then  you'll  not  be  able to  meet  your  specification against  a  high  yield  target, high  process  capability  target. The  worst  case  can  be your  process  is  not  capable, which  means  you  have  large  variation and  over  time  you  also see  a  lot  of  stability  issues. It's  everywhere, Across  the  whole  industry, monitoring  factory  process  using the  process  performance  plot becomes  very  useful. It's  including  capability,  stability in  the  combined  metrics. As  you  can  see, this  is  a  JMP- generated process  performance  plot. On  the  x- axis  you  see  the  stability  index and  the  y- axis  is  indicating the  capability  overall  which  is  a  Ppk. Eventually  this is  a  four  quadrant  plot. We  want  to  push  everything  low  stability and  high  process  capability which  is  in  the  green  zone. Often  you'll  see  some  process not  capable  which  is  under  this  line or  not  stable,  which  is  across the  vertical  line  showing  in  this  graph. I'm  going  to  talk  about the   Process Screening  function, JMP  provided. This  is  a  very  powerful  tool, if  you  are  talking  about quickly  identify  unstable  process or  some  incapable  process,  meanwhile. Basically  this  is  a  JMP- generated  report, as  you  can  see, I  have  24  processes  listed  in  this  report. Very  quickly  you  can  see stability  index  is  showing  up. How  do  you  define  the  process  stability? It's  calculated  using the   Within Sigma  and   Overall Sigma, This  is  the  ratio  using  the   Overall Sigma divided  by   Within Sigma, What  do  they  mean? Overall Sigma  usually  treated as  a  longterm  process  variation and  the  Within Sigma  treated as a  shortened  process  variation. So  JMP  has  certain  rules that  you  can  refer  to for  this  calculation. But  basically  you  can  define  color  codes. How  do  you  see  your  process  stable? You  can  use 1,  1.3, whatever  the  number  you  want, to  color  code  that. For  me,  as  you  can  see, I color coded  process  greater  than  1.7 as a  process  not  being  stable. Yellow  as  a  process is  kind  of  marginal, And  the  green  zone  meaning the  process  is  super  stable. This  is  telling  us  some  process can  be  stable  but  not  capable, Because  you  can  see the  Ppk  Cpk  are  kind  of  low, But  on  the  other  side, you  can  see  a  bunch  of  process, they  were  not  stable, but  some  of  them  are  very  capable, Again,  this  is  aligned with  what  I  went  through  earlier. Basically,  utilizing the   Process Screening  tool, you  can  quickly  identify the  unstable  processes as  I  marked  in  this  graph. Meanwhile,  as  you  can  see in  this  Process  Screening  tool, you  can  also  see  the  control  chart  alarms based  on  the  samples that  the  raw  data  you  put  in  the  report. We  finally  deploy  the  control  chart, called  statistical  process  control  chart, be  able  to  monitor and  improve  process  capability. Stability  is  very  important, This  is  a  JMP- generated  control  chart, You  can  use  Windows  Scheduler  or  JMP  Live to  automate  those  charts to be able to  pull  the  data  real  time. Meanwhile, you  can  also  use  certain  web  API to  generate  the  control  chart. JMP  is  very  useful  in  this  case. It can  send  out email  notification  to  the  group, to  individual  engineers. Meanwhile, if  you  can  connect  us with  your  internal  system , I  think  that's  going to  give  you  additional  power to  be  able  to  communicate with  your  process tool, be  able  to  pause  the  tool, put  it  on  hold  for  the  engineers to  react  to  the  variations either  process  shift or  out  of  control  data  points. Process  Capability Analysis  is  a  standard, Basically  a  few  rules that  we  need  to  follow, First  of  all,  you  want  to  make  sure your  data  set  is  following the  normal  distribution. This  can  be  done  using  JMP  tool, I  won't  go  through. But  on  the  other  side  we  realize Cpk  calculation  changes  dramatically due to  the  outliers,  especially when  sample  size  is  small. The  outliers  can be  driven  by  special  causes, excursions  during  the  process. So  this  can  add  bias to  Process  Capability  Analysis. Sometimes  if  you  do  see  some  outliers, you  can  drag  your  Cpk  down, but  it's  not  representative to  your  standard  process  variation. Apply  outlier  removal  method to  remove  outliers. We'll  help  you  get  rid  of  those  noises to  better  understand your  true  process  capability. Within  the  JMP, they offer  different  methodology to  remove  outliers. I  won't  go  through  each  of  them, but  they're  all  very  powerful. You  can  read  through  the  instructions which  method  is  the  best  in  your  case. Basically,  they  can  be  found  under Analyze/S creening /E xplore  Outliers. Basically,  I'm  going  to  show the  basic  method  we  are  using to  exclude  the  extreme values  for  our  process, which  is  a quantile  range  outlier  removal. Basically  this  tool we  found  is  very  useful, Because  when  you  try to  pull  the  data  through  the  database, often  you'll  find  some  outliers. Some  data  are  very  extreme, you  know  they're  outliers, some  are  not  as  obvious, So  basically this   quantile range  outlier  method offer  you  flexibility. Basically,  as  you  can  see, this  is  the  distribution  of  our  process. We  have  upper  spec  limit,  target and  you  also  have  the  mean. And  follow  this  Box  Plot, you  can  find the 10th percentile  value, or  the  90th percentile  value. Basically  the  inter-percentile  range is  calculated  using  90th percentile minus 10th percentile  value and  this  is  defined as  your  inter-percentile  range. The  lower  threshold  value  is  calculated, using  the  10th percentile  value minus  three  times the  calculated  inter-percentile  range. So  this  becomes  your  low  threshold for  each  individual  processes. On  the  other  side  the  high  threshold is  defined  by  90th percentile plus  three  times  inter-percentile  range. This  becomes  your  high  threshold. Basically  all  the  extreme value   outside  of  this  range will  be  treated  as  outliers and  they  can  be  colored. They  can  be  marked  as  missing . They  can  be  excluded from  your  data  analysis. As  you  can  see, JMP  did  provide  the  flexibility. How  do  you  define the  inter-percentile  range? You  can  do  0.1, you  can  do  other  values  as  needed. And  also  the  Q  value, which  is  this  value  I  highlighted  here, can  be  changed  as  well. It  really  depending  on  how  much  noise you  want  to  get  rid  of from  your  data  analysis. A s  you  can  see, This  is  an  example that  showing  some  value  is  being  colored and  also  being  changed  to  missing  value, so  they  will  be  excluded from  the  data  analysis. This  is  a  quick  demonstration to  show  you  how  the  outlier is  going  to  impact  your  Cpk  calculation. As  you  can  see,  Cpk  value remains  equal  or  better post  outlier  removal. It  really  depends on  your  sample  size. Sometimes  if  your  sample  size  is  small, the  change  can  be  more  dramatic. But  in  my  case,  I  believe my  sample  size  is  quite  large. This  is  why  they're  not showing  very  big  differences. Yeah,  the  quantile  range  o utliers parameters  can  be  tuned if  necessary  as  I  mentioned  earlier. Other  things  I  want to  highlight  in  this  page, it's  the  process  capability  box  plot. We  figure  out  this  is  very  useful  tool because  you  can  be  monitoring many  process  parameters. So  be  able  to  put  them  together to  visualize  how  tight  they  are, which  direction  they're  shifting and  how  much  variation  is  being  counted to  calculate  our  process  capability. Process  stability  is  very  useful, as  you  can  see. They  use  a  standardize s pec  limit to  be  able  to  combine  everything  together. It's  super  useful  for  data  visualization. I  want  to  quickly  show  you, before  and  after  we  automate the  process  performance  analysis. At  the  beginning  team  are  not  using the  JMP  scripting to  automate  this  process, as  you  can  see. We  have  to  collect  the  quality  data by  individual  process  owners  per module, so data, sometimes, becomes  not  standard. They  use  different  formats and  then  they  often do  manual  outlier  removal and  then  they  have to  grab  all  the  data  together to  be  able  to  merge  them. It's  also  a manual  process,  very  tedious. After  that  they  have to  do  the  manual  process  performance because  the  variation they  see  through  different  people, different  format and  then  generate  report. They have  to  eventually repeat  everything  they  did on  like  weekly  basis, monthly  basis  or  per  PEQ build. But  on  the  other  side for  the  automated  process  flow, basically  the  quality  data can  be  queried  all  in  one  step. All  the  raw  data, all  the  specification  data can  be  put  from  the  database using  the  SQL  JMP  scripting, Of  course  we  apply  standard  outlier removal  methodology  across  all  the  data and  then  all  the  sorting, split,  spec  assignment, all  the  different  visualization can  be  done  automatically. Standard  report  will  be  generated. Then,  when  we  are  talking  about  over  time or  per  PEQ  summary , you  can  simply  modify your  SQL query  filter  to  update  the  data and  on  top  of  that, you  have  basically  all  the  raw  data. You  can  apply  local  data  filter. You  can  add  additional  functions to  make  your  filter  data  analysis  easy. Basically,  we  figure  out  manual  process is  very  time  consuming, the  feedback  is  slow and  it's  not  very  efficient  regarding the  yield  improvement. On  the  other  side, we  figure  out  the  automated  scripting  process  using  JMP. Anybody  can  perform  this  complicated process  performance  analysis  in  minutes. Some  highlights  I want  to  share for  the  process  capability  analysis. As  you  can  see, I didn't  include the  JMP  query  portion. But I basically  put different  process  name, different  test  label, all  the  raw  data  into  the  JMP  table and  after  that  we  figure out  to  split  this d ata  table because  table  comes  in  everything  combined process,  part  ID, process  time,  test  label. So  we  have  to  eventually split  the  data  table to  be  able  to  perform this  process  capability  analysis, the  box  plots  or  the  Cpk Ppk  analysis. So  very  useful  function  for  JMP is  after  we  split,  group  them. So  we  have  some  missing  value because  we  sort the  data  by  data  time. Some  process  leave  blank but  JMP  is  smart  enough not  counting  those  missing  value into  the   capability  analysis which  is  very  useful, So  I  do  want  to  mention  that. Please  don't  forget to  sort  your  data  by  date  time because  that's  super  important because  sometimes the Cpk  Ppk  calculation is  really  depending on  the  process  sequencing. If  you  are  not  sorting the  data  by  date time, then  the  results  can  be  biased, So  we  made  some  mistake  earlier, figure  out  this  is  useful  tip. The  other  one  is  automated  outlier  removal using  quantile range  outlier. As  you  can  see, b asically,  very  simple  process. You  get  all  the  column  names using  this  scripting. Condition  can  be  numeric,  continuous and  then  you  start  at launching the   quantile range outlier  platform basically  assign  them  into  the  report and  then  go  through  the  report. Use  this  for  loop  function to  be  able  to  exclude all  the  outliers  identified, This  can  be  repeated  as  needed . Then  you  can  also  launch the  process  capability  analysis. If  you  don't  know  how  to  do  it, you  can  basically  manually  do  it and  then  grab  the  code  from  the  log, This  is  a  new  function JMP  provided which  we  figure  out  is  super  helpful. Something  I  want  to  highlight  here is  the  spec  limit  assignment, This  is  a  super  powerful,  very  useful. Basically  you  can  assign  specification for  multiple  process  variables, using  another  data  table that's  generated  using t he  SQL  query  as  well. Conclusions. Analyzing  process  performance  data using  JMP  is  super  critical to  drive  the  yield  improvement in  modern  factories, especially  with  many  process and  methodology  stacks to  ensure  healthy  production  lines and  deliver  high  quality  products. Analysis  of  performance  data  including long  term,  short  term  process  capability, stability  and  statistical  process  control is  particularly  useful  when  monitoring hundreds  of  process  KPIs. During  the  production  ramp  up  phase. Identifying  the  processes of  most  concern  is  highly  challenging and  using  JMP  scripting and  quality  data  analysis  platforms , our  Eyepieces  factory  implemented an  automatic  processor which  can  pull,  analyze, visualize,  correlate,  predict  and  verify factory  yield  improvement  based on  a  variety  of  performance  matrix. Magic Leap's Eyepiece  Factory, we  demonstrated  greater  than  90%  RTY, which  includes  hundreds  of  process  KPIs. Eventually,  you  have to  multiply  them  together to  get  this  RTY  number, no  throughput  yield. We  were  able  to  demonstrate greater  than  90%e  RTY six  months  ahead  of  our  next generation  product  launch, driven  by  continuous process  improvement  activities, guided  by  automated process  performance  analysis, using  JMP  scripting, and  quality  platform  tools. And  thank  you  for  your  time.
Monday, September 12, 2022
The quality and SPC platforms in JMP 17 have many new features and capabilities that make quality analysis easier and more effective than ever. The measurement systems analysis platforms—Evaluating the Measurement Process (EMP) MSA and Variability Chart—have been reorganized and improved and a new Type 1 Gauge Analysis platform has been added. The Manage Limits utility (previously called Manage Spec Limits) has been generalized and expanded to handle many types of quality related limits that are needed to work easily with many processes in various quality platforms. The Distribution platform has added the ability to adjust for limits of detection when fitting distributions and performing process capability analysis. Control Chart Builder has several new features including a label role, a row legend, a new button to switch an XBar/R chart to an IMR chart, new dialog options and Connect Thru Missing. Both the EWMA and the Cusum Control Charts have several new features including the abilities to save and read from a limits file and save additional information to the summary table.     Hello,  my  name  is  Laura  Lancaster and  I'm  here  with  my  colleague, Annie   Dudley Zangi, to  talk  about  recent  developments in   JMP quality  and  SPC. The  first  thing  I  want  to  talk  about is some  improvements  that  we've  made to  the  distribution  platform specifically  related to  limits  of  detection. So  limited  detection   is  when  we're  unable  to  measure above  or  below  a  certain  threshold. And  in  JMP  Pro  16,  some  functionality was  added  for  limits  of  detection. Specifically  in  the  DOE  platform, we  added  the  ability to  account  for  limits  of  detection   and  a  Detection  Limits  column  property was  added  that's  used  by  the  Generalized  Regression  platform to  specify  censoring  for  responses. However,  what  was  left  unaddressed was  a  problem  with  process  capability and  limits  of  detection. The  problem  is  that when  you  ignore  limits  of  detection when  analyzing  process  capability, it  can  give  misleading  results. And  there  was  no  way  to  do process  capability  with  censored  data. But  in   JMP Pro  17,   and  I  just  wanted  to  note that  this  is  the  only  feature that  we're  going  to  talk  about that's  JMP Pro. Everything  else is  regular  JMP in  this  talk. So  in  JMP  Pro  17, now  in  the  Distribution  Platform, we  recognize  that  Detection  Limits column  property and  we  can  adjust  the  fitters for   censored data. That  means  that the  Process  Capability  report that's  within  those  fitters that  use  the  adjusted  fit to  account  for  censored  data will  give  more  accurate  results. And  the  available adjusted  distribution  fitters are  Normal,  Log normal,  Gamma, Weibull,  Exponential,  and  Beta. And  before  I  go  to  the  example, I  just  wanted  to  give  a  shout  out to  check  out  the  poster  session Introducing  Limits  of  Detection   in  the  Distribution  Platform that  Clay  Barker  and  I  worked  on if  you  want  to  learn  more  about  this. Let's  go  ahead  and  go  to  JMP. Here  I  have  some  drug  impurity  data where  I  have  an  issue   with  being  able  to  detect  impurities below  a  value  of  one. And  this  data  that  I've  recorded   is  actually  in  the  second  column and  anywhere  that I  wasn't  able  to  record  an  impurity because  it  was  below  one, I've  simply  recorded  it  as  a  one. So  this  is  censored  data. This  first  column   is  really  the  true  impurity  values that  I'm  unable  to  know, unable  to  detect  with  my  detection. So  let's  go  ahead   and  compare  both  of  these  columns using  distribution. So  if  I  go  to  Analyze,  Distribution, and  I  look  at  both  of  these  columns, you  can  clearly  see there's  a  pretty  big  difference between  having  true  impurity  values   which  I'm  unable  to  know, and  the  censored  data. Ultimately,  what  I  want  to  do is  I  want  to  do  a  log normal  fit and  run  a  process  capability  analysis on  this  data. So  I'm  going  to  go  ahead  and  do  that for  both  of  these  distributions. So  I'm  going  to  do  log normal  fit   for  both  of  them. You  can  see  that  I  get... Obviously  the  histograms look  pretty  different and  my  fits  look  pretty  different  too, which  isn't  surprising. Now,  I  want  to  do  Process  Capability on  both  of  these. I've  already  added  an  upper  spec  limit   as  a  column  property, and  you  can  see  that when  I  have  my  true  data, which  I'm  unable  to  know, my  capability  analysis looks  pretty  different from  having  the  censored  data. With  the  true  data, my  capability  looks  pretty  bad. There's  probably  something I  need  to  address. But  because  I'm  not  able  to  see the  true  data, and  I  only  have  the  censored  data that  I  can  analyze  in  JMP, the  PPK  value  is  a  lot  better. It's  above  one, and  I  may  blissfully  move  along thinking  that  my  process  is  capable when  in  actuality, it  really  isn't  so  good. But  thankfully,  in  JMP Pro  17, I  can  add  a  detection  limits column  property  in my  data. So  this  third  column   is  the  same  as  my  second  column, except  that  I've  added a  detection  limits  column  property. So  I've  added  that  I  have a  lower  detection  limit  of  one. And  now  when  I  run  Distribution  platform on  this  third  column with  a  detection  limit  column  property, and  I  do  my  logn ormal  fit   and  notice  because  I  have  censored  data, I  have  a  limited  number of  distributions  available, I'm  going  to  do  my  log normal  fit, and  it's  telling  me  it  detected that  detection  limit  column  property, and  it  knows  I  have   a  lower  detection  limit  of  one. And  when  I  do  Process  Capability, you  can  see  that  my  capability  analysis is more  in  line  with  when  I  had the  true  data   because  my  PPK is 0.546, doesn't  look  so  good. And  I  realize  that   there's  probably  something that  I  need  to  address  with  this  process. It's  not  very  capable. All  right,  so  let's  move along  to  the  next  topic. The  next  thing  I  want  to  talk  about is some  improvements in  Measurement  Systems  Analysis, specifically  the  Type 1 Gauge  Analysis  platform. A  Type  1  Gauge  Analysis  platform is  a  basic  measurement  study that  analyzes  the  repeatability and  bias  of  a  gauge to  measure  one  part relative  to  a  reference  standard. It's  usually  performed   before  more  complex  types  of  MSA  studies such  as  EMP  or  Gauge  R&R that  are  already  in  JMP. It's  required   by  some  standard  organizations such  as  VDA  in  Germany, and  this  has  been  requested by  our  customers for  quite  a  while, but  we  believe  it's  useful  for  anyone, whether  it's  required by  a  standard  organization  or  not. It's  located  in  JMP  17 in  the  Measurement  Systems  Analysis launch  dialog as  an  MSA  Method  type. It  requires  a  reference  standard  value to  compare  your  measurements  against, and  a  tolerance  range where  you  want  your  measurements   to  be  within  20%  of  your  tolerance  range. Produces  a  run  chart, metrics  such  as  Cg, Cgk, which  are  comparable   to  capability  statistics,  bias  analysis, and  a  histogram  for  analyzing  normality. Let's  go  ahead and  look  at  this  new  platform  in  JMP. Here  is  my  Type  1  Gauge  data. It's  simply  measurements of  one  part  with  one  gauge. So  to  get  to  the  platform, I  go  to  Analyze,  Quality  and  Process, Measurement  Systems  Analysis. And  the  first  thing  I  want  to  do is  change  the  method from  EMP  to  Type  1  Gauge. I'm  going  to  move  my  measurements as  the  response I'm going to  leave  everything  else at  default,  and  I'm  going  to  click  OK. But  before  I  can  proceed  to  get  my  report, I  have  to  enter  that  metadata that I  mentioned  earlier, the  reference  value and  the  tolerance  range. So  I'm  going  to  go  ahead and  enter  that  information. I'm  going  to  enter  it as  a  tolerance  range and my  reference  value. I'm going to skip  resolution because  that's  optional. Click  OK. And  this  is  the  default  report  that  I  get. I  get  a  run  chart  on  my  measurements graphed  against  my  reference  line. And  I  also  get  the  20% tolerance  range  lines. One's  10%  tolerance  range  above  reference and  one  is  10%  below. So  you  get  some  default  capability  statistics. Now  notice  that  my  measurements are  well  within  the  20% of  my  tolerance  range, which  is  really  good. I  could  also  do  a  Bias  Test to  see  if  my  measurements  are  biased. It  looks  okay. And  I  could  also  turn  on  a  histogram to  test  for  normality. And  before  we  move  on, I  wanted  to  find  out  one  more  thing, and  that's  that  in  this  top  outline  menu, if  I  click  on  that, there's  an  option  to  save  that  metadata that  I  had  to  enter to  be  able  to  get  this  report. Remember,  I  had  to  enter   the  reference  value and  the  tolerance  range. So  I  could  either  save  this  metadata as a  column  property  and  we've  introduced a  new  column  property  called  MSA, or  I  could  save  it  to  a  table. I'm  going  to  go  ahead  and  save  it   as  a  column  property so  I  can  show  you the  new  MSA  column  properties. If  I  go  back  to  the  data  table, this  is  the  new  MSA  column  property. You  can  see  it's  storing my  tolerance  range and  my  reference  value, and  it  also  can  hold  other  metadata for  other  types  of  MSA  analysis. Let's  move  along  to  the  next  topic. I  also  want  to  talk  about some  improvements to  existing  MSA  platforms, the  EMP  MSA  platform. EMP  stands  for   Evaluating  the  Measurement  Process, and  this  is  the  platform   based  on  Don Wheeler's  approach and  a  variability  chart  platform. So  in  both  of  these  platforms, we've  improved  the  usability when  analyzing  multiple  measurements at  one  time. We  have  better  handling  of  the  metadata, such  as   [inaudible 00:10:16]  or  tolerance  values or  process  Sigma. So  this  has  been  improved in  variability  charts and it's added  to  the  EMP  MSA  platform. We've  also  reorganized  the  reports so that  they  work  better with  data  filters. In  addition,  we've  filled  in  the  gaps between  the  EMP  MSA  platform and  the  Variability  Chart. We've  done  this   by  adding  some  reports to  the  EMP  MSA  platform. We've  added   the  Misclassification P robability  Report, the  AIAG  Gauge  R&R  report, and  a  Linearity  and  Bias  report. In  addition, we've  modernized  the  Linearity  report and  the  Variability  Chart to  match  the  new  Linearity  report in  the  EMP  MSA  platform. So  let's  go  ahead and  look  at  some  of  these  changes. So  here  I  have  some  measurement  systems analysis  data  for  some  tablets where  I've  measured   two  different  attributes with  multiple  operators. And  I  want  to  analyze  this   using  the EMP  platform. So  I'm  going  to  go  to  Analyze,   Quality  and  Process Measurement  Systems  Analysis. First  thing  I  want  to  do is  change  the  method  back  to  EMP, take  my  measurements  as  response, Tablet as  Part,  Operator  as Grouping. Notice  there's  now  this  standard  role, if  I  were  doing  a  linearity  and  bias  study, I  would  use  that, but  I'm  not  in  this  example. Also some  new  options  down  here in  the  dialogue, but  the  one  I  want  to  point  out is the  Show  EMP  Metadata  Entry  Dialogue. I  want  to  set  that  to  Yes so  I  can  enter  tolerance  values and a  historical  Sigma   for  the  AIAG  Gauge  R&R  report. So  I'm  going  to  click  OK and  this  dialogue  pops  up. I  don't  have  to  enter  this  data   during  the  launch, but  I'm  going  to   because  I  think  it's  easier. So  I'm  going  to  go  ahead and  enter  the  data, and  when  I  click  OK, my  report  looks  similar  to  how  it's  always  looked when  I've  had  multiple  measurements. But  I  also  have   an  additional  outline  at  the  top, and  we'll  look  at  that  in  a  minute. But  the  first  thing  I  want  to  do is  I  want  to  turn  on the  Misclassification  Probabilities  report for  both  of  these  analyses. So  I'm  going  to  choose Misclassification  Probabilities, and  you  can  see,  I  get  a  new misclassification  probability  report for  both  of  these  and  it's  available without  a  prompt because  I've  already  entered   my  lower  and  upper  tolerance  values. Now,  if  I  had  not  already  entered  that  information, I  would  have  been  prompted. Or  I  could  use  the  new  option, Edit  MSA  Metadata, to  either  enter  or  edit any  of  that  information, which  would  automatically  update any  of  the  corresponding  reports. Let's  go  ahead  and  turn  on   the  AIAG Gauge R&R  report for  both  of  these  as  well. And  you  can  see  I  get   an  AI AG  Gauge  R&R  report that  looks  a  lot  like what's  in  the  Variability  C hart  platform, and  it  includes   that  percent  tolerance  column because  I  entered  tolerance  values and  percent  process, because  I  entered  historical  Sigma. I  could  also  turn  on   the  discrimination  ratio  if  I  desired. And  before  we  move  on, I  just  want  to  point  out at  this  top  outline  menu,  once  again, we  have  an  option  to  save  the  metadata. I  can  save  the  metadata, which  includes  not  only  the  MSA  metadata, but  also  I  can  save  out  measurement  Sigma, which  is  a  result  of  my  MSA  analysis, which  can  be  consumed by  the  Process  Screening  platform. So  it's  going  to  be  considered process  screening  metadata, and  there's  actually  a  new process  screening column  property  for  that. But  I'm  going  to  save  this as a  table  just  so  we  can  look  at  it. I can  see  I  have  my  MSA  metadata, plus  I've  saved  out  the  measurement  Sigma once  I've  computed  those variance  components. So  let's  go  on  to  the  next  topic, my  final  topic before  I  hand  this  over  to  Annie, the  last  thing  I  wanted  to  talk  about was some  improvements to  the  Manage  Spec  Limits  utility. In  fact,   the  name  has  been  changed to  the  Manage  Limits  utility  because  now it  handles  more  than  just  spec  limits. It  still  handles  spec  limits and  anything  related to  process  capability. But  now  it  also  can  handle   Process  Screening  metadata, which  includes  centerline, specified  Sigma, and  measurement  Sigma, MSA  metadata,  and  Detection  Limits. So  now  I'm  going  to   hand  this  over  to  Annie. Hi,  everyone. I  am  Annie  Dudley  Zengi, and  I  am  the  developer  responsible for  control  charts  in  JMP. I'm  here  to  talk  with  you about  some  of  the  new  features that  I  added  for  Control   Chart Builder in  version  17. So  I  added  a  Label  Role   in  addition  to  the  Y, the  subgroup,  and  the  phase  role, there's  now  a  label  role. I've  added  a  button  so  that  you  can  switch an   XBar  and  R  chart  to  an  IMR  chart. I  added  a  row  legend, a  Connect Thru  Missing  Command, and  I've  done  some  Dialog U pdates. I'll  start  with  this  data  table  diameter, which  you  can  find  in  the  sample  data. And  let's  start  with  the  label  role. So  I'm  going  to  alternate   between  using  the  interface and  using  the  dialogs so that  everybody  can  get  a  feel  for  both. If  I  start  with  the  interface,   and  I  drag D iameter  in to  the  graph, we  immediately  see  we  get an  Individual and  Moving  Range  chart. Now,  one  thing  that  you'll  notice that's new  here is  this  new  role   in  the  lower  left- hand  corner  of  the  chart for  the  label. Now  I  can  drag  Day  in. Now  I  want  to  take  a  look at  Day  here  in  the  data  table. So  you  might  notice  that there  are  six  different  rows that  are  associated  with  May  1st,  1998. There  are  six  rows  associated with  every  date in  this  particular  data  table. And  we  know  that  if  we  were  to  drag  that to  the  Subgroup  role, then   Control  Chart  Builder will  automatically  aggregate. But  sometimes  we  don't  want  that. So  for  this  example,  I'm  going to  drag  this  to  the  Label  role. You  notice  we  still  have an  Individual  and Moving  Range  chart. It  did  not  switch and  it  did  not  aggregate  the  data. We  can  see  that  it's  a  regular  axis. We  currently  have  an  increment  of  24. We  can  change  the  increment  to  six. We  can  see  every  date  on  the  x- axis and  we  can  still  see  that we  have  an  Individual   and Moving  Range  chart  of  Diameter. So  there's  the  Label  role. Now  the  next  option  is the  switch  to  the  IMR  chart. This  option  was  made  available because  there's  now  a  Label  role. To  switch  to  an  IMR  chart,  we  first have  to  have  an   XBar  on  our  chart. So  I  will  create  an   XBar  on  our  chart through  the  dialog. You can choose  Control  Chart and  then   XBar  Control  Chart. Again  I'll  move  Diameter  to  the  Y. And  this  time  I'm  going to  put  Day  in  as  a  Subgroup. You  can  see  here  it's  aggregated  the  data because  we  have  Day   as actually  the  subgroup. But  if  I  show  the  control  panel and  I  scroll  down, you'll  notice  there's  a  new  button  here underneath  the  old  button of the  Three  Way  Control  Chart. And  when  I  click  that  button, it  moves  the  variable from  the  Subgroup  role into  the  Label  role. So  you  see  we  now  have an  Individual and  Moving  Range  Chart of  Diameter. Now,  the  next  option  is a  Row  Legend. Row  Legend  is  new for  Control  Chart  Builder. And  I  have  a  little  note  here. The  Row  Legend  option   is  only  going  to  appear when  there's  only  one  row  per  subgroup. So  if  you  right- click  like  you  do  in  a  lot of  other  graphs  in  many  other  platforms in  JMP,  you'll  now  see  a  Row  Legend  here, but  only  if  you  have  one  row  per  subgroup. And  the  Row  Legend   acts  like  a  row  legend  does  anywhere  else. I  can  choose,  say,  for  example,  Operator, and  it  will  color  by  Operator  by  default. And  now  you  have  your points  colored  accordingly. The  next  option— I'm  going  to  close  this— is   Connect Thru  Missing. Now, Connect Thru  Missing is going  to  involve  some  missing  data. So  let's  open  up  Coding, which  happens  to  have  the  Weight that you might  normally  be  measuring, but  it  also  has  Weight  2 that  has  missing  data. If  I  go  through  the  interface and  create  two  control  charts, you  notice  we  have   a  good- looking  control  chart  here. Everything  is  connected  and  so  forth. But  if  we  scroll  down  to  the  second  one, we  see  some  gaps. And  sometimes  management   doesn't  want  to  see  the  gaps, so  we  need  to  connect  those. So  there's  a  new  option under  the  red  triangle  menu called  Connect Thru  Missing. You  can  see  the  little  caption  there. It  says,  "This  item  is  new as  of  version  17." This  was  in  the  old  Legacy  platform. And  so  I've  been  bringing  more  options into  Control  Chart  Builder that  were  available in  the  old  Legacy  platform. So  there's  your   Connect Thru  missing. Now,  the  next  option— I'm  going  to  switch back  to  my  slides  here  for  a  moment— so  the  next  option  is  the  Laney and  P  prime  control  charts. This  is  a  bigger  option. So  let's  think  about Control C harts  for  a  moment. The  purpose  of  Control  Charts is to  show  the  stability  of  your  process. If  your  process  is  not  stable, then  you  cannot  reliably  make the  same  sized  part, which  is  going  to  be  a  problem for  all  of  your  customers. And  so  there's  lots  of  tests  involved in  making  sure  that  you  are  stable, that  you're  reliably  able to  make  the  same  part. Now,  if  you're  looking  at  attribute control  charts,  those  are  based  on  either the  Binomial  or  the  Poisson  distribution, and  those  assume  a  constant  variance. Now,  what  happens  if the  variance  changes  over  time? Maybe  there's  humidity   or  there's  temperature  problems or  there's  wear  and  tear  on  a  gear. This  is  what  statisticians  refer  to as  over dispersion, or  in  rare  instances,  under  dispersion. And  one  parameter  distribution cannot  model  this. So  Laney  proposes that  we  normalize  the  data in  order  to  account  for  the  variant and  account  for  varying  subgroup  sizes. And  David  Laney  wrote  a  paper  in  2002, Improved  Control  Charts  for  Quality. So  let's  take  a  look  at  the  Laney  charts. Here  I  have  some  data also found  in  the  sample  data. This  is  not  a  terribly  large  lot  size, but  here  we  have  a  column for  teaching  purposes that  has  a  varying  lot  size. So  let's  explore  how  this  works. If  we  were  to  look  at,  say, a  P  chart  of  the  number  of  defective out  of  this  varying  lot  size. I'm  going  to  use  the  menus, the  dialogs  again, I'm going to create  a  P  chart   to  start  with and  let's  see  how  that  performs. Okay,  we're  going  to  look at  our  number  defective, and  then  we  have  the  lot as  our  Subgroup  identifier. Now,  I'm  going  to  use  Lot  Size  2 because  that's  the  varying  lot  size. And  click  OK. All  right, so  on  first  glance, yes,  we  expected  the  non- constant  limits because  we  have the  varying  subgroup  sizes. But  we  also  notice  immediately that  our  chart, this  process  is  out  of  control because  we  have  these  points that  are  beyond  the  limits and  we  can  turn  on the  Test  Beyond  Limits and  they're  flagged. And  so  this  process  would  probably  raise all  kinds  of  alarms  and  people would  be  trying  to  retool  things. Now,  if  I  show  the  control  panel when  I  have  the  statistics  set to  proportion,  because Laney  only,  in  his  paper, gave  formulas  for  the  P  and  the  NP  chart,  or the  P  and  the  U  chart. So  currently  it's  only  implemented for  a  proportion. But  when  you  have  your  statistics  set to  proportion,  you  have  four  choices  now instead  of  just  two  on  your  Sigma. So  we  could  switch   to  the  Laney  P  prime  chart and  see  what  that  difference is  going  to  be. And  suddenly  you  see  your  process is  not  nearly  as  problematic. It's  not  out  of  control  at  all. It  looks  like  this  process is actually  stable,  which  is  great  news. Now,  is  this  is  this  really  okay,   you  might  ask, or  is  this  cheating? Let's  take  a  look  at  the  formulas and  help  us  figure  this  out. So  Laney  suggested  that  we  compute a  moving  range, Sigma  on  the  standardized  values. So  these  Z's,  those  are our  standardized  values. We  compute  an  average moving  range  on  that. And  we  have  a  Sigma  sub z , which  is  the  average  moving  range divided  by  1.128. And  then  we  take  that  Sigma  sub  z and we  insert  it  into  the  exact  same formula  that  we  saw  for  our  P  limits. And  so  what  you  can  see  from  this is  if  you  actually  have a  constant  variance, this  Sigma  sub  z  is  going  to  approach  one. Many argue, including Laney, that it is generally safe to use this instead of the P chart since it's going to approach one and  it's  going  to  be  the  same anytime  you  actually  do  have constant  limits. So  there's  the  Laney  P  prime  chart. I  wanted  to  show  you  also, there's  a  few  dialog  updates. Let  me  show  you  some  of  those  right  here. So  I  hinted  a  little  bit  at  it. You  can  see  the  Laney P  prime  and  U  prime. Those  are  two  new  dialogs that  you  can  see  there. The  IMR  chart  now  has  a  label  role   on  the  dialog. The  XBar  and  our  Control  Chart  now  has a  Constant  Subgroup  Size  option in  case  you  don't  have  a  subgroup that  you  want  to  specify. There's  a  little  more  work  that  was  done on  the  Three  Way  Control  Charts. So  that  now,  not  only  can  you  specify the  constant  subgroup  size if  you  don't  have  a  subgroup  already  identified, you  can  also  choose  your  Grouping  Method, your  Between  and  Within  Sigmas for  your  control  chart. So  there's  different  options that  are  added on  the  Three W ay  Control  Chart  dialog. And  I  want  to  thank  you very  much  for  your  time. If  you  have  any  questions, please  feel  free  to  ask. Thank  you.
Degradation data analysis is used to assess the reliability, failure-time distribution or shelf-life distribution of many different kinds of products including lasers, LEDs, batteries, and chemical and pharmaceutical products. Modeling degradation processes shines a light on the underlying physical-chemical failure-mechanisms, providing better justification for the extrapolation that is needed in accelerated testing. Additionally, degradation data provides much richer information about reliability, compared to time-to-event data. Indeed, by using appropriate degradation data, it is possible to make reliability inferences even if no failures have been observed.   Degradation data, however, bring special challenges to modeling and inference. This talk describes the new Repeated Measures Degradation platform in JMP 17, which uses state-of-the-art Bayesian hierarchical modeling to estimate failure-time distribution probabilities and quantiles. The methods we present are better grounded theoretically when compared to other existing approaches. Besides advantages, Bayesian methods do pose special challenges, such as the need to specify prior distributions. We outline our recommendations for this important step in the Bayesian analysis workflow. In this talk, we guide the audience through this exciting and challenging new approach, from theory to model specifications and, finally, the interpretation of results. The presenters conclude the talk with a live demonstration.     In  this  talk,  we're  going  to  describe repeated  measures  degradation   and  its  implementation  in  JMP. I'm  going  to  present  the  background,  motivation, some  technical  ideas  and  examples. Then  I'll  turn  it  over  to  Peng, who  will  do  a  demonstration  showing just  how  easy  it  is  to  apply these  methods  in   JMP 17. Here's  an  overview  of  my  talk. I'm  going  to  start  out  with  some motivating  examples, and  then  explain  the  relationship  between degradation  and  failure  and  the  advantages of  using  degradation  modeling in  certain  applications. Then  I'll  describe  the  motivation for  our  use  of  Bayesian  methods to  do  the  estimation. To  use   Bayes' methods  for  estimation, you  need  to  have  prior  distributions. I'll  talk  about  the  commonly  used, noninformative  and  weekly  informative prior  distributions. Also  in  some  applications, we  will  have  informative  prior distributions  and  how  those  can  be  used. Then  I'll  go  through  two  examples, and  at  the  end  I'll  have some  concluding  remarks. Our  first example  is  crack  growth. We  have  21  notched  specimens. The  notches  were  .9 "  deep, and  that's  like  a  starter  crack. Then  the  specimens  are  subjected  to  cyclic  loading, and  in  each  cycle, the  crack  grows  a  little  bit. When  the  crack  gets  to  be  1.6 "  long, that's  the  definition  of  a  failure. We  can  see  that  quite  a  few of  the  cracks  have  already  exceeded that  level,  but  many  of  them  have  not. Traditionally,  you  could  treat those  as  right  censored  observations. But  if  you  have   the  degradation  information, you  can  use  that  to  provide additional  information  to  give  you a  better  analysis  of  your  data. The  basic  idea  is  to  fit  a  model to  describe  the  degradation  paths, and  then  to  use  that  model  to  induce a  failure  time  distribution. Our  second  example   is  what  we  call  Device  B, a  radio  frequency  power  amplifier. Over  time  the  power  output  will  decrease because  of  an  internal degradation  mechanism. This  was  an  accelerated  test   where  units  were  subjected to  higher  levels  of  temperature, 2 37, 195  and  150  degrees  C. The  engineers  needed  information   about  the  reliability  of  this  device so  that  they  could  determine  how  much redundancy  to  build  into  the  satellite. The  failure  definition  in  this  case  was when  the  power  output  dropped  to  -.5  decibels. We  can  see  that  all   of  the  units  had  already  failed at  the  higher  levels  of  temperature, but  at  150  degrees  C, there  were  no  failures  yet. But  there  is  lots  of  information  about  how close  these  units  were  to  failure by  looking  at  the  degradation  paths. Again,  we  want  to  build  a  model for  the  degradation  paths, and  then  we  use  that  to  induce a  failure  time  distribution. In  this  case,   the  use  condition  is  80  degrees  C. We  want  to  know  the  time  at  which  units operating  at  80  degrees  C, would  reach  this  failure  definition. Once  again,  we  build  a  model for  the  degradation  paths. We  fit  that  model,  and  then  from  that  we can  get  a  failure  time  distribution. Many  failures  result  from  an underlying  degradation  process. In  some  applications, degradation  is  the  natural  response. In  those  situations, it  makes  sense  to  fit  a  model to  the  degradation  and  then  use the  induced  failure  time  distribution. In  such  applications, once  we  have  a  definition  for  failure, which  we  call  a  soft  failure because  the  unit  doesn't  actually  stop operating  when  it  reaches  that  level of  degradation,  but  it's  close  enough to  failures  that  engineers  say  we  would like  to  replace  that  unit at  that  point,  just  to  be  safe. Now,  in  general,  there's  two  different kinds  of  degradation  data, repeated  measures  degradation, like  the  two  examples  that  I've  shown  you, and  destructive  degradation, where  you  has  to  destroy  the  unit  in  order to  make  the  degradation  measurement. For  many  years,  JMP  has  had very  good  tools   for  handling  degradation  data. I'm  focused  in  this  talk on  the  repeated  measures  degradation  methods that  are  being  implemented  in  JMP 17. There  are  many  other  applications of  repeated  measures  degradation, for  example,  LED  or  laser  output, the  loss  of  gloss  in  an  automobile  coating and  degradation  of  a  chemical  compound, which  can  be  measured   with  techniques  such  as  FTIR or  any  other   measured  quality  characteristic that's  going  to  degrade  over  time. There  are  many  applications for  repeated  measures  degradation. There  are  many  advantages  of  analyzing degradation  data  if  you  can  get  them. In  particular, there  is  much  more  information in  the  degradation  data relative  to  turning   those  degradation  data into  failure  time  data. This  is  especially  true if  this  heavy  censoring. Indeed,  it's  possible  to  make inferences  about  reliability from  degradation  data  in  situations where  there  aren't  any  failures  at  all. Also,  direct  observation of  the  degradation  process   allows  us  to  build  better  models for  the  failure  time  distribution   because  we're  closer to  the  physics  of  failure. Now,  several  years  ago, when  we  were  planning  the  second  edition of  our  reliability  book, we  made  a  decision  to  use more  Bayesian  methods  in  many  different  areas  of  application. One  of  those  is  repeated   measures  degradation. Why  is  that? What  was  the  motivation  for  using  Bayes'  methods in  these   applications? I  used  to  think  that  the  main  motivation for  Bayesian  methods  was  to  bring prior  information  into  the  analysis. Sometimes  that's  true, but  over  the  years, I've  learned  that  there  are  many other  reasons  why  we  want to  use  Bayesian  methods. For  example,  Bayesian  methods  do  not  rely on  large  sample  theory to  get  confidence  intervals. It  relies  on  probability  theory. Also,  it  turns  out  that  when  you  use  Bayes'  methods  with  carefully  chosen, noninformative  or  weekly informative  prior  distributions, you  have  credible  interval  procedures that  have  very  good  coverage  properties. That  is,  if  you  ask  for  95 %  interval, they  tend  to  cover  what  they're  supposed  to  cover  with  95%  probability. In  many  applications, there  are  many  non-Bayesian  approximations that  can  be  used  to  set confidence  intervals. When  you  do  Bayesian  inference, it's  very  straightforward. There's  really  only  one  way  to  do  it. Also,  Bayesian  methods  can  handle with  relative  ease, complicated  model  data  combinations for  which  there's  no maximum  likelihood  software  available. For  example,  complicated  combinations of  nonlinear  relationships, random  parameters  and  sensor  data, the  Bayes'  methods  are  relatively straightforward  to  apply in  these  complicated  situations. Finally,  last  but  certainly  not  least, Bayesian  methods  do  allow  an  analyst to  incorporate  prior  information into  the  data  analysis. But  I  want  to  point  out that  the  revolution  we've  had in  the  world  of  data  analysis   to  use  more  Bayesian  methods, most  analysts  are  not  bringing informative  prior  information into  their  analysis. Instead,  they  use  weekly  informative  or  noninformative  priors so  that  they  don't  have  to  defend the  prior  distribution. But  in  many  applications, we  really  do  have  solid  prior  information that  will  help  us  get  better  answers. I  will  illustrate  that in  one  of  the  examples. Bayesian  methods  require the  specification  of  a  prior  distribution. As  I  said,  in  many  application, analysts  do  not  want  to  bring informative  prior  information   into  the  modeling  and  analysis. What  that  requires  is  some  default prior that's  noninformative or  weekly  informative. There's  been  a  large  amount of  theoretical  research  on  this  subject over  the  past  40  years, leading  to  such  tools  as  reference  priors, Jeffrey's  priors, and  independent  Jeffrey's  priors that  have  been  shown  to  have  good frequentist  coverage  properties. One  of  my  recent   and  current  research  areas is  to  try  and  make   these  ideas  operational in  practical  problems, particularly  in  the  area  of  reliability. A  simple  example  of  this  is  if  you  want to  estimate  a  location  parameter in  the  log  of  a  scale  parameter, a  flat  prior  distribution  leads to  credible  intervals  that  have exact  coverage  properties. That's  very  powerful. Also,  flat  prior  distributions  can  be  well approximated  by  a  normal  distribution with  a  very  large  variance,  and  that  leads to  weekly  informative  priors. Again,  it's  somewhat  informative, but  because  the  variance  is  very  large, we  call it  weekly  informative. The  approach  that  I've  been  taking to  specify  prior  distributions  is  to  find an  unconstrained  parameterization, like  the  location  parameter and  the  log  of  the  scale  parameter  that  I  mentioned  above, and  then  use  a  noninformative or  weekly  informative  flat  or  normal  distribution with  very  large  variances  as  the  default  prior. Then  it's  always  good  idea  to  use  some sensitivity  analysis  to  make  sure  that  the prior  are  approximately  noninformative. That  is,  as  you  perturb  the  prior  parameters, it  doesn't  affect  the  bottom  line  results. JMP  uses  very  sensible, well-performing  methods  to  specify default  prior  distributions   that  are  roughly  in  line with  what  I've  described  here. Having  those  default  prior  distributions makes  the  software  user-friendly, because  then  the  user  only has  to  specify  prior  distributions where  they  have  informative  information that  they  want  to  bring  in. Here's  just  an  illustration  to  show that  as  the  standard  deviation of  a  normal  distribution  gets  larger, you  approach  a  flat  prior  distribution. Now,  as  I  said,  in  some  applications, we  really  have  prior  information that  we  want  to  bring  in  that  is informative  prior  information. When  we  have  such  information, we  will  typically  describe  it with  a  symmetric  distribution, like  a  normal  distribution, although  some  people  prefer  to  use what  we  call   a  location- scale  t  distribution because  they  have  longer  tails. In  most  applications  where  we  have  this informative  prior  information, it's  only  on  one  of  the  parameters. Then  we're  going  to  use  noninformative, or  weekly  informative  prior  distributions for  all  of  the  other  parameters. Let's  go  back  to  alloy  a. What  we're  going  to  do  is  we're  going to  fit  a  model  to  the  degradation  paths, and  then  use  that  model  to  induce a  failure  time  distribution. Now  if  you  look  in  an  engineering  textbook on  fatigue  or  materials  behavior, you'll  learn  about  the   Paris  crack-g rowth  model. It's  always  nice  to  have  a  model that  agrees  with  engineering  knowledge. JMP  has  implemented this   Paris crack-growth model. Here's  the  way  it  would appear  in  a  textbook. Then  on  the  right  here  we  have the  JMP  implementation  of  that. It's  one  of  the  many  models  that  you  can choose  to  fit  to  your  degradation  data. Now,  c  and  m  here, which  in  JMP  is  c  and  b2, are  materials  parameters, and  they  are  random  from  unit  to unit. The  K  function  here  is  known  as a  stress  intensity  function. For  the  crack  we're  studying  here, the  stress  intensity  function has  this  representation. Now  this  is  a  differential  equation, because  we've  got  a  of  t  here,  and  also  a  here, you  can  solve  that  differential  equation   and  get  this  nice  closed  form. This  is  the  model that's  being  fit  within  JMP. Again  the  parameters   b1  and  b2 will  be  random  from  unit  to  unit. Now  here's  the  specification of  the  prior  distribution. I've  illustrated  here  two  different prior  distributions, the  default  prior  in  JMP, and  the  prior  distribution   that  we  used  in  the  second  edition of  our  reliability  book,   which  we  call  SMRD2, statistical  methods  for  reliability  addition  two. Now  the  way  we  specify  prior  distributions in  SMRD2  and  JMP  is  doing  this  as  well, is  with  what  I  call  a  99 %  range. For  example,  we  say  that  we're  going to  describe  the  mean  of  the   b1  parameter by  a  normal  distribution that  has  99 % of  the  probability between  -15  and  22. That's  a  huge  range. That  is  weekly  informative. Then  we  have  similar,  very  wide  ranges for  the  other  mean  parameter  here. Then  for  the  Sigma  parameter  JMP, following  usual  procedures  for  these  uses  a  half   Cauchy distribution, which  has  a  long  upper  tail, and  therefore,  again,   is  weekly  informative. Now  in  our  reliability  book, we  used  much  tighter  ranges. But  interestingly,  the  two  different   prior  distributions  here give  just  about  the  same  answer,   because  both  are  weekly  informative. That  is,  the  ranges  are  large  relative  to, let  us  say,  the  confidence  interval that  you  would  get  using  non-Bayes'  methods. Now,  in  addition  to  specifying  the  prior  distributions,  which  again, JMP  makes  very  easy  because  it  has  these  nice  default  priors, you  also  have  the  ability  to  control the   Markov chain Monte Carlo  algorithm. The  only  default  that  I  would  change  here is  typically  I  would  run more  than  one  chain. I  changed  the  one  here  to  four. The  reason  for  doing  that  is  twofold. First of  all,  in  most  setups, including  JMP, you  can  run  those  simultaneously,  so  it doesn't  take  any  more  computer  time. The  other  thing  is  we  want  to  compare those  four  different  Markov  chains to  make  sure  that  they're giving  about  the  same  answers. We  call  that  mixing  well. If  you  see  a  situation  where  one of  those  chains  is  different from  the  others,   that's  an  indication  of  a  problem. If  you  have  such  a  problem, then  the  usual  remedy  is  to  increase  the  number  of  warmup  laps, which  is  set  to  be  10  by  default,  but  you  can  increase  that. What  that  does  it  allows  JMP  to  tune the  MCMC  algorithm  to  the  particular problem  so  that  it  will  sample  correctly to  get  draws  from  the  joint  posterior  distribution. In  all  of  my  experiences  using  JMP, and  Peng  has  suggested  that  he's  had similar  experiences   that  by  increasing  that  high  enough, with  any  examples  that  we've  tried, JMP  will  work  well. But  with  10  for  most  applications, that  is  a  sufficiently  large  number. Here's  the  results. Here's  a  table  of  the  parameter  estimates. Well,  typically  in  reliability  applications, we're  not  so  much  interested   in  the  estimates  themselves. We're  going  to  be  interested  in  things like  failure  distributions, which  we  look  at  in  a  moment. Then  in  this  plot, we  have  estimates  of  the  sample  paths for  each  of  the  cracks. Again,  you  can  see  the  failure  definition  here. As Peng  will  show  you, JMP  makes  it  easy  to  look at  MCMC  diagnostics. It's  always  a  good  idea  to  look at  diagnostics  to  make  sure everything  turned  out  okay. What  you  do  is  you  export  posterior  draws  from  JMP, and  then  JMP  has  set  up  there  some  scripts  to  create these  various  different  diagnostics. For  example,  there's  a  script  to  make a  trace  plot  or  a  time  series  plot. I  always  like  to  compare  those for  the  different  chains. Then  there's  another  one  to  make   what  we  call  a  pairs  plot, or  scatterplot  matrix  of  the  draws. That's  what  we  see  here. Then  as  I  said,  we  can  use  those  draws to  generate  estimates   of  the  failure  time  distribution. JMP  implements  that  by  using the  distribution  profiler  here. We  can  estimate  fraction  failing  is  a  function  of  time for  any  given  number  of  cycles. Now  let's  go  to   the  Device  B RF  Power A mplifier, again,  an  accelerated  repeated measures  degradation  application. We're  going  to  need  a  model that  describes  the  shape  of  the  paths and  the  relationship  the  temperature has  on  the  rates  of  degradation. Again,  the  use  condition  in  this  application  is  80  degrees  C, and  we're  going  to  want  to  estimate the  failure  time  distribution at  80  degrees  C. In   SMRD2,  this  is  the  way  we  would  describe  the  path  model that  fits  device  b. We  call  this  an  asymptotic  model, because  as  time  gets  large, we  eventually  reach  an  asymptote. In  this  equation, X  is  the  transformed  temperature. We  call  it an  Uranus  transformation  of  temperature. X 0  is  an  arbitrary  centering  value. Beta  one  is  a  rate  constant for  the  underlying  degradation  process. Beta  three  is  the  random  asymptote. Those  two  parameters, the  rate  constant  and  the  asymptote, are  random  from  unit  to  unit, and  we're  going  to  describe that  randomness  with  a  joint or  bivariate  lognormal  distribution. Beta  two,  on  the  other  hand,  is  a  fixed, unknown  parameter  that  is  the  effect of  activation  energy  that  controls  how temperature  affects  the  degradation  rate. This  is  where  the  X0  comes  in. Typically  we  choose  X 0  to  be  somewhere in  the  range  of  the  data or  at  a  particular  temperature  of  interest, because  beta  one  would  be the  rate  constant  at  X 0. Again,  there's  a  large  number of  different  models  that  are  available. Here  is  how  you  would  choose this  particular  asymptotic  model. This  corresponds  to  the  same equation  we  have  in SMRD 2. The  only  difference  is  that  JMP  uses a  slightly  different  numbering  convention for  the  parameters. That  was  done  to  be  consistent with  other  things  that  are already  in  JMP  elsewhere. Again,  we  have   to  specify  prior  distributions, but  JMP  makes  that  easy because  they  provide  these  defaults, these  weekly,  informative  defaults. Here  I  have  the  default  that  JMP would  have  that  we're  going  to  use if  we  did  not  have  any  prior information  to  bring  in. I'm  going  to  do  that  analysis, but  I'm  also  going  to  bring  in  the  information  that  engineers  have. In  particular,  we  only have  information  for  b3 and  so  that's  being  specified  here. But  all  the  other  entries  in  the  table   are  exactly  the  same  as  the  JMP  default, again,  making  it  really  easy to  implement  these  kinds  of  analyses. Here's  the  results. Once  again,  here  we  have  a  table  giving  the  parameter  estimates, credible  intervals  and  so  forth. In  this  plot,  again, we  have  estimates  of  the  sample  paths for  all  of  the  individual  units. Again,  we  have   the  failure  definition  here, but  what  we  really  want  are  estimates  of  the  failure  time  distribution at  80  degrees  C. Again,  we're  going  to  do that  by  using  a  profiler. On  the  left  here  we  have  the  estimate of  fraction  failing  is  a  function  of  time at  80  degrees  C  for  the  default  priors. On  the  right  we  have  the  same  thing, except  that  we've  used  the  informative prior  distribution  for  B 3. Immediately  you  can  see   that  prior  information  has  allowed  us to  get  much  better  estimation  precision. The  confidence  interval is  much  more  narrow. Interestingly, the  point  estimate  of  fraction  failing actually  increased  when  we   brought  in  that  prior  information. In  this  case,  the  prior  information would  allow  the  engineers  to  get a  much  better  estimate of  fraction  failing  as  a  function  of  time. Then  to  make  that  important  decision about  how  much  redundancy to  build  into  the  satellite. Let  me  end  with  some  concluding  remarks. Repeated  measures  degradation  analysis is  important  in  many reliability  applications. It  is  also  important  in  many  other  areas  of  application, like  determining  expiration  dates for  products  like pharmaceuticals  and  foodstuffs. When  will  the  quality  be at  such  a  low  level  that  the  customers  no  longer  happy? Also,  in  certain  circumstances, we  can  bring  prior  information into  the  analysis,  potentially  allowing  us to  lower  cost  of  our degradation  experiments. JMP  17  has  powerful, easy- to- use  methods  for  making lifetime  inferences  from  repeated measures  degradation  data. Now  I'm  going  to  turn  it  over  to  Peng, and  he's  going  to  illustrate  these  methods to  you  using  a  demonstration  in  JMP. Thank  you,  Professor. Now,  the  demo  time. The  purpose  of  a  demo  is  to  help  you to  begin  exploring  the  stable  art  approach to  analyze  repeated measures  degradation  data. I  will  show  you  how  to  locate   the  sample  data  tables  using  JMP, how  the  information  is  organizing  in  the  report and  highlight  some  important information  that  you  need  to  know. First,  there  are  three  repeated  measures degradation  data  samples among  JMP  sample  data  tables. Alloy  a, d evice  b, are  two  examples  with  embedded  scripts. The  s laser  does not  have an embedded script. Alloy  a  is  an  example without  an  x  variable. Device  b  is  example  with  an  x  variable. To  find  them  go  to  the  help and  click  sample  index. Try  to  find  the  outline  node  called reliability/ survival. Unfold it  and  should  see a lloy  a  is  here, and  device  b  is  here. To  find  GAS  laser, you  need  to  go  to  the  sample  data  folder on  your  computer  by  clicking  this  button, open  the  sample  data  folder. Now  I'm  going  to  open  alloy a. Then  I'm  going  to  analyze  menu,  reliability  and  survival, choose  the  repeatedly  measure  degradation. Now  we  see  the  launch  dialog. Our  assign  length,  crack  lengths  goes  to  y, specimen  goes  to  label  system  ID and  million  cycles  go  to  times. Then  I'm  going  to  click  okay. This  is  the  initial  report. It's  a  fifth linear  models for  individual  batches. I'm  going  to  select  the  third model here  by  clicking  this  video  button. This  fifth  initial  model  of  a  Paris  model  of  this  alloy a  data. I'm  going  to  click  this,  go  to  Bayesian  estimation  button and  generate  a  configuration  interface. Here  we  see  the  model  formula. Here  are  the   default  settings  for  the  priors. Then  we  are  not  going  to  change anything  right  now, and  we  are  just  going  to  use the  default  prior  to  fit  our  model. Now  I'm  going  to  click this  fit  model  button  here   and  let  it  run. Then  I'm  going  to  explain what  are  in  the  report. In  the  end,  how  to  get the  failure  distribution  profiler. Now  I'll  click  the  button, the  algorithm  start  run and  the  progress  dialogue  said  that  the  first step  is  tuning. The  underlying  algorithm  will  go  through  some round  of  warm  up laps, is  procedure. The  algorithm  is  trying  to  learn  the  shape of  the  posterior  distribution, for  example,  where  the  peak, how  wide  is  the  span,  et cetera. In  the  end,  they  will  try  to  figure  out what  is  a  good  thinning  value  to  draw samples  from  the  posterior  distribution, such  that  the  samples  are   as  little  auto correlated  as  possible. Then  the  algorithms  enter  the  second  step. The  dialogue  says,  this  step  is collecting  posterior  samples. In  this  step,  an  automatic thinning  of  80  is  applied. The  dialogue  shows  how  much  time in  total  the  algorithm  had  been  running. In  the  second  stage,  the  dialogue also  shows  expected  completion  time. By  such,  I  hope  it  can  help  users to  adjust  to  their   expectation  accordingly. Sometimes  excessive,  long  expected completion  time  is  a  sign  of  a  problem. Then  we  wait  a  little  bit, and  the  algorithm  should  finish  soon. Okay,  now  the  algorithm  has  finished. Let's  see  what's  in  the  report. First is  the  completion  time. If  we  left  your  computer  run  over  time, you  may  want  to  know in  the  morning  on  next  day. The  second  is  a  copy  of  your  settings, including  priors,  number  of  iterations, random  seed  and  other  thing shows. Third part  is  posterior  estimates. Be it a  summary   or  the  posterior  samples, there  are  two  links  on  the  site  to  allow you  to  export  posterior  samples. I'm  going  to  emphasize  the  first link. One  purpose  of  using  these  first link to  export  posterior  examples  is  to  inspect  potential  problems. Two  main  concerns  are:  convergence  and  effective  sample  size. Let's  look  at  it. The  table  have  parameters in  individual  columns. Each row  a  posterior  sample. There  are  several  embedded  scripts. The  most  important  one  is  the  first one. I'm  going  to  click this  green  triangle  to   run the  script. The script simply  run  time  series  on  individual  columns and  show  their  time  series  plot. In  the  context  of  MCMC, this  plot  is  known  as  the  trace  plot. What  do  we  see  here? What  I  call  good  results. The  series  are  stationary   and  no  significant  auto correlation. Loosely  speaking,  when  I  say  stationary  in this  context,  I  specifically  mean  plots looks  like  these. They  are  straight  equal with  band  of  random  dots. Okay,  let  me  close the  report  and  the  table. We  are  seeing  good  results  here. Also  the  data  and  a fitted model also  shows  the  results  is  good. Now  we  are  ready  to  ask  for  a fitted time  distribution  profiler. To  do  that,  go  to  the  report  outline  node  menu and  select  show  like distribution  profilers. Most  entries  in  this  dialogue has  sensible  default  values and  then  we  only  need  to  supply  one  of  the failure  definition. I'm  going  to  enter   1.6   to  this  upper  failure  definition. Before  I'm  going  to  click okay, I'm  going  to  reduce  this  number of  SMRD2   realizations  to  5,000 to  save  me  some  time. Then  I'm  click  on  okay. This  is  also   a  computational  intensive  procedure, but  not  as  expensive  as  MCMC  in  general. It  should  finish  quickly. You  can  use  the  profilers  to  get the  failure  probability and  the  quantile  estimates. I'm  not  going  to  elaborate  further, because  profiler  is  a  very  common and  important  feature  in  JMP. Okay,  this  is  the  end   of  the  demonstration, and  you  are  ready  to  explore  by  yourself. But  here  are  a  couple  of  tips that  might  be  useful. First,  before  you  save   the  script  to  the  table, go  to  check  this   save  posterior  to  script  option before  you  save  the  script  to  the  table. By  this,  next  time   you  run  the  save  script, software  will  bring  back  the  fitted  model  instead  of  going through  the  lengthy   MCMC  procedure  once  again. The  second  thing  that  I  want to  bring  to  your  attention  is we  have  seen  good  examples,  good  results, but  there  are  bad  ones. Here  are  some  bad  examples. This bad  example  means either  fail  to  converge, or  there  are  high  auto correlations. To  address  them  my  first suggestion  is  to  increase the  number  of  warm up  laps. Second  suggestion   where  we  turn  off  auto-thinning and  apply  a  large  thinning  value, a  thinning  number  manually. If  those  suggestions  don't  work, it's  likely  that  the  model or  its  configurations  are  not appropriate  for  the  data. You  may  need  help. Okay,  this are  all  we would  like   to  illustrate  this  time, and  I  hope  you  can  start  to  use  this information  to  explore the  state  of  art  approach  to  analyze  repeated  measures  degradation  data. Thank  you.
Monday, September 12, 2022
JMP Live is a secure collaboration platform for JMP content. The first step to collaborating is publishing your content to JMP Live. We start with the simple case of publishing a report, and then we dive into the flexibility that is available to you if you need it.   In this presentation, we demonstrate: How to publish and replace reports. How to publish data by itself (so that others can use it in their analyses). How to make use of existing data that's already on JMP Live. How to select (or create) a place to put your published content. How to do all of this with JSL.     Hello.  I'm  Michael  Goff and  I'm  joined  today   by  Aurora  Tiffany- Davis. We're  both  software  developers on  the  JMP  Live  team. For  those  of  you  who  don't  know,  JMP  Live is  a  relatively  new  product  from  JMP. It  allows  you  to  securely  share   your  JMP  insights  with  your  colleagues , even  if  they  don't  use  JMP  themselves. Additionally,  JMP  Live   enables  collaboration with  colleagues  who  do  use  JMP. You  can  view,  talk  about,  build  upon, and  improve  each  other's  JMP  content. Of  course,  the  first  step  to  collaborating is  getting  your  content  up  to  JMP  Live. So  today  we're  going  to  focus on  that  publishing  step. It's  really  straightforward   to  do  most  publishes, but  we're  going  to  explore  some  of   the  more  advanced  options  today  as  well. Let's  go  ahead  and  get  started. All  right,  let's  pull  up  JMP  here. Go  ahead  and  minimize  this. All  right,  first  up,  I'm  going to  open  up  a  data  table  here. This  is  a  data  table   of  college  financial  data that  I'd  like  to  share  with  my  colleagues. For  those  who  don't  know,   many  US  universities participate  in  athletic  events  and  joined  together  into  conferences for  shared  negotiating  power and  scheduling  stability. College  athletics  is   a  huge  revenue  generator for  all  these  US- based  universities. And  lately,  that  landscape  has  been  changing  significantly. Various  media  providers   like  ESPN  and  Fox  Sports have  been  signing  deals  with  these  conferences to  get  their  games  on  TV. And  of  course,  once  you   bring  money  into  the  equation, things  start  to  get  a  little  bit  wacky. I  have  my  data  set  here. Let's  go  ahead  and  create a  graph  builder  report. I  have  one  saved  on  the  table to  save  some  time. Here's  a  graph  builder. Each  bar  represents   one  of  these  conferences. You  can  already  see  that   there's  some  revenue  leaders  here. We  have  the  SEC  and  the  Big  Ten. They  have  the  largest  slice  of  the  pie. And  we  also  have  some  other  major  players: the  PAC-12 ,  the  Big 12,  and  the  ACC. Our  local  universities  in  this  area   are  all  members  of  the  ACC. That  would  be  NC  State  University,   UNC,  and  Duke. One  other  thing  to  note  about  this  graph: this  is  only  the  public  universities. Private  institutions  don't  have to  share  their  data. So  this  isn't  quite  a  complete  picture, but  it's  enough  to  get  the  point  across. Anyway,  I'm  going  to  go  ahead  and   publish  this  and  share  this  with  my  peers. To  do  that,  all  I  need  to  do  is  hit F ile,   Publish,  and  Publish R eports  to  JMP  Live. Okay.  We  have  our  intro  screen  here. We  select  from  available  reports. I  only  have  one  report  open, and  that's  this  graph  builder. And  I  want  to  publish  the  new  report. I'm  going  to  go  ahead  and  hit  Next  here. The  next  step  is   to  select  a  publish  location and  we  choose  a  space  to  publish  to. I'm  going  to  choose  the  Discovery Americas  2022  space  to  begin  with. S paces  are  a  new  concept  in  JMP  Live  17. They're  an  organizational  tool  to  help  you keep  your  related  content  together and  shared  with  the  right  people. We're  going  to  take   a  little  bit  of  a  closer  look at  the  rest  of  the  screen   in  a  couple  of  minutes. I'm  just  going  to  take the  defaults  here  and  move  on. Next  up  is  the  Configure  Reports  page. This  is  very  similar  to   the  previous  version  of  JMP  Live. We  have  our  title,  description, and  some  advanced  options  here  as  well. I'm  just  going  to  take  the  defaults  here  and  go  ahead  and  publish. I've  successfully  published  one  report. I've  published  to  the Discovery  Americas  2022  space. I've  created  a  new  report,   this  graph  builder  here, and  some  new  data  as  well, the  College  Finances  table. We  have  some  options  at  the  bottom. I  can  choose  to   close  reports  after  running, or  I  can  save  the  published  script to  the  script  window. Let's  go  ahead  and  check that  box  and  close  this  window. Now,  this  is  new  in  JMP  17. We  have  the  ability   to  automatically  generate  scripts from  your  interactive  publishes. Let's  go  over  what  the  script  is  doing. First,  we  have  our  new   JMP  Live  connection, and  we're  connecting  to  devl ive 17. That's  one  of  the  JMP  Live  instances. Next,  we  create  some  new  JMP  Live  content and  we  pass  in  some  options  here. The  first  option  is   passing  in  a  report  from  JMP, and  to  do  that,  I'm  grabbing the  window  titled  the  Graph  Builder. Then  we  have  a  bunch   of  default  options  here: title,  description,  and  a  couple  of  other  advanced  options  that  are  set  for  us that  we're  going  to   talk  about  in  a  little  bit. Finally,  we  choose  to  Publish. We  pass  that  content  in   and  we  choose  the  space  to  publish  to. We  chose   the  Discovery  Americas  2022  space. This  here  that  I  have highlighted  is  a  space  key. Think  of  that  as an  identifier  for  the  space. I'm  going  to  go  ahead  and  close  this   and  get  this  out  of  the  way. Then  let's  open  up a  new  data  table  here. This  is  a  new  data  table, C ollege  Finances  with  New C onference  Affiliations. As  I  was  saying  earlier,  there's  been a  lot  of  changes  to  the  landscape  lately. Schools  have  been  changing  which  conference  they're  affiliated  with, and  it's  all  about  the  money. Let's  go  ahead  and  open  up  that  graph  builder  again. Here's  a  graph  builder   with  the  new  conference  alignments from  some  of  our  schools. You  can  see  here,   this  is  a  case  of  the  rich  getting  richer. We  have  the  SEC  and  the  Big  Ten  pulling  away  from  the  other  conferences. And  it  came  at  a  cost   to  the   Pac-12  and  the  Big  12. They've  lost  some  of  their  major  financial players  to  these  other  conferences. Let's  go  ahead  and  share   this  one  up  to  JMP  Live  as  well. Again,  we're  going  to  hit  File,  Publish, and  we're  going  to   Publish  Reports  to  JMP  Live. All  right.  We  can  select from  available  reports  again. I  have  just  my  graph  builder  here and  I'm  going  to  publish  new  again. Let's  go  ahead  and  hit  Next. Next,  we  still  have  our Discovery  Americas  2022  space, that  I  selected  previously. But  this  time  I'm  going  to  do a  little  bit  of  organizing. In  JMP  Live  17,  we've  created the  concept  of  hierarchical  folders. We  have  a  folder  here  inside  of  our  space, Deep  Dive:  Publishing to J MP  Live , that's  the  name  of  our  talk. We're  going  to  be  publishing all  of  our  stuff  into  this  folder. But  I  want  to  keep  things a  little  bit  more  organized. So  I'm  going  to  create another  folder  within. I'm  going  to  call  this  College  Finances. I've  created  the  College  Finances folder  inside  of  our  talk  folder, and  all  of  this  is  contained  within the  Discovery  Americas  2022  space. I  think  this  all  looks  good. I'm  going  to  go  ahead and  hit  Next  to  move  on. We're  back  on  the  Configure  Post  page. I'm  going  to  go  ahead and  name  this  something  else. Let's  call  it  New  Conferences. I  think  I'd  like  a  visual  aid to  help  communicate  the  absurdity of  some  of  these  college  landscape  changes. To  do  that,  I'm  going  to  add  an  image. I'm  going  to  choose  this image  here,  the  Big  ten  map. This  is  just  a  map  of   the  geographic  locations of  the  members  of  the  Big  Ten  conference. Historically,  they've  been  a  sort  of   Midwestern  conference  up  here, but  recently  they've  added  two  new  members,  UCLA and  the  University  of  Southern  California, both  over  here  on   the  West  Coast  in  California. Now  this  doesn't  really  make  sense  as  a  geographic  pairing, but  it's  all  about  the  money. So  they've  joined  up  to  get some  of  that  TV  money. Let  me  go  ahead  and  add  this, and  let's  name  this  something  else, Big  Ten  Map. I  think  I  want  to  reorder this  image  to  be  upfront in  front  of  my  report. To  do  that,  I  can  just click  it  and  drag  it and  pick  it  up  and  move  it  up  to  the  top. We  can  drag  and  drop   all  of  our  reports  in  a  single  publish to  order  them  in  any  way  we'd  like. I  think  this  looks  great, so  I'm  going  to  go  ahead  and  publish. Okay. I've  successfully  published  two  reports. I've  published  to   the  College  Finances  folder, and  that's  within our  Discovery  Americas  space. This  time  I've  created  two  new  reports: our  New  Conferences  graph  builder and  our  Big  Ten  Map  image. Finally,  I've  created   another  new  data  table, the  New  Conference A lignments  table  here. Let's  go  ahead  and  take  a  look at  this  folder  in  JMP  Live. I'm  going to  go  ahead  and  click  this  link here  to  the  College  Finances  folder. We've  opened  up  JMP  Live  now. We're  in  our Discovery  Americas  2022  space. We  are  in  our  College  Finances  folder, and  there's  our  two  reports. Now,  if  you  remember   the  first  publish  I  did, I  just  took  the  defaults  on  everything and  put  that  thing in  the  root  of  the  space  here. So  if  I  JMP  over  to  the  root, there's  my  first  report, along  with  some  pictures  of  some  of  our  presenters. I   want  to  move  this  report into  the  College  Finances  folder to  keep  things  organized. To  do  that,  I'm  going  to  jump  over  to  the  Files view  here. I'm  going  to  select   both  the  report  and  the  data, and  then  I'm  going  to  hit   the  Move  posts  button  over  here. When  I  do  that,  I  have   the  root  of  the  space  preselected  for  me because  that's  where we're  currently  located. But  I'd  like  to  move  this into  the  College  Finances  folder. Again,  we're  going  from  the  root  of  the  space  here, and  then  we're  moving  a  couple  of  levels  down  into  the  College  Finances  folder. So  when  I  hit  move, the  data  and  report  disappear, and  if  I  click  over   to  the  College  Finances, we  see  they've  shown  up  here. If  I  go  back  to  the  Reports  tab, I  have  my  three  reports  ready  to  go. With  that,  I'm  going  to   pass  things  over  to  Aurora to  take  a  look  at  some  more publishing  scenarios. Thank  you,  Michael. Well,  while  Michael  was   looking  at  College  Finances, I've  been  reviewing   some  internal  data  that  we  have on  the  development  of  the software  for  JMP  Live  version  17. Every  row  of  this  data  table  contains  a  brief  description of  a  code  change  that  one  of  us  made. I've  done  some  really  basic  text  analysis. First,  I  have  just  a  word  cloud. "What  are  developers  talking  about?" During  the  development  of  the  software for  JMP Live  version  17, we've  been  talking   an  awful  lot  about  data, and  we've  also  been  talking an  awful  lot  about  collaboration  spaces. But  every  developer  on  the  JMP Live  team has  had  a  slightly  different  focus. For  example,  Aaron  has  been  talking a  lot  about  downloading  projects. I've  been  talking  a  lot  about  access,   in  other  words,  collaboration  permissions. Chris  has  been  talking  a  lot about  user  groups  and  so  on. I  think  my  coworkers  might  find  this  interesting, so  I'd  like  to  publish  this  to  JMP  Live. So  I'll  go  to  File,  Publish, Publish  Reports  to  JMP  Live. And  of  course,  the  first  choice  I  have  is among  those  reports   that  are  currently  open, which  ones  do  I  want  to  publish? I  would  like  both  of  them. I  need  to  pick  a  space  to  put  that  in. Just  like  Michael,  I'll  pick the  Discovery A mericas  2022  space the  Deep  Dive  folder,  and  I  see  there's a  College  Finances  folder  within, but  that's  not  really  good  for  my  reports, so  I'm  going  to  just  create  a  new  one on  the  fly  here,  Software  Insights. Now,  Michael  has  shown  you   how  simple  publishing  can  be, but  we  do  also  want  to  show  you   some  of  the  more  advanced  options that  are  available  in  case  you  need  them. One  of  these  advanced options  is  publish  data. By  default,  this  is  true because  the  normal  thing  to  do  is  to  publish  not  only  your  reports, but  also  the  data   that  those  reports  rely  upon. The  reason  that's   the  normal  thing  to  do  is: first,  if  you  want  your  reports to  still  be  interactive  on  JMP  Live, you've  got  to  publish  the  data, because  the  data  is  what drives  that  interactivity. For  example,  column  switchers, local  data  filters,  stuff  like  that. In  my  case,  I  don't really  care  about  that. These  are  just  word  clouds. I  don't  need  them to  really  be  interactive. The  second  reason  why  it's  a  normal  thing  to  do  to  publish  your  data is  maybe  you  want  to  update   that  data  later  on. When  you  do  that,  you  want   your  reports  to  automatically  regenerate to  reflect  that  latest  information. I  don't  care  about  that. I  just  want  these  word  clouds to  be  like  a  snapshot  in  time. The  third  reason  that  it's  the  normal thing  to  do  to  publish  your  data is  maybe  you  would  like  your  colleagues  to  be  able  to  download  your  data so  that  they  can  also  run  analyses  on  it, create  some  new  reports,   that  kind  of  thing. Let's  say,  I  don't  care  about  that  either. So  there's  really  just  no  reason for  me  to  publish  my  data. I'm  going  to  turn  this option  off  for  both  reports. Another  reason  you  might  choose  to  turn this  option  off  and  not  publish  your  data, although  this  isn't  applicable  in  my  case, is  maybe  your  data   is  just  extraordinarily  large and  you  don't  want  to  wait  for  it   to  be  uploaded  to  JMP  Live. On  the  other  hand, if  you're  perfectly  fine   with  uploading  your  data  to  JMP  Live, you  just  don't  want  anybody else  to  be  able  to  download  it. In  a  case  like  that, you  would  just  come  down   and  uncheck  this  checkbox  that  says "Allow  data  and  scripts  to  be  downloaded". But  in  my  case,  I  just  don't want  to  publish  the  data  at  all. So I'm  happy  with  what  I've  set  up  here, and  I'm  going  to  click  Publish. We  can  see  here  that  we've  published to  the  Software  Insights  folder two  new  reports,  and  it  doesn't say  anything  about  data. That's  because  neither  of  these new  reports  use  data  at  all. They're  just  static  reports. If  we  want  to  confirm  that, we  can  follow  the  link  by  clicking on  either  one  of  these  reports. That  will  open  up  JMP  Live  and  take  us   directly  to  the  newly  published  report. We'll  click  on  the  Details  to  open  the  Details  pane. Scroll  down  to  the  Data  section,   and  we  can  confirm. Yes,  zero  data  sources are  used  by  this  report. Back  in  JMP, I'd  like  to  show  you  the  published  script that's  been  generated  for  us based  on  the  choices  we've  made, just  like  Michael  did. This  JSL  looks  pretty  similar  to  the  JSL that  Michael  showed  you  a  moment  ago. You're  creating  some  new  JMP  Live  content. You're  publishing  that  content. In  my  case,   I'm  publishing  directly  to  a  folder. I  think  the  JSL  you  saw  before   was  publishing  just  directly  to  a  space. And  there's  one  option, of  course,  that's  different, where  I  said,  "Publish  data?  No,  thanks." I  wonder  if  Michael  has  also  been  thinking about  his  data  and  how  best  to  use  it. Yeah, so  let's  go  ahead  and  look  at  JMP. I  have  a  data  table  here. This  is  a  data  set  of  the   New  York  Times puzzle  game,   Spelling  Bee . Lately,  I've  been  really  into   Wordle , but  the   New  York  Times   Spelling B ee  game  is  a  new  one  for  me. I'm  not  quite  sure   what  I  want  to  do  with  this  data, but  I  know  Aurora  really enjoys  playing  these  puzzle  games, so  I'm  going  to  publish   this  data  set  for  her  to  explore. Now,  in  prior  versions  of  JMP  Live, to  get  your  data  up  to  JMP  Live, you  had  to  publish  reports  along  with  it. But  in  version  17,  we've  been  prioritizing  data  a  little  bit  more, making  it  more  of  a  first- class  citizen. So  I  can  publish  this  data  by  itself. To  do  that,  I'm  going  to  hit  File, Publish,  and  Publish  Data  to  JMP  Live. Okay,  I  have  my  data  here. I  have  a  list  of  data. I  only  have  one  data  table  open,   the  New  York  Times  Bee  data, and  I  want  to  publish a  new  data  table  here. Just  like  Reports,  we  choose  a  space and  a  folder  within  to  publish  to. In  this  case,  I  want  to  create  another new  folder  for  this  new  type  of  data. I'm  going  to  call  this  Spelling  Bee. Okay. I'm  going  to  create  that. We  have  our  space  in  our  folder. But  unlike  reports,   when  you're  publishing  data, there's  not  much  more  configuration. So  I  can  just  go  ahead and  immediately  publish  right  now. I'm  going  to  hit  Publish. I've  created   the  New  York  Times  Bee  data  table, and  I've  published  it  to  the Spelling  Bee  folder  within  our  space. Let's  see  what  Aurora is  going  to  do  with  this  data. I've  heard  a  rumor  that  Michael   has  published  some  data about  the  Spelling  Bee  puzzle. I  like  to  play  that. So  I  want  to  take  a  look  at  this  data. I'm  browsing  around  on  my  organization's JMP  Live  site,  and  I'm  on  the  home  page. But  the  home  page   really  shows  you  just  reports, because  reports  is  what  most  people  want  to  see  most  of  the  time. I  just  want  to  find  some  data. So  I'm  going  to  use  the  search  field  up  here  in  this  blue  navigation  bar, and  I'm  going  to  hope  that  Michael has  named  his  data  table  well. I'm  going  to  see  if  I  can  search for  it  just  by  typing  in  the  word   bee . Great.  There  it  is,  "nytbee". But  maybe,  what  if  Michael  was  having  a  bad  day, and  he  wasn't  thinking  about  how  to  make  his  data  easily  findable and  he  called  it  just   cool  data  or  something. Well,  in  that  case, I  could  just  search  for  Michael. Let  me  search  for   his  last  name  here,  Goff. There  he  is.  Michael  Goff. I  can  open  his  profile  page. This  is  going  to  show  me  all of  the  reports  that  Michael  has  published, at  least  those  reports that  I'm  allowed  to  see. Under  this  Reports  tab, there's  a  Data  tab. This  shows  me  all  of  the  data that  Michael  has  published that  I'm  allowed  to  see. And  there  it  is,  "nytbee". However  you  find  it, once  you  find  it,  click  on  it. That  will  open  the  data  post. Okay,   I  can  see  that   there  is  some  data  called  "nytbee" and  it  was  published   a  minute  ago  by  Michael  Goff and  it's  not  used  by  any  reports  yet. But  is  it  actually  useful  data  for  me? I  don't  really  know. It's  possible  that  this  data  is  just   humongous and  so  maybe  I  don't  want to  take  the  time  to  download  it just  to  find  out  that  that's  not  what  I  want  after  all. So  I'm  going  to  use  this   View D ata  feature  to  get  a  sneak  peek at  the  shape  of  the  data and  see  what's  in  there, see  if  it's  useful  to  me. It  looks  like  it's  got   a  bunch  of  information  for  every  day that  a  new  puzzle  came  out. Some  of  this  looks  like  not  super  helpful, so  let  me  get  that   out  of  the  way. All  right.  Oh,  cool. It  looks  like  for  every  date,  we  have the  letters  that  were  used  in  the  puzzle, stuff  like  the  maximum  score  you  could  achieve, the  number  of  pan grams. The pan gram  is  the  number of  solutions  that  there  are that  use  all  of  the  letters  in  the  puzzle. These  are  few  and  far  between. It's  what  makes  it  really exciting  when  you  get  one. This  is  definitely  data  I  want  to  analyze. So  I  need  to  get  it  from  JMP  Live down  to  my  local  machine. I  can  do  that  by  going  to  the  menu and  clicking  Download  data  table, or  because  I  already  have  a  connection  set  up  between  JMP  on  my  machine and  my  organization's  JMP  Live  site,   I  can  use  a  shortcut  called  Open  in  JMP. What  this  is  going  to  do, it's  just  going  to  download  it   in  the  background  for  me and  then  open  it  for  me  in  JMP. There  we  go. This  has  opened  this  for  me in JMP  on  my  machine. To  save  some  time  in  the  demo, we  have  some  analyses  ready  to  run. We've  got  Number  of  Pan grams  versus  Date. There  are  very  few  pan grams,  typically. Distribution  of  Letters,   not  too  surprising. Q  is  not  a  very  common  letter   to  see  in  this  puzzle, but  A  is  a  very  common  letter. And  some  difficulty  metrics. So  what's  the  maximum   score  you  can  achieve and  what  is  the  maximum   number  of  words  you  can  make based  on  what  letter   you're  required  to  use. So  if  you're  required  to  use  Z   in  all  of  your  solutions, you're  pretty  darn  constrained   on  the  maximum  score  that  you  can  achieve. I  like  to  get  these   reports  up  to  JMP  Live. So  I'll  go  to  File, P ublish,   Publish  Reports  to  JMP  Live. As  always,  I  need  to  choose  among   those  reports  that  are  currently  open which  ones  do  I  want  to  publish? I  want  all  of  them. Then,  of  course,  I  need to  choose  where  to  put  it. Spelling  Bee,  obviously. Now,  remember  when  I  told  you  earlier that  the  normal  thing  to  do is  to  publish  not  only  your  reports, but  also  the  data  that   those  reports  rely  upon? But  in  this  case,  I  don't  want   to  publish  the  data  up  to  JMP  Live because  I  know  it's  already  on  JMP Live. I  mean,  I  just  downloaded  it. But  I  also  don't  want  to  say,  "Don't  use  data," because  then  my  reports   would  be  non-interactive. They'll  just  be  static. What  I  want  to  do  is  I  want  to  say, "Please  publish  these  reports, but  make  them  use   the  data  that's  already  up  there." To  do  that,  I  click  on  Data  Options and  I  switch  from  Publish  new  data to  Select  existing  data. Of  course,  the  next  question  is,  "What  data  do  you  want  me  to  use?" I  click  in  there  and  it  makes a  recommendation  to  me. That's  because  the  software  recognizes that  I  just  downloaded  this  from  JMP  Live and  then  made  some  reports  with  it. This  is  probably  the  JMP Live  data   that  I  want  to  use. In  my  case,   that's  a  great  recommendation. That's  exactly  the  data  I  want  to  use. But  if  I  didn't  get  a  good  recommendation, I  can  always  just  start  typing  in  this  field, and  it  will  show  me  data  tables  up  on JMP  Live  that  match  what  I've  typed  in. Once  I  find  the  data  on  JMP  Live   that  I  want  these  new  reports  to  use, I  just  save  that  option  and  click  Publish. We  can  see  here  that  we've  published   to  the  Spelling  Bee folder three  new  reports   and  zero  new  data  tables. Hopefully,  that's  because  these  new  reports are  successfully  using Michael's  existing  data. If  we  need  to  confirm  that  we  can  just click  on  any  one  of  these  reports, it'll  open  that  newly  published report  in  JMP  Live. Go  to  the  Details  pane, scroll  down  to  the  Data  section, and  sure  enough,   one  data  source  is  used  by  this  report, and  it's  the  "nytbee"  table  that  was  published  by  Michael  six  minutes  ago. I'm  going  to  follow  this  link, open  the  data  post. Here,  we  can  just  double- check  that, yes,  it's  not  only  this  report, but  it's  all  three  of  the  reports  I  just  published are  now  successfully associated  with  Michael's  data. Let's  go  back  to  JMP and  again  show  you  the  published  script  that  was  generated from  the  choices  that  we  made. These  are  probably  looking very  familiar  to  you  by  now. In  this  case,  we  have   three  new  pieces  of  JMP  Live  content for  these  three  reports. We  are  publishing  them  to  a  folder. Here  we're  exercising  a  new  option,   Use E xisting  Data that's  already  on  JMP  Live. Since  I  am  using  Michael's  data, I'd  like  to  let  him  know how  useful  it  was  to  me. Back  on  JMP  Live, I'm  going  to  click  on  Comments  here. Just  let  Michael  know  that  I  found  this  to  be  really  interesting  stuff, "I have done some  basic  analysis. Let  me  know  what  you  think." Let's  see  what  Michael  thinks  of  that. Okay.  I'm  back  on  JMP  Live  here, and  I  just  got  a  notification. It  looks  like  Aurora  has left  a  comment  on  my  data. If  I  go  over  here   and  check  out  her  comment, looks  like  she  thinks it's  really  interesting. And  look  at  that. She's  created  some  reports to  go  with  that  data. Let's  jump  over  to  the  folder  and   take  a  look  at  everything  together. I  have  the  Spelling  Bee  folder  here, Aurora's  three  reports. If I  go  to  the  Files  tab, I  can  see  my  data  table  here  as  well. Let's  go  ahead  and  open  this  entire folder  up  in  JMP  and  take  a  look  at  it. Just  like  Aurora  did  with  her  data  table, I  can  open  a  folder   in JMP using  the  Open  in  JMP  button. I'm  going  to  get  a  couple of  warnings  here. This  is  saying  that  it's  going  to  open  JMP. That's  okay. I'm  also  going  to  get  a  warning  that   this  is  downloaded  from  the  Internet. This  is  a  Mac  thing. Let's  go  ahead  and  open. Here  I  have  a  JMP  project. If  you  want  to  learn  a  little  bit  more about  how  JMP  projects  work  with  JMP  Live, you  can  tune  into  Aurora's  other  talk   with  Erin  Anderson  about  JMP  projects. For  now,  I'm  just  going  to  go over  things  really  quickly  here. First,  we  have  a  journal. This  is  a  manifest  of  the  files that  were  included  with  this  project. Of  course,  that's  going  to  be  the  reports that  Aurora  just  talked  about, our  difficulty  metrics,  our  pan grams, and  the  distribution  of  letters. I'm  looking  at  this  distribution  of  letters, and  I  think  there's  an  enhancement  we  can  make  to  this  report. I  think  it'd  be  interesting  to  look  at just  the  vowels  and   just  the  consonants  together to   get  a  better  comparison  of  the  two. To  do  that,  we  need  to  add   a  new  column  to  the  table to  identify  which  is  a  vowel  and  which  is  a  consonant. Let's  go  ahead  and  open the  New  York  Times  table  here. I  can  hit  the  red  triangle  menu  and  add  a  column. Let's  go  ahead  and  name  this  Vowel. Okay,  that  looks  good. And  let's  create  a  formula  here. I  already  know  what I'm  going  to  type  here. I'm  going  to  go  ahead  and  type  this  out, and  then  we  can  talk  about  it. If contains a, e, i, o, u… Okay.  What  this  is  saying  is  if the  letter  is  contained  within  the  set of  vowels  here,  that  means  it's  a  vowel. Otherwise,  it's  just  a  consonant. Hit  OK  here. And  I've  created  my  new  column. This  looks  great. Let's  JMP  back  over to  the  distribution  of  letters and  add  a  local  data  filter. So  I  can  hit  the  red  triangle  menu,   hit Local  Data  Filter. Here's  a  list  of  columns  to  choose  from. I'm  going  to  choose  my  Vowel  column, and  I'm  going  to  go  ahead and  display  this  as  a  list. So  now  if  I  select  consonant, I  can  see  only  the  consonants. If  I  select  vowels,  I  see  the  vowels. This  looks  great. I  want  to  go  ahead  and  update  this  report   on  JMP  Live  with  this  new  enhancement. To  do  that,  I  can  do  a  replace. To  replace  this  report,  I  go  to  File, Publish,  and  Publish  Reports  to  JMP  Live. This  time,  I  have  a  list of  three  different  reports. These  are  the  reports  that  Aurora  created. They're  coming  out  of  the  project. Since  I  had  the  distribution  focused, it's  already  preselected  for  me. And  instead  of  publish  new, I  want  to  replace   an  existing  post  this  time. Let's  hit  Next. When  you  replace  a  report, you  need  to  find  the  report that  you'd  like  to  replace. I  hit  this  drop- down  here  and  I  see that  distribution  of  letters  already. I'm  going  to  go  ahead  and  select  that. This  is  Aurora's  distribution  here. I'm  going  to  go  ahead  and  modify  the  title and  save  with  Local  Data  Filter. I  think  everything  else  looks  good  here, so  I'm  going  to  go  ahead  and  move  on. Next  up  is  the  Match  Data  step. You  have  to  match   the  data   that's  on  the  existing  report  in  JMP  Live to  the  data  you  have  locally. In  this  case,  there's  only  one  table, the  New  York  Times  Bee  table. I  have  three  options  here. I  can  choose  to  publish  new  data, that  would  create   a  new  copy  of  the  data  table for  use  only  with  this  report. I  can  choose  to  use   the  existing  data  on  JMP  Live, or  I  can  choose  to  update  that  data. In  this  case,  I  want  to  update  the  data  since  we  added  a  new  column. When  I  choose  update  here, I  get  a  warning  that  updating will  affect  two  other  reports. That  would  be  the  other  reports  that Aurora  created  that's  using  this  data. In  this  case,  I  know  that's  okay   because  I'm  just  adding  a  new  column  here. That  won't  affect   the  other  visualizations. Everything  else  here  looks  great, so  I'm  going  to  go  ahead  and  hit  Replace. I've  successfully  replaced the  Distribution  of  Letters with  Distribution  of  Letters with  Local  Data  Filter, and  we've  also  updated  the  data  here. I'm  going  to  go  ahead  and  hit the  script  button  here and  close  the  window, and  let's  take  a  look  at  this  script. Just  like  before  we're  creating that  connection,  the  content, grabbing  that  window, and  all  these  options. The  difference  is  here, instead  of  publish,  I'm  going  to  replace. To  replace,  I  pass  in  that  content, I  have  to  specify  which  report I'd  like  to  replace. I  did  that  interactively,  but  this is  the  ID  of  that  report  I  selected. Finally,  I  have  an  option of  what  to  do  with  that  data. I'm  choosing  to  update  that  existing  data. So  I'm  identifying  that  data  by  its  ID, and  I'm  updating  it  with the  New  York  Times  Bee  table  data  here. Let's  see  what  Aurora  thinks of  my  new  update  to  her  report. Thank  you,  Michael. All  right.  I've  seen  that  Michael has  made  an  improvement  to  this  report. He's  put  a  local  data  filter  on  here. It  looks  like  if  I  select   both  vowels  and  consonants, then  basically  this  is  the  same   report  that  I  published  initially. So  I  haven't  really  lost  anything. I've  only  gained  in  interactivity. I  think  this  is  going  to  be  great   for  me,  for  Michael, and  for  anybody  else   who  might  look  at  this  report, including  people  who  don't  even  use  JMP. They  can  still use  this  interactive  feature. So  I  think  that's  fantastic. I  love  this  improvement, and  I'm  going  to  comment  and   let  Michael  know  that  I  love  it. "Love  the  new  data  filter.  Thanks." But  why  is  it  that  Michael  was  allowed to  improve  and  replace  my  report? Should  I  be  worried  about  that? No,  it's  simply  because  I  published  this  report  to  a  collaboration  space, Discovery  Americas  2022,   where  Michael i s  a  key  contributor. To  show  you  more  about  what  I  mean, I'm  going  to  switch  over  to  a  browser where  I'm  logged  in  as  an  administrator. As  an  administrator,  I  have  access to  this  Permissions  tab  in  the  space. In  this  Permissions  tab, I  can  turn  on  and  off   collaboration  permissions for  different  individual  users   and  for  groups  of  users. So y ou  can  see  here  that  the  administrator has  given  Michael   replace  permission  on  this  space. When  your  JMP  Live  admin  creates  a  new  space  for  your  organization, they're  the  only  ones   with  access  to  that  space, then  they  have  to  turn  on  collaboration  permissions  for  other  people. So  if  one  of  your  co  workers  has collaboration  permission  on  a  space, it's  because  an  admin  trusts  them to  use  that  responsibility  wisely. While  I've  been   going  on  about  collaboration, I  think  that  Michael  has  been  looking  at  all  of  the  pieces  of  JSL that  we've  been  generating throughout  this  demonstration. Let's  see  where  he's  at  with  that. Yes,  that's  right. Let's  go  ahead  and  look  at  JMP  here. Let me get  some  stuff  out  of  the  way, and  open  up  this  script   that  I've  been  working  on. I've  taken  all  of  our  scripts  that  were generated  by  our  interactive  publishes, and  I've  put  them  together   into  this  one  script to  recreate  our  entire  demo  in  one  go. Let's  go  ahead  and  take  a  look at  what  this  script  is  doing. First,  we're  creating   a  new  JMP  Live  connection  here to  our  instance  that  we're  publishing  to, followed  by  saving  off   some  variables  to  use  later. I  have  the  space  key  here  for   the  Discovery  Americas  2022  space. We're  going  to  use  that  for  our  publishes, and  I  have  a  path to  all  of  our  data  here  as  well. First,  I  need  to  recreate  that  folder structure  that  we  created  interactively. To  do  that,  I'm  going  to  use some  of  the  JMP  Live  JSL  here, Create  Folder. The  first  folder  I'm  going  to  create  is  that  container  folder, our  Deep  Dive:  Publish, but  this  is  our  script  version. I'm  publishing  this   to  our  space  using  the  space  key. When  I  create  that  folder, I'm  going  to  go  ahead  and  save  off that  ID  of  the  folder  to  use  later. Next,  I'm  going  to  create   the  three  subfolders: the  College  Finances  folder, the  Software  Insights  folder, and  the  Spelling  Bee  Puzzle  folder. I'm  going  to  use the  folder  ID  to  create  them. I  pass  the  folder  ID  in  here, and  that  makes  this  folder  here a  child  of  the  parent  folder  there. So  I've  got  those  folders  created, and  of  course,  I'm  also   saving  off  these  IDs  as  well to  use  later  in  my  Publish  operations. Next,  I'm  opening  up   data  tables  and  running  reports. I've  opened  up   the  College  Finances  folder  here. I'm  running  that  table  script  to  create  a  report. And  here's  that  first  section of  interactive  Publish  code  here. I'm  creating  that  new   JMP  Live  content  like  before, taking  all  the  defaults   and  calling  Publish. The  only  edit  I've  made  here   is  I've  substituted  out  the  ID with  the  ID  of  the  folder   I  just  created  above. Next,  I'm  going  to  create   the  new  conference  alignments  table, run  that  script,   and  then  create  that  content. Do  the  same  thing  I  did  before to  that  college  folder  ID. Next,  I'm  going  to  get  Aurora's word  cloud  data  table  and  reports. I'm  opening  that  table, writing  the  scripts, and  just  like  Aurora  showed, we're  creating  that  new  JMP  Live  content, and  we're  not  publishing  the  data  here so  publish  data  is  zero. A gain,  we're  publishing, and  I'm  passing  in   the  word  cloud  folder I D  here. We're  putting  that  in  our new  folder  we  created. Then  I'm  going  to  create that  New  York  Times  Bee  data  table and  three  more  reports,   get  those  opened  up. Then  we're  going  to  create   that  new  JMP  Live  content. Finally,  publish  all  that  up   to  JMP  Live  to  the  bee  folder  ID. If  you  remember,  I  did  some  edits to  that  report  after  we  published  it. I'm  going  to  save  off the  published  results  here. This  is  just  a  list  of  the created  reports  and  data. I  want  to  iterate   through  this  list  of  posts to  identify  the  report  called Distribution  of Letters  and  also  the  data. Then  I'm  going  to  save  off  these  IDs. I  have  my  report  ID  and  my  data  ID. I'm  saving  these  off  so  I  can replace  that  post  in  a  second. Next,  we're  going  to  create   that  new Vowel  column  that  I  created and  also  add  that  Local  Data  Filter. Then  finally,  we're  doing that  last  Replace  operation. We're  creating  that  new  JMP  Live  content, and  then  we're  calling  Replace. We're  passing  the  content  into  Replace. We're  identifying  the  report   we'd  like  to  replace  with  that  report  ID. Then  we're  identifying  the  data   we  want  to  update  with  that  data  ID  here. Then  finally,  we're  closing   everything  to  clean  it  up and  opening  up  the  web  browser  to  the  folder  here. Let's  go  ahead  and  run  this  script. We're  getting  started  here. Michael,  I  noticed  that  a  semicolon is  selected  with  your  cursor. -Is  it  possible  that  it's  running? -Yes. Thank  you. There  we  go. Now  we  have  our  report  opening,   our  second  report, Aurora's word  clouds, the  New  York  Times  Bee  table, and  the  script  here, and  there's  our  folder. We  created  that  folder  on  JMP  Live. Here's  the  script  version, our  Spelling  Bee  Puzzle  report, Software  Development  Insights with  the  word  clouds, and  our C ollege  Finance  reports. With  that,  that  about  wraps  up  our  demo for  publishing  to  JMP  Live  in  JMP  17. Be  sure  to  check  out   our  other  talks  on  JMP  Live. We  have  a  general  updates  talk, a  talk  on  using  projects  with  JMP  Live and  a  talk  on  our  biggest  feature in  JMP  Live  17,  refreshable  data. Thanks  again  for  joining  us  today.
JMP Live is a secure collaboration platform for sharing JMP insights with your colleagues, even if they are not JMP users. JMP Projects are self-contained files that help you organize JMP data tables, reports, scripts and more.  This presentation walks through some strategies for staying organized in JMP Live, and in JMP Projects, and for moving smoothly between the two.     Hi,  I'm  Aurora  Tiffany- Davis, and  I'm  joined  today  by  Aaron  Andersen. We're  software  developers on  the  JMP  Live  team. We'd  like  to  talk  to  you  today about  staying  organized on  JMP  Live  and  in   JMP Projects. As  a  reminder, JMP  Live  is  a  secure  platform for  sharing  your  JMP  insights with  your  colleagues, even  if  they  don't  use  JMP  themselves. It  also  offers  deeper  collaboration with  your  colleagues  who  do  use  JMP. JMP Projects  are  self- contained  files which  can  help  you  to  organize your  data  tables,  your  reports, your  scripts,  and  more. To  get  started,  Aaron  is  going  to  talk a  little  bit  more  about those   JMP Projects. Aaron? Thanks,  Aurora. I  am  JMP  Developer  Aaron  Andersen, and  I'm  going  to  show  you  how  to  organize your  work  using   JMP Projects. To  do  this,  I'm  going  to  use  some  data from  the  JMP  sample  data  directory. If  you  have  JMP  open while  you  watch this video, you  can  follow  along  with  us. Seventeen  samples,  data. The data I'm  going  to  use   is called  Airline Delays .jmp. I'd  like  to  do  some  analysis  of  this, and hopefully,  get  some  insights. Because  I  know  that  I'm  going to be  producing  several  reports, and  I'm  not  sure  what  else with  this project, I  would  like  to  keep  all  of  those  things organized  and  together in JMP. To  do  that,  I'm  going to  use  a   JMP Project. I  will  go to  File,  New, P roject, which  creates  a  new  project and  opens  the   JMP Project  window. JMP Project  window  is  a  container  window into  which  all  of  the  data  tables and  reports that  I'm  going  to  create  or open will  live  throughout  this  project. Let's  drag  Airline  Delays  in. We  can  see  this  JMP  data  table   opened here  in  the  project  window. Let  me  make  this  bigger. The  Airline  Delays  data  table  contains information from  almost  30,000  airline flights that  took  place  in  the  United  States over  the  course  of  a  year. For  each  such  flight, we  have  information  about how  long  the  flight  was, whether  the  flight  arrived  on  time  or  not and  by  how  much, and  what  airline  flew  that  flight. To  get  a  better  visual  picture   of this information, let's  open   Graph Builder. Let's  start  by  getting  an  overview of  what  a  typical  week  looks  like. Typically,  I  want  to  know, is  there  a  day  of  the  week that  is  more  or  less  likely   to have its flight  delayed  than  others? Now,  of  course,  all  I'm  really  learning from  this  is,  was  there  a  day  of  the  week in  the  particular  year this  data  was  taken? But  I  can  reasonably  extrapolate some  of  this  information to  airline  flights  today. We'll  start  with  Day  of  the  Week, put  that  in  the  Y  column, and  Arrival  Delay  in  the  X  column. That's Arrival Delay. It's not... Let's  switch  these  around,  Order  by Swap. It  isn't  liking  the  day  for  some  reason. Move Day of the Week down to here. Put  Arrival  Delay  on  the  Y  axis. There  we  go. Now,  I  have  pretty  good  graph  showing  me the  mean  arrival  delay for  any  given  day  of  the  week. I  can  already  see  that  Friday is  the  biggest  day  most  likely or  statistically  expected to  have  the  longest  delays and  Saturday  is  the  shortest. To  get  a  little  bit  better  view  of  this, let's  group  this  by  airline. Drag  airline  to  Group  Y. Then  let's  flip  these  back like  I  wanted  to  do  the  first  time when  I  couldn't  quite  get  it  right. There  we  go. Finally,  to  help  see the  days  of  the  week  better, we'll  drag  Day  of  the  Week into  the  Color  column. Then  I'm  going  to  change  the  color scheme  on  the  Day  of  the  Week. Double  click  on  this  label  here, hit  color  scheme, and  get  a  color  scheme that's  not  quite  so  bold for  this  particular  graph. Now, I  think  I'm  finished. What  I  have  is  a  graph  showing for  each  airline and  each  day  of  the  week, what  the  mean  arrival  delay was  for  the  year. The  colors  allow  me  to  follow a  particular  day from  one  airline  to  the  next. The  first  thing  I  noticed  in  this  graph, which  is   funny, is  that  there's  only  one  of  these that's  negative. If  I  flew  Southwest  on  a  Saturday, my  expected  delay  would  be  negative, which  is  to  say I  respect  to  arrive on time, whereas  every  other  row in  the  whole  graph  is  positive. On  average, the  flights  were  late  every  other  day of  the  week  for  every  other  airline. That's  not  what  you  want   if you're  ringing  an  airline. But  at  least  they're  not  too  bad, 15 minutes, 10, 15  minutes  appears  to  be  typical for  the  average  anyway. To  try  to  get  a  better  picture of  this data, let's  create  one  more  graph. Open   Graph Builder  a  second  time, and  this  time, let's  try  to  get  an  overview of  an  entire  year's  worth of  airline flights to  see  if  there  are  clusters of  higher  and  lower  delays throughout  the  year. To  do  that,  I'm  going  to  drag  Month to  the  Y  column and  Day  of  Month  to  the  X  axis. Graph Builder  will  automatically create  a  heat  map  for  me. Then  I'm  going  to  make  sure that  Arrival  Delay  is  the  color  source. Finally,  let's  go  into  the  Y  axis, and  reverse  the  order so  that  January' s  at  the  top and  December's  at  the  bottom. Now  I  have  a  graph  showing   an entire year's  worth  of  airline  flights. I  can  already  see where  the  dark  red  is. There  are  certain  clusters  of  delays. There's  a  cluster  here   right around the  Christmas  holidays in  the  United  States  that  drops  off once  the  holidays  actually  start. There's  an  oddly  delay  filled  day  here right  in  the  middle  of  November, and  there's  a  lot  more in  the  summer  months than  the  winter  months. I  can  speculate  that maybe  these  delays are  correlating  with  flight  volume, but  the  more  people  who  fly,   the more likely  a  flight  is  to  be  delayed. Because  airports  would  be  busier, loading and unloading  a  plane takes long if there  are  more  people  on  it. It's  a  pretty  good  hypothesis. I  don't  have  that  data in  this  table,  though, so  I  can't  confirm  it  yet. But  I  have  a  pretty  good  start. If  I  want  to  see  the  two  graphs that  I  made  side  by  side in  the  product  window, I  just  go  up  to  Airline  Delays, and  I  drag  it  out,  and  I  drop  it in  this  dock  right,  drop  down. Now  I  have  my  two  graphs  side- by- side, so  I  can  see  them  both  simultaneously. If  I  wanted  to, I  can  actually  take  the  data  table, I  can  drag  that  down  to  the  bottom, so  that  I  can  see  all  three  graphs that  is to  say  all  three  items, two reports, and  the  graph  at  the  same  time. This  is  particularly  useful  if  I  want  to modify  this  data  table, and  watch  the  graphs  update  as  I  do. But  before  I  do  that, let's  save  this  project. I've  made  a  lot  of  progress  here. I  like  to  save  my  work so  that  I  don't lose it if  something  goes  wrong or  I  mess  something  up. Let's  go  to  File,  Save  Project  As, put it on  the  Desktop and  call  it  Airlines.jmpprj which  I  pronounce   JMP Project. You can imagine  not  any  vowels. JMP Project. That  will  save  the  project  file here  on  my  Desktop, and  I  can  now  close  it. A ll  my  reports  that  I  created and  the  layout  that  I  use are  saved  in  that  file. If  I  reopen  that  file, everything  comes  back right  the  way  I  left it, which  is  the  second  useful  feature of   JMP Projects. Not  only  can  you  organize  your  data and  your  reports  in  the  project  window in  a  very  convenient  way, however  you  want, you  can  also  save  the  project at  any point, close  it,  and  resume where  you  left  off  later. In  fact,  you  can  open more  than  one  project  file at  the  same  time if  you  want  to  work on  more  than  one  JMP analysis or  more  than  one  project  any  given  day. Now  that  I  have  this  project  back  open, I'm  looking  at  the  Distance and  the  Elapsed  Time  columns  here, and  I  can  see  that there  is  some  huge  variation in  the  length  of  these  flights. This  flight  is  327  minutes  long. That's  five  and  a  half  hours of flight,   which  makes  sense;  it's  2,200  miles. Whereas  this  flight is  a  little  bit  less  than  an  hour. Some  of  them,  if  I  keep  scrolling, aren't  significantly  less  than  that. Let's  say  that  I'd  like  to  exclude   shorter flights  from  my  analysis, under  the  idea  that I   only  want  to  look at  substantial  flights. Maybe  if  your  flight is  only  half  an  hour long then  small  delays and  getting  a  runway  position change  things  more  than  in  a  large  flight, where  you  have  a  chance  to  make  up  time. What  I'd  like  to  do  is  exclude from  these reports all  flights  that  were  less  than say  an  hour  and  a  half  worth  of  length. To  do  that,  I  will  go  to  Rows, Row  Selection, Select  Where. I'm  going  to  select  Distance and  set  distance  is... actually,  Elapsed  Time. You  can  do  with  mile, let's  do  it  with  minutes. Any  flight  where  e lapsed  time is  less  than  90  minutes, I  am  going  to  select  this in  the  data  table. I  can  now  see  that there  were  9,338  such  flights out  of  29,000  total  flights, so  a significant  number  of  them. To  exclude  them  from  the  analysis, I  can  go  up  here  to  Rows, select H ide  and  Exclude, and  all  of  these  are  now  hidden. You  can  see  that  the  data changed  a  little  bit. It  didn't  change  a  lot,  but  it  did  change. There  is  a  difference  in  longer  flights versus  shorter  flights in  what  the  mean  delays  turn  out  to  be. Having  done  that, I'll  save  the  project again so  that  I  can  save  my  progress and  come  back  to  this  point  later. Before  I  do  that,  notice  that  I  modified the Airline delay data table to  hide  and  exclude  all  of  these  rows. I  would  like  when  I  resume  this  work for  those  modifications   to restore  with  the  project. But  what  I  don't  want  to  do is  overwrite the copy that  is  in  my  sample  data  folder because  I  would  keep  these pristine  and  fresh the  way  they  sit  with  JMP  for  future  use. What  I'm  going  to  do  is  I'm  going to  save  a  copy  of  this  data  table. I'll  Save  As. But  because  I'm  in  a  project, I  have  the  option  to  save  it to  a  place  called  the  Project  Contents, which  is  about  what  it  sounds  like. It  is  how  I  can  save  this  table to be contained inside  the  project  file  itself. The  project  is  essentially a  miniature  file  system that  can  contain  files  and  folders relevant  to  your  JMP  analysis that  live  inside  the  project  file. If  I  hit  Save  here, we  can  now  see  that  Airline  Delays, a  copy  of  it is  saved  inside  of  this  project. If  I  go  back  to  my  Desktop... I  got  to  save  the  project  first, save the  project, then  go  back  to  my  Desktop. We  can  see  that  when  I  save  the  project, the  size  gets  quite  a  bit  larger   because now  this  file  itself  contains, not  just  two  reports, but  also  the  data  table that  I  use  to  generate  those  reports. Because  this  is  a  self- contained  file, I  can  do  things  like  copy  and  paste to  create  a  backup  copy  of  the  file. Now  my  backup  copy  also  contains its  own  copy  of  airlinedelays.jmp safely  secure  here in  case  I  mess  up  the  other copy  in  my  main  project. Because  this  is  a  single  file, it's  easy  for  me  to  email  this  file to  one  of  my  colleagues, if  they  also  are  a  JMP  user, and  allow  them  to  open  this  project and see the  results of  the  work  that  I  did. However,  if  I  want  an  easy  way to  share  this project with  non- JMP  users, if  I  want  an  easy  way  for  me and  my  colleagues  to  collaborate on  this  work  together, I  can  upload  these  reports to  my  organization's  JMP  Live  Instance, where  my  colleagues  can  see  them. To  do  that, and I  put  these  back  into  Tabs  first, to  publish  these  reports  to  JMP  Live, I'm  going  to  go  File,  Publish, Publish  Reports  to  JMP  Live. This  loads  the  JMP  Live  Publish page  from  my  organization's J MP  Live  Instance. I  want  to  publish  both  of  these  reports, and  I  want  to  publish  them  to  a  space called  Discovery  Americas  2022, and  a  folder called  Staying  Organized  on  JMP  Live and  in  JMP Projects, title  of  this  presentation, where  we'll  explain  shortly what  a  space  is and  how  full  of  JMP  Live  work. But  for  now,  this  is  where I  want  to  put  this  stuff. Let's  go  ahead  and  hit  Next. This last  string  gives  me  a  chance to  customize  the  titles  of  these  reports. These  are  generic. Let's  rewrite  this  to  be Airline  Delays  by  Weekday  and  Airline, or  say,  Day  of  Week, to be  less  ambiguous. Down  here,  let's  call  this  one Airline  Delays  by  Month  and  Day  of  Month. Now  I  have  two  reports  ready  to  go. I  hit  Publish. JMP  is  going  to  upload  these  reports and  the  data  that  I  use  to  create  them to  our  JMP  Live  Instance. Now  we  see  Success  page. It's  already  finished. Showing  me  that  I  published  two  reports and  one  data  table to  a  folder  called  Staying  Organized on  JMP  Live  and  in   JMP Projects. I  can  click  on  this  link to  actually  load  it  in  JMP  Live and  see  that  it  is  there, largely  the  same  as  it  was  on  my  system. To  show  off  JMP  Live  and  demonstrate the  value  and  able  to  collaborate  and  work with  reports  in  this  way, and  pass  it  to  Aurora. Yeah. Thank  you,  Aaron. All  right,  so  I'm  browsing  around on  the  homepage of  our  organization's  JMP  Live  site, and  I  see  that  Aaron  has  published some  new  reports that  look  pretty  interesting, having  to  do  with  airline  delays. I  see  that  he  put  both  of  these in  the  same  folder. Let's  take  a  look  at  that  folder. One  of  the  easiest  ways  to  stay organized  when  you're  working  with  JMP  Live is  whenever  you're  publishing  the  reports,  just put them  somewhere  reasonable. Easy  enough. And  Aaron  has  done  that  here. He's  put  his  reports  into  a  folder called  Staying  Organized  on  JMP  Live. Of  course,  that's  the  title of  this  talk  that  we're  giving. But  if  you  recall, even  before  he  chose  a  folder, he  was  asked  to  choose  a  space to  publish his content to, and  he  chose the  Discovery  Americas  2022  space. This  is  a  place  for  Aaron  and  I  and  a  few of  our  other  colleagues  in  JMP  Live to  work  on  content  related to  this  Discovery  conference. It  contains  interesting  reports not only  in  the  Staying  Organized on  JMP  Live  talk, but  also  we've  got  another  talk in  this  conference that  takes  a  deep  dive into  publishing, another  one  about  automatically refreshing  your  data,  and  so  on. It  makes  sense  that  we  would all be  working  in  the  same  space. But  what  is  a  space? Well,  I  like  to really  call  them collaboration spaces, because  that's  really  what  they  are. They're  just  a  place for  multiple  JMP  users to  work  together  on  the  same  content. To  show  you  more  about  what  I  mean, I  will  switch  over  to  a  browser, where I'm logged  in  as  an  administrator. As  an  admin,  I  have  access to  this  Permissions  tab. When  I  click  on  this  tab, I  can  easily  turn  on  and  off collaboration permissions for  individual  users and  for  groups  of  users. We  can  see  here  that  in  this  space, all  of  the  users  in  my  organization have  permission  to  view  the  content in  this  space  and  to  download  it. But  Aaron  and  I, we  have  some  extra  permissions, so  we  have  the  permission  to  create new  content  in  the  space, in  other  words,  to  publish, like  Aaron  just  did  a  moment  ago. We  also  have  permission to  edit  content  and  so  on. We  are  fairly  well  trusted  members of  the  space. Let  me  switch  back to  my  normal  browser  now. Of  course,  Discovery  Americas  2022 isn't  the  only  space that  my  organization has set up, and  I'd  like  to  show  you how  to  find  additional  spaces. But  before  I  do,  I  know  I'm  going  to want to  find  this  folder  again, so  I'm  going  to  bookmark  it to  make  it  really  easy for  myself  later  on. Now  if  I  go  up  to  this  blue  navigation  bar and  click  on  the  word  Spaces, it  opens  up  the  Space  Directory, and  we  can  see  here  that  I  have access  to  some  other  spaces  as  well. Discovery  Americas  2022   is the  one  we  are  just  looking  at. We  also  have  one for  Discovery  Europe 2023, a  conference  coming  up  in  the  spring. I  see  that  there's  also  a  space  here with  my  name  on  it. That  is  my  own  personal  space. In  JMP  Live  version  17, every  user  gets  their  own  personal  space to  do  with  whatever  they  want. There's  also  a  shortcut to  your  personal  space. If  you  go  all  the  way  to  the  top and  all the way to the right and  click  on  your  profile  picture, you'll  see  this  shortcut My  Personal  Space, My space  doesn't  really   have that much in  it, but  what  it  does  have is  this P ermissions  tab, even  though  I'm  not  an  admin. The  reason  being this  is  my  own  personal space, so  I  should  get  a  say   on who has access  to  it. Of  course,  by  default, I'm  the  only  one  with  access  to  it, but  I  can  invite  more  people  in  if  I  want. I  have  chosen  to  let  Michael  Goff  in to  see  the  content  in  my  space, although,  I  don't  really let  him  do  much  else. Now  that  we've  had that  brief  tour of spaces, let's  go  back  to  the  folder we  were  working  in. I'm  going  to  use  the  bookmark  I  made to  get  there  quickly. All  right,  here  we  are. I  can  see  these reports  that  Aaron  has  published, but  I'm  thinking  ahead, and  I  think  we're  going  to  want   a lot more content  in  here  in  the  future, maybe  some  content  that  doesn't  have anything  to  do  with  airlines. To  stay  organized, I'm  going  to  create  a  new  folder by  going  up  here and  finding  the  New  Folder  icon, click  that,  and  let's  say  airlines. Now  that  I  think  about  it, I  actually  have  some  airlines, at  least  one  airline  report that  I  want  to  publish  as  well. But  whereas  Aaron's  reports are  entirely  related  to  airline  delays, my  report  has  nothing  to  do  with  that. It's  more  to  do  with  the  flow  of  traffic of  airplanes  over  the  continental  US. I'm  going  to  add  another  layer of  organization  in  here  under  Airlines. I'm  going  to  create  a  folder called  Delays  for  Aaron's  stuff and  a  folder  called  Traffic  Flow for  my  stuff. Now,  I  just  want  to  move Aaron's  content  into  the  right  place. The  easiest  way  for  me  to  do  that   is to  click  over  to  the  Files  tab, and  I  will  select  all  of  Aaron's  files, that  being  the  two  reports that  he published and  the  data that  those  reports  rely  upon. I'll  come  over  here  to  the  upper  right and  select  Move  Posts, and  I'll  find  that  Delays  folder I  just  created  a  second  ago, and  move  all  of  Aaron's  content  in  there. Now  we've  got  Airlines  with  two  folders: Delays  that's  got  Aaron's  stuff, and  Traffic  Flow  that's  got  nothing  in  it, because  I'm  just  about to  publish  something  to  it  right  now. Let  me  switch  over  to  JMP  on  my  machine. I  have  here  a  bubble  plot with  a  local  data  filter. This  shows  the  flow  of  flights that  are  taking  place over  the  continental  US. It  also  has  a  local  data  filter. I  can  filter  this to  just  show  certain  airlines. I've  chosen  Delta  and  Southwest. We  can  see  here  that  Delta has  a  hub  in  Atlanta,  Georgia, and  we  can  see,  rather  unsurprisingly, that  Southwest  Airlines concentrates   its flight  patterns in  the  Southwest  region of  the  United  States. Let's  publish  this  to  JMP  live. File. It  works  just  the  same  as  when  Aaron was  publishing  from  his  project, even  though  I'm  publishing outside  of  a  project. File, P ublish, Publish  Reports  to  JMP  Live. The  first  thing  you  do  is  choose   among those  reports  that  you  have  open which  ones  do  you  want  to  publish. It's  a  really  easy  decision  for  me   because I  only  have  one  report  open. Next. Now  I  need  to  choose,  of  course, where  to  put  it. I'm  going  to  stay in  the  Discovery   Americas  2022  space. I'm  going  to  stay in  the  Staying  Organized  folder. But  under  that, I  want  to  drill  down  a  little  bit, go  inside  Airlines and  inside T raffic  Flow, and  that's  where  I  want   my reports to  be  published. I'll  click  Next. Just  publish  that. We  can  see  here  on  the  results  screen that  we  have  published to  the  Traffic  Flow  folder one  new  report  as  well  as  the  data that  the  report  relies  upon. It's  this  data  that  allows  the  report to  remain interactive once  it  goes  on  JMP  Live. Let  me  follow  the  link  here, and  this  will  open  up my  organization's  JMP  Live  site and  take  me  right to  this  newly  published report and  we  can  see that  it  is  still  interactive. I  can  speed  it  up, slow  it  down, maybe  I  want  to  find  out what's  going  on  with  Express  Jet, a  much  smaller  airline. You  can  see the  interactivity  is  still  here I  want  to  let  Aaron  know  that  I've  done a  little  bit  of  reorganization so  that  he  can  see  what  he  thinks  of  it. I'm  going  to  move  back  up our  folder  hierarchy  a  little  bit. My  report  is  in  the folder  Traffic  Flow, of  course,  so  I'll  move  up  there, then  I'll  move  up  one  more to  this  Airlines. I  want  to  let  Aaron  know  what's  going  on . Let  me  actually  make  a  comment   on one of  his  reports. That's  going  to  make  sure that he gets a notification about it. Just  open  one  of  his  reports and  click  on  Comments  here, and  I'll  just  let  him  know. "Aaron,  I  did  a  bit  of  reorganization. Let  me  know  what  you  think." Let's  see  what  Aaron  thinks  about it. Thanks, A urora. If  I want  my  JMP Live Instance, I'm  going  to  see a  pop  up  here in  the  upper r ight- hand  corner just  to  say  I'm  logged  in  on  my  computer  to the same  JMP  Live  Instance, a  little  alert. When  I  click  on  this, I  can  see  that  Aurora  Tiffany- Davis added  a  new  comment to  a  report  that  I  uploaded. I  can  click  here  to  go  to  the  report, and  then  view  the  comment that  Aurora  made. "I  did  a  bit  of  reorganization. Let  me  know  what  you  think." I'm  just  going  to  say, "This is great, thanks." I  appreciate  her  helping  me  out  with  this. I  can  now  go  in  and  take  a  look   at the report  that  she  added I  said  it  was  great  before  I  saw  it because  we're  recording  a  video, I  got  a  sneak  preview. I  suppose  in  real  life, I  want  to  see  it  first, so  I  can  know  if  I'm  saying this  is  great  or  this  is  crap, depending  on  what  I  think of   Aurora's  work, but it is great. It's  uploaded. It's  airline  flights going  across  the  country . Depending  on  whether  the  data  table that  she  used  for  this  has  dates  in  it, I  might  actually  be  able  to  use  it to  answer  the  question  I  had  earlier, which  is,  are  the  delays correlated  with  volume? Even  if  it  doesn't, it's  data  that  I  would  like  to  add to  the  project  that  I  created with  the  airline  delays. What  I'd  like  to  do  then is  create  a  new project that  contains  the  delay  reports that I made, plus  the  traffic  flow  reports that  Aurora  made. On  JMP  Live, I  can  do  this  automatically. I  just  go  to  the  Airlines  folder. I  go  up  here  to  the  Menu  bar. I  hit  Download  as   JMP Project, and  JMP  Live is  going  to  create  a  project  for  me with  this  information. Let's put  that  on  the  Desktop, and  let's  call  it  Airlines  Updated. When  I  open  this  project  in  JMP, I'm  going  to  see  this  file, the  project  manifest that JMP  Live  has had  to  tell  me  everything I put  in  the  project, and  in  the  case  that it  went  wrong, what  it  couldn't  put  in  the  project that's  empty  today,  which  means  everything that  should  have  been  there  was. I  can  see  the  list  of  reports  included. This  is  one  that  I  made. If  I  click  on  that, it  will  open  the  project. I  can  also  get  them  down  here because  all of these reports and the data is  saved  inside  the  Project  Contents. In  fact,  it's saved  in  the  exact same  folder structure that  Aurora  organized  it  into on  JMP  Live, which  is  useful  for  me because   now it's  in  two  neat little  subfolders. I  can  open  the  air  traffic  report that  she  made  and  return  to  here. I  can  swap  out  the  airlines  interactive, just  like  it  was  before. I  can  add  all  of  this  stuff to  the  work  that  I  did. I  have  essentially  round  trips  to  Data. It  started  on  my  machine   when I  made  my  first  two  reports. I  upload  it  to  my  organization's JMP  Live i nstance. A colleague, Aurora, was  able  to  see  the  report that  I created, add  to  them  herself, reorganize  the  structure of  my end  or  hers, and  then  I  was  able  to,  in  one  step, download  the  resulting  folder as  a  JMP  project that  I  can  then  continue  to  work  with, analyze,  explore,  and  discover. Pass  it  back  to  Aurora  to  finish  up. Yeah. Thank  you,  Aaron. I  hope  that  the  features that  we  showed  you  today can  help  you  to  stay  organized. Mostly,  I  hope  that  you and  your  colleagues are  creating  so  much  content  in JMP that  staying  organized becomes  absolutely  crucial  for  you. There  are  actually  several  other  JMP  Live focused  talks  during  this  conference, so  if  you're  interested  in  JMP  Live, we  encourage  you  to  check  those  out. Either  way,  thank  you  so  much for  joining  us  today, and  we  hope  you  have  a  fantastic rest  of  your  conference. Bye  now.
This presentation explores the use of JMP Pro to analyze gene expression of single-cell RNA sequencing data from murine melanoma samples. Single-cell RNA sequencing is a next-generation sequencing technology that reveals the heterogeneity between individual cells and permits comparison of the transcriptores of different cell types. While there are many statistical tools to analyze scRNA-seq data, JMP provides many streamlined methods for initial analysis and visualization of data. The objectives for this study were to determine cell subtypes from a sample of T cells extracted from cancer-infiltrated lymph nodes and adjacent tissue and find differentially expressed genes. PCA and clustering methods were used for preliminary exploration of cell heterogeneity. Predictive modeling was used to determine if cells can be accurately classified as cancerous by gene expression profiles. Next, T cells taken from different time points were analyzed to study the trajectory of gene expression associated with functional changes in these cells. Significant gene signatures were explored with pathway enrichment analysis and other downstream tools.       Hello,  everyone. Today  I'll  be  presenting about  the  application  of   JMP to  analyze  gene  expression single- cell  RNA  data  in  melanoma  cells. My  name  is  Catherine  Zhou, and  I'm  a  high  school  student at  Lynbrook  High  School. As  introduction, melanoma  develops  in the  cells or  melanocytes  that  produce  melanin and  is  the  most  aggressive  type of  skin  cancer, representing  65%  of  all  deaths from  skin  cancer. T  cells  are  a  type  of  white  blood  cells that  develop  from  stem  cells in  the  bone  marrow and  mature  in  the  thymus, where  they  multiply  and  differentiate  into helper,  regulatory,  or  cytotoxic  T  cells or  become  memory  T  cells. Cytotoxic  T  cells, which  are  activated  by  various  cytokines, bind  to  and  kill infected  cells  and  cancer  cells. In  melanoma  patients,  T  cells  mount an  immune  response  against  the  tumor, but  at  some  point, the  responder  T   cells   become  ineffective due  to  a  local  immunosuppressive  process occurring  at  the  tumor  sites. We'd  like  to  identify  the  causes behind  T  cell  dysfunction. Alternative  splicing is  a  regulatory  process essential  to  generate transcriptome  diversity. Misregulation  contributes to  disease  and  cancer. Eukaryotic  genes  are  composed of  int ronic  and  exonic  sequences, as  you  can  see  in  this  diagram. In  the  alternative  splicing  process, non-coding  introns  of  a  gene are  selectively  removed and  the  included  exons are  combined  in  the  final  messenger  RNA, and translation  of  these  different isoforms,  where  different  combinations result  in  different  proteins and  different  cellular  functions. Next,  traditional  bulk  sequencing examines  the  genome  of  a  cell  population such  as  the  cell  culture  or  tissue, and  its  output  is  the  average  gene expression  of  the  cell  population. On  the  other  hand,  single  cell  sequencing measures  the  genomes of  individual  cells  from  the  population. Single-cell  RNA  sequencing, or  scRNA-seq, measures  the  transcriptomes of  each  cell  in  the  sample, which  reveals  the  heterogeneity of  thousands  of  cells and  provides  insight  into cellular  differences  at  high  resolution. In  this  study,  I'll  be  analyzing alternative  splicing in  scRNA-seq data, which  has  rarely  been  studied due  to  several  issues. ScRNA-seq  results  in  a  very  large number  of  cells  and  sparse  data. However,  with  the  proper  statistical  tools like  JMP  and  R, we  can  take  steps  to  solving  these  issues. In  this  presentation, I'll  be  going  over  my  method  to  find the  most  significant  alternative  spicing or  AS  events  differentiating  cancerous T  cells  from  healthy  lymph  node  T cells. I'll  explain  my  data  processing  pipeline, and  then  I'll  go  over  how  to  use  JMP  Pro for  predictive  modeling, clustering,  and  visualization. And  then  I'll  go  over  the  results of  the  analysis  of   scRNA-seq   dataset of  T cells  in murine  melanoma. This  is  a  diagram  of  the  presentation. First,  I'd  like  to  explain  my   dataset preparation  and  processing  process. First,  I  prepared and  processed  the   dataset. I  took  the   scRNA-seq  dataset  of  cells in  lymph nodes  with  melanoma  in  mice from  this  paper  over  here, and  then  performed  read  alignment of  the  F ASTQ  files  with  STAR, and  detected  AS  events with  the  pipeline  derived  from  rMATS, which  uses  a  generalized linear  mixed  model. Instead  of  using  the  pairwise  comparison, I  ran  it  on  a  single  sample with  each  individual  cell, effectively  quantifying  AS  in  each  cell. Then  I  used  R  to  quantify exon  skipping  events  with  IJC, which  stands  for  Inclusion  Junction  Count, and  SJC,  which  stands for  Skipping  Junction  Count. As  you  can  see  below  in  this  diagram, an  inclusion  junction  is  detected when  the  read  includes both  the  flanking  sequence, which  is  this  black  sequence  over  here, and  the  exon  sequence, which  is  this  E  sequence  over  here, while  a  skipping  junction  count is  detected when  the  read  only  includes the  two  flanking  sequences and  just  entirely  skips  the  exon. Then  I  create  a  matrix  of  this  data. Next,  I  subsetted  the  T  cells using  cell  labels  previously  defined by  the  authors  of  the   dataset. To  filter  AS  events, I  applied  the  following  rules, and  I  did  this  in  R. There  must  be  at  least ten  reads  per  junction. The  event  has  to  be  detected in  more  than  ten  cells, and  the  event  cannot  have no  variability  across  the  cells. This  moved  around  75%  of  the  exons. This  shows  that  filtering  is  important because  it  removed  exons with  low  coverage and  were  basically  unimportant. This  reduced  the  feature  size, which  will  improve  the  performance of  predictive  modeling  and  analysis further  in  the  future. Then  I  calculate  the  PSI or  percent  spliced  in  value with  this  equation: IJC  over  IJC  plus  SJC. Before  I  go  into  the  next  step of  predictive  modeling, I  wanted  to  explain  the  reasons  I  decided to  apply  predictive  modeling in  JMP for  this  specific  problem. There  have  been  several  studies  that successfully  used  machine  learning  models for  analyzing  RNA-seq  data, which  achieved  high  accuracy. I  hypothesized  that  it  would  also be  accurate  for   scRNA-seq  data. There  are  many  advantages of  using  predictive  modeling  as  well. It  can  extract  meaningful  features from  huge   datasets, classify  data  and  predict  outcomes with  supervised  learning, and  recognize  underlying  relationships of  data,  like  in  neural  networks. JMP Pro  also  provides  a  great  interface for  exploring  these  different  models, allowing  you  to  easily apply  sophisticated  algorithms to  large   datasets  quickly  and  accurately. It  also  allows for  model  comparison  and  screening. Finally,  it  generates some  visual  and  interactive  reports that  reduce  the  black  box  effect of  machine  learning  models, which  means  that  you  don't  really  know what  happens  in  the  model, you  just  know  the  output, which  is  not  very  useful. Our   dataset  had  8,044  variables  or  exons, and  1,014  rows  or  cells. It  was  difficult  to  run  analysis with  such  a  large  number  of  columns, so  I  used  a  bootstrap  forest for  variable  selection. A  bootstrap  forest is  very  computationally  efficient and  it  can  operate quickly  over  wide   datasets, and  it  uses  random  sampling with  replacement, causing  it  to  be  very  robust  and  accurate. I  could  have  also  used predictor  screening, which  is  another  function  in  JMP, but  I  found  that  manually  running  it using  a  tuning  design  table was  more  computationally  efficient and  allowed  me  to  tune the  number  of  trees  as  well. But  I  found  that  the  number  of  trees doesn't  make  a  huge  difference. In  general,  the  more  trees  you  use leads  to  better  results, but  their  improvement  decreases as  the  number  of  trees  increases. At  a  certain  point, the  benefit  in  prediction  performance from  learning  more  trees  will  be  lower than  the  cost  in  computation  time. For  the  8,044  exons, I   tuned the  number  of  trees, and  I  found  that  around  50  trees resulted  in  the  highest  accuracy. And  then  I  took  the  top  100  exons from  the  column  contributions, and  then  I  ran  it  again with  the  top  50  exons, and  then  again  with  the  50  exons, comparing  the  accuracy  as  you  go. This  shows  the  JMP  reports for  tuning  trees  and  variable  selection. As  you  can  see,  I  use  the  tuning  design table  and  change  the  number  of  trees. I  looked  at  the  entropy R  square and  these  different  metrics. I  believe  50  trees  was  the  best  result. I  took  the  top  100  exons  and  then I  ran  the   bootstrap  forest  with  30  trees because  I  did  another  tuning  design  table. For  each  iteration, I  did  a  separate  tuning  design  table to  find  the  best  number  of  trees. As  you  can  see, the  entropy  R  square  increases as  the  number  of  exons  was  decreased. As  I  selected the  top  three  different  exons, it  shows  that  it  doesn't  really  matter if  it's   the  top  10  or  top  50  or  top  100, it  will  still  result  in  a  high  accuracy. The  misclassification  rate   varied. The  RMS  error  also  decreased, and  the  average  absolute  error also  decreased. With  these  metrics, I  decided  to  use  the  top  10  exons to  run  the  next  steps  of  my  analysis. I  used  the  Model  Screening function  in  JMP  Pro to  run  the  top  10  exons  on  all the  different  models  that  were  possible. It  resulted  in  a  high  accuracy across  the  board. I  looked  at  the  prediction  profiler to  determine  how  the  exons caused  tumor  or  cancerous  cells to  be  different. Now, I'll  be  performing a  demo  of  this  process. Let  me  open  the  top  10  exons. In JMP  Pro,  you  can  go  to  Analyze, Predictive  Modeling, and  then  Model  Screening. I'll just hit  Recall because  I  used  this  before and  it  automatically  updates with  your  previous  settings. And  then  it also  did K-fold  cross v alidation. This  process  is  very  quick,  it  only  takes around  20  seconds  or  10  seconds. All  right. As  you  can  see,  the  neural  boosted had  the  highest  accuracy. Next  was  generalized  regression lasso. It  split  the   dataset into  training  and  validation. You  can  see  that  boosted  tree had  the  highest  accuracy  for  training, which  makes  sense  because  it basically  fits  the  data  really  well when  you  use  a  tree. But  for  validation,  let's  see, the  most  accurate  was  in  neural  boosted. With  this  information, I  decided  to  use  a  neural  boosted to  do  the  final  classification  of  cells for  tumor  cells  or  normal  cells, because  it  would  result in  the  highest  accuracy. I  clicked  Run  Selected, and  then  I  used  the  validation  column that  I  previously  created with  the  Make V alidation  function in  the  Predictive  Modeling  tab. Here  are  the  results of  the  neural  network. It  was  very  quick,  again. You  can  see  that  there  are  the  ROC  graphs. It  basically  has  a  very  high  AUC. If  you  take  a  look  at the  prediction  profiling, the profiler, this  shows  the  different  ways... Let  me  explain  this  better. As  you  move  your  cursor  along, if  you  look  at  A POBR,  this  gene, when  the  inclusion  of  this  exon  increases, then  there's  a  higher  probability that  the  cell  will  be  a  tumor  cell. Over  here,   for  this, this is  also  the  same  trend. For  this  one,  this  is  the  inverse  trend. For   PCDH9,  as  the  inclusion of  this  exon  increases, the  probability  that  it  will  result in  a  tumor  cell  decreases. Sorry. This  allows  you  to  look at  the  role  of  these  exons in  causing  the  function of  tumor  and   normal   cells  to  be  different. This  is  a  very  powerful  visualizer, and  it  can  show  the  different relationships  between  these  cells. You  can  also  just  save the  formulas  to  the... Sorry,  one  second. You  do  Publish  Prediction  Formula, and  that  saves  it  in  the  formula  depot. Okay. Next,  I'd  like  to  look at  the  difference  in  expression as  it  goes  from  5  days to  8  days  to  11  days. This   dataset  also  provided the  cells  at  different  time  points. If  you  look  at  here, this  column  shows  that  the  different times  are  available  for  us  to  analyze. I  subsetted  the  tumor  cells. Let  me  find  that  over  here. Tumor  only. I  compared  the  distribution of  the  different  exons. Let  me  show  you  the  graph. As  you  can  see, there  are  several  exons that  increase  in  expression  over  time. Over   here ,  SLAMF9   or  T OMM20 increases  dramatically, and  these  also  increase. There  are  also  exons that  increase  and  then  decrease, which  show  that  they  have an  optimal  period  of  time that  they're  the  most  active. I'll  be  going  over  these  specific  genes and  functions  later  in  my  presentation. Let  me  go  back  to  my  presentation. All  right. Here,  I  also  ran  clustering on  the  top  10  exons  dataset. I  did  K- means  clustering, and  this  more  accurately  classified the  tumor  versus  normal  cells. As  you  can  see  here, the  blue  is  normal  cell and  then  the  red  is  the  tumor  cell. This  shows  that  there  are  possibly different  subsets  of  tumor  cells that  have  different  functions. I  also  ran  UMAP  on  all  the  cells. UMAP  is  another dimensional  reduction  formula. This  was  ran  with  a  JMP  add- in that  was  created  by  this  author. I  will  link  it  in  my  presentation in  the  community. Here  is  the  gene  enrichment analysis  that  I  performed. As  you  can  see,  the  top  function of  all  of  these  genes was  RNA  import into   the  mitochondrion. The  second  most  important  one  was regulation  of  leukocyte  degranulation. T cells  are  a  type  of  leukocyte. It  can  show  that  possibly  these  leukocytes or  T  cells  are  degranulated, and  that  causes  the  cells to  become  tumors. Also,  malignant  tumors  selectively  retain mitochondrial  genome  and  ETC  function. That  can  also  explain why  RNA  import into  the  mitochondrion is  an  important  function. This  is  another  graph of  the  top  GO  biological  processes. Over  here, the  top  one  is  cellular  process. There  are  also  biological  adhesion, regulation,  and  immune  system  processes that  can  be  further  explored. I'd  like  to  explain the  most  notable  genes. These  genes  are  basically  the  top  genes that  were  in   the   column  contributions. S TPBX2  is  involved in  intracellular  trafficking, control  of  SNARE  complex  assembly, and  the  release  of  cytotoxic  granules by  natural  killer  cells. This  can  show  that  the  T  cells are  involved  in  some  sort  of  trafficking. Or  the  cytotoxic  T  cells  can  use  this  exon to  regulate  the  tumor versus  normal  functions. SLAMF9  nine  encodes  a  member of  the  signaling   lymphocytic  activation molecule  family  and  its  transmembrane. WARS is  a  tryptophanyl- tRNA  synthetase that  catalyzes  the  aminoacylation of  tRNA  with  tryptophan and  is  induced  by  interferons. All  of  these  genes  have  a  role in  the  immune  system  and  T  cell  function. We  can  study  these  further and  determine  if  they  actually  have a  important  function  with  in vitro  tests. Next,  I  explained  this  earlier, but  I  analyzed the  tumor  gene  expression  over  time and  T OMM20,  which  was  the  one that  had  the  most  dramatic  inclusion, is  actually  implicated  in  the  translocase of  outer  mitochondrial  membrane  complex that  facilitates  cancer  aggressiveness and  therapeutic  resistance in   chondrosarcoma, which  is  a  type  of  cancer. You  can  see  that  TOMM 20  causes cancers  to  be  more  aggressive. It's  really  interesting how  our  basic  analysis  with   JMP Pro resulted  in  this  exon being  the  most  important. This  shows  the  effectiveness of  JMP  in  these  types  of  studies. In  conclusion,  the  exploration of  this   dataset  facilitated  by  JMP demonstrates  the  role  of  genes and  exons  in  T  cells  in  melanoma. A  good  thing  is  that  the  code can  be  replicated  in  Python. JMP  allows  that for  more  robust  or  detailed  analysis. These  potential  genes like  TOMM 20  or   STPX9 can  aid  in  the discovery of  novel  and  personalized  approaches to  cancer  treatment. And  then  we  can  perform in  vitro  testing  on  the  top  genes. Thank  you  for  listening to  my  presentation. I  will  provide  all  the  code and  R  scripts  in  my  presentation  link.
Through JMP 16 outlier and quantile box plots (distribution), together with quantile range outlier and robust fit outlier detection (screening), we present comprehensive strategies to powerfully separate signal from noise in the presence of univariate response(s). We also propose that through practical analysis with the box plot, we can connect the Gauge R&R noise impact with location of the points most adjacent to the upper and lower fences. We use Monte Carlo sampling (random() function and instant columnfFormulas) to produce multiple distribution types (normal, uniform, peaked, bimodal) to validate the impact on the box plot and histogram together to detect normality violation failure modes.    We demonstrate that the box plot is a powerful visualization tool to judge the data distribution, unique to separate skewness from outliers. Graph Builder, one-way, GoF, and nonparametric hypothesis testing show that – since box plot is very weak to detect bimodality, kurtosis, or draw hypothesis test decisions (missing sample size effect) – both the histogram and box plot are needed to visualize normality. Together with descriptive statistics, the most powerful discrimination between different candidate distributions is presented. Finally, we synthesize and demonstrate our learning experience by formulating 17 thought-provoking quiz questions and answers to maximize the utility of the box plot for data-driven problem solving.     Well,  thank  you  everyone  for  joining  me. This  is  a  Discovery Summit  2022  presentation, courtesy  of  my  co-presenters, Charles  Chen  and  Mason  Chen. My  name  is  Patrick  Giuliano. The  title  of  this  talk  is, Box Plot A nalysis: Blending  Scientific  and  Artistic  Enquiry in Uni variate  Response  Characterization. Here's  the  abstract. You  can  find  this on  the  JMP  User C ommunity in  the  Discovery  2022  community  page, US D iscovery  2022  community  page. I'm  putting  it  here  for  reference. I  will  provide  a  link  in  the  slides to  the  community  page where  the  project  will  live. What's  the  motivation  for  this  project? The   Box Plot  is  one of  the  most  popular  graphical  tools to  visualize  a  univariate   distribution  of  data. This  project  studies  how  to  use the   Box Plot  to  analyze  data  effectively. Most  people  who  use the   Box Plot  don't  use  it  necessarily to  determine  the  shape of  the  distribution  of  the  data. In  fact,  many  people  use  it  wrongly to  draw  mean  or  mean  comparison  decisions, and  they  may  assume normality  based  on  symmetry, when  in  fact, the  normality  assumption would  actually  not  be  reasonable if  they  were  to  take  a  closer  look at  the  shape  of  the  data on  a  histogram,  for  example. The  objective  of  this  project is  to  demonstrate  how  to  use  JMP specifically  16, to  interpret  information  in  a   Box Plot and  to  improve  proficiency in   a  global  community of  scientists  and  engineers that  are  really  under  a  DMEIC  or  APS, or  Lean  type  Six  Sigma  methodology, which  is  very  popular,  obviously,  today and  over  the  last  few  decades. The  interesting  thing  about   this  project  is  we  framed  it in  the  context  of  17  quiz  questions. This  is  a  question and  answer  slide  deck. And  I'm  not  going to  go  into  too  much  detail about  each  and  every  question, whic h  I  will  show  you  here. But  what  I'd  like  to  do is  show  you  a  little  bit  about how  you  can  use  JMP  to  explore the  answer  to  these  questions, because  I  think  that's  really the  most  interesting  and  fun  part. The  first thing  I  wanted  to  do  is  just   quickly  go  over  what  a   Box Plot  is. So  what  is  the  anatomy  of  a   Box Plot? Just  as  a  refresher  for  some  of  you, or  introduction  for  some  of  you, the  median  is  indicated  by  the  midline, and  it's  referred  to  as the  second  quartile,  Q2 or  the  50th  percentile. Then   Q1  is  referred to  as  the  25th  percentile. Q3  is  the  75th  percentile, as  you  can  see  here. The  interquartile  range,  or  IQR, is  the  difference between   Q3  and   Q1. The  other  important  elements are  we  have  what's  called a  whisker  on  the  lower  side and  on  the  upper  side. Right at  the  end  of  that  whisker, sometimes  we  refer  to  it  as  the fence and  you'll  see  a  vertical  line and  JMP  draws  a  vertical  line   to  indicate  that  edge. What's  important  about  this  is that  this  location  is  actually Q1 minus  1.5  times  the  IQR, which  is  represented  by  the  distance between  this  edge and this edge. This point  and  the  upper  fence  is Q3 plus one  and  a  half  times  the  IQR. So   that  defines  the  upper  edge. Then  any  points  that are  beyond  these  edges  or  fences are  considered  potential  outliers. And  they  actually  show  up   by  themselves  as  points whereas  the  rest  of  the  data in  the  middle  of  the  histogram is  not  shown  for  emphasis  on  the  points that  are  beyond  the  beyond  these  fences. I'm  going  to  jump  right  in. How  did  we  explore  and  develop the  answers  to  these  questions, and  in  some  cases, even  refine  the   questions  themselves? Well,  we  created a  simulated  data  table in  JMP  16, where  we  constructed  100  rows  of  data, and  we  constructed  data, first from  a  normal  distribution, and  then  applied  some transformation  to  that  data. We  see  that  we  have a  normally  distributed  data drawn  from  a  population  the  mean of  zero  and  a  standard deviation  of  one. Then we  have  uniformly  distributed  data. Then  we  have  data  that's  peaked, i.e  has  a  positive  Kurtosis. Then  we  have  data  that's  right  skewed, has  two  modes, has  some  outliers  about  3 %  on  average and  then  integers. In  all  the  cases, with  the  exception  of  the  bimodal, we  just  based  the  simulated  formula, on  the  original  normal  column. The  way  that  we  put  all the  data  on  the  same  scale is  we  use  the  column  standardized  function so  that  we  could  compare  all  the  data relative  to  each  other in  the  distribution  platform. This  is  just  a  preview  of  that. I'll  jump  over  to  JMP and  show  you  that. But  again,  all  of  this  data  is  centered at a mean of   approximately  zero and  a  standard  deviation, approximately  one. We  covered  the  first question. Why i s  a   Box Plot,  sometimes  referred to  as  a  five- point  plot? Well,  there  are   five  main  points . There's   Q1,  there's   Q2 in  the  middle,  there's   Q3, and  then  there  are  the  whiskers, the  upper  and  lower . Next  question. What  are  the  two  ways that  the   Box Plot  can  determine whe ther the  distribution  is  skewed? Well,  we  can  look at  the  width  of  the  box  itself. We  can  also  look at  the  width  of  the  whiskers. In  this   right  skewed  example, you  can  see  that  upper  whisker is  much  longer  than  the  lower  one. So  that  would  imply that  the  data  is  right  skewed. In  other  words,  that  the  tail  in  the  data, if  you  were  to  imagine  that  distribution, is  pointing  to  the  right. Third, why  does  the  Box Plot include the  median  and  not  the  mean? Well,  a   Box Plot   uses  the  median to  determine  or  gauge  skewness. So  if  the  distribution  is  normal, then  the  mean  is  equal  to  the  median. And  in  fact,  what  you  would  see  here is  that  this  median  line  in  the  middle would  line  up  exactly with  the  edges  of  this  diamond, the  middle  of  this  diamond. In  that  case, you  would  effectively  have  a  situation where  you're  not really  losing  any  information because  the  distribution is symmetric. The  median  in  general, then  might  be  considered  better, regardless  of  whether the  distribution  is  normal  or  non-normal. Fourth,  why  is  the  Box  Plot the  most  powerful  visualization  tool, or  one  of  the  most  powerful  tools to  separate  skewness  and  outlier  problems? When  we  talked  about  this  idea that  because  the   Box Plot  uses this  Q1  minus  one  and  half  times  IQR and  Q3 plus one  and  a  half times  IQR  methodology, it  really  allows  us  to  separate potential  outliers  from  the  main  data. It  also  gives  us  a  framework by  which  to  judge whether  the  upper  whisker is  larger  or  smaller  than  a  lower  whisker. So  those  two  components  of  the  plot really  help  us,  rather, see  if  we're  trained  skewness and  potential  outlying  this. This is  a  unique  feature  of  the   Box Plot. The  fifth question is  a  little  more  interesting. What's  the  relationship between  the  interquartile  range, that  distance  between   Q1  and   Q3, and  the  standard  deviation, which  we  can  calculate  for  any  data  set, regardless  of  how  it's  distributed? If  the  data  is  normal, what  about  if  the  data is  skewed  or  non-normal  or  peaked or  any  other  shape? Well,  we  know,  based  on  theory, that  the  ratio  of  the  IQR, to the standard deviation is  1.35  for  normal  data. What  would  that  ratio  look like  if  the  data  wasn't  normal? Well,  we  can  explore  that  in  JMP, and  I'm  going  to  show you  that  really  quickly. Here's  the  data  set. I'm  also  going  to  post this  on  the  community . The  first thing  I'm  going  to  do is  go  ahead  and  show  you how  I  get  to  a  visual  state where  we  can  see  all  the   Box Plots  together, without  the   distributions. This  is  interesting, but  I'm  going  to  go  ahead and  start  from  the  beginning. I'm  going  to  analyze  distribution. I'm going to  show you  how  I  got  there. I'm  actually  going  to  click  everything, and  JMP  is  going  to  give  me a  histogram  and  a   Box Plot  together. A t  the  end  of  the  presentation, we're  going  to  summarize why  that's  important. But  what  I'm  going  to  do  here  is  I'm  going to  go  ahead  and  turn  off  the  histogram, I  can  go  ahead  and  customize the  width  here  of  the  lines. I  can  copy  this  customization  over, which  is  really  nice. I'm  going  to  hold  down the  control  key  because  I'm  on  a  PC. I'm  going  to  right  click and  then  I'm  going  to  hit  Edit, Copy, P aste C ustomizations, and  that's  going  to  bring  them  all  over. I'm  actually  going  to  hold  down the  control  key  again  and  resize  this so that I  can  resize  them  all  together. Now  I'm  going  to  minimize the  quantile  section, because  I'm  going  to  get the  information  that  I  need from  the  summary  statistics  section. I  actually  have  the  IQR and  a  standard  deviation  shown  here. A lthough  I   could  customize  this, either  here  or  in  the  properties, which  I  can  access  under  File,  Preferences and  under  the  distribution  platform  group. What  I'm  going  to  do is   I'm  actually  going  to  make this  information  into  a  data  table and  I'm  going  to  right- click  and  select, Make  Combined  Data  Table  to  do  that. Now  I  only  need  the  IQR and the  standard  deviation. I'm  really  only  interested  in  that. So  I'm  actually  going  to  select, one  of  the  standard  deviations, one  of  the  IQRs. I'm  going  to  move  my  cursor  over  here and  select  Matching  on  all  of  the  rows that  have  these  values  in  them. I'm  going  to  go  ahead and  invert  the  selection, delete  the  rows  that  I  don't  want, and  I'm  left  with  this. Now  I'm  just  going  to  go  ahead   and  restructure  the  data so  that  I  can  calculate  the  ratio  of  the  IQR  to  the  standard  deviation. So  I'm  just  going  to  use Table  Split  for  that. I'm  going  to  go  ahead and  split  by  column 1 put  column  2  in  here, put  these  in  a  group. I'm going  to  click  OK. I  have  the  data  how  I  want. This  shows  me  from  which  distribution this statistics  came  from, I'm  going  to  go  ahead  and  do a  New  Formula C olumn,  Combine,  Ratio. There  you  go. This  looks  a  little bit  hard  to  interpret  for  me. I'm  going  to  go  ahead  and  change  it so  that  I  can  only  see  two  decimals. I've  got  numbers  which  are  very  similar, it  should  be,  anyway,  very  similar to  what  I  have  in  the  slide here detailing  the  ratio  of  the  IQR to  the  standard  deviation. Of course,  they're  going  to  be  different because  there's  sampling  error. This  table  is  only  one sampling  experiment. But  this  is  how  I  can  quickly and  interactively  extract  this  information and  really  understand, what  does  this  ratio  look  like if  my  data  is  not  normal in  a  particular  way? We  can  see  here, that  the  values  that  tend  to  be  lower at  the  peak  distribution of  the  one  with  outliers, the  values  that  tend to  be  higher  than  the  typical or  the  expected theoretical  1, 3, 5  normal, are  going  to  be  the  uniform, the  right  skewed  and bimodal. Next  question, what's  the  ideal  outlier  percent if  the  distribution  is  perfectly  normal? Well,  it  turns  out  that  if  we  look in  the  textbooks  or  reduce  simulation, on  average,  we  should  see  about  0.7% of  the  points  beyond  the  fences, in  a  normal  distribution, or  at  least  perhaps  not  beyond  the  fences, but  if  we  were  to  do  a  control  chart, we  would  certainly  see  about, which  is  under the  assumption  of  normality. For  example, if  we  were  to  do  an  individual  moving range  chart,  we  would  see  around  0.7% of  the  points  on  average being  outside  the  limits. Although  for  practical  purposes, we  could  probably  say, if  we  saw  about  3 %  or  less of  the  distribution  beyond  the  limits, we  would  consider  it  approximately  normal. Why  is  that  question  important? Well,  we  can  use  the  proportion of  the  points  beyond  the  fences in  a  Box Plot   when the  sample  size  is  small to  determine  whether  or  not we  have  some  evidence  of  normality on  the  basis  of   outliers. Although  if  our  sample  size  is  too  big, then  we're  going  to  see  lots and  lots  of  points  beyond  those  fences. So  it's  really  important  that  we  consider a  " reasonable  sample  size." And  that's  part  of  the  reason  why  we only  considered  100  rows  in  our  project. Next  question, what's  the  difference  between a  quartile  range and  a  quantile  range   Box Plot? Well,  in  a  practical  context, anyway,  we  can  talk  about the  Explore Outlier  utility  in  JMP  16, which  a llows  us  to  adjust  the  Q, which  is  the  multiplier  on  the  IQR and  the  tail  quantile, which  is  essentially how  the  data  is  divided  up. We  can  customize  that  range. I'm  just  going  to  show  you what  that  looks  like  real  quickly. I'm  going  to  go  into  Analyze, I'm  going  to  go  to  Screening, Explore  Outliers. I'm  going  to  do  this  on  my  raw  data. I'm  actually  going  to  close  this. I'm  going  to  go  back to  the  raw  data  table. I'll  just  pick  a  couple  of  these. I'll  actually  pick  the  ones  that  I  have in  my  slides  to  peak  in  the  outliers. I'm  going  to  go  ahead and  use  the  quantile  range  outliers. I'm  going  to  adjust  this to  what  the   Box Plot uses: 0.25  and  1.5. I'm  going  to  click  Rescan and  JMP's going to  identify potential  outliers  here. How  does  this  connect to  the  distribution  platform? Well,  if  we  go  over  here, we  look  at  this, what  we're  going  to  see, is  there  are a  number  of  outliers  here. I'm  actually  going  to  select  the  rows, I'm  going  to  go  over  here. Well,  lo  and  behold, it's  these  values. So  you  got   1, 2, 3, 4, 5, 6,7. There's  seven  outliers. 1, 2, 3, 4, 5, 6,7. That  squares  up. That's  exactly  what  we  would  expect. Similarly,  we've  got   1, 2, 3, 4, and  if  we  scroll  over  here, and  under  the  outliers, see  if we're over  here  are  to  four. Great. Going  back  to  the  slides  here, we  can  customize  this . And  that's  actually  what  we  get into  in   subsequent  Question  10. How  do  we  determine  whether outliers  are  marginal  or  extreme? Well,  and  why  is  it  important? Well,  we  can  adjust  the  sensitivity of  the  outlier  detection  based on   the  multiplier  on  the  IQR while  keeping  the  tail quantile  the  same. You  might  intuitively  expect that  if  you  were  to  take  Q₃ plus a  larger  number  times  the  IQR, it's  going  to  extend  the  whisker  length and similarly,  on  the  lower  side. That's  going  to  mean that  more  points  are  going  to  fall  inside. So  less  outlier  would  be  detected. We  should  be  able to  see  that  and  test  that  in JMP. So  if  I  were  to  increase this  to  two  and  click  Rescan, we  see  a  few  outliers become  part  of  the   Box Plot, or  we  can  imagine a  situation  where  that's  the  case. I'll  increase  this  to  three, I'll  hit  Re scan we  see  even fewer outliers  being  identified  still. A s  I  go  up  to  Q equal to  five, now  I  only  have  one  outlier  detected in  the  peak  column  of  data. So  the  idea  here  is  that  we  can develop  criteria  for  Q,  for  example, we  might   situate  it  with  three, a situation  where  data  might  be  considered a  typographical  error,  where  it  might  be   an  extreme  or  more extreme outlier. We  might  set  Q  equal  1.5 if, for example, we think that the potential outlier  might  be  associated  with  variability due to  the  measurement  system or special  process  variation. We  can  do  some  simulation based on  our  application and decide on what the  value  of  Q should be in  these  particular  scenarios. In connection  with  that, in  Question  10, we  touched  a  little  bit  on   GRR or  measurement  system  variability. Question  8  talks   a  little  bit about, it goes  a  little  bit  deeper  into  this and  brings  together  some  ideas. The  idea  here  is  that   we  might actually consider the  distance  between   the  upper  fence  and  the  first outlier or  the  first potential outlier  series  of  outliers. We  may  extend  that  upper  fence   by  a  distance  of  two times the Sigma due  to  the  measurement  system  variability. In  this  way, we're actually considering the  variability due to  the  measurement  system. And  we're  asking  ourselves, is  this  potential  value within  the  noise  of  the measurement  system  or  not? We're  creating  a  graphical  way, a blended graphical means of  determining  whether   the  value  is  reasonable under the  expectation  tha t there's measurement  system  variability. I  have  here  the  distance  between   the  marginal  outlier and the whisker should  be  compared   to  the  GRR  noise  standard  deviation. If  it's  within  two  standard  deviations, we  don't  have  95 %  confidence to conclude this  marginal  outlier is different  from  the  whisker. This  is  just  a  graphical  version of  a  one- sample  T- test  in  effect. We  could  actually  construct   a  one- sample  T- test using this red line as our  target   and  the  observed  value, or  rather  assumed  series  of  values, this black dot, as  our  distribution relative  to  that  target. The  next  question,  how  many  points do we  need  really  to  produce  a   Box Plot if  we're  sample  size  limited? Well,  we  might  need   at  least  seven  points, and our simulation  in  this  particular sampling  experiment  shows  that. What's  happening  here? Well,  each  of  these  three  data sets  have  the  same  median. You  can  see  in  this  data  set, there  are  six  observations. In  this  one  there  are  seven, and  then  this  one  there  are  eight. Let's  start  on  the  left,  actually, and  we  have  eight  observations, one  out  here  around  15. What  if  we  reduce  the  number  of observations to seven and  we  actually  included   the  same observation here, but  we  reduce  one  of  the  others? What  if  we  reduce  it  further while  maintaining  the  same  median? Then  what  we  see  is  that  this outlier 15, which  is  still  in  the  data  set,   no  longer  becomes  an  outlier. In  essence,  it  becomes  absorbed  into the whisker  itself. The  other  thing  that's  interesting about this simple experiment, is  that  the  IQR  becomes  inflated when we go from seven to six. We  can  see  that  visually  as that the width of this box from Q1 to Q3   becomes  much  wider. We  can  also  see  that  numerically  here. I  actually  want  to  show  you how  we  might  explore  that  in  JMP. Here's  some  data. It's  not  the  same  data, but  here's  some  data. I  just  created  a  column that  ranks  the  data. A gain,  I  just  use an  instant  column  formula. I  can  do  that  by  selecting   one of these options, so  I  believe  it's  under  distributional. Now,  what  I'm  going  to  do  is I'm going  to  go  ahead  and  just  clock  this  data. I'm  going  to  turn the  histogram  on  its  side. I'm  actually  going  to  invoke   the  local data filter. I'm  going  to  bring  in  that  rank  column   that  I'm  going  to  make it ordinal first, so  that  I  can  select  data  individually rather  than  under  the  assumption of  the  continuous  distribution. I'm  going  to  select  everything. Now,   let's  see, if  I  go  back  to  the  data  table, I  know  that  8  represents the  highest,  the  largest  value. I'll  keep  8  in  there, and  I'll  just  start  reducing   some  of  the lower values by  holding  down   my  control  key  and  clicking  that, which  will  effectively  remove  that  point dynamically  from  this  analysis. I  got  the  control  key  down   and  clicked  again,  click again. You  saw  it  there, that one outlier at the low side, anyway,  in  this  case,   just  disappeared. There's  a  relationship  among  the  distance  between the  fences  and  the  points, which  is  calculated   on  the  basis  of  the  data where the median  and the quartiles  are  calculated based  on  the  data   that's  in  the  analysis. This  gives  you  a  better  means of  appreciating  how  the   Box Plot is changing as  a  function  of  data that's either  in  or  out  of  the  analysis. This  is  a  really  super  cool  feature that I really  like  to  use  a  lot  in  many  contexts. What's  the  advantage of a Robust Fit Outlier  algorithm, which  is  a  JMP  16  algorithm? It  gives  us  another  means   of  detecting  outlyingness . We  have  the  ability   to  use  a  Cauchy  method which  often  avoids   the  impact of skewness, which  can  be  useful for  practical  situations. We  can  also  use  a  3-s igma   or a K-sigma multiplier in  order  to  help  detect  outlyingness . All  of  these  methods  really  help  us separate potential outliers  from real outliers and  help  us  create  a  reasonable signal detection and  methodology in a similar way that we might  do   if  we  were  to  use  control  charting and build a control chart with  limits for  our  particular  experimental   or  manufacturing  application. 13.  Can  we  include  the  sample  size information  in  the  Box  Plot? Well, this  is  where  the  Box  Plot  starts to  present  a  clear  limitation. There  isn't  any  sample  size  information   explicitly  in  the   Box Plot. A lthough,  we  do  have   the  ability in graph builder to create a notch Box Plot,  which  gives  you  something like a  confidence  interval   on  the  median, the  edges  indicate   a  confidence  interval  on the  median. We  also  have  the  ability  in  graph  builder to invoke the caption box which is a very useful  feature   for  summarization  of  data graphically  without  needing  to  provide an  additional  tabular  data  output. But  of  course,  that  information  is completely  hidden  to  the   Box Plot  itself. Connected  to  that  is, can  we  make  any  decision with any level of  statistical  confidence if  we're  just  looking  at  the   Box Plot? The  answer  is  no. In  this  particular  example, we actually designed it so  the  medians were  slightly  different  on  average. And  so  we're  getting  some  separation among  the  medians  between  the  groups. We  used  to  fit Y by  X  in  this  context. What  this  shows  is  that  the  mean [inaudible 00:29:27] represents the  mean, the mean diamonds  are  non- overlapping. It  looks  like  all  across   all  four groups being compared, which  indicates  that  there's  some evidence that  there's  a  difference in  the  means  between  the  groups. We  can  also  see   the  difference   in  the  medians. We  can  do  a  non- parametric  test. In  this  case,  we're  using   a  non- parametric steel test with control, where  the  control  is  just  the  Z  normal. We're  seeing  some   evidence of separation, statistical  separation  among  the  medians  in this particular instance. It's  hard  for  us  to  detect  that and  see that in the Box Plot. In  fact,  it  really  isn't   that  clear  at  all. How  can  we  tell  if  we  have  any concern  with  respect  to  Kurt osis ? What's  Kurtosis? Kurtosis is  basically  the  idea  that  if it were a  positive  Kurtosis, you  would  have  data   that's  concentrated  in  the  middle, your  data  that's  squished  together   into  the  middle  of  the  distribution. That's  this  example  in  the  right. If  you  had  an  idealized  case of extreme negative  Kurtosis, you'd  have  a  uniform  distribution where  the  data  is  really  spread  out. What  you  can  see  in  these  graphs relative  to  the  normal  distribution, is  that  the  50 %  dense  zone, indicated  by  this  red  bar, is  basically  about  as  long  as  the distance between  Q1  and  Q3  here, but  it's  on  one  side  of  the   median, on  one  side  of  the  median and  the  uniform  case. It's  about  as  long  as  this  box  width then  it's  also  on  one  side  of  the  median. That's  a  unique  characteristic  feature of  this  uniform  distribution  shape. If  we  look  at  the  peak  situation, we  see  that  the  box  width   is much  more  compressed and the shortest half width  is also about the same as  the  box width, the shortest half is the  most  dense  region rather  as  centered  about  the  median. That's  similar  to  what  you  would  see for the normal distribution case, where  the  50 %  dense  region   will  be  about centered on  the  mean  or  the  median and  about  the  same  width  as  the  box. Clearly,  the  differentiator  here  is that the  distances  are  reduced  quite  a  bit. Really  the  takeaway  here,  though, is  that  this  type  of  interpretation is  really  difficult. And  it  would  be  easier  for  us  to rely on the  shape that's evinced by the histogram than  to  try  to  look  at  the   Box Plots  separately. Question  16  is  very  similar to 15. What  about  in  the  context  of  data that  has  more  than  one  mode? What  about  a  bimodal  distribution? Well,  I  just  took  the  Box Plots   and  pulled  them  out on the left, they're  from  the  pictures  on  the  right. We  can't  really  see  a  whole  lot of  difference  among  these. It's  difficult  for  us  to  interpret  this. But  once  we  put  the  histograms, we can see clearly  if  we  fit  a  two- peak distribution that  there's  two  modes  in  this  data,   and  there's  maybe  one  mode, maybe  a  small  mode,  but  really  essentially one  mode  in  the  data  on  the  left. The   Box Plot  isn't  particularly good at  detecting  that  presence  of  multiple  modes. The  last  question  is,  how  many   "normality violation failure modes" can  we  detect  with  the   Box Plot? This  question  brings all  the  other  ones  together. Well,  if  we  have  skewness, we've  shown  that  we  have a  strong  ability  to  detect  that. If  we  have  potential  outliers, we  definitely  have   a  strong  ability  to  detect  that. If  we  have  Kurtosis, which  is  really  related  to  the  shape   as is  if  there  are  multiple  modes, then  we  really  don't  have a  strong  ability  to  detect  that. If  we're  considering   hypothesis  testing, we definitely don't have an  ability to detect that  either  with  the   Box Plot. What's  the  takeaways? Well,  the   Box Plot  is  definitely  a powerful visualization tool. It's  a  great  introductory  tool, and it  has  a  wonderful  ability   to  separate skewness from  potential  outlying ness. But  it  has  its  limitations. In  cases  where  we're  looking at  Kurtotic  shape  or  a  bimodality or multimodality,   the  histogram is  definitely  a  better  choice. That's  really  probably  why  JMP  uses both the Histogram and the Box Plot together in  the  distribution  platform to visualize how the data is behaving, if  you  will. Of  course,  adding  descriptive  statistics helps us really  round  up  the  picture   where we  have  a  graphical  first approach. This  is  just,  again,  a  summarization   of  what  we've  discussed. But the  last  couple  of  minutes, I  just  want  to  show  you  a  couple  more things  about  the  data  set  itself. Because  I  think  this  is  perhaps the  most  useful  aspect  of  the  project. How  might  we  set  up  a  data  set  like  this? All  we  really  have  to  do   to  simulate  data in JMP is  just  create  some  rows   and then create  a  function, a random  normal  function. The  process  that  we  did, one way you  could  do  this  is  you  could say, okay,  you  can  go  into  a  column  formula… Let me  just  show  you  this. You  can  just  double- click  into  it, and  you  can  click  Formula, you  can  edit  the  formula. You  can  go  over  here   to  these  random functions, you  can  click  it  in,  and  then  you  can   specify  a  population  mean in Sigma. Zero  and  one  by  default,  click  OK,   and  then  I  can  add  a  bunch  of  rows. I'll  go  ahead  and  add  100  rows. What  about  these   other  distributions? A uniform  distribution is,   we  can  use  the  random  uniform  function. And  then  we  can  specify a  Min  and  Max  value. In  this  case, I  just  specified  the  minimum of this  column, this  normally  distributed  data  column, and  the  Max is  the  maximum. And  then  finally,   as  I  mentioned, I  standardized  the  column  so  that  it was  on  the  same  numeric  scale. This  standardize  this  column, standardized  feature  is  common  to  all  of  these  columns. Now,  the  last  thing  I  want   to  talk about real quick is,  well,  what  about  peak? What  about  right  skewed,  and  even  bi modal? Well,  one  of  the  things  we  can  do, which I really think is cool, is  we  can  use   the  distribution  calculator in JMP to  help  us   understand what certain  distribution  types  look  like. I'm  just  going  to  go  into  it  here. I'm  going  to  just  drive  down  in  here. I'll  share  with  you   the  location  here  of  this  script. It's  going  to  be  under  Calculator. It's not. It's  going  to  be  under Distribution. Generator. Distribution  calculator,   on the calculators,  yes. How  might  I  create  a  distribution  that's  right  skewed? Which  random  function   would  I  use? Well,  I  have  the  ability  to  look   at some of  these  distributions and see for  example, if  I  specify  a  random  F   and  I  specify these parameters, then  I'm  getting  a  distribution   with  this  kind of skewness. And  then  I  can  say,  well, what  happens  if  I  change   these  parameters  a  little  bit? How  is  that  going   to  change  the  distribution? I  can  use  this  insight   to  specify the parameters for  the  random distributions that  I  specify  in  my  data  set. In  fact,  that's  what  I  did  here. What  did  I  do  for  the  peaked  one? Well,  if  I  look  at  the  T  distribution, and  I  reduce  the  degrees  of  freedom, I'm  going  to  get  a  distribution  that's  relatively  peak. I'm  going  to  see  a  positive  Kurtosis  in  that. That's  one  way  I  can  understand   the shape of these distributions so  that  I  can  use them  to my advantage to  do  different  what  if  analysis   in JMP. I'm  just  going  to  quickly go  back  to  my  slides. Thank  you  very  much  for  listening. If  you  have  any  questions, I  look  forward  to  receiving  them   on  the  user  community. As I  mentioned, this  project  will  be  posted  there, and  the  summary  abstract is posted  at  this  link  here. Thank  you again.
You don't need an elegant predictive model to tell you what's going on with coral reefs these days; as seawater temperatures continue to rise as a result of climate change, thermo-sensitive corals continue to decline in abundance across the globe. Where data science comes in handy, however, is in what I refer to as "coral reef triage;" by that I mean, in the likely event that we can't save everything, which corals and coral reefs do we prioritize for preservation? Where are the refugia characterized by the unusually hardy corals that may have a chance of weathering the storm? Historically, we answered these questions by randomly stumbling upon corals and coral reefs that, for whatever reason (e.g., environmental factors or unique adaptations of the corals themselves), were unusually robust. Given the incredibly small percentage of coral reefs characterized to date (<0.0001%), however, simply hoping to come upon climate change-resistant corals by chance alone is neither a time- nor cost-effective conservation strategy.   In this presentation, I use JMP Pro to demonstrate how we can instead leverage data from laboratory experiments and coral reef surveys to predict where we will find the most stress-prone corals, as well as those that display a marked capacity for resilience. By triaging coral reefs across a spectrum of climate resilience, we can not only make more informed management decisions, but we can actually use machine learning and other predictive modeling tools to dictate the optimal mitigation and/or bioremediation ("coral rescue") approach(es) for particular reefs.     Hi  everybody, thanks  for  tuning  in. My  name  is  Anderson  Mayfield, and  I'm  a  core  reef  scientist working  in  South  Florida. Over  the  next  45  minutes  or  so, I'm  going  to  talk  to  you about  some  exciting  research I've  been  doing  entirely in  the  JMP  Pro  suite on  attempting  to  enable  coral  reef  triage with  machine  learning. So,  to  give  you  an  outline of  what  I'm  going  to  be  discussing  today, I'm  going  to  give  you  a  brief  overview of  some  problems  facing  coral  reefs, the  ecosystem  I  study. I'm going  to  give  you a  little  bit  of  a  recap of  the  talk  I  gave in  the  JMP  Discovery  2021  summit as  well  as  the  2020  summit. And  this  is  what  I'll  refer  to as  the  coral  Veterinarians  approach, in  which  I  was  trying  to  make  predictions about  the  fates of  individual  coral  colonies. About  halfway  through  the  talk, I'm  going  to  segue to  what  I've  been  working on  more  recently, which  is  attempting to  find  the  resilient  reefs, which  reefs  out  there are  going  to  be  the  ones that  can  weather  the  storm with  respect  to  climate  change and  are  going  to  be  around in  future  millennia. And  this  approach  I'll  refer  to as  the  poor  epidemiologist  approach. So  I  think  most  of  you  probably  already are  aware  of  the  motivation or  the  need  for  this   research. Coral  reefs  are  in  bad  shape. The  reason  is  because,  the  simple  coral  animals that  build  these  amazing  structures have  a  delicate  intricate  association with  dinoflagellates of  the  family  symbiodyacia. This  allows  them  to  build these  massive  structures that  can  be  seen  from  space. The  problem  is,  as  seawater  temperatures get  warmer  and  warmer, the  symbiosis  breaks  down,  the  algae, the  dinoflagellates are  no  longer  able  to  photosynthesize, and  they  leave  the  coral  or  digested or  actually  just  cease  to  photosynthesize. What  this  means  is  the  corals  slowly begin  starving  to  death  and  they  perish. Certainly  we're  worried about  other  stressors  as  well, things  like  seawater  pollution, disease,  eutrophication, over development  of  coastal  regions. But  on  a  truly  global  scale, climate  change is  what  we're  most  concerned  with, particularly  these rising  seawater  temperatures. But  for  sure, certain  corals  fare  better  than  others. There's  harder  species, there's  more  resistant  genotypes within  a  species. You  might  even  have  clone  mates  that  are in  close  proximity  to  one  another, one  of  which  might  die due  to  high  temperatures, the  other  will  maintain  resilience. So  what  drives  this  resilience in  these  more  robust  corals has  been  something  I've  been  working  on for  about  20  years  now. What  I've  been  trying  to  do  more  recently is  not  just  explain what  makes  corals  resilient, but  try  to  predict  which  corals that  we  haven't  studied  yet will  be  the  ones that  might  inherit  the  Earth. So  I'm  going  to  give a  kind  of  brief  overview of  my  former  approach. I  don't  want  to  make  it  seem like  I've  completely  abandoned this  line  of  research, but  as  you'll  see  there  are  some  issues with  it  in  terms  of  its  cost. The  goal  today  is  to  show  you kind  of  the  old  way  I  was  doing  it and  then  this  transition  to  this  newer, cheaper,  potentially  more  global  way. So  what  I  was  doing  before, I'm  a  molecular  biologist  by  training, I  was  using  molecular and  physiological  data  from  corals nearly  exclusively to  make  predictive  models that  would  then  give  me  a  prediction about  the  fate  of  the  longevity, the  lifespan  of  the  coral. And  this  is  what  I  call  the  coral veterinarians  approach, because  I  was  basically  doing what  your  own  physician  would  do. I  would  check  in  on  my  patients every  now  and  then, I  would  take  biopsies, I  would  profile  them using  molecular  stress  tests that  I've  developed  over  the  years, and  then  I  would  attempt to  make  predictions  about whether  or  not  these  corals  would  bleach as  temperatures  became  warmer. I  think  it's  important  to  note  that, the  molecular  components  of  this are  particularly  important because  subcellular  biology is  going  to  reflect  aberrant  behavior or  stress  indicative  behavior before  you  observe  the  changes with  the  naked  eye. I  don't  want  to  wait for  the  corals  to  bleach or  become  diseased or  start  to  slough  off  their  tissues. I  want  to  look  at  sublethal  indications of  stress  that  happened  weeks  or  months before  these  catastrophic  manifestations. Analogously,  this  is  why  we  have our  annual  physicals. You  want  to  know,  for  instance if  your  cholesterol  levels  are  high before  you  have  a  heart  attack, because  if  you  know  you have  high  cholesterol you  might  be  able  to  change  your  diet, take  medication,  change  your  lifestyle. You  might  be  able  to  thwart these  kind  of  more  severe  signs of  health  decline, like  a  cardiac  arrest. It's  the  same  idea  with  coral. We  want  to  look  at  something at  sublethal  scales that  we  can  do  something  proactive. So  if  we  know  a  coral  is  stressed based  on  its  molecular  signatures, we  might  be  able  to  mitigate  something at  the  local  scale. We  may  not  be  able  to  slow  the  rate of  climate  change for  the  sake  of  that  coral, but  we  could  do  something at  the  local  scale that  would  give  it  a  chance. What  I  was  doing  a  few  years  ago at  a  project  I  carried  out at  Noah's  Marine  Lab  in  Miami, AOML , is  I  was  building  thousands of  neural  networks  in  JMP Pro  16, in  which  I  was  taking  laboratory  corals and  field  corals. I  was  taking  data from  their  protein  levels. This  is  a  proteomic  project. Then  we  had  our  field  test  samples where  there  were  these  corals out  in  the  field  in  the  Florida  Keys, where  we  didn't  know if  they  were  going  to  bleach or  become  diseased  or  perish. But  we  would  routinely  take  biopsies and  then  enter  the  proteomic or  the  protein  data into  these  neural network  models  I  made  in  JMP  Pro, and  then  the  models  will spit  out  a  prediction. Then  the  beauty  of  working with  adult  corals is  they  don't  move this  is  actually  also  a  bad  thing  for  them because  it  means  they  can't  just  move  away when  conditions  deteriorate, but  it  means  I  know  where  to  find  them and  then  I  can  go  out  there  and  see, if  the  neural  networks  predictions were  correct. They  actually  worked  really  well. This  was  one  particular  species we  did  this  kind  of  proof  of  concept  with was  called  Orbicella  feveolata. It  looks  like  this. With  these  neural  network  models that  were  trained  with  lab and  field  protein  data, their accuracy  is  about  92%. So  92%,  this  is  about 11  out  of  12. So  90,  95%  of  the  time, I  can  use  the  protein  data  exclusively tell  you  whether  or  not a  coral  colony  will  bleach as  temperatures  get  really  warm. Typically  in  South  Florida, we  see  our  highest  seawater  temperatures in  August  or  September. In  2019,  I  took  some  samples from  different  reefs  throughout  the  Keys. For  instance,  we  have  this  sample  here, 6745  from  Crocker  Reef. We  basically  entered  the  proteomic  data from  that  sample months  in  advance  of  bleaching, so  I  think  sometime  in  the  winter. The  neural  network  from  JMP  Pro  16 is flagged  as  being  bleaching  sensitive. We  went  out  there  as  temperatures reached  32  or  33  C, which  is  very stressful  for  corals, and  we  saw  the  colony  appearing  like  this. This  is  bad  news. It  might  recover  from  this, but  it  probably  hasn't. There  was  another  coral from  a  site  we  know is  typically  more  resilient. I  mean,  this  is  a  huge,  ancient, several  hundred  year  old  Orbicella  colony. Based  on  its  protein  biomarkers input  into  the  neural  network  from  JMP  Pro it  was  deemed  bleaching  resistant. Low  and  behold, we  went  out  there during  the  high  temperature  event that  was  killing  other  corals, it  looks  pretty  good. You  don't  see  any  signs of  hailing  or  bleaching. Similarly,  we  have  another  site that's  also  known  for  having more  resilient  corals  called  the  Rocks. It's  protein  biomarker  signatures were  input  into  the  neural  network  model and  it  was  also  deemed bleaching  resistant, and  this  indeed  appeared  to  be  the  case. This  is  kind  of  a  map of  the  Florida  Keys, our  marine  labs  up  in  Miami, so  not  too  far  away. This  is  something  I've  wanted  to  do for  a  long  time, using  molecular  signatures to  assign  a  level  of  health  or  stress, the  case  may  be because  this  could enable  coral  reef  triage in  which  we  could  prioritize our  conservation  efforts. Maybe  this  example  reef  down  here that  I  gave  an  A  plus, lots  of  resilient  corals  that  don't  seem in  jeopardy  of  bleaching  or  disease. Maybe  we  will  let  that  reef  be  for  now and  focus  our  efforts  on  reef that  was  given  a  grade  of  C. Maybe  the  one  that  we  gave  a  grade  of  F, maybe  it's  too  far  gone it's  not  even  worth  our  efforts to  try  to  save  it. But  I  think  these  kinds  of  triage  data are  going  to  be  important for  prioritizing  management  decisions and  I  was  really  excited about  this  project. But  there's  a  huge  issue, it's  really  expensive  and  it's  slow. That's  one  coral  species in  a  relatively  small  area  of  the  Earth took  three  years  of  my  time, working  80  hours  a  week, quarter  of  a  million  dollars to  basically  build those  neural  network  models. Most  of  the  world's  coral  reefs are  in  the  Indo  Pacific. The  most  beautiful  one are  found  in  this  region that  I've  highlighted  in  the  bottom, known  as  the   Coral Triangle. These  are  areas that  do  not  fund  coral  reef  research to  any  great  extent, they  simply  don't  have the  human  power  or  the  funding. Even  if  they  did,  there  are  hundreds, up  to  six  or  700  coral  species you  can  find  on  these  reefs. I  will  have  passed  away  long  before I  could  do  this  sort  of  analysis with  all  these  corals, even  if  I  had  a  couple  of  helpers. It's  too  expensive  and  it's  too  slow. Is  there  something  else  we  could  do that  would  help  us  to  know  something about  the  resilience,  the  longevity, the  stress  loads  of  these  reefs, without  having  to  do  these  fancy, expensive  molecular  analyses that  require  well  trained  personnel. That's  what  I'm  going  to  be  talking about  the  rest  of  the  time. This  is  what  I  call kind  of  transitioning from  a  coral  veterinarian who's  got  a  handful  of  patients that  I  know  their  health  in  great  detail, to  thinking  of  myself  more as  an  epidemiologist. I'm  trying  to  look  for  more  global  trends in  coral  health that  I  could  use  to  make  models about  their  future  persistence on  the  Earth  as  temperatures  warm. If  you  remember  before, I  only  used  the  physiological  data to  make  a  predictive  model. Now  what  I'm  going  to  do is  I'm  going  to  try  to  integrate three  disparate  data  types into  making  a  predictive  model. We're  going  to  look  at  environmental  data, and  by  that  I  mean  things  like, seawater  quality,  the  type  of  reef, whether  the  reef is  exposed  to  the  elements, the  shape  of  the  reef, those  kinds  of  physical  properties, ecological  data, this  is  essentially  what's living  on  the  reef. The  corals  present, how  much  algae  there  is, how  many  fish  live  on  the  reef. These  are  all  things that  could  be  important  for  reef  health, and  then  also  the  physiological  data from  the  corals  themselves. This  actually  has  never  been  done  before. Most  people  monitor  the  health  of  reefs based  on  only  two  properties, temperature  and  the  abundance  of  coral, which  is  a  good  start. But  as  I'll  show  you, I  think  these  models that  are  more  comprehensive  and  holistic are  going  to  give  you  much higher  predictive  power. So  in  this  case, we're  not  simply  trying to  predict  the  resilience, individual  coral  colonies, we're  looking  at  a  more  habitat or  entire  ecosystem  level  scale, that's  what  we're  trying  to  predict. So  as  a  proof  of  concept  for  this, I've  got  a  nice  data  set. I've  been  playing  with from  the  Solomon  Islands it's  in  the  southeastern part  of  this  Coral  Triangle I  mentioned  that  this  is  where  you  see the  most  biodiverse  reefs, the  reefs  with  the  most  amount  of  coral and  in  my  subjective  opinion, this  is  where  you  see  the  most beautiful  reefs  on  the  planet. And  I  had  an  amazing  opportunity to  dive  all  over  this  region  and  beyond with  Khaled  bin  Sultan  Living Oceans  Foundation. A couple of years back, they carried out what was known  as the Global Reef Expedition, it was the largest coral reef survey ever undertaken So  we  had  a  whole  team  of  scientists monitoring  the  reef from  the  satellite  level,  from  space all  the  way  down  into  the  molecules of  the  organisms  residing  on  these  reefs. So  it's  a  really  rich  data  set. We  have  nice  reef  maps  we've  been developing,  we  have  scuba  surveys, divers  collecting  information about  what's  living  on  the  reefs. We're  looking  at  our  environmental  data, our  seawater  quality this is  obviously  going  to  be  important for  coral  health and  then  my  role, as  you  can  see  in  this  image  here was  in  sampling  corals, just  taking  tiny  little  biopsies to  profile  with  some  molecular  assays I've  developed  over  the  last  20  years. And  we  used  a  different species  from  the  Caribbean. We  use  this  coral  called, Pocillopora acuta. It's  kind  of  intermediately  sensitive, so  it's  kind  of  in  the  middle, it's  kind  of  a  typical  coral but  more  importantly, it's  the  model  coral  for  research. So  this  is  the  coral  that  we  know the  most  about  its  physiology. So  I  would  encourage  you to  either  check  out  my  personal  website, coralreefdagnostics.com, to  really  see  how  incredible  a  location, Solomon  Islands and  other  places  we  visited  were for  people  that  are  more  interested in  the  data. Living  Oceans  Foundation has  this  interactive  map  web  server that's  loaded  with  high  resolution  maps and  all  manner  of  data  we  collected, it's  all  open  access,  it's  a  really  nice  resource and  I  was  really  happy to  have  been  a  part. So  finally,  15  minutes  in, let's  start  doing  something  in  JMP. So  I  mentioned  we  have all  these  different  data  types. We've  got  stuff  living  on  the  bentos, we've  got  the  ecological  data, we've  got  the  coral  health  data. If  I  talked  to  my  marine  biologist  friend, the  first  thing  they're  going  to want  to  know  is, what's  the  coral  cover  on  the  reefs? Ecologists  are  admittedly a  little  bit  too  focused  on  abundance as  you  may  see  later  in  the  talk, depending  on  how  the  models  run. Coral  cover  alone  or  coral  abundance is  not  actually  a  good  predictor of  poor  resilience. A  reef  with  tons  of  coral doesn't  actually  do  any  better than  a  reef  with  a  few  coral. One  of  the  reasons  that  might  be  is, a  reef  that's  been  decimated that  may  only  have  a  few  corals  left. Those  stragglers inherently  adapted  or  acclimatized to  whatever  killed  off  their  brethren so  they  actually  are  more  resilient. The  reef  might  be  gross  and  ugly and  no  tourists  may  want  to  go  there, but  it's  not  actually  a  lower  resilience. So  for  me,  I'm  more  interested in  what's  going  on  with  corals. Most  people  in  the  field  are  more  obsessed with  coral  cover, which  is  still  important,  even  if  it's not  a  good  metric  for  resilience, you  still  want  to  know,  where  do  I find  the  reefs  with  the  most  coral? Maybe  that's  where  you want  to  start   [inaudible 00:16:11] . How  would  you  go  about doing  this  in  JMP  Pro? With  this  demo, I'm  actually  going  to  do  it  in  JMP  Pro  17, a  beta  version  that  I've been  demoing  for  a  few  months but  you  could  just  as  easily  do this  analysis  in  JMP  Pro. Just  to  familiarize  you with  what  the  data  set  looks  like, the  rows,  there's  272, these  are  what  we  call  transects. These  are  swaths  of  the reef  that  we  surveyed. You  can  see  we  looked  at  different  depths. These  are  the  environmental data  I  mentioned  before. We've  got  spatial  data such  as  coordinates, the  type  of  the  reef, seawater  quality. And  you  don't  need  to  worry  too  much about  these  abbreviations, but  these  are  just  the  abbreviations for  the  genera  of  organisms that  were  living  on  the  reef. We  basically  bend  them  into  54  different  coral  bins, six  algae  bins,  barren  substrate, so  this  is  where  nothing  is  living, this  is  going  to  be  important  to  remember. Then  other  invertebrates. These  are  the  main  things that  occupy  the  reef  environment. I've  excluded  the  fish  data  because I  didn't  have  a  nicely  curated  data  set at  the  moment, but  I  definitely  want  to  factor that  in  later. But  let's  look  at  this  live  coral  cover. This  is  all  the  different coral  genera,  sum  together. This  is  a  simple  univariate  analysis. I  want  to  know,  in  the  Solomon  Islands what's  contributing  most to  the  variation  in  coral  cover. And  I  think  a  really  good  way  to  get at  this  really  simply  as  a  first  go, is  to  predict  your  screen. In  this  analysis, the  Y  is  going  to  be  my  live  coral  cover, and  I  want  to  look  at  these  eleven environmental  parameters  that  I  think might  influence  coral  cover in  the  Solomon  Islands. I'm  going  to  put  them  here  as  my  X. Right  off  the  bat,  you  can  see  depth. It's  contributing  to  about  40% of  the  variation  in  the  coral  cover. To  a  marine  biologist or  a  coral  biologist, this  is  not  going  to  be a  surprising  finding, we  know  different  parts  of  the  world, corals  prefer  different  depths. Most  of  the  most  lush  coral  reefs you're  going  to  see are  from  about  2  meters down  to  about  30  meters. Let's  see  where  we  find the  most  corals  in  the  Solomon  Islands. With  this  selected,  I  don't  even have  to  go  back  to  my  columns. I  can  just  go  directly  into  fit  Y  by  X, move  the  live  coral  cover  into  the  Y. Let's  just  do  a  simple  ANOVA. I  actually  have  my  depth  as  bins,  although I've  got  the  continuous  data  somewhere. We  see  from  doing this  analysis  of  variance a  really  strong  effect across  these  four  depth  bins, and  we're  seeing  significantly higher  coral  cover in  the  eight  to  twelve  meter  window. We  can  actually  look at  these  two  Keys  post  hoc  test and  we  see  that  eight  to  12 have  over  50%  coral  coverage. A healthy  reef  can  range  from  20, 40, 50%  is  astonishingly  impressive coral  cover, you're  not  going  to  see  this kind  of  coral  cover  in  much  of  the  world. But  for  now  it's  important  to  know  that, in  the  Solomon  Islands  eight  to  12  meters is  where  you  find  the  most  coral. But  to  me  that  might  be  good for  a  publication, but  that's  not  really  that  interesting. So  if  I've  got  colleagues or  marine  park  managers who  are  working  in  the  Solomon  Islands and  they  say, We  can't  go  out  there and  survey  all  these  reefs. I  mean,  this  is  a  huge  area. What  we  surveyed  was  a  drop  in  the  bucket. We  want  to  make  predictions about  reefs  we  didn't  visit that  might  also  have  a  lot  of  coral, that   might  be  important  for  conservation. High  coral  cover  reefs  also  where  you see  more  fish  and  other  invertebrates. This  might  be  important  for  people that  want  to  bio  prospects,  for  instance. Now  what  I'm  going  to  do is  I'm  going  to  do  something  similar, but  rather  than  just  do  a  simple predictor  screen  of  coral  cover, I'm  going  to  do  a  model  screen, which  I  try  to  build  a  simple predictive  model  of  coral  cover. Let's  go  back  into  JMP  Pro  17. This  was  a  newly- available feature  in  JMP  Pro  16,  I  believe, and  is  arguably  my  favorite feature  in  the  entire  package. What  you're  going  to  see  here, I'm  going  to  set  this  up  exactly the  same  way  I  did  before. Live  coral  cover  is  my  Y, and then we've  got  our  11 environmental  potential  predictors  here. I  had  JMP  make  me ahead  of  time  a  validation  column because  it's  going to  be  important  to  validate  this. You  see  down  here  a  list  of  all  the different  predictive  models  you  can  test. I  want  to  include  all  of  them. I  want  to  look  at  two- way interactions  as  well  as  quadratics. I'm  not  going  to  do k-fold cross validation because  I  have  a  validation  column. Let's  let  this  run. It's  going  to  be  looking at  this  fairly  large  dataset. It's  not  huge. I  think  many  of  you  working  in  industry, this  will  actually  be  a  pretty puny  data set,  but  it's  going  to  test  it with  all  these  different modeling  types  and  it's  going  to  give  me this  nice  summary  output. I  can  see  right  here who  won  this  particular  battle. A  generalized  regression  with  forward selection  using  a  pretty  advanced it's  looking  at  quadratics, it's  looking  at  factorial  combinations. It  considered  a  lot  of  different parameters  in  the  68  samples that  were  flagged  as  validation. We  don't  actually  even  have  to  go into  fit  model  now  and  try  to  rerun  this. We  can  run  it  right out  of  the  model  screen. There's  a  lot  of  data, we're  not  going  to  sift  through all  of  this  because,  to  be  honest, this  was  something I  did  on  the  fly  by  design. I've  never  run  this  particular  model before,  just  because  I  think  that  really emphasizes  how  easy  it  is to  dive  in  and  start  interpreting. There's  other  ways to  get  at  this,  but  I'm  lazy, so  I  want  to  see  what  are the  most  important  predictors that  this  generalized regression  model  found. Depth. We're  not  surprised  to  see  depth  there because  we  just  saw  from the  predictor  screen  that  is  important in  driving  trends in  coral  cover  on  the  Solomon  Islands. Reef  type  times  latitude  interaction, that's  maybe  a  little  bit  harder to  wrap  our  heads  around. But  let's  go  into  the  profiler and  see  what  we  can  learn in  more  detail  about  this. The  profiler  is  here. Let  me  close  some  of  these  things so  we  get  a  little  bit  more  room. Enlarge  this. The  profiler  is  not  showing  me the  reef  type  times  latitude interaction  on  the  same  plot  per  se. But  watch  what  so  if  you just  look  at  reef  type  in  isolation, we have barrier  reefs, fringing  reefs,  patch  reefs, and  these  other  which tend  to  be  these  pinnacles that  come  up  out  of  the  ocean  depths. We  don't  see  much difference  in  coral  cover, but  look  how  the latitude  line  shifts. This  is  emphasizing  that  latitude times  reef  type  interaction. Over  here, we're  seeing  a  very  similar  plot as  when  we  did  the [inaudible 00:23:59]   in  the  fit  Y  by  X. We're  seeing  8- 12  meters as  being  the  sweet  spot for  finding  the  most  coral. But  what  I  think  is  cool is  to  go  one  step  further and  do  this  desirability  analysis. What  I'm  going  to  do, I  think  it's  probably  going  to  remember my  presets,  but  let's  just start  it  from  scratch. I  want  to  tell  JMP to  give  me  the  scenario that  would  result in  the  highest  live  coral  cover, because  this  is  what  a  marine biologist  is  going  to  want  to  know. Right  here,  my  response  goal is  to  maximize  live  coral  cover, so  I  want  to  have  high  desirability values  for  my  high  coral  cover  levels. I  hit  okay,  then  I  go  back  in  here and  I  say  Maximize  Desirability. Unsurprisingly,  they  stay  the  same, 8-12  meters  is  where  we  want  to hone  in  on  our  search. But  this  might  be  more  interesting to  people  that  are embarking  on  a  field  trip. "Hey,  we've  got  a  week  in  the  country, we  want  to  find  rich high  coral  cover  reefs where  should  we  go?" Well,  I  think  you  should  go to  this  farther  flung  islands out  and  farther  away  from  the  equator. As  you'll  see  later,  these  are  the  more remote,  sparsely- populated parts  of  the  country, which  is  probably where  you  expect  to  find  more  coral. A lthough  it's  very  similar to  the  barrier  reef, you'd  probably  want  to  focus  on  these other  types  of  reefs  and  barrier  reefs, if  you  have  the  choice, versus  fringing  reefs  and  patch  reefs. I  think  doing  this  kind  of  analysis could  be  important  for  conservation and  for  planning  field  trips. But  arguably, this  is  a  little  bit  of  an  aside, and  we  have  not  yet  reached  the  actual goal  at  this  time.  That's  coming  up. All  right,  we've  done  these  two  demos, let's  go  back  into  PowerPoint. I  really  wish  I  had  more  time for  this,  but  I  just  know  I  don't and  I  feel  so  bad  for  all  the  developers and  people  that  work  so  hard  on  this, but  I  take  full  advantage of  the  multivariate  platform and  this  is  going  to  be  really  important because  even  though  in  this  past  demos, I  just  looked  at live  coral  cover,   singular  Y. In  reality,  that's  completely  belittling the  complexity  of  these  ecosystems. There's  hundreds  of  things living  on  the  sea  floor. You  really  need  to  do a  multivariate  analysis where  you've  got multiple  Ys,  multiple  Xs. We're  talking  about  things like  principal  components  analysis, multi- dimensional  scaling, doing  these  daily  in  JMP  Pro. Really  like  discriminant  analysis. For  instance,  right  here, this  took  me  1  minute. I  can  quickly  see  that  reefs of   Tinakula  in a  multivariate  scale are  very  different  from  those of  the  rest  of  the  country. If  you  were  to  go to  the  Solomon  Islands,  you  would  know, this  is  because  these  are  reefs growing  at  the  base  of  an  active  volcano. They  look  very  different, they  behave  very  differently. The  multivariate  benthic data  corroborate  this. Similarly,  we  see  this  nice  effect. I've  color- coded  the  reef  sites by  exposure,  whether  they  were  sheltered or  exposed  to  the  waves  or  intermediate. And  you  can  see  pretty  nice  parsing by exposure  in  this  discriminant  analysis. I'm  a  big  fan  of  these  algorithms and  partially  squares  in  particular, and  I've  got  some  hidden  slides and  some  scripts  in  the  data  table that  I'll  make  publicly  available. So  if  you  want  to  get  more  detail about  the  multiv ariate  analysis, you're  definitely  welcome  to  download. But  what  I  want  to  spend the  rest  of  the  talk  on is  the  health of  the  corals  themselves. T hat  was  looking  at  the  bentos, the  reef  as  a  whole. I'm  a  physiologist,  I  want  to  know what's  going  on  in  the  corals, and  I  measured  so  many  different  things in  these  corals  over  the  years that  I  recently  created  what  I  call the   Coral Health Index  for  the  tree. This  is  basically  an  amalgamation of  a  bunch  of  different  response  variables that  I  know  from  my  past research  scale  with  coral  resilience. What  I've  done  is  tried  to  simplify things  to  where  if  your  Coral  Health  Index score  is  zero,  this  means  you're about  to  kick  the  bucket. Five  means  you're  immortal. Trivia  is  [inaudible 00:28:25]  like  corals and  jellyfish  technically  are  immortal if  left  their  own  devices  and  no  stress, they  can  continue  to  regenerate  forever but  of  course,  in  reality,  there's always  going  to  be  some  limitation. They're  going  to  reach  the  surface, the  water  is  going  to  get  too  cold, but  they  can  actually  live  forever. Anyway,  we're  not  going to  see  any  corals  their  fives. This  basically  follows  a  bell  curve so  we're  going  to  find  most  of  our  corals, their  health  indices are  in  this   2-3  window. With  the  help  of  John  Powell,  he  made these  really  nice  customized  pie  graphs. I  adapted  this  from  some... They're  called  these  really great  coral  reef  report  cards. They're  developed  by  an  NGO  called  AGRA. I  said,  I  love  that  visual. I  want  to  adapt  it, but  focus  on  coral  scale. What  this  is  is  each of  these  outer  four  widgets, which  you can  see  the  details  here, the  interior  is  basically  showing you  the  average  of  the  four  widgets. A s  you  can  see, we're  seeing  values  as  low  as  1.5. Corals  and   Nono Lagoon seem  to  be  the  least  resilient. Most  of  the  people  in  the  Solomon  Islands live  close  to  the  capital  of   Honiara. We  probably  would  expect this  kind  of  west- east  gradient. We  tend  to  see  higher Coral Health Index  values over  here  in  the  provinces and  the  Reef Islands  and  Monte  Carlo. This  is  not  surprising. This  map  was  made  with  Graph  Builder. Let's  see. I  think  I  have  enough  time. I'm  not  going  to  try to  reproduce  this  map because  I  think  this  map, even  though  I  love  it,  I  think  it's still  too  complicated  for  a  manager. They  don't  want  to  see all  these  pie  widgets. They  want  a  single  number. I  want  to  show  you a  really  cool  trick. There's  great  webinars  about  how  to  plot data  onto  a  map  on  the  JMP  website but  I'm  going  to  do something  that  was  new  to  me and  it  might  actually be  useful  to  a  lot  of  you. It's  taking  it  one  step  further. We're  going  to  do  it  in  JMP  Pro  16 because  I  want  to  be  able to  publish  this  online. That's  not  yet  a  feature  in  JMP Pro  17 because  it's  still  the  beta  version. I  want  to  plot  the  Coral Health  Index  on  a  map. This  is  going  to  be shockingly  easy  in  a Graph  Builder. Just  going  to  drag  my  latitude and  longitude  over  JMP  nose to  treat  these  as  such. I  don't  want  this  line. Right  now  it's  just  showing  me essentially  the  location  of  my  dive  sites. I  want  to  add  a  background  map. This  is  the  detailed  Earth. Let's  make  it  bigger. We  see  the  Solomon  Islands  now. Getting  closer. I  want  to  overlay  my  Coral H ealth  Index its color. Still  not  there  yet. I  want  to  convert  this  to  a  heat  map, but  I  want  a  finer  scale  of  resolution and  this  is  the  trick  that  I  learned that  I  think  is  going  to  be  really  useful because  I  was  actually  doing  this  ArcGIS before,  which  is  a  PC- only  program. I'm  on  a  Mac,  cost  thousands  of  dollars. I  said,  why  can't  I  do  this  in  JMP? And  it  turns  out  that  I  can. What  I  want  to  do  is  I  want  to  force a  smaller  grid  onto  this  map because  I  want these  cells  to  be  much  smaller. I  want  them  to  be  0.5 by 0.5  degrees. As  long  as  you  turn  the  grid  lines  on, it's  going  to  give  you  an  average of  the  Coral  Health  Index  in  each  of  these 0.5 by 0.5  decimal  degree  boxes. That's  what  I  want. I  actually  prefer  to  use  a  green  to  red, and  the  default  is  to  have  red  be  high. If  you  remember  the  image of  the  Coral  Health  Index, I  actually  have  green  as  the  high  value, so  I'm  going  to  switch  it  as  such. I  actually  want  it to  span  the  entire  range, even  though  I  don't have  many  zeros  or  fives. I'm  going  to  do  this,  drag  this  here, and  now  I  think  it's looking  good,  but  it's  still  too  busy. I'm  going  to  turn the  grid  lines  back  off. It  will  keep  the  cell  shapes  that  I  want. Voila, in  my  opinion,  this  is  exactly how  I  want  to  see  these Coral  Health  data  portrayed. But  I'm  going  to  take  it yet  another  step  further. I'm  going  to  say,  "Hey,  look, my  friends  that  have  never  seen these  data,  they  may  want  to  play  around with  the  different  environmental variables  and  see  how  these  change depending  on  the  type  of  reef, the  temperature  and  whatnot. I'm  going  to  add this  local  data  filter. Going  to  give  this  a  name. Still  not  done  yet,  though. I  want  to  actually  share this  with  my  friends. What  I'm  going  to  do, I'm going to  publish  to  JMP  Public. This  may  take  a  minute because  I  may  not  be  logged  in, but let's  just  see. I'm  going  to  create  a  new  post. I  want  to  share  it  with  everyone. I  can  add  an  image  if  I  want. I'm  just  going  to  leave  all  these  defaults as  is  for  now,  and  we'll  publish  it. It's  going  to  take  a  few  seconds. Hopefully  it  works  well. It's  going  to  migrate me  over  to  the  website, and  I'll  show  you,  as  it's  working, what  you  can  then  do  once  it  publishes. All  right,  here  we  go. Let's  go  ahead and  check  it  out  online  first. This  is  what  I  can  share with  my  friends  so  they  can  say, "Hey,  look,  I  only  care  about  reefs over ...  I'm  only  going  to  be  able  to  go to  the  western  part of  the  country  for  my  field  trip. I  don't  care  about those  reefs  in  the  east. So  let  me  just  turn  them  off. Then  it's  going  to  refresh. Then  you  can  hone  in  your  search  here. You  could  look at  the  different  reef  types. Another  thing  you  can  do, which  I  do  all  the  time, is  you  can  actually  take  the  embed  code or  the  embed  card,  copy  it, and  put  it  in  your  personal  website. Because  of  the  way  my  website's  set  up, I  have  so  much  padding  here,  it's  not actually  going  to  show  the  map  very  well. It's  better  for  me to  simply  do  what  they  call  a  card where  I've  got  a  schematic  of  it  here, and  then  if  people  want  more  details, they  can  click  on  it and  then  go  back  to  JMP  public. This  is  a  super  cool  feature that  I  think  people  with  access  to  JMP should  be  taking  advantage  of. This  is  just  showing  you  how  you  can basically  even  embed  it  within  your website,  within  a  presentation. But  I  don't  think  we  need  to  go into  that. A gain,  that's  another  aside, we're  finally  getting  to  the  good  stuff. This  is  what  I've  been  wanting  to  do. This  is  the  goal  of  this  whole  analysis. So we're  almost  at  the  finish  line. This  is  using  the  JMP  Pro  suite  to try  to  find  the  climate- resilient  corals that  we  haven't  stumbled  upon  yet. We  usually  find  climate  resilient  corals either  through  experiments, through  surveys. We've  lost  this  time  window. We  don't  have  time to  do  all  these  experiments, we  don't  have  the  money. Coral  reefs  are  in  bad  shape. We  need  a  way  to  speed  up the  search  for  the  resilient  corals that  we  may  want  to  use  for  restoration. The  ones  we may  want  to  protect,  buy  or  preserve. What  we're  going  to  do is  we're  going  to  make  a  predictive  model of  the  Coral  Health  Index  we factor  in  all the  different  survey  data  we've  collected. It  sounds  daunting, but  I  think  you'll  see  this  is  actually something  that  could  be done  relatively  quickly. In  this  case,  I'm  going to  go  to  another  data  table that's  got  my  coral  physiological  data and  that  is  somewhere  here. This  is  110  rows. Instead  of  dive  sites  now, these  are  coral  samples. This  is  the  ecological  data. The  Coral  Health  Index  is  here. We're  going  to  go  over to  my  beloved  model  screen  again. I  probably  could  use  recall, but  just  to  be  safe,  we're  going to  take  50  benthic  categories. These  are  the  bins of  things  that  live  on  the  reef. Move  them  here. World  Health  Index  is what  we  want  to  predict. We're  going  to  take  this validation  column  here. We're  going  to  use the  same  settings as  last  time. It  looks  a  little  bit  different because  I'm  now  doing  this  in  JMP  Pro, but  it's  working  very  similarly. I  want  to  do  the  additional methods with  quadratics. I  think  this  will  run  fairly  quickly, and  indeed  it  did. In  this  case,  a  neural  network that  was  boosted  rose  to  the  top. Validation  R  squared  of  about 0. 49, it's  not  bad,  let's  run  it. It's  going  to  be  different  because of  the  way  neural  networks  work. They  can  vary  actually  quite  dramatically from  run  to  run,  especially  when  you  have relatively  smaller  data  sizes  like  mine. But  we're  still  in  the  ballpark, 0.52. But  if  you  know  about  neural  networks, you  know,  there's  tons  of  different modeling  parameters that  you  can  tinker  and  tweak. That's  why  this  really  brilliant ad- in  from  Dietrich  Schmidt has  been  an  absolute game  changer  for  my  research. He  created  a  nice  GUI  that's  going to  let  me  look  at  potentially  thousands of  different  factorial combinations  of  modeling  parameters. But  today,  for  the  interest of  time,  I'm  just  going  to  do  four. I  input  the  model  exactly like  I  did  in  Model  Screen, but  now  you'll  see  these  options  that  are specific  to  the  neural  network  platform. I  want  to  just  look  at, you  know  what,  I'm  going  to  explain  this while  it's  running  because it  might  take  a  second  to  run and  we're  running  low  on  time. I'm  going  to  have  to  build four  models  for  me. I  think  everything's  in  there  like  I  want. All right,  now  let  me  explain this  while  it's  running. I  think  I  input  something  wrong. Apologize  for  that. Let's  see, let  me  restart  this  input  this  year. This  is  all  correct. I  want  these  to  vary. I  think  maybe  this  was  too  low. Let's  try  it  again. It's  basically  going to  start  running  these  models. It's  going  to  use  the  JMP  default. I've  heard,  basically  he  leveraged the  power  of  design  of  experiments to  basically  have the  number  of  sigmoidal,  linear, and  radial  activation nodes  span   0-4. We  can  have  up  to  20  boost. I'm  allowing  the  covariance  to  either  be transformed  or  untransformed, either  with  or  without  a  robust  fit. Because  I  want  to  go  with the  minimum  number  of  potential  factors, I  want  to  use  a  weight  decay  algorithm. It  gives  me  this  nice  output. Let's  see  if  the  R squared of  the  validation  models did  any better  than  the  JMP  default. Most  of  the  time  they  do. In  this  case, it's  not  way  too  much  different. About   0.55 We  can  run  it, it  will  ask  me  to  save  the  output and  in  the  meantime  it's  going to  run  this  model which  may  end  up  actually being  very  similar  to  the  JMP  default  one. But  then  once  it  spits  it  out, we're  actually  going  to,  whatever gives  us,  we're  going  to  go  with  it. I'm  going  to  show  you, assuming  it  was  our  square or  another  modeling  benchmark, that  you're  happy  with what  you could  then  do  with  the  analysis. That's  going  to  be  going  back into  the  Desirability  analysis. If  you  just  bear  with  me  another few  seconds,  it  should  finish. What  we're  going  to  do  is we're  basically  going  to  go into  the  Profiler, and  I'm  going  to  tell  the  Profiler, hey,  I  want  to  find  the  conditions, the  environmental  conditions, and  the  benthic  conditions  that  lead to  the  highest  Coral  Health  Index  scores. Because  that's  where  I  might  want  to  focus my  efforts  for  conservation, for  trying  to  find  Brazilian  corals. You  can  see  in  this  case  we  got a  fair  bit  higher  our  squared. Let's  go  into  the  Profiler. It's  probably  going  to  remember my  settings,  just  safety, let's  go,  set  Desirability. I  want  to  maximize  the  Coral  Health Index,  so  it  remembered  it. Now  I  want  to  maximize  Desirability. It's  going  to  tell  me  the  conditions in  which  I'm  going  to  find  the  corals with  the  highest  Coral  Health  Index  scores We  don't  have  time  to  go  into  all  these, but  this  is  going  to  be  super  useful for  people  that  are embarking  on  field  trips,  and  to  managers. They're  going  to  say, look,  if  I  want  to  find the  most  resilient  corals in  the  Solomon  Islands, I'm  best  sticking  to  intermediately exposed  fringing  reefs,  within  the  lagoon, submerged  reef  types. Some  of  these  may  not  make  as  much  sense, the  time  of  day,  temperature, you  may  not  have  that  luxury. Things  like  depth,  you  want  to  focus on  shallow  corals,  in  this  example. These  are  going  to  be  super  useful  data that  are  going  to  allow  us to  find  resilient  corals  on  a  much faster  time  scale. The  important  thing  to  note  here  is one  thing  to  note  is  these  aren't necessarily  the  conditions in  which  you  find  the  most  corals, because,  remember, more  is  not  necessarily  healthier, but  these  are  things that  are  cheap  to  measure. Latitude,  longitude, you  just  need  a  smartphone. Temperature,  you  need  a  thermometer. You  don't  need  to  do  these  fancy, expensive  molecular  analysis by  PhD  scientists. You  can  train  a  high  school  student  to  go out  there  and  collect  these  data that  are  going  to  be  really informative  for  coral  health. My  idea  is  I  have  all  these  similar data  sets  from  all  over  the  world. I  can  start  building  what  I'm calling  this  Coral  Health  Atlas. I  can  use  Graph  Builder to  make  these  nice  plots where  I'm  showing  people  where resilient  corals  are  likely  to  be  found. This  is  going  to  help  us, in  concert  with  these  temperature  based models  from  Noah, envision  what  the  future  reefs are  going  to  look  like, where  we're  going  to  find  corals in  the  future, which  corals  are  going  to  live  there. Since  we're  running  out  of  time, don't  worry,  I'm  not  going to  read  off  this  list. But  this  was  not  completely done  in  isolation. I  did  obviously  benefited  greatly from  the  JMP  Pro  software  itself, but  a  lot  of  these  people  behind the  scenes  lended  their  support. Some  of  you  won't  be  surprised to  see  your  name  there, some  of  you  might  be  surprised, and  that's  it  was  probably  because you  gave  a  webinar  or  you  wrote  a  blog or  something  that  was really  inspiring  to  me. I  hope  you're  happy to  see  your  name  up  there. I  really  want  to  give  a  shout  out to   Diedrich  Schmidt  if  he's  on, for  developing that  really  excellent  auto- tuning  add- in that's  greatly  benefited  my  research. I  also  want  to  give  a  shout  out to  John  Powell, not  just  for  helping  me make  those  figures, but  because  he  was  the  person that  really  convinced  me that  JMP  is  more  than just  a  software  package. You've  got  this  network  of  really  talented individuals  behind  the  scenes that  are  willing and  able  to  help  you  along  the  way. I  really  appreciate  John and  everybody  else's  support. So  with  that,  I'll  end  my  talk and  I'm  probably  over  here furiously  answering  questions. If  we  are  to  any  time  left, I'm  happy  to  field  more. Alright,  thanks  a  lot.
Attribute gauge analysis is typically applied to compare agreement or lack thereof between two rating approaches to a problem. For example, two inspectors may have differences of opinion as to whether a part is conforming (Pass) or non-conforming (Fail) based on consideration of specific quality indicators in individual parts. How do we quantitatively measure the degree of agreement? In more complicated situations, attribute gauge analysis may be applied to compare agreement among multiple raters for multiple responses, including agreement to a standard. We describe a personal consulting case involving the use of drones flying in warehouses to read labels of stacked inventory shelves in place of manual efforts by humans. We illustrate the application of JMP’s attribute gauge analysis platform to provide graphical and quantitative assessments such as the Kappa statistic and effectiveness measures to analyze such data.     Hi,  I'm  Dave  Trindade, founder  and  owner  of  Stat-Tech, a  consulting  firm  specializing in  the  use  of  JMP  software for  solving  industrial  problems. Today  I'm  going  to  to  talk about  a  consulting  project that  I  worked  on  over  the  last  year with  a  robotics  company. We're  going  to  be  talking about  Drones F lying  in  Warehouses: An  Application of  Attribute  Gauge  Analysis. Attribute  gauge  analysis  is  typically applied  to  compare  agreement, or  lack  thereof  between  two or  more  rating  approaches  to  a  problem. For  example,  two  inspectors may  have  differences  of  opinion as  to  whether  a  part is  conforming,  call  it  pass or  nonconforming  call  it  fail. Based  on  consideration  of  specific quality  indicators  for  individual  parts, how  do  we  quantitatively measure  the  degree  of  agreement? Let's  actually  start  off  with  an  example. Let's  say  we  have  two  inspectors, inspector  1  and  Inspector  2, and  they  are  presented with  a  list  of  100  parts, the  critical  characteristics on  these  100  parts and  asked  to  determine  whether  each  part should  be  classified  as  a  pass  or  a  fail. I've  summarized  the  results on  the  table  to  the  right,  partial  table. You  see  there  are  100  rows  in  the  table. All  variables  are  nominal, so  the  first  column is  the  part   1-100. Then  the  second  column is  the  rating  by  Inspector  1, whether  it's  a  pass  or  fail, and  then  the  second  inspector where also  second  column has  a  pass or  fail  rating. Now,  if  we  were  not  familiar  with  JMP's gauge  attribute  analysis  program, first  step  that  we  could  take  could  be to  look  at  the  two classification  distributions and  use dynamic  linking  to  compare. What  I  will  do  is  show  you  the  slides and  then  I  will  go  and demonstrate  the  results  on  the  slides after  I've  gone  through  the  slides, through  a  certain  amount  of  material. For  example,  if  we  click,  say, let's  generate  distributions of  the  two  columns,  the  fail, Inspector 1  and   Inspector 2, then  we  can  click,  say,  for  example, on  the  fail  column  for   Inspector 1, you  see  mostly  matches  for  Inspector  2, but  there  are  a  few disagreements  over  here. There  are  some  passes  where   Inspector 1 classified  it  as  a  fail, but  Inspector 2  classified  it  as  a  pass. Now,  when  you  do  click on  that  JMP  highlights, the  actual  rows  that  correspond  to  this. You  can  see  over  here,  for  example, row  four,   Inspector 1  called  it  a  fail and   Inspector 2  called  it  a  pass. Generally  though, they're  mostly  in  agreement, fail, failed, fail, fail  and  so  forth. We  could  also  do  that by  clicking  on   Inspector 2  fail, and  then  seeing how  it  compares  to   Inspector 1. We  see  that  there  are  actually five  instances  of  disagreement between  the  two  inspectors. When  the   Inspector 1 classifies it  as  a  fail, there's  five  that I nspector 2  classified  it  as  a  pass. Now  we  can  also  visualize the  inspector  comparison  data. To  do  that,  we  can  use  graph builder with  tabulate  to  view  agree  and  disagree counts  between  the  two  inspectors. Here's  one  way  of  visualizing  it. We  can  put   Inspector 1 using  graph builder  on  the  horizontal  axis and   Inspector 2  on  the  vertical  axis. Then  we  see  now  with  color  coding, whether  it's  agreement  or  disagree, agree  is  green  and  then  the  rows that  are  disagree color  code  is  red  markers. Now  we  can  see  the  actual  distribution. Then  we  can  use  tabulate  to  actually total  the  numbers  that  are  involved. We  can  see  over  here for   Inspector 1  and   Inspector 2. Inspector 1  and   Inspector 2  agreed on  the  fail  categorization for  42  of  the  parts and  they  agreed  on  44  of  the  pass  parts. They  disagreed  on  nine  instances  over  here where   Inspector 2  called  it  a  fail and   Inspector 1  called  it  a  pass. And   Inspector 2  called  a  pass, where   Inspector 1  called  it  a  fail. So  those  total  14. The  inspectors  agreed on  a  classification  for  86%  of  the  parts and  they  disagreed  on  14%. From  there  now  we  can  go and  do  attribute  gauge  analysis and  see  what  JMP  can  do  for  this  analysis. To  go  to  attribute  gauge  analysis, we're  going  to  go  to  quality and  analyze  quality  and  process variability  attribute  gauge  chart. Then  we  cast  the  rows. And  here  I've  shown  the  inspectors are  listed  under the  Y  response,  both  of  them. Then  the  column  for  the  part is  listed  as  the  grouping. These  are  required  entries. We  notice  the  chart  type is  attribute,  we  click  okay. And  now  JMP  provides  us  with  this attribute  gauge  analysis  report. The  first  chart  that's  shown  over  here is  the  percent  agreement  for  each  part. So  we  have  100  parts  on  the horizontal  rows  axis  over  here, and  when  there's  100%, it  means  the  two  inspectors  agreed. When  there's  0%,  it  means  to  disagree. The  left  chart  shows  the  overall percent  agreement  86%  by  inspector. Since  the  comparison is  between  only  two  inspectors, both  are  going  to  have the  same  86%  agreement  value. The  agreement  report  now  includes a  numerical  summary of  the  overall  86%  agreement. You  can  see  86  matches out  of  100  inspected and  the  individual  ones  are  going  to  be the  same  since  it's  only  one  issue of  whether  it  was  a  pass or fail  for  a  given  part. And  95%  confidence  limits  are provided  for  the  two  results, both  for  the  inspector and  for  the  agreement. Now  the  agreement  comparisons report  includes  a  new  statistic that  perhaps  many  people are  not  familiar  with, called  the  Kappa  Statistic. It's   devised  by  Cohen, that's  given  in  the  reference. The  Cohen  Kappa  Statistic  index, which  in  this  case  is   0.7203. Is  designed  to  correct for  agreement  by  chance  alone. This  was  very  interesting  to  me when  I  first  read  about  this. Like, "What  do  we  mean by  agreement  by  chance  alone?" Let's  go  into  a  little  bit of  an  explanation  of  agreement  by  chance and  how  can  we  estimate  it. Let's  consider  two  raters,  R1  and  R 2. We'll  assume  totally  random  choices for  each  rater  for  each sample, example,  for  each  part. We  further  assume  that  the  probability a  rater  selects  either  choice  pass or  fail  over  the  other  is  50%. So  it's  50/50. Hundred samples  or  trials  are  differently categorized  by  pass/ fail  for  each  rater, similar  to  flipping a  coin  for  each  choice. We  just  visualize  two  inspectors and  they're  each  flipping  a  coin, and  they're  trying  to  match how  many  get  head, heads or  tail,  tail, or  head, tail or tail, head. What's  the  expected  fraction of  agreements  by  chance? Well,  it's  a  simple problem  in  probability. Similar  to  tossing  two  coins, there  are  only  four  possible and  equally  likely  chance  outcomes  between the  two  inspectors  for  each  part. Rater  1  could  call  it  a  fail, and  Rated  2  could  call  it  a  fail. They  would  agree. Rater 1  can  call  it  a  pass, and  rater  2  would  call  it  a  pass and  there'd  be  agreement  there. The  disagreement  would  be  when  they don't  agree  on  whether  it's  a  pass,  fail. Now,  these  are  four equally  likely  chances. Well,  two  of  them  are  to  agree and  two  of  them  are disagree. Therefore,  the  probability  of  agreement by  chance  alone  is  two  out  of  4 it's 50%. It's a  simple  probability  problem. Now,  how  do  we  calculate the   Kappa S tatistic  on  the  other  hand? As  I  say,  it's  meant  to  correct for  this  expected  probability of  agreement  by  chance  alone. The  simple  formula  that  you  can  put into  JMP  for  the   Kappa Statistic is the  percent  agreement,  in  this  case, would  be  86%  minus  the  expected  by  chance from  the  data,  which  we  know is  going  to  be  around  50%. How  do  we  actually  use  the  data to  estimate  the  expected  chance from  the  data  itself,  the  expected agreement  by  chance  from  the  data? Well,  the  estimation of  Cohen Kappa Statistic is  shown  below,  and  this  is basically  how  it's  done. This  is  the  tabulated  value. This  is  what  we  saw  earlier. Agreement  on  fail/ fail for   Inspector 1  and   42  instances. Agreement  between Inspector 1  and   Inspector 2 on  the  past criteria  for  44  instances. You  had  those  up  and  that's  86%. So  I  show  you  over  here  in  the  Excel format  that  we  added,  42  plus  44 we  got  divided  by  100. Then  disagree  is  one  minus [inaudible 00:08:58] or just  the  five  plus  nine  divided  by  100. Now,  to  calculate  the  agreement  by  chance, the  Cohen   Kappa Statistic  is  the  sum of  the  products  of  the  marginal fractions  for  each  pass/ fail  type. Well,  here  are  the  marginal  fractions. F or  fail,  the  marginal  fractions  are  51 divided  by  100 and 47  divided  by  100. So  we  form  a  product  of  those  two. Fifty-one  divided  by  100 times  47  divided  by  100. Plus,  now  we  enter  the  fail. The other criteria is the fail. This is 49...  I  should  have  said that  was  the  first  disagreement. Fifty-one and  47  for  the  fail  criteria. For  the  pass  criteria  now, we  go  49  out  of  100, and 53  out  of  100, and  we  multiply  those  two together  and  add  them. That  gives  us  a  number when  we  calculate  it  out  to  49. 94, which  is  very  close  obviously,  to   0.5. Then  the   Kappa Statistic  is the  percentage  agreement minus  the  expected  by  chance,  divided by  one  minus  the  expected  by  chance. That  comes  out  to  be 0.7023  in  this  case. The  Kappa. Here  are  some  guidelines for  interpreting  Kappa. Go  back  up  again. I  want  to  show  you that  the  Kappa  was  72.03. The  guidelines  for  interpreting Kappa  are  if  Kappa  is  greater than  0 .75,  it's  excellent. If  it's  between  0.4 and 0. 75,  it's  good. And  if  it's  less  than  0 .40 it's  called  marginal   or poor. There's  some  finite  divide  lines  on  this. An excellent,  totally  100%  agreement, would  make  a  Kappa  of  one, and  we  could  actually  get  a  negative Kappa which  would  be  agreement that's  less  than  by  chance  alone. The   books  for these are given  in  the  reference. All  right,  let  me  just  stop  here and  go  into  JMP  to   what  we've  done  so  far. We  can  see  that. The  data  file  that  we've  got over  here  is  the  inspectors that  I  talked  about,  and  I  said  we could  take  a  look  at  the  distribution. Obviously,  if  you're  familiar  with  JMP, you  just  go  analyze  distribution and  put  in  the  inspectors  over  here. Then  we  can  put  this  next  to  the  rows, and  if  we  click  fail, we  see  the  comparison  between  the  fails for   Inspector 1  versus  the  fail for   Inspector 2  and  some  disagreements. Then  you  can  see  which  areas the  rows that  disagree it's  example  four. Similarly,  if  we  calculate  over  there, we  can  compare   Inspector 2  to   Inspector 1. Get  a  different set  of  numbers  over  here. The  other  option  that  I  mentioned for  visualization  is  to  do  a  graph builder and  in  graph builder,  we  can  put Inspector 1  on  the  horizontal  axis and   Inspector 2 on  the  vertical  axis. Now  we  have  a  comparison, and  now  we  can  actually  see  the  numbers of  times,  for  example, how  many  times  did   Inspector 1 rate  something  as  a  pass  where Inspector 2  rated  as  a  fail. If  we  go  back  to  the  data  table, we  see  that  there  are  nine  instances  of that,  and  they're  highlighted  in  a  row. Th is  is  a  very  quick  way of  seeing  what  the  numbers  are for  the  different categories  that  we're  working  with. Let's  say,  click  done  over  here. The  other  thing  we  can  use as  I  mentioned,  is  the  tabulate  feature. We  can  go  to  tabulate, and  we  can  put   Inspector 1  on  the  top and   Inspector 2  on  the bottom  over  here  this  way. Then  we  can  add  in  another  row  for  the marginal  totals  down  here  and  so  forth. Now  we  have  the  summary that  we  can  put  next  to  the  graph builder and  see  what  the  actual tabulations  are  that  we've  got  for  that. That's  something  that  we  would  do perhaps  if  we  were  not familiar  with  JMPs  program. But  let's  use  JMP  now  to  do  the  analysis. We're  going  to  come  over  here, we're  going  to  go  to  Quality  and  Process. We're  going  to  go to  variability  attribute  gauge  chart. We're  going  to  put in  the  inspectors  over  here. We're  going  to  put  in  the  part. Over  here  you  notice it's  required,  attribute. Okay,  click  okay, and  now  we  have  our  output. This  output  shows  again for  each  part  the  agreement. Zero  means  I  disagree. 100%  means  I  disagree. This  shows  the  rating between  the  two  inspectors,  86%. This  is  the  summary,  86%. Here's  our  Kappa  index  over  here, and  we  have  the  agreement  within  rates. This  is  kind  of  redundant  in  the  sense because  we're  only  looking at  one  binary  comparison. Then  further  on  down  here, we  can  do  the  agreement  by  categories. We  can  actually  calculate  how  much is  the  agreement  by  the  fail, individually,  or  by  the  pass  individually. Okay,  so  that's  how  we  would do  it  for  a  simple  comparison. But  what  if  we  now  consider that  the  actual  diagnosis of  the  part  was  known  or  confirmed? Let's  go  back  into  our PowerPoint's  presentation. I introduce a  standard. Okay,  this  is  a  measure of  what  we  call  effectiveness. We're  going  to  assume  the  correct  part, classification  was  either  known or  subsequently  confirmed. So  this  is  the  true  correct  diagnosis that  should  have  been  done  on  that  part. How  accurate  are  the  inspectors  choices? In  other  words,  how  can  we  compare how  accurate  each  inspector  was to  determine  matching  up with  the  true  standard? We  set  up  that  column  in  JMP, and  now  we  can  go  through  the  process that  we  said  earlier of  looking  at  a  distribution. For  example, if  we  add  in  the  distribution, this  time  we  include  a  standard, now  we  can  click  on  pass, and  we  can  see  the  agreements between   Inspector 1  and   Inspector 2 on  pass  classifications. You  can  see  both  of  them  had  some misclassifications,  some  wrong  diagnosis. We  can  click  on  fail  and  do  the  same thing  for  the  other  category. Then  JMP  will  highlight those  in  the  data  table. To  do  the  attribute  gauge analysis  in  JMP  using  the  standard, all  we  have  to   do now  is  enter  standard into  the  dialog  box  as  we  did  before. This  is  the  additional  column. The  big  difference  now, and  this  is  not  highlighted  in  the  manual, is  that  under  attribute  gauge, we  can  now  get  a  chart that  applies  specifically to  the  effectiveness. What  we're  going  to  do  is  unclick  these agreement  points  on  the  chart, and  click  instead  the  effectiveness points  under  attribute  gauge. When  we  do  that,  we  get  another chart  that  measures  the  effectiveness. And  this  effectiveness has  three  ratings  for  it. This  gauge  attribute  chart now  shows  the  percent  agreement, 0, 50 %,  or  100%  of  the  two inspectors  to  the  standard  for  each  part. A  0%  implies  both  inspectors misdiagnosed  their  problem. Seven  events  of  that  occurred. A  50%  signifies  one  of  the  inspectors got  their  correct  classification, and  obviously, 100%  means  they  both  got  it  right. Then  the  left  chart  shows the  overall  percent  agreement to  the  standard  for  each  inspector. We  noticed  that  there  was  some  slight difference  between  the  two  inspectors. We  now  generate  the  effectiveness  report that  incorporates the  pass/ fail  comparisons to  the  standard  for  each  inspector. You  can  see   Inspector 1 got  42  of  the  fails  correct. He  got  43  of  the  passes  correct, but  he  got 10incorrect  of  the  fails, call  them  passes. And  he  got  five  of  the  passes  incorrect. I  find  this  notation a  little  bit  confusing. I  put  it  down  at  the  bottom. When  we  say  incorrect  fail, that  means  a  fail  was incorrectly  classified  as  a  pass. When  we  say  incorrect  pass, it  means  a  pass  was  incorrectly classified  as  a  fail. You  can  get  your  mind  going  in  crazy  ways just  trying  to  always interpret  what's  in  there. What  I've  done  is  I  created my  own  chart  to  simplify  things. The  misclassifications  shows  over  here that  17  actual  fail  parts  were  classified as  pass,  and  eleven  pass parts  classified  as  fail. So  that's  in  the  JMP  output. But  what  I  said  over  here, and  I'd  love  to  see  JMP  include something  similar  to  this  as a  clear  explanation  of  what's  going  on. Inspector 1,   Inspector 2, the  standard  is  pass, and  then  the  correct classification as  pass  is  43  and  42. The  misclassified as  fails  are five  and  six. Then  over  here, when  the  standard  is  the  fail, the  correct  choices  by  Inspector 1 and Inspector 2 is  42  and  45. And  the  misclassified, so  when  it  was  fail, 10  of  them  were  classified as pass and seven  over  here. Now,  understand, a  fail  part  classified  as  pass is  a  part  that's a  defective  part  going  up. That's  called  a  miss. On  the  other  hand, a  fail  part  that  is t he  actual  pass  part, we  call  that  basically  produces  risk, that's  a   false  alarm. And  JMP  uses  those  terms, false  alarm  and  miss, later  on  I'll  explain  that. I  like  this  chart  because  it  seems to  make  a  clear  explanation of  what's  going  on. Using  graph builder, again  we  can  view  the  classifications by  each  inspector  as  shown  over  here. A gain,  you  can  highlight specific  issues  there. JMP  also  allows  you to  define  a  conformance. In  other  words,  we  said  non- conforming is  a  fail  and  conforming  is  a  pass. That  way  we  can  take  a  look at  the  rate  of  false  alarms  and  misses in  the  data  itself  as determined  by  the  inspectors. We  can  see  that  the  probability of  false  alarms  for   Inspector 1  was  0.1 and   Inspector 2  is 0.125. The  probability  of  misses, okay, this  means  that we  let  it  defect  the  park,  go  out, was  higher  for  Inspector 1 and   Inspector 2. I'll  show  how  these  calculations  are  done. To  emphasize  this,  a  false  alarm occurs  when  the  part is  incorrectly  classified  as  a  fail, when  it  is  a  pass. That's  called  a  false  positive. The  false  alarm,  the  number  of  parts that  have  been incorrectly  judged  to  be  fails, divided  by  the  total  number  of  parts that  are  judged  to  be  passes. Now,  that's  where  this calculation  is  done  over  here. If  I  go  up  here, here's  the  pass is  misclassified  as  fail, so  if  I  take  five  out  of  48, I  end  up  with  that  number  0.1042. Now  the  next  thing  is  a  miss. That  part  is  incorrectly classified  as  a  pass. When  it  actually  is  a  fail. That's  a  false  negative. In  this  case, we're  sending  out  a  defective  part. The  number  of  parts  that  have  been incorrectly  judged  to  be  passes divided  by  the  total of  parts  that  are  judged  to  be  fails is  10  out  of  42  plus  10 is  0.193. A gain,  going  back  to  this  table, these  are  the  parts  that  are  fail,  but  10 of  them  were  misclassified  as  a  pass. So  the  parts  that  should  have  been classified  as  they  fail  is  52. Ten  divided  by  52  gives you  that  number  of 0.1923. So  I  like  that  table  is  easier to  interpret  over  here. The  final  thing  about the  conformance  report is  you  can  change your  conformance  category, or you  can  switch  conform  to  non-conform. You  can  calculate  also  an  escape  rate. And  that  is  the  rate  that  the  probability that  a  non-conforming  part  is actually  produced  and  not  detected. To  do  that,  we  have  to  provide some  estimate  or  probability of  non-conformance  to  the  JMP  program. I  put  in  like  10%,  let's  say  10% of  the  time  we  produce  a  defective  part. Given  that  we've  produced a  defective  part,  what's  the  probability that  's  going  to  be  a  miss and  then  escape? And  that's  the  escape  rate. That's  the  multiplication  of  the  two. our process  will  produce  times the  probability  of  a  missed  process produces  fail  part  times the  probability  of  a  miss. Now  let's  go  into  JMP  again and  we're  going  to  use the  inspection  with  the  standard. I'm  quickly  going  to  go  through  this. We  do  analyze  distribution, again,  put  into  three  over  here, and  now  we  can  click on  a  standard  down  here and  then  we  can  highlight, compare  the  Inspectors 01 and Inspector 2. Another  way  to  visualize  it  is  to  use graph  builder  as  we've  done  before. Then  we  can  put  Inspector 1  over  here. Let  me  do  it  this  way. And   Inspector 2  can  be  on  this  side  now. Then  we  can  enter the  standard  over  here  on  this  side. And  now  we  have  a  way  of  clicking and  seeing  what  the  categories were  relative  to  the  standard. That's  a  very  nice  little  graph, and  if  you  wanted  to  say, "Okay,  how  many   Inspector 1 versus  classified as  a  pass  when  it  was  a  fail. Now  we  can  bring  that  to  a stand, and  the  rows are  highlighted  too. Let's  go  into  JMP. Now  we're  going  to  analyze quality  and  process, variability  attribute  gauge  chart, recall  and  I'm  going  to  add in  the  standard  over  here. We  click  the  standard. Now  here's  the  issue. JMP  gives  us  the  attribute  gauge  chart, but  this  was  for  the  agreement. What  we'd  like  to  measure is  against  the  standard. We  come  up  here on  the  attribute  gauge  chart and  what we're  going  to  do  is  unclick anything  that  says  agreement. And  click  on  anything that  says  effectiveness. There  might  be  a  simpler way  to  do  this  eventually in the  [inaudible 00:23:31] programming  in  JMP. Now  we  have  the  effect  on  this  chart, again,  as  I  said,  50%  means that  one  of  the  inspectors  got it  right,  0%  means  they  both  got  it  wrong. And  we  had  the  agreement  report  showing  86  that  we've  seen  before. But  what  we  want  to  get  down  to is  the  effectiveness  rating, the  effectiveness  report. And  now  we  see that  Inspector  1  was  85%  effective. Inspector 2  was  87%  effective. Overall  it  was  86%  effective. Here's  the  summary of  the  miss  classifications. And  these  are  the  ones that  are  listed  over  here. As  they  say  this  terminology  you  need to  understand  that  incorrect  fails were  correct  passes and  incorrect  passes  were  correct  fails. Then  the  conformance  report  is  down  here, we  showed  you  how  to  do  the  calculation and  then  we  can  change  the  conforming category  by  doing  that  over  here. Or  we  can  calculate  the probability  of  escape,  escape  rate, by  putting  in  some  number in  that  estimates, how  often  we'd expect  to  see  a  defective  part. I'm  putting  in  over  here  point  one, click,  okay. And  then  JMP  gives  us  the  probability of  non-conformance  and  the  escape  rates as  shown  over  here  now  for  each  inspector. I was  going  back  to  now my  PowerPoint  presentation. Now  that  we  have  a  feeling for  these  concepts  of  agreement, effectiveness  and  the  Kappa  index, let  us  see  how  we  can  apply  the  approach to  a  more  complex  problem  engage analysis  called  inventory  tracking. As  part  of  a  consulting  project with  the  robotics  company, and  It's   Vimaan Robotics, and  by  the  way, there's  some  wonderful  videos if  you  click  over  here that  shows  the  drones  flying in  the  warehouse  and  so  forth doing  the  readings and  some  of  the  results  from  the  analysis. As  part  of  a  consulting  project I  was  introduced  to  the  problem of  drones  flying  in  a  warehouse using  optical  character  recognition to  read  inventory labels  and  boxes  and  shelves. In  measurement  system  analysis  MSA, the  purpose  is  to  determine  if the  variability  in  the  measurement  system is  low  enough  to  accurately  detect differences in  product- to- product  variability. A  further  objective  is  to  verify that  the  measurement  system is  accurate,  precise  and  stable. In  this  study, the  product  to  be  measured  via  OCR on  drones  is  the  label  on  the  container stored  in  racks  on  a  warehouse. The  measurement  system  must read  the  labels  accurately. Furthermore,  the  measurement  system  will also  validate  the  ability  to  detect, for  example,  empty  bins,  damaged  item, counts  of  items  in  a  given  location, dimensions,  and  so  forth. All  being  done  by  the  drones. In gauge R&R studies, one  concern  addresses  pure  error, that  is  the  repeatability  of  repeated measurements  on  the  same  label. Repeatability  is  a  measure  of  precision. In  addition,  in  gauge R&R studies, a  second  concern  is  the  bias  associated with  differences  in  the  tools, that  is,  differences among  the  drones  reading  the  same  labels. This  aspect  is  called  reproducibility, that's  a  measure  of  accuracy. The  design  that  I  proposed was  a  cross  study  in  which  the  same locations  in  the  warehouses, in  the  bins  are  measured multiple  times,  that's  for  repeatability across  different  bias  factors, the  drones  for  reproducibility. The  proposal  will  define  several standards  for  the  drones  to  measure. The  comparisons  will  involve  both within- drone  repeatability, drone- to- drone  agreement  consistency, and  drone- to- standard  accuracy. The  plan  was  to  measure  50  locations  1-50, and three drones  will  be used  to  measure  reproducibility that's  drone- to- drone  comparisons, and  there  will  be  three  passes for  each  location  by  each drone  to  measure  repeatability. Now,  multiple  responses  can  be  measured against  each  specific  standard. So  we  don't  have  to  have  just one  item  and  a  standard. We  can  have  different  characteristics. The  reading  can  be  binary,  that  is, classified  as  either  correct  or  incorrect. And  also  the  reading  can  provide  status reporting  for  a  location,  like  the  number of  units,  any  damage  units,  and  so  forth. Examples  of  different  responses are  how  accurately can  the  drones  read  a  standard  label? Are  there  any  missing  or  inverted  labels? Are  the  inventory  items in  the  correct  location? Is  the  quantity  of  boxes in  a  location  correct? Are  any  of  the  boxes  damaged? This  would  be  something that  a  human  person  would  be checking  as  part  of  an  inventory  control, but  now  we're  doing  it  all  with  drones. Here's  the  proposal. I  have  50  rows  over  here, 150  rows  actually,  because  each  location is  being  read  three  times  by  each  drone. So  I  have  drone  A,  drone  B,  and  drone  C. Then  these  are  the  results of  a  comparison  to  the  standard. We're  classifying five  standards,  A,B,C,D  and  E, and  they're  randomly  arranged in  here  as  far  as  the  location  goes. And  it's  one  characteristic specify  each  of  the  50  locations. S ince  we're  doing  three readings,  it's  150  rows. T hree  drones  reproducibility, three  passes  for  each  location and  by  each  drone  that's  repeatability and  the  standards are specified  for  each  location. I'm  going  to  make  an  important  statement over  here  that  the  data  that  I'm  using for  illustration  is  made- up  data and  not  actual  experimental results  from  the  company. We  can  start  off  with distributions  and  dynamic  linking. We  can  now  compare  the  classification of  the  drones  by  standard. We  generate  the  distributions and  then  we  click  on  say,  standard  A, and  we  can  see how  many  drones  got  that  standard  A  right, or  whether  any  drones  had further  misdiagnosis. Same  thing  if  we  can  click  on  standard  E, we  can  see  drone  A  had  a  higher  propensity for  misclassifying  standard  E and  same  thing  with  drone  C. Now  the  chart  below  shows  how  well the  drones  agreed  with  each other  for  each  location. Here  are  the  50  locations  and  we're looking  at  the  drones  comparing. Now  when  you're  comparing  a  drone  to  other drones,  you've  got  a  lot  of  comparisons. You're  comparing maybe  drone  one  to  i tself  three  times. You're  comparing  zone  one, drone  one  to  zone  two, times  for  each  one  of  the  measurements. So  it  could  have  like a 1, 2, 3   for  zone, drone  one  and  a  one,  two,  three  for  zone. Two,  and  you're  comparing  all  possible combinations  now  of  those  drones. That's  why  the  calculations get  a  little  bit  complex when  you  get  multiple  drones  in  there. But  you're  doing  a  comparison. This  shows  the  agreement among  all  the  comparisons. Now  we  noticed  that between  zones  the  five  and  10, that  for  these  locations the  accuracy  dropped  quite  significantly and  that  prompted  further investigation  as  to  why? It  could  have  been  the  lighting, it  could  have  been  the  location, it  could  have  been  something  else  that  was interfering  with  the  proper  diagnosis. You  see  most  of  the  drones are  reading  accurately  100%. This  is  an  agreement between  the  drones, so  they  were  agreeing roughly  90,  91%  at  the  time. And  these  are  the  confidence intervals  for  each  drone. So  this  told  us  how  well  the  drones were  comparing  to  each  other. Now  we  got  agreement  comparison, the  tables  will  show  the  agreement  values comparing  pairs  of  drones and  drones  to  the  standard. And  the  Kappa  index  is  given  against the  standard  and  repeatability within  drones  and  reproducibility  are  all excellent  based  on  the   Kappa Statistics and  agreement  across  even the  categories  is  also  excellent. So  we're  comparing  here  drone  A  to  drone B drone  A  to  drone  C,  drone  B  to  drone  C, all  doing  excellent  agreement. We're  comparing  here  the  drones  to  the standards,  all  an  excellent  agreement. And  then  this  is  the  agreement, just  basically  a  summary  of  it. Then  this  is  the  agreement by  the  different  categories. Now  again,  we  can  look  at  the  attribute chart  for  effectiveness. Same  way  we  click  out all  the  agreement  check  boxes and  then  click  on  the  effectiveness  boxes. We  see  again  over  here, that  seven  and  eight  had  the  lowest agreement  to  the  standard. Again,  that  could  have  been  something associated  with  the  lighting. It  could  have  been  something  associated with  some  other  issue  there. Then  the  overall  agreement to  the  standard  by  drone, you  can  see  they're  about  95%. The  drone  is  pretty  accurate and  they  were  pretty  reproducible and  the  repeatability  was  excellent. This  is  the  effectiveness  report. Now  this  is  a  little  bit  more  elaborate because  now  we're  comparing  it  for  each of  the  five  characteristic  standards and  these  are  the  incorrect  choices that  were  made  for  each  one. Out  of  150  possible  measurements, drone  A  measured  142  correctly, drone  B  145  and  drone  C  140. So  effectiveness  is  the  important  one. How  accurate  were  the  drones? We  can  see  that  the  drones  are  all running  up  around  an  average  about  95%. This  appears  to  be  highly  effective. Then  we  have  a  detail  analysis by  level,  provided in  Misclassification  report. So  we  can  see  individually  how  each  one of  these  drones  compared  to  the  different, how  each  one  of  the different  characterizations were  measured  correctly or  incorrectly  by  the  drones. This  is  the  ones  that  are misclassifications. A gain,  let  me  go  into  JMP. Oh,  one  further  example I  meant  to  show  up  over  here. You  using  graph builder, we  can  view  the  classifications and  mis classifications  by  each  strong. This  is  a  really  neat  way  of  showing  it. I  wish  JMP  would  include  this  possibly as  part  of  the  output,  but  you  can  see where  the  misclassifications  occur. For  example,  for  drone  A, when  you  misclassify  drone  C, most  of  them  were  classified  correctly, but  there  are  a  few  that  were  not. These  show  the  misclassifications. I  like  that  kind of  representation  in  graph builder. Now  let's  go  back  into  JMP and  we're  going  to  do attribute  gauge  analysis  multiple with  the  actual  experiment  that  was  run. Okay,  so  we're  going  to  analyze distributions so  we  can  do  this. We  can  compare  the  drones  to  the  standard. Again,  we  can  just  click  on  a  standard  and see  how  it  compares  across  the  drones. We  can  also  do  an  analyzed  graph builder. And  we  can  put  the  zone  drone  A and  then  drone  B, and  then  drone  C  over  here. And  then  we  can  put  the  standard  in  there and  it  shows  very  clearly what's  happening  with  that. B ut  we  can  go  and  also  into  JMP and  use  the  JMP  quality  and  process variability  attribute  gauge. So  we  add  the  three  drones  in  here, we  add  the  standard, and  we  put  in  the  location  and  we  get our  gauge  attribute  chart report showing  that  drones  as  far  as  the agreement  with  each  other,  we're  at  90%. This  one  has  the  most  difficult locations  to  characterize. Here  are  the  agreement reports  that  I've  shown  you. Drone  A,  Drone  B  and  Drone  C  agreement with  the  other  drones  and  with  itself  too. Drone  A  to  drone  B, these  are  the  Kappa  values. This  is  the  measurement to  the  standard,  all  very  high. And  then  these  are  the  agreement across  categories. And  then  for  the  effectiveness, to  get  that  graph  that  we  like  to  see for  the  effectiveness  report, we  take  out  the  agreement  over  here and  click  on  now  the  effectiveness. We  now  have  the  effectiveness  plot on  the  tap  that  shows  us  how the  drones  agreed  with  the  standard. We now  go  back  into  the PowerPoint  presentation  over  here. Okay,   to  summarize w hat  we've  done  over  here, the  use  of  attribute  gauge  analysis allowed  the  company  to  provide  solid  data on  the  agreement  and  effectiveness of  drones  for  inventory  management. T he  results  are  very  impressive. Subsequent  results  reported on  the  company's  website  show  inventory counts  to  be  35%,  faster  inventory  costs reduced  by  40% and  reduced  missed- shipments and  damage  claims  reduced by  50%  compared  to  the  previous  methods. In  addition,  the  system generates  what  we  call  actionable  data for  more  accurate,  effective,  safer, more  cost  effective, and  faster  inventory  control. Some  excellent  references  over  here is  Cohen's  original  paper, and  book  by  Fleiss  is  excellent, has  a  lot  of  detail,  and  also the  book  by  Le  is  well  done. I  thank  you  very  much  for  listening. Have  a  good  day.
Micol Federica Tresoldi, Senior Research Statistician, Dow Chemical Xinjie Tong, Senior Research Statistician, Dow Chemical   This case study investigates chemical mixtures to achieve optimal properties using design of experiment (DOE) data. The formulation space consists of four input variables: Chemical A Type, Chemical B Type, Chemical C Type, and Chemical D Content. The first three variables represent different compositions for making Chemical A, B and C, respectively, and as such, can be coded both as categorical factors, as well as continuous mixture variables.   We created the DOE treating them as categorical due to the experimental constraints. However, at the data analysis stage, even after considering thousands of simulated hypothetical formulations, none of them was predicted to meet the desired properties. At that point, to be able to identify promising subregions, we needed to overcome the discreteness of the space. So, we recoded those two factors as continuous and mixture variables, derived the equivalent regression model, and reran the simulations. Indeed, under certain assumptions, this coding strategy enables one to interpolate and consider missing compositions not present in the original DOE.   In this presentation, we demonstrate how to use JMP Pro 16 Profiler Simulation feature with Graph Builder to achieve an extensive and insightful exploration of the formulation space applicable to diverse fields.     Hello  everyone. My  name  is  Micol Tresoldi. Today  my  talk  will  be  about  coding with  continuous  and  mixture  variables to  explore  more  of  the  input  space. Before  I  jump  into  the  topic  though, I'd  like  to  give  your  brief  outline of  what  my  presentation  would  look  like. I'll  start  by  sharing  and  presenting to  you  a  little  bit  of  a  general  idea of  what  the  object  of  the  project  was and  the  objective  that  was  driving  it, and  then  I'll  pass  on to  present  the  initial  approach that  we  took  initially to  pursue  this  objective. I'll  then  show  you  though, that  following  this  initial  approach, we  do  encounter  some  problem. At  that  point  at  the  problem  stage, we'll  need  to  go  back to  the  beginning  of  the  problem  setting, and  try  to  look  at  that from  a  slightly  different  perspective, in  a  way  that  we  can figure  out  an  alternative  way of  looking  at  our  input  variables. In  doing  this, we'll  be  altering  our  data  structure. But  I'll  show  you  how  we  can  actually build  an  equivalent  to statistical  model in  a  way  that  we  will  not  be  in  need of  going  and  collect  any  additional  data, but  actually  we'll be  able to  re-analyse  exact  same  data, and  still  be  able  to  hopefully overcome  our  initial  problem and  find  some  useful  directions  to  go. This  is  the  overview  of  the  presentation. Let  me  start  by  giving  you the  general  idea  of  the  project. When  the  clients  first reach  out  to  us, they  had  something  in mind in  terms  of  having  some  ingredients they  needed  to  mix  together, in  a  way  that  the  final  formulation exhibited  some  optimal  properties. More  specifically, any  formulation  was  going  to  be  judged upon  two  properties, and  each  of  these  properties  had  to  meet some  certain  optimality  criteria. As  I  just  stated,  the  problem  itself, is  pre- general  in  its  nature . We'll  have  some  ingredients, using  the  common  analogy, we  can  think  about  ourselves in  the  kitchen  having  some  ingredients and  having  to  figure  out a  way  to  mix  them. In  way  that,  at  the  end, our  cake  will  look  nice and  also  taste  good. This  is  the  general  framework. Now,  let  me  give  you  some  more  details about  this  specific  cake. The  recipe  calls for  four  ingredients, Factor  A,  Factor  B, Factor  C,  and  Factor  D. For  Factor  A,  B,  and  C, actually  the  amount  to  put  in  the  recipe is  being  predetermined. We  don't  have  freedom  there. On  the  other  hand, what  we  need  to  decide  though is  how  we're  going  to  make those  ingredients,  if  you  like. There  are  multiple  ways of  making  those  ingredients because  we  have  multiple  raw  materials that  we  can  employ to  arrive  to  those  ingredients. Then  only  after  having  these  ingredients ready  for  using, we  can  actually  employ  them in  the  final  recipe. This  is  for  Factor  A,  B, and C. For  Factor  D  on  the  other  hand, there  is  only  one  raw  material  we  can  use. Only  one  way  of  making  it. What  we  need  to  decide  is  how  much we're  going  to  put  a  factor  D in  the  final  recipe. Just  to  recap, in  terms  of  decision -making  problem, we'll  need  to  decide  four  things, how  we  make  Factor  A, how  we  make  Factor  B, how  we  make  Factor  C, and  how  much  of  Factor  D we're  going  to  put  in  the  recipe. Okay,  now  I'll  need  to  be a  little  more  specific in  giving  you  some  more  details about  how,  what  were  these  ways of  making  Factor  A,  B,  and  C. The  client,  when  they  came  to  us, they  had  relatively  few  options  in  mind for  this. For  Factor  A,  they  wanted to  consider  two  raw  materials. either  only  using  raw  material  A1, or  only  using  raw  material A2. For  factor  B,  once  again, only  two  raw  materials,  B 1  and  B 2. The  possible  ways  of  making  factor  B, it  was  either the  two  pure  blends  of  B 1  and  B 2, or  a  50-50  blend  of  B 1  and  B2. Factor  C  we are  now  three  row  materials are  available  for  making  it. Again, either  the  three  pure  blends,  C 1,  C 1,  C3, or  as  a  fourth  options, are  50-50  blend  of  C 1  and  C 2. With  respect  to  the  Factor  D  quantity, which  I'm  going  to  denote  by from  now  on,  by  X 1, they  wanted  to  test  four  possible  levels. Four  possible  amounts, five,  10,  15,  and  20. Regarding  the  response  variables, those  are  slightly  more  straightforward in  the  sense  that we  only  have  two  of  them, both  our  continuous  variables. Each  of  them, as  I  was  mentioned  in  the  beginning, had  to  meet  certain  optimality  threshold, optimality  criteria. For   Y1  had  to  be  above  17. For   Y2  had  to  be  above  2.6. Now  we  have  on  our  left, our  input  variables  that  we  need  to  decide how  to  maneuver  and  vary in  making  the  recipe. On  the  right  side, we  have  the  properties that  we're  interested  in. What  we  decided  to  do  was to  propose  our  clients to  do  designed  experiment in  a  way  that  we  would  go  out and  make  some  of  these  recipes, make  some  of  these  formulations and  be  able,  from  the  collected  data, after  recording  the  properties for   [inaudible 00:06:32] , actual  observed  formulation, to  understand and  infer  the  relationships  undergoing that  were  linking the  input  variables. How  we  were  making  our recipe  and  response  variables. How  the  properties  actually were  executing  themselves for  different combinations  of  the  inputs. Ultimately  that the  objective  of  the  project  was  in  fact, to  figure  out whether  there  was  an  optimal  recipe, meaning  a  recipe  that  whose  properties both  met  their  respective optimality  criteria. Given  this  framework, given  this  setting, now  it's  pretty  clear  that X 1  is  going  to  be  a  quantitative  variable. But  how  about  Factor  A,  B,  and  C? Given  the  fact  that  we  can  mix these  raw  materials . Are  we  going  to  treat  them  as  categorical or  are  we  going  to  treat  them  as  numeric? At  this  stage,  because  the  client  was particularly  interested  in observing  the  performance  of  these specific  compositions  of  the  raw  materials for  making  the  various Factors  A,  B,  and  C, we  decided  to  accommodate  their  requests and  coded  them  as  categorical  variables in  a  way  that  we  were  sure that  those  specific  compositions were  going  to  show  up in  the  design  of  experiment. Again,  categorical  variables means  that,  and  in  this  case, each  level  of  the  categorical correspond  to  a  possible  way of  making  the  ingredient  or  factor. We  end  up  with  three  categorical  variables with  two,  three, and  four  levels  respectively. Now,  turns  out  that  actually this  categorical  coding  approach was  also  pretty  helpful  in  the  discussion of  how  we  wanted to  specify  the  statistical  model that,  in  principle,  was  supposed  to, or  at  least  assumed to  be  comprehensive  enough to  describe  and  capture the  relationship  undergoing between  the  factors  and  the  responses, the  properties. For  the  client,  was  particularly  easy for  having  this  categorical  coding to  identify  and  specify what  kind of interaction  turns who  they  were  expecting  to  see in  terms  of  explaining  and  be  relevant in  explaining  the  relationships. The  final  statistical  model that  we  ended  up specifying  the  design  of  experiment, comprised  of  main  effects, two -way  interactions,  all  of  them, quadratic  and  cubic  terms for  the  continuance  variable with  the  addition  of  the  interaction of  the  quadratic  with  one  of  the  factors. Now,  of  course, we  also  had  some  constraint in  the  number  of  experiments  available. Because  we  obviously  don't  have infinite  amount  of  resources, so  we  put  a  constraint  of  51  runs, and  this  is  the  DOE  that  JMP  gave  us able  to  estimate the  statistical  model  we  just  specified, and  also  be  able  to  be within  the  constraints that  on  our  resources. Now  with  this, the  only  thing  that  was  left  to  do was  go  and  make  this  51  formulations. Imagine  that  we're  super  quick, and  everything  is  magic, and  we  have  already  got  gun and  made all  of  our  relations  collected  data . Now  we  are  in  good  shape  for estimating  the  Gaussian  model that  we  specified. These  are  the  results for  the  first property, Y 1. We  can  see  that  there  is  a  pretty  good  fit between  predicting  and  actual  values. A lso,  if  we  look  at  the  metrics, the  reporting,  the  model  summary, those  look  pretty  satisfactory. The  same  is  true  if  we  look  at now  at  the  second  property   Y2, again,  pretty  good  fit. We  are  happy  with  our  models, and  we  think  we  did  a  good  job in  capturing  the  relationship. Now  remember  that  what  we  really  want to  discover  is,  in  fact, there  is  any  optimal  recipe  that  can meet  both  criteria  for  our  properties. How  are  we  going  to  do  this? How  are  we  going  to  establish if  such  a  optimal  recipe  exists  or  not? Well,  in   JMP Pro 16, this  is  a  super  easy  task, because  we  can  simulate  thousands of  potential  alternatives  recipes by  using  the  Profiler  Feature  options. For  each  of  these  hypothetical  recipes, we  can  automatically  have in  the  same  table the  predicted  mean  value for  the  two  properties, so  that  it  comes  super natural and  super  easy to  see  if  there  is  any  optimal  recipe. Just  to  give  you  an  idea how  quick  that  is, I  want  to  show  you  live, how  we  can  do  this. This  is  my  DOE  categorical  table, where  I  have  my  Factor  A,  B,  and  C. X1  is  my  only  quantitative input  variables. I  have  my  recorded  values for  the  two  properties,   Y1  and   Y2. Imagine now  that  we  have  already run  the  model,  estimated  model and  saved  the  prediction  formulas for  the  two  variables  here. We  can  go  here and  highlight  these  two  columns. Go  to  graph,  select  Profiler, and  then  put  those  two  prediction  formulas in  the  Y  prediction  formula  box and  click  OK. This  is  the  usual  way we  get  a  profiler  dialogue  box. In  fact,  we  can ,  easily  play  around and  changing  the  various, but  levels  of  the  inputs  in  a  way that  we  can  actually  see how  this  impacts  our  predictions for  the  two  properties. However,  what  I  want  to  show  you  today is  how  we  can  actually  ask, going  to  the  red  triangle, ask  JMP  to  output  a  random  table, and  we  can  make  it  as  big  as  we  like. I'm  going  to  start  with  30,000  rows, just  to  start, I'll  show  you,  see, didn't  really, took  no  time  for  JMP to  give  us  this  30,000  rows where  each  table,  where  each  row corresponds  to  a  hypothetical  recipe that  we  haven't  necessarily  seen in  the  DOE. This  is  the  power of  having  this  feature  in  JMP, that  we  can  explore  the  input  space in  literally  no  time. Now,  if  we  are  interested  in  seeing whether  there  is  one  recipe that  is  optimal, then  we  can  go  here,  Graph  Builder, and  put  the  predictive  values  for   Y1, predictive  values  for   Y2. And  then  just  to  aid  our  visualization, I'm  going  to  put  a  vertical  axis in  correspondence of  the  optimal  threshold  for   Y2, and  likewise  horizontal  line marking  the  optimal  threshold  for  Y1. This  upper  quadrant denotes  the  optimal  region, because  both  properties  are satisfying  the  optimality  criteria. Unfortunately,  that  we  can  see  from  here that  we  don't  find  any  recipe that  is,  in  fact, able  to  satisfy  both  the  criteria. This  is  like, okay,  not  very  good  news. Now  let  me  go  back to  my  presentation  very  quick. We  can  see,  in  fact, that  we  don't  have any  properties  line  this  quadrant with  the  happy  green  smiley. What  do  we  do  at  this  point? Do  we  give  up? Of  course  not. What  we  can  do  is,  in  fact, go  back  to  the  beginning  of  the  problem and  try  to  see  if  we  can  change any  of  our  initial  choices that  we  first made in  approaching  the  problem. In  particular,  you  might  be  remembering that  we  were  undecided whether  we  would  treat the  Factor  A,  B,  and  C as  categorical  or  as  numeric. So  far  we  have  treated them  as  categorical. So  far,  factor  A  as  being a  categorical  variable,  with  two  levels, either  only  using  A1  or  only  using  A2. However,  because  in  fact, the  client  were  open to  mix  the  raw  materials to  make  Factor  A. So  that  was  an  option. Then  what  we  can  think  of is  substituting   this  Factor  A  with  variable that  now  I  call   A1 Content, which  is  a  quantitive  variable, which  represents how  much  of   A1  I'm  going  to  put into  the  mixture  of   A1  and   A2 for  making  Factor  A. The  translation, the  conversion  between  categorical  levels and  numerical  values, it's  almost  immediate . If  I'm  only  using   A1, I'm  going  to  use  100% of  A1  in  my  mixture. so  I  can  code   A1  Content  to  be equal  to  one. On  the  opposite  side, if  I'm  only  using   A2, this  means  that I  have  zero A1  Content  in  my  mixture, and  therefore   A1  Content is  going  to  be  equal  to  zero. You  might  have  guessed  that  implicitly, we  are  also  defining   A2  Content to  be  equal  to   1 -   A1  Content. But  we  don't  really  need  that because  we  are  only  looking at  two  mixture  variables. Why  are  we  doing  this? Well,  the  advantage  is  clear . With   Factor A, we  were  constrained  in  looking  at  either A1  Content  to  be  equal  to  zero  or  one. Now  that  we're  considering continuance  coding, the   A1  Content  can  take any  value  between  zero  and  one. This,  of  course, represents  an  enormous  jump in  the  flexibility  of  our  model and  an  infinite  in  the  sense  that now  we  are  open to  literally  infinite  more  mixtures and  infinite  more  ways  of  making   Factor A. Likewise,  Factor  B  is categorical  with  three  levels. So  far  it's  been  this  way, coded  only  B1,  only   B2 or  50-50  blend. But  following  the  similar  logic, we  can  now  introduce  a B1  Content, continuous  variable. A gain,  the  conversion  is going  to  be  exactly  the  same. 50-50  blend  of  B 1  and  B 2 will  be  converted  in  0.5 because  I'm  using 50 %  of  B 1  and  50 %  of  B 2. Again,  B 2 Content  is 1 - B 1 Content. A gain,  the  advantage  is  that we're  not  bound  to  jump  from  zero  to  0.5, or  to  zero  to  one  necessarily, but  we  can  explore the  whole  spectrum  of  values from  zero  to  one. Factor  C  is  likely  more  tricky, because  we  do  have  three  possible raw  materials  to  mix  up. A t  this  stage, we  need  to  introduce  not  just  one, but  actually  three  continuous  variables that  besides  being  continuous, have  also  the  mixture  constraints. Meaning at  all  times, they  need  to  be  something  to  one. But  the  conversion between  the  levels  of  Factor  C and  the  three  new  mixture  variables follows  exactly  the  same  logic. That's  super  easy. This  is  just  a  visualization  of  how we  do  the  conversion  of  the  levels. This  is  how  the  DOE  points that  we  already  have  the  data  on. We  don't  need  anything  else. Are  seat  within the  continuous  coding  space. Now,  the  only  more involved  steps  in  passing from  the  categorical  coding to  use  a  continuous  coding is  how,  in  fact, we  convert  this  the  statistical  model that  we  use  to  design  the  experiment and  then  to  analyze  the  data. How  are  we  going  to  do  this? Well,  the  easier  way is  to  just  do  it  in  many  small  steps. What  we're  going  to  do is  start  with  our  main  effects  model, a  little  by  little at  the  different  factors. We  start  with   Factor A, which  had  only  two  levels. Now  in  the  continuous  coding,  what  we're  going  to  put  is   A1  Content. We're  only  going  to  put the  linear  term  of  this   A1  Content. In  fact,  we  only  had  one  coefficient for  Factor A  in  the  category  coding  model. Likewise,  now  we're  going  to  have one  single  coefficient  for   A1  content. Now  if  you  don't  believe  me, this,  it's  an  equivalent  model. I'm  going  to  show  you a  couple  of  examples. Imagine that  we  want  to  figure  out the  impact  of  using  only   A2 for  making   Factor A, then  that  means that   A1  content  is  zero.  Fine. From  the  categorical  coding  model, we're  going  to  just look  at  the  intercept  term, because  this  extra  term refers  to  when  we  use   A1. On  this  other  side, for  continuous  coding  model, we're  going  to  put  the  intercept, of  course, and  then  the   A1  Content  coefficient, but  now  we  will  multiply  it  by  zero because   A1  Content  is  zero. Not  even  doing  any  math, you  can  really  see  that these  two  numbers  are  exactly  the  same. Similarly,  if  we  want  to  see , what's  the  impact  of  using  only   A1 now  at  this  time,   A1  content is  going  to  be  equal  to  one. Now  for  categorical, I'm  going  to  sum  up  the  intercept  term plus  the   Factor A  coefficient  accord accounting  for  the  difference and  the  levels  of  the  factor. On  this  other  side though, we  are  going  to  always include  the  intercept. A t  this  point,  we'll  multiply the   A1  Content  coefficient  by  one because  the  content  is  one. Again, not  even  any  math, the  two  numbers  here  are  the  same as  the  two  numbers  here. Exactly  equivalent. Now  with  Factor  B,  we  had  three  levels. How  are  we  going  to  do  that? Well,  because  it  has  three  levels, now  we  can't  just  add  the  linear  term, but  we  also  need to  add  the  quadratic  term. We  had  two  coefficients  before,  and we're  going  to  have  two  coefficients also now  with  the  continuous  coding. A gain,  if  you  don't  believe  me, this  is  an  equivalent  model, we can  work  out  at  least  one  example, which  works  exactly  as  befor e. If  I  only  have  B2, B 1 Content  in  zero, means  that  two  coefficients are  going  to  have  zero  weight in  computing  the  impact. Therefore  the  two  numbers  are  only just  two  that  are,  in  fact  the  same. I'm  not  going  to  go  into  this  again, only  B 1  is  equivalent to   B1 Content  equal  to one. The  most  interesting  is, this  that  at  least  requires  you to  do  some  summation. Where  B 1  Content  is  going  to  be  0.5, because  we  are  considering  50-50  blend. You  can  verify  easily that  these  two  numbers  here  summed  up are  equivalent to  this  other  side  of  the  equation where  we  put  0.5  and  0.5  squared, because  now  our  B 1 Content is  equal  to  0.5. Now  for  Factor  C,  we  had  four  levels. We  particularly  remember, we  had  three  possible  raw  materials. We  had  to  introduce three  mixture  variables. Every  time  we  do  have  to  deal with  mixture  variables  things, it's  slightly  complicated because  they  become  perfectly  cleaner with  any  constant  term. In  putting  the  C 1,  C 2,  and  C 3, the  sum  of  them  deletes  or requires  us  to  delete  the  constant  term. But  other  than  that, everything  follows  pretty  much  the  same. We  had  three  coefficients  here, and  we're  still  going  to  have three  coefficient  here because  we  have  four, but  we  are  getting  rid  of  the  intercept. So  still  the  same  balance. A gain,  I'm  not  going  to go  through  all  of  the  examples, but  you're  more  than  welcome to  look  at  the  slides  offline and  check  that  those  are,  in  fact, gives  you  always  the  same  answers. These  are  all  the  examples. Now  with  so  much  work,  we  have  found the  conversion  of  the  main  effects. How  we  actually  convert each  separate  factors into  using  the  new continuous  variables? Now  our  original  model,  though, included  more  than  just  main  effects. In  fact,  we  had  the  two -way  interactions. Now  the  idea  here  is  that every  time  Factor A  appears, I'm  going  to  substitute  it with   A1  Content. Every  time  Factor  B  appears, I'm  going  to  substitute  it with  the  two B1  Content and  B 1  Content  squared. Likewise  for  Factor  C, I'm  going  to  substitute  it with  the  four  terms  that  I've  put  here. The  same  holds when  I'm  interacting  with  X 1, and  everything is  very  much  in  the  same  flavor, logically  follows  the  same  scheme. The  only  caution that  you  want  to  be  aware  of and  be  particularly  attentive  about is  that  every  time  you  interact a three  mixture  variables, where  those  are your  three  mixture  variables, those  main  effects  that  you  originally  had now  need  to  be  excluded  from  the  model, otherwise,  the  model  won't  be  feasible. That's  the  only  caution that  you  need  to  be  careful  about. Other  than  that,  we're  ready  to  go. We've  got  our  equivalent  continuous  model. Now  what  we  can  do  is,  in  fact,  again, verify  that  everything  is  still  same. I  get  exactly  the  same  predictions, either  using  the  categorical  coding or  going  and  using  the  continuous  coding. Now  you  might  ask  myself, why  are  you  going  into  so  much  trouble and  going,  doing  so  much  mess if  things  are  exactly  the same? Well,  the  advantage  is  immediate  to  see, and  you  can  really  appreciate  it if  you  start  looking  at  the  profilers. This  is  the  profilers,  how  it  looks, when  you  use  the  categorical  coding . You  have  to  jump  between the  different  levels. You  don't  have  the  faintest  idea what  can  happen  in  between. With the  continuous  coding on the  other  hand, that's  exactly  what  you  can  do. You  can  explore  way  more of  the  different  possible  ways of  making  the  various  ingredients Factor A,  B,  and  C in  a  way  that  before it  was  just  out  of  bounds. In  technical  terms,  means  that we  have  way  more  power  of  interpolation. This  doesn't  come  free,  of  course. What  you  pay,  the  price  of  is  in  fact, that  you  are  implicitly making  some  assumptions. The  assumptions  regards  the  way that  the  various  new  continuous  variables that  we  have  introduced are  related  to  the  responses. In  a  way,  we  are  implicitly  assuming  that the  relations  between  A 1 Content and  our  properties  is  linear. The  relationship  between B1 C ontent  is  quadratic and  so  forth. If  you  think  that  those  assumptions don't  really  hold  in  your  case, then  of  course,  the  whole  procedure is  questionable. You  don't  want  to  pursue  this. But  if  you  don't  have  any  reason why  you  wouldn't  believe  this, or  at  least  why  you  wouldn't at  least  explore  this  possibility, then,  now  we  can  go  back and  do  the  same  exercise and  explore  again  the  input  space, but  with  way  more  flexibility. Again,  let's  see  if  we  can  find that  an  optimal  recipe with  this  new  continuous  mixture  coding. How  we're  going  to  do? Well,  exactly  same  ways. I'm going to  use  the  JMP  profiler  feature and  use  the  simulation and  see  if  we  can  find  anything. Now  let  me  go  here. This  is  my  DOE  continuous  table. Continuous,  because  now  you  can  see  that these  are  all  coded as  continuous  variables . They  have  the  blue  triangle next  to  themselves . The  C 1,  C 2,  C 3  are  also  these  stars, indicating  that  they're  coded as  mixture  variables  in  JMP. Now  imagine  that  again, we  have  already  fitted  our  model with  the  fit  model  platform. We  saved  our  prediction  formulas now  with  the  continuous  coding. What  we're  going  to  do  same  thing, Graph  Profiler,  select  those, and  here  we  go. Here  is  our  prediction  profiler. Now  we  can  play  way  more  with  the  profiler and  see  all  different  combinations without  having  to  jump between  different  options. Now,  once  again, red  triangle,  output  random  table. Just  for  making  things  fair, I'm  going  to  ask  3,000  rows. Again,  no  time, literally  blink  of  an  eye. JMP  gives  you  3,000  row  tables where  now  every recipe is again, sorry. Every row  is  again, a  potential  hypothetical  recipe that  we  haven't  really  seen, necessarily  seen  in  our  DOE but  it  still  feasible, because  it  still  respects  the  constraint that  we  had  at  the  beginning. Once  again, to  figure  out whether  something  good  is  happening, or  at  least  whether within  this  30,000  formulation, we  do  find  something  that  is  optimal. I'm  going  to  construct  the  same  graph. Now  you  can  see  that  our  points  are  all disperse  and  are  not  aligned  anymore. Again,  fitting  the  axis just  to  aid  our  visualization. This  is  the  nice  thing. With  this  way  of  coding and  looking  at  more  of  the  input  space, we  do  find  few formulations  that  seem  to  be  promising. Of  course,  we  need  to  keep  in  mind that  this  our  predictive  values. Everything  is  still  relying  on  our  data,   on  our  statistical  model  analysis, but  is  still  more  promising  than  before. We  do  find  something  in  the  optimal  region defined  by  these  two  axis. Quickly,  going  back  to  my  presentation, I  want  to  draw  a  final  conclusion  here, which  is,  in  fact,  that, using  the  categorical  coding, we  couldn't  find  any  recipe that  at  least  on  the  predictive  side, could,  in  fact, meet  both  the  optimality  criteria. Well,  once  we  turn  to, figuring  out  how  to  code these  different  categorical  variables into  continuous  and  mixture  variables and  exploit  the  JMP  power  of  giving  us thousands  and  thousands  of  formulations, we  do  find  a  few that  in  fact  meet  the  specs. We  were  happy  that  at  least we  could  go  back  to  our  clients  say, look,  instead  of giving  up  on  your  project, try  to  make  these  formulations and  see  how,  in  fact, whether  the  actual  properties do  meet  your  criteria  or  not, but  at  least  it  gives  us  some  directions of  improvement  where  to  go. With  this, I'd  like  to  end  my  presentation. Thank  my  colleague,   Xinjie Tong and  all  of  my  collaborators at  Dow  Chemical. Thank  you,  all  of  you for  watching  my  presentation. I'll  be  more  than  happy to  answer  any  questions you  might  have  at  this  point. Thank  you.
Monday, September 12, 2022
It is common to need to compare two populations with only a sample of each population. Statistical inference is often used to help the comparison. Our presentation is limited to statistical inference that involves two hypotheses: the null hypothesis and the alternative hypothesis. Sometimes the goal of the comparison is to provide sufficient evidence to decide that there is a significant difference between two populations. At other times, the goal is to provide sufficient evidence that there is significant equivalence, non-inferiority, or superiority between two populations. Both situations can be assisted with a hypothesis test, but they require different tests. We review these situations, the appropriate hypotheses, and the appropriate tests using common examples.   Another common comparison is between two measurements of the same quantity. This situation is broadly covered by Measurement System Analysis. Our presentation focuses instead on the Method Comparison protocol for chemical and biological assays used by pharmaceutical and biotechnology development and manufacturing. We present two methods that are available in JMP 17 to assess the accuracy of a new test method against an established reference method. One method is known as Deming regression or Fit Orthogonal in JMP. The second method is known as Passing-Bablok regression. We review the background of assessing accuracy, the unique nature of data from method comparisons, and demonstrate both regression methods with examples.     Hello. My  name  is  Mark  Bailey. I'm  a  senior  Analytics  Software  Tester at  JMP. My colleague  today  is Jianf eng  Ding, a  Senior  Research Statistician  Developer. I'm  going  to  start  the  presentation about  some  new  approaches  to  comparisons that  will  be  available  in  JMP  17. I'm  going  to  start  over. I  don't  know  why. Hello. My  name  is  Mark  Bailey. I'm  a  Senior  Analytics  Software  Tester at  JMP. My  co-presenter today  is  Jianfeng  Ding, a  Senior  Research Statistician  and  developer. I'm  going  to  begin  with an  introduction  to  our  topic. Before  we  talk  about  specific  comparisons, we'd  like  to  introduce some  fundamental  concepts. All  of  this  has  to  do  when we're  comparing  populations. Comparing  populations is  a  very  common  task. The  comparison,  we  hope,  will  lead to  a  decision  between  two  hypotheses. Samples  from  these  populations  are often  collected  for  the  comparison. S tatistical  inference  can  provide  some valuable  information  about  our  samples. In  particular, is  there  sufficient  evidence  to  reject one  hypothesis  about  these  populations. A clear  statement  of  the  hypothesis  is really  essential  to  making  the  correct choice  of  a  test  for  your  comparison. These  hypotheses represent  two  mutually  exclusive  ideas that  together  include the  only  possibilities. They're  called  the  alternative and  null  hypotheses. The  alternative  hypothesis is  really  a  statement  about the  conclusion  that  we  want  to  claim. It  serves  to  represent  the  populations and  it  will  require  sufficient  evidence to  overthrow  the  other  hypothesis, which  is  the  null  hypothesis. It  states  the  opposing  conclusion  that must  be  overcome  with  strong  evidence. It  serves  as  a  reference  for  comparison and  it's  assumed  to  be  true. This  is  important  that  we sort  this  out  today  because  historically, statistical  training  has  presented  only one  way  of  using  these  hypotheses. The  most  often  taught  statistical  tests are  used  to  demonstrate  a  difference between  the  populations. But  that's  not  the  only  possibility. The  lack  of  understanding about  this  distinction can  lead  to  misusing  these  tests. The  choice  of  a  test  is  not  a  matter of  the  data  that's  collected or  how  the  data  is  collected. It's  strictly  a  matter  of the  stated  hypotheses for  the  purpose of  your  comparison. Let's  look  at  two  similar  examples  that are  actually  fundamentally  different. But  let's  start  out  where  we  have a  purpose  of  demonstrating  a  difference. In  this  example,  let's  say  I  would  like  to demonstrate  that  a  change  in  temperature will  cause  a  new  outcome, an  improvement  perhaps. We  want  to  claim  that  a  new  level of  our  response  will  result from  changing  the  process  temperature. We'll  use  a  designed  experiment to  randomly  sample  from  a  population for  the  low  temperature  condition and  the  high  temperature  condition. The  two  hypotheses  are  the  null  states that  the  temperature  does not  affect  the  outcome. This  will  be  our  reference. The  alternative  states  our  claim, which  is  the  temperature  affects the  outcome, but  only  if  the  evidence  is  strong enough  to  reject  the  null  hypothesis. All  right,  this  is  going  to  sound  very similar,  but  it's  exactly  the  opposite. In  this  case,  an  example  two, we  need  to  demonstrate  equivalents. Here  we  want  to  demonstrate that  a  temperature  change doe  not  cause  a  new  outcome. That  is,  after  the  change, we  have  the  same  outcome. For  example,  this  might  be  the  case  where we  are  planning  to  change the  process  temperature to  improve  the  yield, but  we  want  to  make  sure that  it  doesn't  change the  level  of  an impurity  in  our  product. We  design  the  same  experiment to  collect  the  same  data and  we  have  the  same  two  hypotheses, but  now  they're  reversed. It's  the  null  that  states that  the  temperature  affects  the  outcome, that  is,  there's  a  difference, while  the  alternative  states that  our  change  in  temperature will  not  affect  the  outcome. Are  we  testing  for  a  difference or  for  equivalents? Really  we  see  that  from  these  examples that  it's  not  the  data,  the  data identical,  but  the  tests  are  different. the  choice  is  not  about  the  data, it's  about  our  claim,  or  in  other  words, how  we  state  our  hypotheses. Also  remember  that  hypothesis tests  are  unidirectional. They  serve  only to  reject  a  null  hypothesis of  a  high probability  when  it's  false. in  our  presentation  today, we'd  like  to  introduce  some  new equivalents  tests  as  well  as  some additional  methods  that  are  used  when comparing  two  measurement  systems. I'm  now  going  to  hand  it  over  to  Jianfe ng to  talk  about  equivalence  test. Thanks  Mark. Hello. I'm  Jianfeng Ding . I'm  a  Research  Statistician Developer  at  JMP. In  this  video I'm  going  to  talk  about  the  equivalence, non-infererority and  superiority  test  in  JMP  17. The  classical  hypothesis  test  on  the  left is  the  test  that  most  quality professionals  are  familiar  with. It  is  often  used  to  compare two  or  more  groups  of  data to  determine  whether they  are  statistically  different. The  parameters  data  can  be  a  mean  response for  continuous  outcome  and  a  proportion when  the  outcome  variable  is  binary. Theta  t  represents  the  response from  treatment  group and  theta  zero  represents response  from  a  control  group. There  are  three  types of  the  classic  hypothesis  test. The  first  one  is  the  two  sided  test and  the  rest  are  one  sided  tests. If  you  are  looking  at  this  two  side  test on  the  left, the  no  hypothesis  is  that the  treatment  means  are  same and  the  alternative  hypothesis  is  that the  treatment  means  are  different. Sometimes  we  really  need  to  establish that  things  are  substantially  the  same and  the  machinery  to  do  that  is called  an  Equivalence  Test. An  equivalent  test  is  to  show the  difference  in  theta  t  and  theta  zero is  within  a  prespecified  margin  delta and  allow  us  to  conclude  the  equivalence with  a  specified  confidence  level. If  you  look  at  the  equivalence  test, the  no  hypothesis  is  that the  treat  statement  means  are  different and  the  alternative  hypothesis  is  that the  treatment  means  are  within a  fixed  delta  of  one  another. This  is  different  from  the  two sided  hypothesis  test  on  the  left. Another  alternative  testing  scenario  is the  Non-inferiority  Test, which  aims  to  demonstrate  that  results are  not  substantially  worse. There  is  also  a  testing  scenario  called superiority  testing, that  is  similar  to   non-inferiority  testing, except  that  the  goal  is  to  demonstrate that  results  are  substantially  better. There  are  five  different  types of  equivalent  type  test depend  on  the  situation. When  should  we  use  this  test will  be  discussed  next. These  tests are  very  important  in  industry, especially  in  the  biotech and  pharma  industry. Here  are  some  examples, if  the  goal  is  to  show  that  the  new  treatment does  not  differ  significantly from  the  standard  one by  more  than  some  small  margin, then  equivalent  test  should  be  used. For  example,  a  generic  drug that  is  less  expensive and  cause  few  side  effects than  a  popular  name  branded  drug. You  would  like  to  prove  it  has  same efficacy  as  the  name  brand  one. The  typical  goal in  non-inferiority  testing  is  to  conclude that  a  new  treatment  or  process is  not  significantly worse  than  the  standard  one. For  example,  a  new  manufacturing process  is  faster. You  would  make  sure  it  creates  no  more product  defects  than  the  standard  process. A  superiority  test  tries  to  prove that  the  new  treatment  is  substantially better  than  the  standard  one. For  example,  a  new  fertilizer  has  been developed  with  several  improvements. The  research  wants  to  show that  the  new  fertilizer  is  better than  the  current  fertilizer. How  to  set  up  the  Hypothesis. The  graph  on  the  left, summarize  these  five  different type  of  equivalent  type  tests  very  nicely. This  graph  is  created  by  SAS  STAT  College, john  Castelloe  and  Donna  Watts. You  can  find  their white  paper  easily  on  the  web. Choosing  which  test depend  on  the  situation. For  each  of  the  situation, the  blue  region  is  the  region  that  you are  trying  to  establish  with  the  test. For  equivalent  analysis, you  can  construct  an  equivalence  region with  upper  bound  theta  zero  plus  delta and  lower  bound  theta  zero  minus  delta. You  can  conduct an  equivalence  test  by  checking whether  the  confidence  interval  of  theta lies  entirely in  the  blue  equivalence  region. Likewise,  you  can  conduct a  non-inferiority  test  by  checking whether  the  confidence  interval  of  theta lies  entirely  above the  lower  bound  if  large  theta  is  better, or  below  the  upper  bound  if smaller  theta  is  better. These  tests  are  available  in  one  way for  comparing  normal  means and  in contingency for  comparing  response  rates. The  graphical  user  interface of  equivalence  test  launch  dialog makes  it  easy  for  you to  find  the  type  of  test that  corresponds to  what  you  are  trying  to  establish. A   [inaudible 00:12:00]  in  the  report summarize  the  comparison  very  nicely and  makes  it  easy  for  you to  interpret  the  results. Next,  I'm  going  to  demonstrate equivalence  tests. I'm  going  to  use  the  data  set called  Drug  measurements. That  is  in JMP  sample data  as  my  first  example. Twelve  different  subjects  were  given three  different  drugs  A,  B  and  C. And  32 continuous  measurements  are  collected. We  go  to  theta  YX, and  we  load  the  response and  the treatment. This  will  bring  the  one  way  analysis. Under  the  red  triangle find  the  equivalence  test. There  are  two  options  means and  standard  deviations. We  are  going  to  focus on  means  in  this  talk. we  bring  the  dialogue  and  you  can  select a  test  that  you  would  like  to  conduct and  the  graph  will represent  the  selected  test. For  the  superiority  or   non-infererority test  there  are  two  scenarios. Large  difference  is  better  or smaller  difference  is  better. Choose  option  depend  on  the  situation. You  also  need  to  specify the  margin  here  for  the  delta. You  need  to  specify the  significance  level  alpha  as  well. You  can  choose  use  pooled  variance  or unequal  variance  to  run  the  test. You  can  do  all  pair  wise  comparison or  you  can  do  a  comparison with  the  control  group. We're  going  to  run an  equivalence  test  first and  we  will  specify  the  three  as margin  for  the  difference. We  click  the  Okay  button. Here  is  the  result of  the  equivalence  test. From  this  forest plot you  can  see  that the  confidence  interval  for  the  main difference  between  drug  A  and  drug  C is  completely  contained in  this  blue  equivalence  region. The  max  P-value  is  zero  less  than  .05. We  can  conclude to  the  .o5  significance  level. Drug  A  and  drug  C  are  equivalent. But  if  we  look  at  drug  A  and  B, drug  B  and  C we  can  see  their  confidence  interval of  the  main  difference is  both  beyond  this  blue  region. At  the  .05  significance  level we  cannot  conclude  that  drug  A  and  B or  drug  B  and  C  are  equivalent. Assume  drug  C  is  our  standard  drug and  we  would  like  to  find  out if  the  measurements  of  drug  A  or B  are  much  better  than  drug  C. We  can  run  a  superiority  test  to  prove. Let  me  close  this  outline  note  first and  we  bring  the  launch  dialogue  again. This  time  we're  going to  do  a  superiority  test. For  this  test  we  believe large  difference  is  better. Here   we  keep  this  selection. A lso  for  this  study  we  want  to  set drug  C  as  our  control  group. We  plug  in  the  delta,  the  margin .04  for  this  case  click  Okay  button. Here  is  the  result  for  superiority  test. From  the  forest  plot  you  can  easily see  that  the  confidence  interval of  mean  difference  between  drug  B  and  C is  completely  contained in  this  superior  region and  the  P-value is  less  than  .05 . We  conclude  that  drug  B is  superior  to  drug  C. The  confidence  interval  of  mean  difference between  drug  A  and  C  is beyond  this  blue  region. The  P-value  here is  much  bigger  than  .05 . we  conclude  at  the  .05  significance  level we  cannot  conclude  that drug  A  is  superior  to  drug  C. This  concludes  my  first  example. Now  I'm  going  to  use  a  second  example to  use  the  relative  risk between  two  proportions to  show  you  how to  conduct  a  non-inferiority  test. Bring  the  data  table. The  trial  is  try  to  compare  a  drug called  FIDAX  as  alternative  to  drug  VANCO for  the  treatment  of  colon  infections. Both  drugs  have  similar efficiency  and  safety. 221  out  of  224  patients  treat  with  FIDAX achieved  clinical cure by  the  end  of  study. Compare  to  223  out  of  257 patients  treated  with  VANCO. We're  going  to  launch  Fe Y  by  X  again. And  put  our  response  and  a  treatment variable  and  account  will  be  Freg. Since  the  response variable  is  categorical. Contingency  analysis  is  produced and  all  the  test  here  is  based on  classical  hypothesis  test. The  P-value  suggests  us that  we  cannot  conclude that  clinical  Q  is different  across  the  drug. But  for  this  study we  really  want  to  find  out if  drug  FIDAX  is  not inferior  to  drug  VANCO. We  go  to  red  triangle  menu, find  equivalent  test. There  are  risk  difference and  relative  risk. We  are  going  to  choose relative  risk  to  do  this  case. In  the  launch  dialogue we  choose  non-inferiority  test and  the  large  ratio is  preferred  by  us  for  this  study. We  also  need  to  find the  category  interest. For  this  study  we  select  yes as  a  category  of  interest and  we  also need  to  plug  in  our  ratio  margin  here. We  specify  zero  .09. We  click  Okay  button  and  here  is the  result  of  non-inferiority  test. From  the  forest  plot you  can  easily  see  that the  confidence  interval  for  the  relative risk  between  drug  FIDAX and  drug  VANCO is  completed contained  in  this  non-inferior  region. We  conclude at  the  .05  significance  level, drug  FIDAX  is  not inferior  to  drug  VANCO. This  concludes  my  talk  and  I will  give  it  back  to  Mark. Thank  you  JainFeng. I'm  going  to  now  talk  about  a  very common  procedure  called  method  comparison. It's  a  standard  practice  whenever  new measurements  are  being  developed. We  have  to  assume  that  there  is a  standard  method  that  exists  already to  measure  the  level  of  some  quantity. Perhaps  it's  the  temperature or  the  potency  of  a  drug. A new  method  has  been  developed for  some  reason. We  want  to  make  sure  that  its  performance is  comparable  to  the  standard  method. Today  there  are many  standards  that  have  been  developed over  many  years  by  various  organizations to  make  sure  that  this  is  done  properly. What  we  would  hope  is that  the  new  test  method ideally  returns  the  same value  as  the  standard  method. A scatter  plot  of  the  test  method versus  the  standard  method would  show  that  the  data  agree with  the  identity  line  Y=X . But  of  course  the  data  points won't  perfectly  agree because  of  measurement  error in  both  the  standard method  and  the  new  test  method. Regression  analysis  can  determine the  best  fit  line  for  this  data and  the  estimated  model  parameters  can be  compared  to  that  identity  line. This  ends  up  being  stated  in  the  two hypotheses  as  follows. The  null  hypothesis says  that  they're  not  comparable  and  so another  way  of  saying  that  is the  intercept  is  not  zero and  the  slope  is  not  one. The  alternative represents  our  claim  that the  new  method  is  comparable and  so  we  would  expect  the  intercept to  be  zero  and  the  slope  to  be  one. We'll  compare  by  using  regression and  ordinary  least squares  regression assumes  a  few  different  things. It  assumes  that  Y  and  X are  linearly  related. It  assumes  that  there  are  statistical errors  in  Y  but  not  in  X. These  statistical  errors  are  independent of  Y,  that  is,  they're  constant  for  all  Y. There's  no  data  that  exert excessive  influence  on  the  estimates. But  in  the  case  of  a  method  comparison, the  data  often  violate  one  or more  of  these  assumptions. There  are  measurement  errors in  the  standard  method  as  well. Also,  the  errors  are  not  always  constant, in  which  case  we  might  observe  that  the coefficient  of  variation  is  constant. That  is,  the  errors  are  proportional,  but the  standard  deviation  is  not  constant. Finally,  there  are  often  outliers  present that  can  strongly  influence the  estimation  of  these  parameters. other  regression  methods  can  help. Deming  regression will  simultaneously  minimize the  least  squared  error  in  both  Y  and  X and  Passing-B ablok  regression is  a  non-parametric  method. It's  based  on  the  median of  all  possible  pair-wise  slopes and  because  of  that  it's  resistant to  outliers  and  non- constant  errors. The  Deming  regression  is  available  in  JMP through  the  Bivariate  platform using  the  Fit  Orthogonal  command. Deming  regression  can  estimate the  regression  several  ways. It  can  estimate  the  error  in  both  Y  and  X, or  it  can  assume  that the  error  in  Y  and  X  are  equal, or  it  can  use  a  given ratio  of  error  of  Y  to  X. Passing  Bablo k  is  now  available  in  JMP  17, again  through  the  Bivariate  platform using  the  Fit  passing  Bablok  command. It  also  includes  checks for  the  assumptions  that the  measurements are  highly  positively  correlated and  exhibit  a  linear  relationship. There's  also  a  comparison  by  difference. The  Bland- Altman  analysis  compares the  pair-wise  differences to  the  pair-wise  means to  assess  the  bias between  these  two  measurements. The  results  are  presented  in  a  scatter plot  of  Y  versus  X  for  your  examination and  also  to  see  if  there  are  any anomalies  in  the  differences. This  is  all  provided  through the  Match  Pairs  platform  along  with several  hypothesis  tests. I'll  now  demonstrate  these  features. I'm  going  to  show  you Deming  regression  for  completeness that's  actually  been available  in  JMP  for  many  years. I'm  going  to  use  a  data  table  which  has measurements  for  20  samples by  the  standard  method and  then  four  different  test  methods. I'm  just  going  to  use  method  1. I  start  this  by  selecting the  analyze  menu  set  Y  by  X. The  standard  goes  in  the  X  roll, while  the  method  one  goes  in  the  Y  roll. Here  we  have  the  scatter plot  to  begin  with. I'll  click  the  red  triangle and  select  Fit  Orthogonal and  you  can  see  the  different  choices  I mentioned  just  a  moment  ago. I'm  going  to  have  JMP  estimate the  errors  in  Y  and  X. There's  a  best  fit  line  using  Deming regression  along  with information  about  that. we  can  see  that  our  intercept for  the  estimated  line  is  close  to  zero, our  slope  is  close  to  one,  and  in  fact our  confidence  interval  includes  one. Now  I'm  going  to show  you  Passing  Bad blok. I  return  to  the  same  red  triangle, select  Fit  passing  Ba blok, and  a  new  fitted  line is  added  to  my  scatter  plot. It  looks  very  much  like  the  result from  the  Deming  regression. But  remember  that  Passing  Ba blok is  resistant  to  outliers or  non-constant  variance. First  we  have  Kendall's  test that  is  telling  us  about  the  correlation. Positive  correlation is  statistically  significant. We  then  have  a  test, a  check  for  linearity, and  we  have  a  high  P- value  here indicating  we  cannot  reject  linearity. Finally  we  have the  regression  results. I  see  that  I  have  an  intercept  close to  one,  but  the  interval  includes  zero, so  I  can't  reject  zero. The  slope  is  close  to  one. My  interval  includes  one, so  I  can't  reject  that  the  slope  is  one. Finally, using  Passing  Ba blok  this  curve  menu, I'll  click  the  red  triangle and  select  Bland- Altman  analysis. This  launches  the  Match  Pairs  platform, so  it's  a  separate  window. Here  we  are  looking at  the  pair-wise  differences between  method  one  and  the  standard versus  the  mean  of  those  two  values. We're  using  this  to  assess  a  bias. The  Bland- Altman a nalysis is reported  at  the  bottom. The  bias  is  the  average  difference. We  hope  that  it's  zero. The  estimate  is  not  exactly  zero, but  we  can  see  that the  confidence  interval  includes  zero, so  we  would  not  reject  zero. We  also  then  have  lower  limits of  agreement, and  we  see  that  they also  include  zero  as  well. The  standard  methods  that  are  used when  comparing  two  measurement  methods are  now  available  in  JMP  17. That  concludes  our  presentation. Thank  you  for  watching.
This presentation demonstrates how to design a HIIT (High-Intensity Interval Training) profile to help a type 2 diabetes patient avoid insulin glargine injections. In addition to meal control and taking metformin and/or insulin, diabetes patients should exercise at a higher heart rate to burn sugar faster.   A full factorial DOE of treadmill settings (incline and speed) was conducted to build a heart rate RSM model to design the optimal HIIT profile. Based on the RSM model, interaction effects were all very small, which may indicate the treadmill heart rate model is not coupling (complicate). Heart rate is linearly proportional to incline level (potential energy when incline angle is small) and in quadratic form with speed (kinetic energy). To avoid injury to the knee/foot and ACL (anterior cruciate ligament), jumping patterns were studied using 3D motion biomechanics modeling. The fatigued muscles could not hold the knee stable or provide sufficient knee cushion during the shorter soft landing, which could increase the risk of an ACL injury during the second hard landing period.   By using the Model Driven SPC, the injury mechanism was studied to determine the treadmill's highest speed limit for this diabetes patient. Through these ACL risk studies, the HIIT profile has been further optimized to consider these ACL design constraints. By following the HIIT profile that was designed with JMP, this diabetic patient has seen significant reduction of blood glucose levels and serum readings (falling from over 200 mg/dL to near 75 mg/dL) in four months.     Hi  everyone.  I'm  Mason. Today  I'll  be  presenting a  project  on  designing  a  treadmill exercise  plan  for  diabetes  patients. Just  to  give  a  bit  of  a  background as  to  why  did  this  project. In  the  spring  of  21, one  of  my  family  members  was  told that  he  had  type  two  diabetes. A  follow-up report  in  the  summer  of  21 showed  that  his  glucose  level  was  higher than  200  milligram  per  deciliter, which  is  much  higher than  the  normal  glucose  range of  65  to  99 milli grams  per  deciliter. A t  the  same  time, I  wanted  to  conduct  a  project to  analyze  exercise  data, especially  because  diabetes  is so  common  across  human  race. We  did  this  project  on  designing a  treadmill  program  for  my  family  member. A fter  following this  plan  for  a  few  months, his  glucose  level  went  back to  the  normal  range  in  the  fall  of  21. To define  our project  we  want  to  listen to  the  voice  of  the  customer, which  is  our  doctor, who  provides  his  advice  on  what my  family  member  should  do. The  doctor  suggests  to  see how  mammals take  metformin  and  insulin and  also  exercise more  intensely  to  burn  sugar. In  this  project  will  focus  on  the  last piece   of  advice  because the  other three  are  quite  easy  to  follow. But  we  don't  quite  know  you  know  how to  exercise  the  most  efficiently  yet. We  need  to  translate  this  advice to  what  we  will  do, which  is  critical  to  quality. Our goal  is  to  design a  treadmill  program  specifically focusing  on  the  legs, to  strengthen  the  lower  body  muscle, prevent  injury  and  also help  cure  diabetes. In  more  quantitative  terms, we  want  to  lower  live  glucose  levels to  below  100  mg  per  deciliter  and  also reduce  the  resting  heart  rate. A s  healthier  individuals, usually  have  a  lower  heart  rate, since  it  takes  more rigorous  exercise  for  them to  lack  the  same  amount of  oxygen  and  have  a  higher  heart  rate. Just  introduce  the  team. The  project  leader  is  me. We  will  have  a  52.5  year  old  diabetes patient  as  the  experimental  subject, who  will  monitor his  daily  blood  glucose level. Our  family  doctor  is  also  in  the  team who  will  follow  up  with  the  diabetes patient  every  three  months. We  will  also  have  two  advisors, a  Six  Sigma  advisor to  assist  in  the  DIMAC framework, and  a  JMP  advisor  who  will help  in  the  physical  analysis. To  design  our  treadmill  program, we  wanted  to  know  how intense  we  should  exercise. So  doing  exercise  for  normal  individuals, it's  recommended  to  reach  50  to  85  % of  your  maximum  heart  rate for  the  exercise  to  be  effective. However,  our  patient  is  at  a  moderate to  higher  risk  of  having  a  heart  attack. So  and  that's  because his  calcium  score  was  131, which  is  from  the  coronary  artery heart  attack  risk  assessment, which  is  at  the  72nd  percentile for  his  age. The  family  doctor  advised  to  limit the  upper  bound of  the  target  heart  rate  to  just  80 % because  a  too  vigorous  and  intense exercise  can  lead  to  heart  attacks. But  to  accommodate  that  drop  in  the  upper limit  of  the  target  heart  rate, we  also  increased  the  lower limit  from  50 %  to  65 %. Now  for  the  exercise,  specifically, we  chose  to  do  brisk  walking, since  one  leg  will  always be  on  the  ground  and   [inaudible 00:03:52] so  brisk  walking  helps  to  protect the  knee  and  lower  injury  risk. Also  choosing  brisk  walking  over  running helps  prevent  heart  attacks, because  if  we  do  run,  we  may  accidentally go  over  80 %  of  the  maximum  heart  rate. So  to  determine  the  upper  and  lower balance  of  our  target  heart  rate, we  have  to  first calculate  the  maximum heart  rate,  which  is  220  minus  the  years. For  a  52.5  year  old,  the  maximum  heart rate  would  be  167.5  beats  per  minute. And  the  upper  limits  for  the  target  heart rate  would  be  134  beats  per  minute and  a  lower  limit  would  be 109  beats per minute. As  you  might  recall,  one  of  our  goals is  to  reduce  the  resting  heart  rate. We  want  to  lower the  heart  resting  heart because  doing  so  makes the  heart  muscles  stronger, and  as  a  result helps  prevent  heart  attacks. When  we  strengthen  the  heart  muscles, the  heart  pumps  more  blood and  more  oxygen  is  available. So  now  that  we've  set  our  goals and  what  we'll  be  levering, we  can  design  our  treadmill  program. We'll  be  considering three  control  variables. The  first is  walking  uphill,  so  whether we  want  to  add  incline  or  speed. The  second  is  HIIT or   High Intensity Interval Training, which  involves  a  short  period  of  intense exercise  followed  by  a  recovery  period. And  we  need  to  design this  HIIT  workout  so  that the  heart  rate  does  not  go  below 65 %,  but  maximum  high  rate, or  above  80 %  of  the  maximum  heart  rate. The  third variable  is  frequency, which  is, how  many  times  we  will  conduct this  exercise  every  week, and  also  how  long  for  each  time. To  set  up  our  experimental  design, we  chose  to  alter  two  variables, incline  and  speed. An i ncline  has  two  levels, zero  or  five  degrees, and  speed  has nine  levels,  from  zero  to  3.6  mph. Now  the  most  vigorous  level  would  be at  an  incline  of  five  degrees and  a  speed  of  3.6  mph. We  don't  want  to  increase the  speed  cast  3.6  mph because  that  we  transitioning to  running,  which  we  do  not  want, since  we  want  to  focus  on brisk  walking, and  also  don't  want  to  exceed the  80 %  the  maximum  heart rate. I  also  want  to  add  rests  after  each exercise,  so  that  the  patient  returns to  resting  heart  rate  before undergoing  another  treatment. We  ran  stepwise  regression on  the  response  for  surface  design. Our  model  has  a  pretty  higher  R- square of  97 %  and  a  p-value  less  than  0.05. For  the   studentized  residuals, which  are  the  residuals  that  underground [inaudible 00:06:39]   , only  about  one  point  goes  over  the  green line,  which  is  two  2 cm  to  the  mean and  the  red  line  represents three  studentized  deviations. So  jump  towards  the  most  significant variables,  which  include the  two  main effects,  both  incline  and  speed, the  interaction  term  between  incline speed  and  the  quadratic  term  for  speed. Why  did  the  model  include  the  quadratic term  for  speed,  but  not  inclined? Well,  if  we  look  at  the  interaction profiles  on  the  right, we  can  see  that  heart  rate  has a  linear  correlation  with  incline and  occur relationship  with  speed. We  can  explain  the  linear  relationship between  heart  rate  and  inclined as  due  to  potential  energy, which  is  mgh, mass  X  gravity  X  height. So  height  is  a  linear  term. When  the  angle  of  the  incline  is  small enough,  we  can  use  the  mgh  approximation. So  the  relationship  is linear  based  on  physics. On  the  other  hand, speed  is  connected  to  kinetic  energy, which  is  100  squared  or  one  half X mass  X Velocity squared . So  kinetic  energy has  a  quadratic  speed term. From  the  bottom  two  propellers, we  see  that  we  can  reach  the  lower  bound of  the  target  heart  rate at  109  beats per minute at  an  incline  of  zero  degrees and  a  speed  of  2.9   mph. In  the  improved   [inaudible 00:08:06]   , we  won't  need  to  include easier  settings  than  these  levels, since  the  heart rate  will  then  be  too  low. So  we  don't  want  to  go  under  the  60%. Also,  the  upper  bound  of  the  target heart  rate  is  reached  at  an  incline  of five  degrees and  a  speed  of  3.5  mph. So  2.6  mph  is  a  good maximum  level  for  speed. We  also  want  to  prevent  injury  risk in  addition  to  managing  diabetes, which  is  our  second  objective. More  than  80 %  of  runs are  injured  each  year, and  some  of  the  most  common  ones  include Achilles  tendonitis, G  splits  and  hamstringing  injuries. We  also  wanted  to  avoid  injuries, so  we  made  sure  that  the  patient  was  using correct  form  while  brisk  walking by  keeping  their  head  up  neck  relaxed and  back  straights. In  addition  to  posture,  muscle coordination  is  also  really  important to  preventing lower  body  injuries. So  three  motion  around  mechanics  studies the  correct  angle  of  joints  relative  to each  other  in  order  to  lower  injury  risk. Centers allow  us  to  measure and  monitor  the  angles  of  joints relative  to  each  other. We  can  also  conduct  exercise  on [inaudible 00:09:19]   , to  look  at  which  places  on  the  feet hit  the  ground   [inaudible 00:09:] , Based  on  the  acces, whether  the  runner is  using  correct  form  or  not. Just  to  take  a  quick  detour  in, I  guess,  a  greater  study  of  injury  risk. The  first thing  that  we  did to  study  injury  risk, is  we  conduct  variable  clustering, which  groups the  different  sports  together. You  can  see  that  every  sport has  different  injury  areas. For  example,  cluster  one  and  cluster three  have  different  pattern. So  one  targets  the  lower  body, which  makes  sense  as  it  consists of  basketball,  soccer,  foot  skating, tennis,  which  I'll  use the  lower  blood  extensively. Three  as  more  upper  body  injuries, as  it  consists  of  golf, volleyball ,  weightlifting. You  can  see  that  these  clusters  are differentiated  quite  well  from the  principal  component  analysis. Any,  I  guess,  exercise  plan  that  is used  for  running,  for  example, can  be  modified  for  the  other  sports. It's  an  efficient  way  of  both designing  access  to  plan as  well  as  studying  injury  risk. The  specific  injury  risk  that  we  looked  at as  a  result  of  running was  anterior  cruciate  ligament, because  it  is  a  common  injury in  a  lot  of  sports that  used  to  lower  body  muscle, such  as  basketball,  for  example. And  ACL  is  located at  the  center  of  the  knee  joint, from  the  backside  of  the  thigh  bone or  the  femur, to  the  front the   shinbone  or  the  tibia. The  image  shows  the  three  othe important  ligaments  of  the knee, the  LCL,  the  MCL  and  the  PCL. These  four  ligaments  are  crucial to  protecting  the  ACL  from  the  injury, especially  the  lateral  collateral ligament,  as  well  as, the  lateral  and  medial which  are  pieces  of  cartilage that  further  cushions  the  ACL. ACL  injuries  occur  when  the  tibia, or  the   shinbone  moves  two  foot  forward and  is  hyper  extended,  so, in  other  words,  straining  too  much. That  causes  the  ACL  to  tear. This  can  be  caused  by  a  variety of  ways,  such  as  sudden  desolation or  pivoting  in  places, or  when  the  foot  is  planted and  the  body  changes  direction suddenly. These  movements  are  common, in basketball, I said, but  also  football, soccer,   downhill  skiing, and  mostly  this sports, of  course,  use  a  lot  of  running. So we  want  to  understand how  ACL  injury  can  be  altered before  an  act  fatigue, specifically  in  the  context  of  running, as  part  of  this  project that focuses  on  running  and  injury  risk. To  understand  the  connection  between fatigue  and  ACL  injury, we  wanted  to  conduct  an  experiment to  measure  how  fatigue and  ACL  injury  risk  are  related. We  need  to  choose  an  exercise that  can  compare before  and  after fatigue  flexion  and  forces. And  choosing  the  right  exercise  that  can accurately  measure  ACL  injury  risk is  really  important  because, after  we  consulted with  the  local  physical  therapist, we  found  that the   countermovement jump can  assess  the  ACL  injury  risk  quite  well through  force and  flexion  of  different  body  parts. Before  I  go  into  what  exactly  is a   countermovement jump, let  me  tell  you why  we  chose  this  exercise  specifically. The   countermovement jump  is  a  jump. So it can  assess  how  much  force  your knee  puts  on  the  ground. AO nce  again,  Newton's  third law  comes into  play  here  physics, as  the  same  amount  of  force from  your  knee  to  the  ground is  experienced by  the  knee  from  the  ground. Too  much  force  onto  the  ground  can increase  ACL  injury  risk as  your knee  experiences  too  much  force. And  this  is  how  it  can  land  awkwardly and   [inaudible 00:13:33]   the  ACL. In  addition  to  force, self  coordination  between  flexion and  extension  of  hip  sneezes  and  ankles are  really  important  when doing  this  exercise. Both  force  and  joint  flexion  are connected  as  how  well  the  test  subject effectively  transitions  from  flexion extension  during  the  exercise is  reflected  in  the  amount  of  force they've  put  on  the  ground. This  is  why  we  chose the   countermovement jump, because  it  enables  us  to  compare the  before  and  after  fatigue state  for  both  flexion  and  force, which  are  the  two  most  important factors  related  to  ACL  injury. How  does  the  countermovement jump  work? There's  five  main  exercise, as  you  can  see  here,  the  unleaded, breaking,  propulsive  flights  and  landing. Five  images  on  the  top, or  an  example  of  where  the  test  subject  is at  each  of  these  phases  in  the  exercise. The  bottom  graph  shows  the  time versus  force  exerted  on  the  ground. For  the  graph  on  the  bottom, I'll  focus  on  the  top  curve. So  the  darkest  blue  curve. As  that  is  the  total  force, whereas  the  two  curves  below  it are  the  left  and  right  forces. The  first phase  of  a  countermovement jump is  the  unrated  phase. When  the  person  is  standing  upright, and  is  currently the  orange  portion  of  the  graph. Now  the  force  briefly decreases  before  coming  back  up as  the  person  continues bending  their  knees. When  they  reach maximum  knee  and  hip  flexion at  the  bottom  of  their  prejump, which  is  the  breaking  phase, they  start  extending  their  body, which  is  propulsive  fit. A smooth  transition from  breaking  to  propulsive is  reflected in  a  smooth  curve  over  here. The  smoother  the  curve,  the  more the  knee  and  hip  are  coordinated  well. Now  the  flight  time  is  when  the  force  is zero,  before  the  ending, before  ending  the  landing  phase. As  you  can  see,  the  huge  spike  in the  amount  of  force  in  the  landing  phase, that  is  when  the  subject  lands. The  first major  peak is  the  soft  landing. So  it's  the  light  blue  dots. When  the  person  lands  on  their  toe   first is  a  soft  landing before  hitting  the  hard  landing, which  is  when  the  soles of  the  feet  touch  the  ground. That's  the  light grey  dot. Doing  the  soft  landing  period is  when  hip  and  knee flexion  can  help  balance  the  force across  different  body  parts  so  that the  knee  isn't  the  only  one experiencing  all  of  the  force. That  can  help  reduce  ACL injuries. But  if  the  hard  landing or  the  second  peak  has  too  much  force, that's  when  there  can  be a  greater  risk  of  ACL  injury, as  that's  when the  whole  foot  runs  on  the  ground. In  addition  to  the  general  flexion and  force  patterns, we'll  be  looking  to  see  if  there's  any  difference in  the  soft  and  hard landings  before  and  after  fatigue. This  brings  me  to  experimental  design. We  wanted  to  measure the  flexion  of  your  different  joints, such  as  ankle,  hips  and  knees, to  study  them  in  detail  further, as  they  reflect  how fatigued  the  muscles  are. The  more  the  muscles  are  fatigued, the  greater  the  ACL injuries . To  measure  those  joints, we  used  several  different  sensors  that  can measure  all  of  these  joints  together, and  we  attach  them  to  the  test subject,  as  seen  on  the  right. Two  on  the  bilateral  thigh, two  on  the  bilateral shank , and  two  on   the bilateral dorsum. Four on the front side,  and  one on  the  pelvis  for  the  backside. A fter  calibrating  our  sensors, the  test  subject  did  ten  runs of   countermovement jumps. He jump  ten  times  on  force weight  to  measure  the  force. A fterwards  he  ran,  squatted,  played basketball  jumps,  did  some  cone  drills anything to  get  fatigued  for  an  hour. We  decided  1 h  would  be  enough  fatigue because  it  was  pretty  hot  outside, when  we  did  this  experiment. After fatigue, we  put  back  the  sensors and  he  conducted  the  ten  trials  of the  counter  movement  jump  once  again. We  collected  our  data through  a  biomedical software called Meloxicam that  enabled  us  to  simulate the  different  degrees and  angles  of  bending for  several  different  joints, as  well  as the  forces  on  the  ground. When  we  look  into the  individual  force  profiles, comparing  before and after fatigue , we  can  observe  even  more differences  in  the  two  behaviors. The  prejump,  which  is  the  transition  from the  breaking  simple  to  the  propulsive, is  a  lot  smoother for  before  than  after  fatigue. We  can  see  a  minor  plateau during  the  after fatigue which  could  indicate  that  the  different body  parts  are  not  oriented  as  well after fatigue. The  also  for  the  landing  period, the  heart  landing  and  the  soft  landing, while  our  contrast  is  quite  huge for  before  fatigue, but  the  contrast isn't  as  large  for  active  fatigue. The  soft  landing  is  important,  once  again, because  only the  toe  touches  the  ground. So  it  doesn't  increase  ACL  injury  risk as  much  as  compared  to  the  greater force  during  the  hard  landing. The  hard  and  soft  landing  contrast  isn't as  great  for  active  fatigue, which  may  increase  the  ACl  injury risk during  the   [inaudible 00:18:59] . This  may  have  been  due  to  the  muscles not  being  able  to  hold  the  knee as  stable  during  after  fatigue. So  the  force  for  the  soft  landing  wasn't too  much  greater  than  the  hard  landing. We  want  to  know if  there  are  any  other  platforms besides a  multivariate control  that we  can  use  to  help  us  find at  what  time  is  the  difference between  before  and  after  fatigue the  most. The  multivariate  SBC  control  chart helps  us  visualize  the  differences. The  top  right  corner  is a  screenshot  of  the  different  trials. A ll  of  the  six  variables  we study  are  considered  in  that  graph. We  use  the  T- square  chart because  it  can  help  us  detect the  relationship between  the  six  variables  that  we  chose. So  hip,  ankle  and  knee  flexion for  both  the  right  and  left  side. The  red  line  is  the  T-  squared on  the  limit. And  outliers  are  points  that  do exceed  this  upper  control  limit. Is  a  good  thing, because  it  means  tha there's  more  contrast between  the  jumping and  the  lining  behavior. If  you  look  more in-depth in  one  of  these  specific, I  guess,   [inaudible 00:20:13]   spikes, which  each  represent  one  trial for  before and after fatigue You  can  see  that  we  outline  five  main points  for  one  trial, before fatigue and after  fatigue, To  help visualize  the  differences. The  biggest  difference  should  be in  points  two  and  four, since  two  is  right  before the  chest  center  leaves  it  there. So  it  should  have one  of  the  highest  flexion because  the  knee  are  bent  the  most  there. Four,  it  should  also  be  similarly  as high  as  two  because, it's  when  the   [inaudible 00:20:45] the  ground and  lands  on the ground, and  the  knees  are  bent the  most. You  can  see  that  before  fatigue, points  two  and  four  are  way  above the  upper  controller and  quite  different from  one,  three  and  five. But  for  after  fatigue, the  contrast  is  much  less  obvious. We'll  try  to  understand  why  that  is and  connect  us  back to  our  research  on  ACL injury. Discussed  previously  by  looking at  the  specific  contributions of  each  of  these  flexions  and  joints in  each  of  these  five  points. But  the  multivariate control chart for  now  tells  us  specifically that  points  two and four   are  when the  before  and  after fatigue  defer  the  most. As  I  said,  we  are  going  to  look into  the  specific  flexion  components. So  the  top  portion is  the  before  fatigue,  top  row. The  bottom  is  the  active  fatigue. Now  these  three  joints can  really  detect  the  difference between  the  before  and  after  fatigue, because  during  the  countermovement jump,  the  lower  body  is  fatigue. So  the  muscle  fatigue and  the  different  angle  flexion for  the  different  joints  is evident  when  we  compare  the  contributions. If  you  look  at  the  graphs starting  at  one, you  can  see  that  ankle has  the  greatest  contribution, where  hip  and  knee are not  so  much  for  after  fatigue. This  may  be  because  some of  the  muscles  are  already  fatigue and  only  some  muscles  contribute to  the  overall  flexion. Now  if  we  move  from  one  to  two for before fatigue , we  see  a  very  clear  transition, permit  even  distribution across  all  the  different  joints to  focusing  on  just  ankle  joint flexion . But  if  we  two,  the  other knee  and  hip  components still  somewhat  flexed  and  haven't been  able  to  reach  full  extension. We  have  these  4  bar  still providing  some  contribution. You  can  see  that  in  three, the  atrophy  contrast  between the  knee  and  hip  and  ankle is  also  not as  large  for  after  fatigue. Again,  there's  not  a  full extension  of  the  hips  and  knee. Then  for  the  first distribution, isn't  as  good  for  after  fatigue, as  the  knee  and  hips  are  already bending  at  the  same  time  as  ankle. So  the  soft  landing, which  is  at  point 4,  is  in  as  effective. And  finally,  in  five, the  ankle  is  still  flexed. It  seems  that  the  knee  hip  aren't  able to  support  the  body  now and  rely  only  on  ankle. This  may  indicate  the  lower  body in  general  is  really  fatigued, and  hip  and  knee  are  mostly  fatigued. As   we  don't  see  much contribution  from  them  the  fifth point. There's  less  flexion  for  these  two  joints, causing  a  greater  reliance  on  the  ankle, which  increases  ACL injury  risk. Now  back  to  our  treadmill  program. With  information  from  injury  risk, as  well  as  the  previous  research  on  HIIT, we  can  set  up  a  HIIT  workout  plan. We  designed  a  15  minute  workout  with, the  first 2  minute  for  warm  up, the  next  12  minutes for  three  cycles  of  exercises, consisting  of  2  minute  at  the  lower  bound of  the  target  heart  rate at  zero  inclined and  2.9  speed, and  2  minute  at  the  upper  bound of  the  heart  target rate at  five  inclined  and  3.5  speed. So  total  of  12  minutes and  then  one minute  cool  down. We  chose  relatively  short  time  period for  each  exercise so  that  the  patient  can work  out  for  a  longer  period  of  time without  getting  tired  too  quickly, which  may  have  happened  if, the  exercise  at  the  upper  bound  of  the target  heart  rate  was  done  for  too  long. To  prove  that  our  project  is  successful, we  will  need  to  validate  our  results. We  want  to  see  if  the  workout  plan  helped lower  the  diabetes  risk, which  can  be  seen  through  the  glucose reading,  and  the  resting  heart  rate. Heart disease  risk  as  well,  which  can  be measured  reducing  the  calcium  score. All  these  values  to  decrease  if the  treadmill  exercise  helps, we  may  also  want  to  revise  the  treadmill settings  every  three  to  six  months, because  the  resting  heart  rate  may  have decreased  due  to  stronger  heart  muscles. In  that  case,  we  may  want to  increase  incline  and  speed to  achieve  the  same  target  heart  rate, since  the  resting  heart rate is  now  lower  due  to  a  stronger  heart. So  in  conclusion, we  utilize  the  DMAIC  approach and   [inaudible 00:25:36]   methods to  help  the  patient  with  type  two diabetes  reduce  their glucose  levels while  preventing  them  from  getting a  heart  attack  or  getting  injured. We  also  designed  an  experimental  plan to  study  injury  risk, but  looking  at  joint  flexion as  well  as  force. We  used  the  DOE  to designed  as  transplant and  from  the  model  results, we  selected  the  settings at  109  beats  per  minute and  134  beats  per  minute  to  be  included in  a  15  minute  High Intensity  Interval  Training  workout. So  we're  currently  finishing the  improvement  control  phases and which we  hope  to  present at  a  future  conference. Yeah,  that's  all  I  have  for  today. Thanks  for  tuning  in.
JMP Pro 17 is a new standalone platform of choice for modern molecular-level data arising in such fields as genomics, metabalomics, and proteomics. Our previous product, JMP Genomics, relied on SAS for data import, processing, and analysis of the large data tables that are associated with -omic problems. New improvements in JMP Pro 17 provide an advanced level of capability and performance that allows it to stand on its own without the need for SAS.    However, the move from JMP Genomics to JMP Pro for Genomics revealed many aspects of JMP Pro that needed to improve. These improvements have pushed the boundaries of what the product can do so that it can now handle these large problems. As a result , JMP Pro 17 is one of the only advanced analytics software packages to provide a combination of interactive and engaging user experience that allows for rapid point-and-click exploration of -omics data, advanced multivariate and predictive modeling tools, and a flexible and adaptive platform (through JMP Scripting and integration with other data science tools).   After defining -omics, this presentation examines the types of data used for these problems, the technical challenges that come with preparing and analyzing large wide data tables, and how JMP Pro 17 addresses these challenges. Examples of just how easy it is to do -omic data analysis in JMP Pro 17 are also demonstrated.     Hi,  this  is  Sam  Gardner  with  JMP. I'm  a  Product  Manager  at   JMP. We're  here  to  talk  today about  introducing  JMP Pro  for  Genomics, pushing  the  boundaries  of   JMP Pro to  enable  data  science  on  the  desktop. I  am  one  of  the  presenters. I'll  be  doing  the  introduction to  this  topic. I'm  S enior  Product  Manager for  Health  and  Life  Sciences in the  Product  Management  team  at  JMP. Our co-presenter  today is  Russ   Wolfinger, who's  a  Distinguished  Research  Fellow and  our  Director  of  Scientific Discovery  and  Genomics  at  JMP. We'll  talk  a  little  bit about  the  background of  genetics  and  genomics, functional  genomics, and  then  talk  about  what  we're  doing to  transition  from  our  former  product, JMP  genomics, to  using  JMP  Pro  for  genomics. Russ  will  demonstrate  some of  the  new  capabilities  in  the  product. A  little  bit  about  classical  genetics. This  is  where  a  lot  of  this  got  started. People  have  been  doing classical  genetics  for  a  long  time. They've  been  breeding  plants  and  animals to  get  desired  traits for  those  plants  and  animals. They've  seen  that  they  can  do  that to  get,  stronger  animals,  better  plants, plants with  desired  properties  and  so  on. You  probably  studied  a  long  time  ago, when  you  were  young  in  school, about  Gregor  Mendel,  the  monk, who  spent  many  years  studying  garden  peas. He  actually  measured  seven  distinct characteristics  of  these  peas— their  height,  their  pod  shape  and  color, seed  shape  and  color, flower  position  and  color— and  observed  that  as  these  peas were  crossbred  with  each  other, that  the  traits  were  passed  on from  the  parent  plants to  the  progeny  plants following  some  rather specific  mathematical  ratios who have made  it probabilistically  possible to  make  predictions about  what  the  progeny  would  look  like based  on  the  traits  of  the parents. His and  later  work  established the  principles  of  genetic  inheritance. What  is  genomics? Genomics is  more than  just  classical  genetics. Genomics  uses  a  combination of  DNA  measurement  methods and  recombinant  DNA  methods to  sequence  and  assemble  and  analyze the  structure  and  function  of  genomes. It  differs  from  classical  genetics in that  it  looks at  the  organism's  full  complement of  genetic  or  hereditary  material. It  focuses  on  the  interactions between  the  loci  or  the  location of  different  genes  on  the  genome, and  the  alleles, the  variation  in  the  genes  in  the  genome, so  that  you  can  understand  things like  epistasis,  pleiotropic  heterosis, which  are  things  like,  okay, one  gene  affects  many  things. That's  pleiotropy. Epistasis is  that  sometimes, one  gene  impacts  the  output or  the  effect  of  another  gene. Heterosis  is  sometimes  you  get synergistic  effects  by  combining  the  genes from  two  different  parents or  two  different  organisms. This  all  relies  upon  the  use of  the  central  dogma  of  genomics. That  dogma  is  that  DNA,  which is  the  code  for  our  biological  systems, is  transcribed  into  RNA, which  is  the  code that's  used  to  make  things and  make  proteins  in  the  body. The  proteins are  the  little  chemical  engines that  do  things  inside  the  body and  give it  its  function. From  that,  you  can  actually  then measure  things  like  metabolites, what  actually  happens, what  do  those  proteins  actually  do inside  the  cells  and  inside  the  body. The  path  is  DNA  creates  RNA creates  protein, and  the  protein  regulates how  things  function  in  the  body, and  that  produces  metabolites. Data  is  really  enabling a  genomics  revolution. Modern  measurement  techniques are  really  helping  us  understand the  structure  and  function of  the  genome and  how  it  works  inside  the  cells in  biological  system. We  can  sequence  the  genome now. We've  got  next- generation  sequencing. Many  years  ago, when  JMP  first moved  into  this  area, helping  customers to be able to  analyze  this  type  of  data, the  way  to  measure  it  was  microwaves, which  was  much  more  focused on  very  specific  parts  of  the  genome, and  oftentimes  a  very  limited  set of  genes  in  the  genome. Now,  you  can  sequence the  whole  genome  of  an  organism. Also,  you  can  look  at  things like  expression  and  regulation. We're  talking  about  the  metabolites. What  is the  output  into  the  biological system  that  you  can  measure? You  can  look  at how  the  proteins  are  produced or  what   those  proteins  are  doing. You  can  also  look  at how  the  structure  of  the  DNA  itself, what's  called  epigenetics, impacts  the  function  of how  DNA  works and  how  the  genes  work  inside  the  body. There  are  typically three  main  stages  of  analysis  that  happen when  you're  doing  this  type  of  work. One  is  you  just  generate  the  raw  data. You  do  the  sequencing  work, generate  the  genome- sequencing  data, or  measure  the  metabolites or  the  protein  expression or  the  RNA  expression. And  then  that  generates pretty  large  data  sets that  have  to  be  filtered and de- multiplexed and  trimmed  and  scored and  cleaned  up. This  is  typically  handled in  a  automated  or  semiautomated  workflow on  computer  systems  that  can process  very  large  data  files. Then  it  typically  goes  into  a  second  stage where  you  start  to  do  sequence  alignment and  basically  lining  things  up, and  being  able  to  do  things  like  counts. How  many  times  did  I  see  the  expression of  a  particular  RNA  fragment or  RNA  sequence? Or  how  many  times  did  I  see a  particular  protein? Or  all  this  raw  data,  how  does  it  line  up to  actually  make  a  picture of  what  the  structure of  the  whole  genome  is? That's  a  pretty  big mathematical  computational  process. That  typically  also  gets  done on  pretty  large  computational  systems with  a  lot  of  computational  resources. And  then  the  third stage, which  is  the  stage where  JMP  really  has  played  in, and   where JMP  Pro will  continue  to  play  in, is  the  determining  genotype  associations and  genotype-to- phenotype  relationships. A  phenotype  is  just  a  trait  of   organisms, the  relationship between  the  genes  and  the  traits. And  also  looking at  correlations  and  associations of  the  different  genetic  markers inside  the  genome, or  the   variance  of  the  genetic  markers. Oftentimes,  what you   want to  do is  you  want  to  characterize  those and  then  correlate  them to  physical,  biological, or  maybe  disease  state  characteristics. All  of  this  can  actually  be  done with  desktop  software. JMP  Pro  is  our  solution  to  do  that going  forward  in  the  future. We've  had  a  product  called  JMP  Genomics for  14  years,  up  until  this  year, that  we  were  providing  the  customers. It  was  a  combination  product of  JMP  and  SAS. SAS was  really  needed  back  early when  we  first put  this  out to  do  a  lot  of  the  data  processing, because  the  size and  the  types  of  data  we  looked  at was  very  difficult  to  do with  a  desktop  software  package  like  JMP. SAS did  the  data  processing, some  of  the  statistical  methods, but  JMP  was  used for  further  statistical  analysis and  visualizing the  results  of  those  analysis. JMP  Genomics  has  been used in research  and  industry for  a  wide  variety  of  genomics  problems for  many  years. But  we  made  a  strategic  decision  this  year to  discontinue  selling  products that  contain  SAS with  them. That's  part  of  the  decision  that  was  made for  JMP  to  become  an independent  company. We're  a  wholly-owned  subsidiary of  SAS  now, and  are  moving  down that  road  of  independence. We  are  not  going  to  be  selling  anything but  JMP  products  going  forward. Because  of  that, we  have  looked  now to  move  the  functions for  genomic  data  analysis  into  JMP  Pro. In  JMP  Pro  17,  which  will be  available  this  fall  in  2022, has  been  and  will  be  optimized for   big  and  wide  data  problems. It's  going  to  have  capabilities to  meet  the  needs of  genomic  data  science and  genomic  data  scientists. It's  going  to  utilize  the  strength of   JMP Pro's  predictive  analytics and  interactive  visualization to  help  enable  discoveries in  this  area  of  work. Some  of  the  enhancements  that  we've  made to  push  the  boundaries  of  JMP  Pro include  just  removing  barriers and  bottlenecks  in  the  software. It's  one  thing  to  do  analysis on  tens  or  hundreds  or  even  thousands of  columns  in  a  data  table. But  when  you  have  a  data  table which  maybe  has  many  thousands or  hundreds  of  thousands  of  columns, you  start  to  reveal  limitations sometimes  in  your  software. By  doing  this  work, we've  uncovered  places where  we  just  need  to  streamline how  operations  happen  inside  the  program. We've  done  that. An  example  would  be if  I  wanted  to  do  a  transformation on  hundreds  of  thousands  of  columns, we've  significantly  improved  that  process. It happens  much  faster on  the  data  tables. Also  being  able  to  do  very  fast  and efficient   multivariate  analysis  methods like  principal  component  analysis and  clustering, when  you  have  these really  wide  genomic  data  tables. And  then  being  able  to  do  models over  and  over  again on  thousands  and  thousands of  response  columns, and  to  do  that efficiently  and  effectively. The  second  goal that  we  have  in  this  transition is  that  bring  in  some  capabilities in  the  JMP  Pro that  are  very  specific for  genetic  and  genomic  analysis. For  instance,  being  able to  import  different  formats that  are  commonly  used  in  this  area. Also,  being  able  to  do genetic  marker  analysis  and  simulation, as  well  as  bringing  in  some newer  popular  data  reduction  methods such  as  t-SNE  and  Unimap. Overall,  what  we're  getting  to is  a  product  that's  going  to  be  lean. It installs  very  quickly. You  can  use  it  on  your  desktop, but  you  can  use  it  to  do this  very  powerful  analysis on  these  large, complex,  wide  data  tables. To  illustrate  that, I'm  going  to  turn  it  over  to  Russ. Russ  is  going  to  show  us  actually  how you  can  do  some  realistic  analysis and  some  real  study  analysis  here on  some  genomic  and  genetic  data. Well,  thank  you,  Sam. It's  a  real  exciting  time  for  us. I  know  I've  actually  been with   the  genomics  analysis  revolution within  SAS  for  over  20  years  now. We  actually   [inaudible 00:11:46]  in  the  early  2000s called  Scientific  Solutions, where  we  were  starting  to  look  at some  of  the  early  micro array  data. It's  been  a  really  fun  20  years. Now,  I  would  say,  almost  one of  the  most  exciting  times  ever  for  us, where  we're  now  able  to  code some  of  these  routines directly  in  JMP  pro  using  C++. A  lot  of  them  are  running much  faster  than  we  had in  the  previous  JMP  Genomics  product. I  want  to  give  you  a  little f lavor of  that  today  with  an  example. This  is  a  data  set  on   loblolly pines, which  for  those  of  you from  the  Southeast might  know  it  as  probably  one of  the  most  popular  species  of  pine. Typically,  if  you  go into  Home Depot  or Lowe's and  buy  some  two- by- fours  or  plywood, it's  going  to  be  made  of  l oblolly. When  you  fly  into  the  area, you   happen to see  a  lot  of  tree  cover. Many  of  those, I'd  say  a  good  chunk  of  those  trees, especially  towards  the  Eastern  part of   North  Carolina,  are  lobl ollies. It's  a  very  important  species,  one that  we  really  want  to  understand  well. It's  been  studied  very  thoroughly, and  even  more so  now that  we've  got  some  crunches  going  on with  home  building  and  what  have  you, it's  critical  to  understand  it inside  and  out. Genomic  technology  is  fantastic for  revealing  some  things that  we  just  never  knew  before. This  data  is  actually  still  10  years  old. It  was  from  a  paper  in  the  Journal of  Genetics  by  Resende  et  al. This  is  a  group  of  researchers from  the  University  of  Florida and   Embrapa   in  Brazil and  University  of  Iowa,  I  believe, if I  recall  correctly. Here's  the  reference if  you  want  to  look  it  up. The  data  are  also  freely  available. I've  got  them. I  went  ahead  and  downloaded  them from  the  supplemental  information and  loaded  them into  a  JMP  table  that  you  see  here. As  Sam  was  mentioning, the format   in JMP  Pro is  what  we  typically like  to  call  a  wide  format, where  we've  got  everything  in  one  table. Here,  we've  got  some  genotype indicator  numbers  indicating  the  lines as  well  as  the  mother  and  father that  the  trees  came  from. And  then  this  specific  data  set that  I've  got  here, we've  got  six  traits that  we've  measured. I  believe  actually  there's  more. I think  there's  17, if  you  want  to  see  the  reference. Our  key  focus  of  interest are  these  genetic  markers. This  data  set's  small by  today's  standards. We've  only  got  4,800. I  say  "Only  4,800" but that's  still  quite  a  few. As  you  can  see, I'm  scrolling  through  here, they're  all  coded as  either  zero,  one,  or two. These  are  so-called  SNP  markers, single  nucleotide  polymorphisms, where  we'll  have  either... The  number  here  indicates the  number  of  the  major  allele that  we  have  in  the  data. Zero  would  be  the  little A, little A, if you're  familiar with  the  old  genetics  notation. The  twos  would  be  the  big  A, big A. The  ones  would  be  all  the  heterozygotes. So 4,500  of  these  markers. The  basic  goal  in  the  end,  typically... In  fact,  that  was  what  the  paper that  this  was  from  was  about. They  were  comparing  several of  the  popular  predictive  methods. But  before   we  get  to  prediction, there's  a  lot  of  really  good  things that  you  want  to  do just  to  make  sure the  data  are  as  expected, and  also  to  learn  and  discover  structure and  other  interesting  characteristics. Let's  dive  in  and  see  what we can do with a  typical  workflow  here  in  JMP Pro. I  would  typically  just  like to  look  at  the  data  in  JMP. We  can  use  just  basic  platforms. For  example,  here, let  me  bring  up  the   multi- area  platform and  just  check  out  basic  plots of  the  data  against  one  another. You  can  see,  for  example,  here, rootnum  and  root numbin are  fairly  highly  correlated with  each  other. Other ones,  not  so  much. You   can  do  distributions. For  example, w e  can  do  it  here with  the  distribution  platform. These  traits  have  actually already  been  centered,  I  think. I believe  all  of  them have  a  mean  of  around  zero. They've  gone  through a  little  bit  of  pre-processing that  we  won't  go  into  today. That's the  way  they  came  from  the  paper. Our  basic  goal  is  to  use  the  genetic information  to  predict  these  traits. They  represent  various  characteristics of  the  loblolly  trees. For  example,  C WAC, I  believe  that's  crowned  with across  the  plant  beddings. It's  a  measure  of  the  tree  size. We've  got  other  measurements  of  density and   characteristic  of  the  roots,  etc. All  important  things  to  know  about and  when  studying  these  trees. Let  me  walk  you  through  what we  might  consider  a  a  basic  workflow once  you  have  your  data  set  up  like  this. Now,  before  doing  that, though, I  do  want  to  mention  too that  we  have  put  in  a  fair  bit  of  work to  helping  and  aiding with  importing  such  data. This  particular  data  came  as just  standard  comma  separated  value  files, so  no  big  deal  to  import  it. But  often,  genetic  data  like  this come  in  so-called  VCF  files. We now  have  new  routines to  be  able  to  import  those  directly, as  well  as  import  files from  the  popular  database, and  then  a  few  other  formats, IDAT  and  what  have  you. Trying  to  make  it  really  easy to  get  your  data  into  JMP. As you know,  once  you've  got your  data  set  up  in  a  JMP  table, there's  just  all  kinds of  great  things  you  can  do. Many  of  the  things  that  you  hear  about... Give  you  some  more  ideas, as  well  as  some  new  things that  we've  put  into  place. To  start  out,  we've  got a  brand  new  couple  of  platforms under  the  Analyze  menu  here  at  the  bottom. Genetics. Analyze,  Genetics. We've  got  Marker  Statistics and  Marker  Simulation. Let's  run  the  first one, Marker  Statistics. This  is  just  a  basic  platform  for  looking at  characteristics  of  a  set  of  markers. You  can  see  here,  I'm  loading. We've  got  4,853  SNPs  organized in  a  group  here  in  the  JMP  table. I  just  move  them  over  into  the  markers. If everything else  is  okay, we'll just  click  OK. It  runs  quite  quickly. What  this  basically  does is  it takes  each  marker and  computes  a  variety  of  standard statistical  genetic  statistics that  you  can  look  across  here and  see  what's  going  on. A  key  thing  to  check  for  a  so-called Hardy- Weinberg  Equilibrium. You  can  do  a  statistical  test  of  that and  get  p- values  from  it, and  even  plot  these  along in  a  graph  like  this. On  the  Y  axis,  we  actually  use the  log 10  p-value, which  we  also  call  the  log worth. To  go  once  step  further,  you  can  make a  false  discovery  rate  adjustment to  avoid  the  multiple  testing  problem. You  can  see  here, we've  actually  plotted  both: the  raw   p-value,  the  raw  log worth, as  well  as  their  FDR  adjusted   p-value. They  tend  to  be  quite  similar, especially  for  the  large  ones. These  markers  up  here  are  ones that  would  be  out  of  equilibrium, very  likely  due to  the  cloning  of  the  trees. These would be  markers that  might  tend  to  drift  or  stabilize over  time  with  future  crosses. It would be good  to  check  these  out and  make  sure  the  distributions of  the  alleles  are  as  expected. Arcing  all  the  way  back to  the Gregor  Mendel  days, things  that  we  learned  about how   alleles  like this  should  behave. That's  a  good  place  to  start, just  to  get  an  idea  for  the  markers. Let's  move  next and  do  some  pattern  discovery. Here,  there's  several  nice  things we  can  try. A  very  basic  one  that's  also  been  popular for  decades  with  gene  expression  data is  just  to  do  hierarchical  clustering. Again,  I'm  just  going  to  put the  SNPs  in  here. You  typically  will  want  to  use one  of  these  faster  methods. Let's  use  fast  ward. We  do  have  some  missing  values, so  let's  do  imputation. We'll  go  ahead  and  cluster it  two  ways. Let's  click  OK  here. I'm  going  to  go ahead. I'm running  everything  live  today. A  few  of  these  things will  take  seconds  to  run. A nalyses  I've  got  that actually  will  take  a  few  minutes that  I  won't  run  live just  for  sake  of  time. But  you  can  see  here, this  scale  of  data, JMP  Pro  can handle  fairly  readily. This  one,  you  can  see that  the  progress  bar  here will  take  probably   30 seconds to  a  minute  to  finish. But  not  too  bad for  a  medium- sized  data  set  like  this. Again,  we're  clustering  around  926  rows and  4,800  columns. But  before  actually the  performance  enhancements, this  kind of analysis would  take  several  minutes. In  many  cases,  we've  been  able to  achieve   orders of  magnitude  speed  up. I'm  able,  basically,  to  enable  you  to do  analyses  like  this  close  to  real  time. A  little  bit  of  waiting  might  be  required as  here,  but  in  general, it's pretty  nice  to  be  able  to  quickly get  answers  to  fairly  difficult  questions. For  example,  here,  we're  trying  to  see how  other  rows  of  our  data cluster  with  each  other. Now  here,  a  very  interesting  thing  occurs. You  can  see  I've  got  colorings that  I  did  to  the  data. I  colored   the  mother  and  father, or maternal  and  paternal  alleles. If  we  look  at  this  variable  here, there's  around  71  unique  levels. And  then  within  each  cross, there's   up  to  17  or  20  individuals. The  data  have  very  nice,  tight  clusters. The  clustering  algorithm actually  found  those. You can  see  the  colors indicate  the  coloring. This  color  theme  is  a  bit  jarring. Let's  move  it  to black  and  white. We  can  see  the  structure a  little  more  cleanly. Here,  we  can  see  the  areas  of  white or  where  we've  got  some  of  those minor  alleles  starting  to  cluster and  identifying  the  key  places in  the  genome that  distinguish  these  unique  crosses. This  is  a  nice  plot  just  to  get an  overall  feel  for  the  various  lines and  how  they  compare  with  one  another. But  the  main  lesson are  these  tight  clusters that  are  mapping  up  exactly  like we  would  expect  with  the  initial  crosses, basically  like  very  close  siblings to  one  another compared  to  cousins,  or  second  cousins, third cousins,  etc . Now,  another  way  to  go  about  this would  be  more  of  a  dimension reduction  type  approach. Here,  the  number  one  analysis is  principal  components. Let's  try  that  on  our  steps and  see  what  that  reveals. Here,   let's  just   use  the  defaults. Sorry,  actually, I  wanted  to  show  off... There's  a  brand  new  method  for  wide  data that's  called  fast  approximate. It's  a  nice   addition  in  software. It  actually  uses, if  you're  familiar  with  the  method called  a  randomized  SVD  approach. You  can  see  a little  message. Let's  see  what's  in  the  log. It  turned  out  this  was  actually  one  case where  an  error  message was  quite  beneficial. The  software  actually  indicated which  markers... There  were  some  markers, they  were  non-numeric  or  constant. It  turned  out  that  a  handful  of  these markers  in  the  table  were  constant. This  would  be  a  case  where  we  could go  back  and  actually  clean  those  out, since  they're  not  really  contributing  much to  the  analysis,  they're  just  constant. But  the  PCA  platform found  them  as  a  byproduct. But  if  you  look  at  the  scores, first two  principal  components, we  again  have  this nice  clustering  of  families. As  usual  with  JMP, all  these  plots  are  interactive and  connected  to  one  another. We  can,  for  example,  click  on one  of  the  branches  of  the  tree  over  here, and  it  will  highlight that  cluster  in  the  PCA. We  can  map  these  two  graphs to  one  another. In  fact,  well,  let's  do  that. We  can  add  a  third one. This  is  another  brand  new  platform that's  just  coming  out  in  JMP  17, called  Multivariate  Embedding. Here,  we're  going  to  compute the  popular  t-SNE  algorithm, which  stands  for  T  multivariate  embedding. This  has  actually   been  quite  popular in  the  machine  learning  world, and   it has  trickled   its  way into  the  genomics  field, especially  with  single- cell  RNA. It  does  a  little  bit  different dimensional  projection  than  PCA. It  tries  to  identify  local  structure, whereas  PCA  is  looking  for  dimensions of  largest  variability  across  all  markers. T-SNE's  trying  to  find tight  local  clusters. It's  actually  perfect for this kind of data, just  to  reveal  these  families. You  can  see  the  nice  little  groups of  clusters,  and  maybe  more  importantly, which  clusters  themselves are  near  each  other. You  can  take  a  picture  here. Kind  of  looks  like  a  butterfly, something   t-SNE will  often  have . I'd  encourage  you  to  try  it  on  your  data once  you  get  your  hands  on  JMP  17.0. That's  revealing  some nice  structure  in  the  data. Let's  move  on  now  to  a dd some  more statistically- oriented  modeling. For  it,  the  basic  thing to  usually  start  out  with is  what  we  would  call a  genome- wide a ssociation  study, where  w e'll  basically  take  our  trait, or  our  traits,  in  this  case, and  screen  them  against  all  the  markers. The  workhorse  platform  here is  Response  Screening. I'm  going  to  Analyze,  Screening, Response  Screening. We've  done  quite  a  bit  of  work  on  this thanks  especially to  John  Saul , who has  implemented  some  nice performance  improvements. What  this  does  is  basically a  big  Y  by  X  analysis. I'm  going  to  move  our  six  targets or  responses  into  the  Y  field, our  SNPs  into  X. And  then  all  you  do  is  hit  Go. What  this  will  do... I  think  we  do  imputation. I  think  it  might  do  that  automatically. Let's  see. Yeah. This  one  runs  lightning  fast. I  basically  just  did  six  times 4,800  quick  regressions  and  plotted  all. This  is  a  plot  of  all the   p-values  at  once. Again,  focusing  on  false  discovery  rate. It's  got  to  be  very  careful about  overfishing  data  like  this. You  want  to  make  sure any  lead  that  you  chase  is  significant, even  after  a  false  discovery  adjustment. Here,  we  see  now that  this  crown width  feature is  the  one  that's  popping  out with  the  most  hits. Then  there's  one  for  rustbin. These  are  sorted  by  significance,  and  then some  of  the  other  traits  start  to  pop  in. But  clearly,  it  looks  like we've  got  the  most  genetic  action with  this  crown width  trait. Now,  to  go  a  little  further and  illustrate  the  things  we  can  do. This  is  very  JMP  Pro like. Let's  save  the  table  out  of   p-values. We've  got  everything  now in  a  new  JMP  table, which  is  effectively  all  the  results, and  they're  nicely  colored  for  us. Just  want  to  browse  the  table. But  I'm  going  to  go  ahead and  use  Graph B uilder  now. Let's  make  some  volcano  plots  by  hand. For  these,  we  w ant  to  put the  slope  on  the X- axis, and  then  the  log worth  on  the  Y. Let's  go  ahead. We'll  make  a  separate  one for  each  of  our  traits. I'm  dragging  that  onto  the  wrap. You  can  see  here,  this  is  the  kind of thing  that  JMP  is  really  interesting  at. I t often  will  find  outliers  of the  data. Here's  one  that's  way  out  here. We've  got  a  slope  estimate of  nearly  negative  2,000. It  turns  out  that  this  variable is  nearly  constant. The  regression  just  blows  up with  an  almost  nearly  vertical, or  nearly  negative, highly  negative  slope. It  turns  out  this  is  more  of  an  anomaly than  an  actual  significant  hit. It  would  actually  make  sense just  to  ignore  it. But  it's  actually  nice  to  find that  it's  in  the  table and  be  able  to  identify  it. This  is  the  kind of thing that  JMP  is  often  really  good  at, finding   weird  patterns. But  to  hone  in  on  the  key  results, let's  go  ahead  and  narrow  our  axes  down. I  just  hit  the  axis  button, and  we're  going  to  just  zoom  in. Let's  go  minus   10 to 10. You  can  see  here, you  get  this  characteristic  V  shape, where   again, we're  plotting  the  slope  of  the  regression versus  its  negative  log  p-value. For CWAC,  we  actually  got, again,  as  we  expected  before, more  hits  than  anywhere  else. A  bunch  of  markers for  positive  and  negative  slope, which  would  indicate a   additive  genetic  relationship going  one  way  to  the  other. For  the  other  traits, these  are  also  V  shape, and  many  of  them are  just  really  a  lot  less  significant and  often sq uished  in  with  one  another. The  slope  also  depends on  the  scale  of  the  measurement. It's maybe  not  quite  as  meaningful if we  put  all  these  on  the  same  exact  scale. But  I  just  wanted  to  show  this for  illustration, as  a  way  to  compare  everything side  by  side. That's a GWAS. Moving  forward,  let's  get  to  probably what  our  main  objective  would  be, which  would  be  to  predict  these traits  as  a  function  of  the  markers. Here,  we  do  have  access  to  all the  great  predictive  modeling  platforms that  are  in  JMP. Some  of  these,  you  have to  be  a  little  careful  to  use. With  missing  data, you  may  need  to  do  the  imputation  first. Some  might  become  quite  slow given  the  size  of  the  problem. For  today,  I  just  want  to  show probably  my  favorite  one, which  is  XG Boost, using  the  XGB oost  platform. This  is  a  case where  I  actually  ran  this  beforehand, because,  and  it's  to  run.. . But  I  l oaded  all  six  traits  into  XGB oost and  did  ten- fold  cost  validation. I  automatically  left  out each  of  the  ten  folds. Here,  you  can  see  the  results  of  that  run, where  we've  got  the  solid  lines here  in  these  graphs, are  the  validation  curves over  the  iterations and  the  dotted  lines  of  the  training. You  can  see  with  these  wide  problems, there's  a  severe  risk  of  overfitting, especially  with  a  powerful approach  like  XG Boost. You  have  to  be  very  careful. As you  can  see,  I actually [inaudible 00:32:18]  parameters. I  could  tweak  them  down  for  one,  and  you can  see  the  other  parameters  here. Within  each  model  fit, we've  got  both  the  training, observed  versus  predicted, and  the  validation. You  can  see  here  for  C WAC we  got  a  correlation  of  around  0.43. Correlation  is  a  typical  measure used  to  assess  performance. This  is  competitive  with  what  was published  in  the  paper  before, without  hardly  much  tuning  at  all. But  then  there's  a  lot  of  other interesting  things  you  can  dive  into, the  most  important  features,  etc. We  even  got  some  new  things  for   instance, one  thing  called  Shapley  values that  I'd  encourage  you  to  check  out. There's  going  to  be  another  talk on  this  topic  by  Peter  Hirsch, Florian  Laura  Lancaster and myself  on  that  here  at  the   conference, I  would  encourage  you  to  check  that  out. It's  a  way  to  break  down predictions  into  their  components. That  gets  another  level you  can  go  into  with  predicting. That's  just  one  example  of  some nice  predictive  modeling  you  can  do. To  wrap  up  the  demo, I  wanted  to  return  back where  we  started  here in  this  Genetics  menu. We've  got  a  marker, a  brand  new  marker  simulation  platform. This  is  some  pretty advanced  genetic  modeling carried  out  by  our  internal  expert, Luciano  Silva. What  this  does  is  it  actually  will  do virtual  crossing  by  the  genotypes. The  idea  is  you'd  load  the  markers  in. The  really  interesting  thing is  you  can  put  a  predictor  formula  here. For  example,  I  save  the  predictor  formula from  the   XGBoost  model  of   CWAC. What  this  will  do  is  both simulate  the  crosses and  predict  their  performance. This  is  what  modern virtual  breeding  does. You  can  actually  virtually  cross different  loblolly  pine  trees and  predict  what  will  happen  with  them without  having  to  wait  10,  20,  30  years to  grow  them  in  the  field. Extremely  powerful,  interesting  approach that  revolutionized  the  way modern  breeding  is  done, and  why  so-called  genomic  selection, or  predictive  modeling with  genetic  markers  is  so  popular. I'll  go  ahead  and  conclude  there. I  hope  that  whetted  your  appetite with  some  of  the  new  things we've  got  going. A  lot  of  the  things  I  showed  today would  also  work  with  gene  expression  data, although  that's  a  little  bit different  ballgame in  terms  of  what  you're  trying  to  do. But  for  sake  of  time,  I  thought  it  would be  good  just  to  look  at  this  one  example and   dive  somewhat  deep. Thank  you  very  much  for  your  attention. Let  us  know  if  you've  got  questions as  you  have  them. We're  really  e xcited  about the  new  things  coming  in  JMP  17  Pro. We've  got  a  lot  more  things coming  in  the  works. Thank  you very much. We  recognize  that  lot  of  people that  come  to  discovery, this  may  not  be  their  area  of  expertise. But you  may  know  somebody who's  doing  this  work, and  we  would  love  to  get  them  connected with  what  we're  doing  here  at  JMP  Pro, because  we  are  going  to  continue to  invest  in  adding  capabilities and  improving  the  software  so  it  can do  work  like  this  better  and  better to  meet  the  needs  of  scientists across  the  life  sciences and  this  industry. Thanks  for  listening  in.
A long time ago in a galaxy far far away…   Actually, it was 1986 in Rochester, NY. Eastman Kodak had 60,000 employees in the community. Sales of photographic film (that stuff your grandparents used to take pictures before digital cameras) were expanding. Waste was too high and the product was too variable. After trying everything else, the corporate quality finally obtained a green light for an SPC program. Within four years, the variance for several key measures dropped by a factor of 100. Products that had averaged six formula changes per event went for six months without a change.   Photographic film manufacturing is no longer important for most of us, but the quality improvement processes used are as relevant today as ever. They are also enabled by JMP. In 1985 we used pencil and paper and mainframe SAS. Data collection sheets, cause and effect diagrams, regression analysis and SPC charts are all facilitated today with JMP.     Well, thanks  for  being  here  today. My  name  is  Ron  Andrews. I've  got  contact  information  listed  here, so  if  there  are any  questions after  the  fact, you  can  reach  me  at  these  addresses. Going to be  talking  about quality  improvement, a  very, very  general  term, but  this  is  specific  with  specific  results from  a  project  I  worked  on  many  years  ago. This  goes  a  long  time  ago in  a  galaxy  far,  far  away… Or  maybe  it  was  1986 in  Rochester,  New  York, at  Eastman  Kodak  dealing with  photographic  emulsions. A  little  history. Kodak  had  a  corporate  quality  council that  had  known  for  years that  we  really  needed  a  robust statistical  process  control  program. Management  wasn't  buying  it. They  didn't  want  to  pay  for  it. They  promoted  some  less  expensive  options like  slogan  contests  and  pep  rallies, and a  lot  of  you  know  about how  effective  they  are. By  1985,  sales  were  hitting  records, but  so  was  waste. So  the  council  finally  got approval  for  an  SPC  program. Though the  improvements I'm  going  to  talk  about are  a  small  part  of  the  total  effort. Within  the  emulsion  manufacturing at  Eastman  Kodak  that  I  was  working  with, I  was  one  of  several  engineers  and  a number  of  operators  working  on  this, so  I  contributed  to the  results  I'm  showing, but  I  was  by  no  means the  leader  for  the  whole  effort. So  why  light-sensitive silver  halide  emulsions? It's  kind  of  obsolete  technology, isn't  it? Well,  yeah,  probably. But  there's  still  three  companies that  do  this  on  a  regular  basis, and  there  are  still a  few  million  people  who  shoot  film. Most  of  all,  this  is  familiar  to  me, and  I  have  some  results  I  can  share. I'll  talk  about  the  basic  process and  what  do  the  chemists  tell  us? We'll  talk  about  several  different quality  improvement  tools like  data  sheets, and  cause  and  effect  diagrams, trend  charts, and  statistical  process  control  charts. Then  got  to  deal  with the  people  side  of  SPC. It's  probably  more  important than  the  statistics. And  then  I'll  deal  with  a  question  that I had  to  deal  with  directly  way  back  then: How  do  you  do  SPC  when  you only  make  six  batches  a  year? Before  I  really  get  started, I  need  to  acknowledge  the  leadership of  two  people. In  our  group  of  engineers, there  was  no  appointed  leader, but  Carl  Eldridge was  clearly  the  point  man. He  had  this  nice,  easy- going  manner and  could  talk  production  supervisors  into making  changes that  they  really  didn't  want  to. But   he'd come in, "W e're  just  going to  try  this  out  and  see  if  it  works. "And  if  it  works,  we'll  probably  keep doing  it  and  it'll  reduce  your  waste." He  would  talk  them  into  it. Kevin  Hurley  was  also  a  key  person. He  was  2nd-floor Emulsion M aking  group  leader. He  was  a  very  capable  leader and  had  the  trust  of  all the  people  who  worked  in  his  group. They  decided  they  really  wanted to  have  control  of  the  process. Engineers  could  decide  the  specs, but  they  wanted  to  control  the  process. Turned  out  to  be  a  very  good  decision. Overview  of photographic  film  manufacturing, and  this  is  the  50,000 -foot  level. We  weigh  out  the  ingredients. We  precipitate the  silver  halide emulsions. We  wash  them. We  take  samples  of  each  batch and  sensitize  them  at  three  different temperatures, choose  the  best  temperature, and  then  sensitize the  balance  of  each  batch. Then  we  assemble  all  the ingredients  necessary for a coating event, and  test  each  melt. A  melt  is  a  kettle ful. You  got   to melt the  gel. That's  where  that  term  comes  from. Then  make  corrections for  the  layers  out  of  spec, and  there  will  be  some. In  those  days,  it  was  a  given. Then  we  coat  a  short  pilot, and  then  we  adjust  the  formulas, and  then  we  coat a  short  re- pilot about  a  week  later and  adjust  the  formulas  again. And  then  if  things  are  looking  good, we   coat the  remaining  emulsions in one  or  two  large  runs and  test  the  results. And if  necessary,  take  the   coated rolls back  to  the  coating  ally and  apply  filter  dyes to  correct  the  color  balance. If  it's  not  already  obvious, everything  in  red isn't  an  adjustment  step. These  are  things  we  did  because  we  didn't always  get  it  right  the  first  time. It's  basic  product  control. Kodak  has  some  of  the  most  extensive and  elaborate  product  control  methods I've  ever  seen  or  heard  about. It's  not  necessarily  a  market  distinction. I'm  focusing  on  emulsions  because the  products  that  I  was  dealing  with, basically  Kodachrome and  Ektachrome  slides, the  light -sensitive  silver  halide emulsions  were  by  far the  biggest  contributors  to  variability. In  the  emulsion  manufacturing  process, we  were  still  using the  old  school  equipment. There  were  some computer- controlled  systems, but  we  were  dealing  with open  kettles  and  gravity  flow from  jars  into  the  main  kettle. The  main  kettle  started  with  water, phthalated  gel,  sodium  bromide, and  potassium  iodide. We  had  three  jars: one  prepped  with  silver  nitrate, another  with  ammonium  hydroxide, another  with  sulfuric  acid. We  start  by  running  the  silver  nitrate through  disc  orifices. There  would  be  a  set  of  discs  with calibrated  holes  drilled  in  them. That  was  basically  our  flow  control. Now,  gravity  flow  is  extremely  consistent if  you  keep  the  geometry  consistent. Big  "if"  there. Once  we  had  all  of the silver  nitrate  in  there, we  formed  a  number of  silver   halide crystals. We  pour  in  the  ammonium  hydroxide. Ammonia  is  a  silver  solvent. It  dissolves  the  little  crystals, and  they  plate  out  on  the  big  crystals, so that's  our  growth  step. Then  we  go  into  the  washing  step. We  need  to  remove  the  salts, the  nitrate  and  the  sodium  and  the  iodide… Not the sodium, the  potassium.  Excuse  me. We  add  acid,  which,  first  of  all, quenches  the  ammonia  reactions, and  second  of  all, it gets  the  pH  low  enough so  that  the  phthalated  gel  coagulates and  drops  to  the  bottom of  the  kettle  with  the  silver. At  this  point, we  siphon  the  supernatant  liquid  off and  complete  the  washing  step. Some  effects  we  knew  about. We  knew  grain size  was  proportional to  the  silver  run  time. That's  the  total  time  it  takes for  the  silver  to  run  into  the  kettle. If  the  silver  is  running  longer, that  means  it  was a  lower  flow  rate  initially, where  the  individual  grains  are  formed. If  you  have  fewer  grains and  add  the  same  amount  of  silver, you're  going  to  grow  them  larger. Temperature  is  also proportional  to  run  time, as  is  the  amount  of  ammonia. That's  not  directly  proportional. It's  very  nonlinear. It's  a  very  steep  slope  to  start  with,  and  then  it  levels  out. In  addition  to  grain size, we  had  to  deal  with  fog. Fog  is  what  you  get  when a  silver   halide  crystal  develops without  having  been  exposed  to  light. We  don't  form  images  that  way, so  we  need  to  minimize  this. That's  proportional  to  the free  ionic  silver  concentration and  to  some  extent,  the  temperature. Now,  for  any  chemists  in  the  group, the  solubility  coefficient for  silver  bromide is  something  like  5  times   10⁻¹³ . The  free  ionic  silver  concentration is  extremely  low, but  it  still  makes  a  difference. Variation  in  this  level  makes  a  difference in  the  photographic  properties. We  prepared c ause  and  effect  diagrams on  paper,  hand- drawn. I  really  wish  we  had a  tool  like  the  one  in  JMP, where  you  list  the  key p arent  parameters. In  this  case,  we're  looking  at  grain size, and  then we  have  materials,  methods,  etc., t hat  might  affect  that. And  then  you  move  these  child  parameters over  to  the  parent  side and  list  the  things that  might  affect  that. As  far  as  I  know,  there's  no  limit to  how  many  branches you  have  on  your  diagram. Once  you  have  this  table  made  up, you  identify  the  child  column and  the  parent  column and  hit  the  OK  button, and  out  pops  the  diagram. I  don't  know  of  another  way that's  as  easy, and  I'm  pretty  sure  there's  nothing  else as  easy  when  you  have  to  modify  something. Instead  of  moving  boxes  around on  a  graphic  chart, you  just  edit  one  or  two  of  the  lines, or  maybe  delete  one,  add  one,  and  hit  the  button  again. That's  all  there  is  to  it. Now,  all  of  these items  listed  on  this  chart can  potentially  affect  the  grain size. But  when  it  came  down  to  it, the  run time  and  the  variation from  one  disc  orifice  to  another, and  the  variation  from  kettle  to  kettle were  the  most  important  things. We  also  did  this  for  the  vAg. vAg is a  measurement  which  is as  close  as  we  can  get to  measuring  the  actual free  ionic  silver  concentration. We  have  basically  the  same  things listed  here,  but  in  this  case, it's  the  percent phthalation which  affects  the  washing, and  the  siphon  level  which is  directly  related  to  the  washing. These  are  the  two  critical things  in  controlling  the  vAg. Going  through  some  of  the  conventional quality  improvement  tools, we  had  data  sheets. We  had  14x 17  ledger  books, about  six  inches  thick. They  had  years  worth  of  data of  several  hundred  emulsion kinds , and  they  were  in  a  lab that  was  hard  to  get  to. You  had  to  go  through  a  dark hallway  to  get  there. When  we  learned   where it  was and  how  to  get  there, we  started  borrowing  the  pages and  transcribed  the  data  on  the  emulsion kinds  of  interest  into  SAS  datasets. It's  a  lot  easier  to  use things  in  digital  form. If  we'd  had  JMP,  the  data  tables would  have  looked  something  like  this. Each  of  the  emulsion  kinds   had a  four- digit  number  identifying  it. We  had  sequential  batch  numbers. We  recorded  the  date. We  recorded  the  kettle  used, and  then  we  recorded a  number  of  parameters. This  is  the  run time  in  seconds. pHs  after  several  different  process  steps, and  the   vAg at  the  end. This  is  an  early  trend  chart. We  hadn't  put  control  limits  on  it  yet. This  is  the  run time. Significant  variability  here. We  could' ve  done  extensive regression  analyses to  try  to  determine what's  really  influencing  this. The  first  step  was  easy. We  overlaid  the  kettle  designations. It's  pretty  obvious. You  don't  need  any  special  analysis to  know  these  kettles  are  different. These  kettles  have been  there  for  a  long  time, and  it  wasn't  really  possible to  completely  rework  them, so  we  restricted  each  emulsion  kind to  a  particular  kettle. Kind 6001  was  restricted  to  kettle  602. I'll  get  into  more  details on  the  control  charts  later, but  just  to  show  the  data. This  early  unrestricted  phase. We were not  using control  limits  at  the  time, but  this  was  our  initial  variability. And  then  we  restricted  the  kettle, and  we  got  a  large  reduction in  the  variability. And  then  one  of the  other  engineers  got  the  idea that  maybe  all  those  disc orifices weren't  created  equal. He  set  up  some  experiments  and  ran  some water  batches  and  timed  them  all, and  found  there  were consistent  differences with  different  sets  of  disc  orifices. We  restricted  a  given  set  of   disc orifices to  a  given  emulsion  kind. We  had  a  file  drawer  with  a  folder for  each  emulsion  kind, and  there  were  envelopes  in  there that  had  the  disc  orifices  in  there. We  had  to  make  more  of  them, but  it's  just  a  little  disc  of  metal with  a  hole  drilled  in  it, so  it  was  not  expensive. That  also  gave  us  another big  drop  in  variability. A  number  of  things  we learned  in  next  few  months. I  mentioned  the  phthalated  gel that  coagulates  when  the  pH  gets  low. We  needed  the  percent phthalation  to  be  correct. The  gel  plant  couldn't  hit  it exactly  with  a  single  batch. They  had  to  blend  batches  together to  hit  the  4.5%, plus  or minus  the  of tenth of  a  percent aim  that  we  were  shooting  for. That  worked  if  the  batches  were  not  too far  apart  in  their  percent  phthalation, but  if  you  had  a  batch  that  was very  high  in  its  percent  phthalation and  a  batch  that  was  rather  low in  its  percent phthalation, when  you  mix  them  together and  go  through  the  wash  process, that  high-phthalation  gel  is  all  going  to drop  out  to  the  bottom  of  the  kettle, but  only  part  of  the  low -phthalation  gel is  going  to  fall  out. So  we  had  variable  amounts  of  gel being  transferred  to the  next  step  in  the  process, depending  on  the  decisions they  made  in mixing  gel  batches. We  came  up  with  a  rule  that  mixed batches  had  to  be  within  1%  of  each  other. It's  not  perfect, but  it  was  a  big  improvement. We  mentioned  run time  and  our  restriction on kettles and  disc orifices. We  also  improved  our measuring  of  the  run time. We  used  to  rely  on  operators  watching the  clock  as  they  opened  the  valve, and  watching  the  clock  as  the  last little  bit  of  silver  nitrate  ran  out. We  put  a  switch  on  the  valve so that  the  clock  started  then, and  we  had  a  sensor  in  the  line so  that  when  the  last  little  bit  ran  out, it  stopped  the  clock. Better  data  always  helps. We  learned,  quite  by  accident, that  if  you  have  a  delay  when  you're setting  up in  the  process and  you  cook  the  gel  a  little  bit  longer than  usual,  it  loses  buffering  capacity. With  less  buffering,  when  you  add  acid to  coagulate  the  gel, that  pH  is  going  to  drop  farther than  what  you  really  wanted. We  discovered  this  during  the trend chart phase  in  our  emulsions. One  of  the  operators looked  at  the  data  and  said, "This  lot 's  different.  All  the  pHs are  different o n  this  particular  batch." Looked  at  it  and  agreed, "Yeah,  that's  different.  There's something  really  unique  about  this  batch." And  conversations with  the  operator, " Do  you  know  of  anything  that  happened different  on  this  particular  batch?" He  volunteered, "W ell,  I  had  a  problem with  the  ammonia  jar, "and  I  had  to  dump  it and  start  over  again, "so  there was  a  delay  in  getting  started." Another  operator  chimed  in, "I  had  a  batch  that  looked  like that  in  terms  of  the  pHs  a  while  back. "Let's  go  look  at  that." And  we  dug  out  the  data  for  that  one, and  the  timestamps  said,  yeah, there  was  a  delay  in  starting  that  one. The  pHs all were  more  variable. They  were  farther  off. The  higher pHs were higher and the  low  pHs  were  lower. So  we  did  more  experiments on  the  bench  scale and  found, yeah, there  was  a  real  effect  there. And  the  chemist  volunteered  that,  yeah, they  knew  it  could  happen, but  they  had  no  idea that it  happened  this  fast. So  we  put  a  limit  on  the  gel  prep,  a  time  limit. If  you  haven't  started  using  it within  a  given  time  frame, you  dump  it  and  start  over. It  really  does  make  sense  to  dump a  couple   hundred dollars worth of  gel  and  salt rather  than  adding  tens  of  thousands of  dollars  worth  of  silver  to  that  kettle and  running  a  risk  of  dumping  that. We  also  learned  in  the  washing  process, it  was  better  to  be consistent  and  imperfect than  strive  for  perfection and  getting  greater  variability. That  is,  our  operators  had  long  been  told in  that  washing  process, the  good  stuff' s  in the  bottom  of  the  kettle. That  silver  and  gel  down  there at  the  bottom,  that's  the  good  stuff. Don't  you  dare  suck  any  of  that  out in  the  siphon  wand, but  get  all  of  the  supernatant  liquid you  possibly  can  out. The  only  problem  was  the  coagulation didn't  always  have  the  same  density. Sometimes  it was  nice  and  compact in  the  bottom  of  the  kettle, and  sometimes  it  was  a  little  fluffy and  took  up  more  space, and  you  couldn't  siphon  down  as  far. Rather  than  siphoning down  as  far  as  possible, we  got  more  consistent  results  when  we specified  exactly  how  far  to  siphon. For  kind  6001, we  went  down  to  number  23 on  the  siphon  wand. We  put  markers. Basically,  we  put  a  measuring  stick along  the  siphon  wand and  had  different designations  for  different  kinds. If  we  really  needed  to  get  that free  ionic  silver  concentration  lower, we  added  on  an  extra  washing  step. We  re dispersed  the  gel  by  adjusting the  pH  and  then  recoagulated  it. Looking  at  the  vAg  chart, this  was  the  initial  area, and  this  is  when  we started  restricting  the  kettle. Not  much  change. It  looks  like  there  might  be a  slight  reduction, but  I  wouldn't  brag  about  that. In  this  last  phase… Well,  okay,  we  restricted  DOs  here, but  the  real  change is when  we   add a  standard  siphon  level rather  than  siphoning  as  far  as  we  can. That  made  a  real  difference. We  had  reduced  variability, so  we  continued  that. Consistency  is  worth  more  than the  ultimate  performance, especially  if  you  can't  repeat  that ultimate  performance  every  time. Early  successes  like  these were  worth  their  weight  in  gold. The  enthusiasm  and  increase  in  morale  that that  brought  about was  possibly  worth  more  than  gold. It  was  priceless. Few  things  get  people  more  excited  than having  them  have  their  own  results result  in  dramatic  improvements in  the  product. How  do  you  sustain  improvements, and  how  do  you  keep  learning? Well,  I've  already  showed  you some  control  charts, but  SPC  charts  are  really  the  way  to  go. As  I  indicated, we  decided  to  make  them  operator -centered, as in  put  the  operators in  control  of  the  process. Now,  the  people  side  of  SPC is  probably  more  important than th e  statistics. Some  people  take to  SPC like  ducks  to  water, and  some  people, it's  more  like  cats  to  water. Now,  I  know  there  are  some  cats who  actually  can  swim, but  most  cats  are  going  to  react more  like  this  one  does. They're  going  to  get  out  of  that  water as  fast  as  they  possibly  can. Now,  that  2nd- floor  Making  group, they  were  in  the  ducks  to  water  category. The  6th- floor  Making  group, which  is  what  I  dealt  with  more often  with  the   Kodachrome products, I  won't  call  them  cats  to  water, but  they  were  skeptical. I  had  to  prove  it  to  them that  this  was  going  to  work before  they  really  bought  into  it. It  took  longer,  but  we  did  get  there. I  hope  most  of  you  are  familiar  with the  work  of  W.  Edwards  Deming. I  was  fortunate  to  attend  one  of  his four -day  seminars back in 1992. Happened  to  be  the  last  year  of  his  life. He  was  92  at  the  time. He  was  one  of  the  preeminent quality  control  and  quality  improvement experts  in  the  world  at  the  time. The  Deming  Award  in  Japan is  named  for  him. They  still  give  that  award  every  year to  the  company  showing  a considerable  improvement  in  quality. If  you  are  not  familiar  with  him, first  of  all, look  up  Deming's  14  points  and  read  them. Second  of  all,  get  his  book. Well, he  wrote  several  books. I  think   Out  of  the  Fear   was  the  last  one. Read  that  as  well. But  point  number   8 of  his  14  points  says, "Eliminate  fear." Allow  people  to  perform  at  their  best by  ensuring  that  they  are  not  afraid to  express  ideas  or  concerns. Think  about  that  operator  that  volunteered that  he  had  made  a  mistake and  that  caused  a  problem with  that  particular  batch. He  volunteered  that  freely. I've  been  other  places  where  operators are  often  punished  for  making  mistakes, at  least  reprimanded. When  that  happens, they  don't  admit  mistakes. They  cover  them  up, and  you  don't  learn  things. You  got  to  work  against  that. Everybody  has  to  be  able  to freely  express  what  happened, what   good happened, what  bad  things  happened, and  to  communicate  freely. It  opens  up  a  whole  world  of possible  improvements when  you  have a  free  exchange   of  information  like  that. Getting  down  to  the  SPC  charts. As  I  mentioned,  we  started  with  the  charts in  control  of  the  operators. To  do  this,  you  got  to  keep  it  simple. Not  that  operators  can't  learn to deal  with  complicated  charts  eventually, but it's  going  to  take  longer and  the  training  process  will be  longer  for  new  employees. It's  worth  something t o  keep  it  simple. We  used  a  chart  of  individuals. We  omitted  the moving  range  part  of  the  chart. I  know  this  may  be  heresy  for  some quality  control  purists, but  we  looked  at  that  and  said it  doubles  the  complexity  of  the  chart. We  know  it  adds additional  useful  information, but  it  doesn't  double the  amount  of  useful  information, so  we're  going  to  forgo  that  for  now. We  also  use  only  two  run  rules. A  point  was  out  of  control  if  one  point was  beyond  three  sigma, or  two  out  of  three  were  beyond  two  sigma. That was the  only  criteria. Obviously,  there  are  six  more traditional  rules, and  other  sets have  even  more  run  rules. We  kept  it  simple, and  this  kept  us  busy. We  still  had  a  number  of out- of- control  events  to  investigate, so  it  kept  us  hopping. It  was  about  all  we  could  handle. It's  also  necessary  to  think  about what  limits  you're  going  to  set. I  think  that's  actually  on  the  next  slide, so  I'll  get to that in a  second. I'm  getting  ahead  of  myself. We  had  daily  meetings to  assess  the  charts. Operators  would  present  them. They  would  indicate  points that  were  out  of  control, and  engineers  were  there to  comment  about  what  we  know  about  it and  help  investigations. Most  importantly,  we  had  celebrations for  out- of -control  situations. Literally. When  an  operator  indicated  that  something was  out  of  control,  we'd  say  thank  you. Thank  you  for  sharing  that  with  us. Let's  see  what  we  can  do working  together  to  find  out  what  happened and  maybe  fix  something. Here's  that  slide  that  I  was getting  ahead  of  myself  with. How  do  you  set  the  limits? Purists  insist  that  the  control  limits must  be  based  on  short-term  variability. That's  the  definition  of  control. The  process  is  in  control when short -term  variability matches  long- term  variability. Pragmatists  know  that  even  if  you set  the  limits  a  little  bit  wider, say  maybe  take  the  first  30  points, take  the  standard  deviations, set  the  limits  of  three  sigma, even  at  that  point,  you're  still  going  to have  out -of -control  points  to  deal  with. If  alarms  happen  too  often, they're  going  to  be  ignored. Set  the  limits  that  are a  challenge  and  achievable. You  got  to  walk  that  tightrope. Now,  I  would  suggest  deciding how  you're  going  to  set  the  limits and  then  stick  with  that  method  until you decide  you  have  to  make  a  change. Don't  just  do  it  totally  on  a  whim, but  set  a  definition  that's comfortable for  your  situation,  and  run  with  it. Most  of  all,  you  got  to  keep striving  for  continuous  improvement. Looking  at  the  results. Now,  so  far,  I've  just  been  talking  about the  emulsion  making  operation. The  next  operation,  the  sensitizing, is  where  there's  a  considerable  boost of  the  photographic  properties. We  test  the  photographic  properties after  the  sensitizing  step. The  lot -to -lot  standard  deviation for  the  photographic  speed dropped  from  about   10 units, that's  about  a  third of a  stop  for  those familiar  with  that  photographic  term, to  about   1 unit. Actually, it  was  lower  than  that because  the  standard  deviation of  the  test  process  was  about  one  unit. We had  more  than  a  ten fold  reduction in  the  standard  deviation. If  you  want  a  more  impressive  statistic, we  had  more  than   a hundredfold  reduction in  the  variance. The  formula  adjustments from  one  coating  event to  the  next  dropped  drastically. We  had  some  products that  went  from  six  changes  per  event to  zero  changes  over  a span  of  six  months. When  we  started  this,  we  had  no  idea  that we  could  possibly  get  anything  that  good. Now,   I  want  to  get  back to  that  question  I  posed  earlier. How  do  you  implement  SPC  when  you only  produce  six  batches  per  year? One  of  my  particular  products was  Kodachrome 25. That  was  a  old  and  venerable  product that  had  once  been  quite  popular and  had  been assigned  to  the  larger  kettles. But  a  lot  of  the  market had  switched  to  higher- speed  products like  the   Kodachrome 64 or  the  even  higher -speed Ektachrome  slide  films. It  was  a  rather  small  runner by  the  time  I  was  responsible  for  it. A  couple  of   emulsion  kinds, we only produced  six  batches  a  year . n equals 6 is  not  very  good  for  statistics. My  answer  to  the  question is  what  I  call  creative  swiping Simply  copy  the  procedures that  were  found  to  be  useful on  the  large -running  constituents and  copy  the  same  ones for  the  small  runners. Now,  they're  the  same  class  of  emulsions. Same  basic  technology, gravity  flow  containers,  ammonia  digest, phthalated g el,  coagulation  for  washing. We're  using  the  same  basic  process. You  find  out  what  works in  the  large  runners, apply  it  to  the  small  runners, and  we  got  similar  improvements. By  the  way, these  charts  with  the  blue  background, these  are  actually  scans  of 35 millimeter  slides that  I  used  in   an  internal  presentation at  Kodak  back  in  1988. They  were  computer -generated by  a  firm  called  Genigraphics. I  think  they  charged  $6  a  slide. A  lot  of  things  have  changed  since  then. This  is  looking  at  the  vAg   in finishing. Previous  data  I'd  shown  was in  the  making  operation. Finishing  is  the  sensitizing  step. This  is  the  last  step  before  you put  the  emulsions  into  a  coating event. We  got  a  significant reduction  in  the  variability. Now,  contrast  balance. I  got  to  explain  this. One  of  the  most  important  things of  a  color  film is you  have  three  different  color  records: red,  green,  and  blue. You  got  to  keep  the  contrast  of  those three  different  records  the  same. They  got  to  match  each  other. If  they're  all  a  little  bit  off, it's  not  too  bad, but  they  got  to  match  each  other, so  the  contrast  balance is  the  most  important  parameter. If  it's  off,  you  could  end  up  with green  highlights  and  pink  shadows, There's no  way  people  can  correct  that in  these  pre -Photoshop  days. In 1987, we  had  a  pretty  wide  spread  of  results in  this  two- dimensional  plot. The  hexagon h ere  are  the  spec  limits. This  95%  confidence  ellipse  indicates there  will  be  more  outside  of  spec. There's  one  here,  but  there are going  to  be  more  over  time. By  1988, we'd  collapsed  the  variability  down  to this  nice,  tight  little  group centered  pretty  close  to the  center  of  this  hexagon. This  made  my  work,  my  job,  so  much  easier, especially  in  terms  of  adjusting  things from  one  coating   event  to  the  next. They  became smaller  and  smaller  adjustments, and  eventually  not  having  to  adjust. In  summation, there  are  many  standard quality  improvement  tools. You  don't  have  to  use  all  of  them. Pick  the  ones  that  fit  your particular  situation  and  use  them. Technical  staff  should  define the  formulas  and  specifications. We  found  a  huge  benefit  to  having the  operators  in  control  of  the  process. They're  going  to  need  plenty  of  support, but  this  is  the  only  way  to  get  the really  rapid  feedback on  what's  actually  going  on. You  got  to  keep  it  on the  simple  side  to  make  this  work. And  most  important  of  all, you  got  to  celebrate  those  opportunities to  learn  and  make  improvements. That's  the  end  of  my  presentation. Repeat  the  contact  information. If  anybody  has  questions, I'll  be  glad  to  answer  them. Thank  you  very  much.
One of JMP’s strengths is the ability to read and write to a variety of data sources.  At Janssen we store much of our data in Oracle databases. This talk compiles some tips and tricks for getting the two applications to talk to each other. These tips and tricks were compiled over a 15-year period, developed using JMP versions 7 through 16 and Oracle versions 10 through 19. Topics include: Finding that connection string to Oracle Oracle ODBC connections without a Data Source Name Pulling data from Oracle Inserting data into Oracle Fast data loading into Oracle Faster data loading into Oracle Executing Oracle PL/SQL procedures Error trapping and handling Building Oracle IN lists Miscellaneous SQL tips and tricks     Hello  everybody. My  name  is  Peter  Mroz. I'm  with  Janssen  Pharmaceutical, and  today  I'm  going  to  talk  to  you about  how  combining  JMP  and  Oracle can  lead  to  a  happy  marriage. I  work  for  Janssen  R&D. We're  a  wholly- owned subsidiary of  Johnson  and  Johnson, and  our  charter  is to  discover  and  develop innovative  medicines  and  solutions that  transform  individuals  lives and  solve  the  most  important unmet medical  needs  of  our  time. Within  that  world, I'm  in the  Global  Medical  Safety  Department, and  our  charter  is to  protect  patients  by  driving robust  medical  safety  excellence and  benefit -risk  assessment. And  then  within  that  department, I'm  in  a  group called  Methods  and  Analysis, and  our  aim  is to  develop  and  implement  analytic  tools to  increase  efficiency and  analytical  capability to  detect  and  evaluate  safety  signals. Again,  my  name  is  Peter  Mroz and  I've  been  a  JMP  user  since  2007. Standard  Disclaimer. These  are  my  views and  do  not  imply  any  endorsement of  any  product  by  Janssen  or  J &J. Here's  our  agenda  for  today. I'm  going  to  give  an  introduction and  then  we'll  jump right  into  Oracle  things. We'll  talk  about  ODBC,  configuring  the  Oracle  client, the  ODBC  connection  string, bringing  data  from  Oracle  into  JMP, and  then  writing  data back  to  Oracle  from  JMP, then  fast  data  insertion, faster  data  loading, then  executing an  Oracle  PL /SQL  procedures, hiding  the  passwords, error  trapping, building  IN lists, and  sprinkled  throughout, I've  got  some  miscellaneous  tips. So  my  department is  called  Global  Medical  Safety, and  we  collect,  process, and  report  and  analyze  adverse  event  data for  the  medicinal  products that  we  produce. They  are  mostly  spontaneous  cases, although  there  are  some clinical  trial  cases. These  are  called  post -marketing. They're  a  man  or  woman  in  the  street, walking  down  the  street, and  you  experience some  sort  of  drug  side  effect, and  you  call  it  into  our  call  center and  we  run  it  through  our  process and  store  it  in  Oracle. Our  volume  is  about  5,000  cases  a  day, and  a  case  consists  of  a  person... They're  not  a  patient, they're  not  in  a  clinical  trial. It's  a  person, the  drugs  they  took, the  events  they  experienced, the  side  effects  they  experienced, maybe  some  medical  history, and  we  store  something  called  a  narrative. And  here's  an  example, patient  narrative. S ubject  had  cancer, which  was  diagnosed  in  June  1998, et  cetera,  et  cetera. It's  a  lengthy  story  about the  patient  and  the  side  effects, and  this  is  very  important for  our  surveillance  physicians and  other  scientists  to  look  at. This  is  stored  as  a   [inaudible 00:02:52]   in  Oracle,  by  the  way. JMP,  as  we  all  know, is  great  at  statistical  analysis and  visualization. Oracle  is  great  at  data  storage and  transactional  processing. Our  users  want  to  analyze  and  visualize data  from  Oracle  using  JMP. Primarily,  we  look at  tabular  reports  of  safety  data, summary  information, patient  narrative  drill downs, and  we  do  some  visualization of  safety  data via  trending  or  forest  plots. With  all  that, JMP  and  Oracle  together  make a  happy  marriage. We'll  start  by  talking  about  ODBC, which  stands for  Open  Database  Connectivity. ODBC  drivers  access the  database  using  SQL. SQL  stands  for  Structured  Query  language. So  the  ODBC  driver  allows a  JMP  client  software to  communicate with  the  Oracle  database. The  first  thing  you  have  to  do is  install  the  Oracle  client  on  the  PC. This  is  an  exercise  left  to  the  reader. It's  not  a  tutorial  on  installing  this, so  you  can  Google  it. However,  once  you've  installed  it, there  are  a  couple  of  things  you  need to  supply  for  the  Oracle  client. You  need  to  define two  environment  variables. One  is  ORACLE_ HOME, the  other  is  TNS _ADMIN. So   ORACLE_HOME  points  to  a  folder where  the  client  is  actually  installed. Here,  it's  in   C, Oracle , 19 , client _1. And  then  you  want to  include  the  bin  directory in  the  path  environment  variable. So  in  this  case, it's  the   ORACLE_HOME   with the  slash bin. The   TNS_ADMIN  points to  the  location  of  TNSNAMES. ORA, —I'll  explain  what  that  is  in  a  second— and  that's  typically  located in  the  network  admin  path underneath  the   ORACLE_HOME . Here's  a  hint; you  can  point  this   TNS_ADMIN  variable to  a  file  share  location so  multiple  users  can all  point  to  this  file and  it's  easier  to  maintain one  version  of  TNSNAMES.ORA. What  exactly  is   TNSNAMES.ORA? It's  a  configuration  file. It's  like  a  secret  decoder  ring, it  translates  between  a  database  alias and  information  needed by  the  Oracle  client to  talk  to  your  database. Here's  my  example; my  alias  is   MYDEVDB, and  then  here's  my  description for  how  to  connect   MYDEVDB to  my  Oracle  database. With  that  completed, now  we  need  to  determine an  ODBC  connection  string, and  the  easiest  way  to  do  this  is  to  click the  Windows  button  and  type  ODBC, and  we  want  to  match  the  hatch. For  64 -bit  JMP, we  want  the   64-bit  ODBC  data  sources. For  32 -bit  JMP, we  want  the  32- bit   ODBC  data  sources. I  have   64-bit  JMP, so  I  clicked  that and  I  bring  up  this  screen  here, and  I  click  on  the  drivers  tab, and  here  are  the  drivers  for  ODBC that  are  installed  on  my  system. I  have  three  Oracle  clients and  one  SQL  Server. I  have  the  Oracle  client version  11, 12,  and  19, so  I'll click on  this  Oracle  version  19  driver, and  you  want  to  make  note  of  this; Oracle  in  Ora Client19 Home 1. That's  all  you  need  to  know. Now  we  have  our  ODBC  connection  string. We  combine  that  like  so with  driver  equals  that  string, DBQ  equals  our  database  alias. UID  equals  your  username, PWD  equals  your  password. So  here's  a  fully -formed ODBC  connection  string. Driver  equals  my  Oracle  19  driver. DBQ  is   MYDEVDB, username  is  MYUSERNAME, password  is   MYPASSWORD. Okay,  now  that  we're  all  configured, we  can  bring  data  into  JMP. There  are  several  ways to  get  Oracle  data  into  JMP. Ther e's   Open  Database,  Execute  SQL , New  SQL  Query,  and  Query  Builder, which  is  under  the  File  Database  menu. This  talk  will  focus  exclusively on  Execute  SQL because  you  can  create a  database  connection and  then  you  can  execute several  SQL  commands  with  Execute  SQL and  then  close  the  database  connection. If  you  compare   that  to  Open  Database, Open  Database  in   one  call, opens  the  connection, runs  a  SQL  command, closes  the  connection. So  if  you  have  20  SQL  statements, you're  opening  and  closing the  connection  20  times, whereas  with  Execute  SQL, you  only  open  it  once, execute  your  20  commands with  20  execute SQL  commands, and  then  close  the  database  connection, so  it  speeds  things  up. In  the  scripting  index, this  is  what  Execute  SQL  looks  like. It  takes  the  following  arguments; there's  a  database  connection  handle which   is here, defined  by  your  connection  string. There's  either  a  SELECT  statement or a  SQLFILE equals  statement, or  a SQL FILE  equals  a  pointer  to  a  file containing  your  SQL  commands. An  invisible  keyword, if  you  supply  a  table Name, that's  equivalent  to  saying SELECT  star  from  table Name, and  then  an  output Table Name  provides the  name  for  the  JMP  data  set and  Execute  SQL  returns, pointer  to  a  table if  you're  issuing  a  SELECT  statement , which  returns  a  data  set. Here's  an  example; I  have  my  connection  string, driver  equals  Oracle  in  Ora Client19 Home1, MYDEVDB,  my  username, my  password. I'm  calling it  create  database  connection with  this  string. I've  got  my  SQL  statement  here, I'm  selecting  some  columns from  a  table  called   eba_sales _salesreps, and  I'm  passing  my  connection, my  SQL  statement, and  then  a  title for  the  table  to  execute  SQL, then  I'm  closing  the  database  connection. Here's  my  table  I  rendered  from  Oracle. That's seven  columns,  20  rows. Here's  the  first  tip, and  that  is  if  you  have  a  string with  a  single  quote  inside  it, in  order  to  use  it  with  Oracle, you  have  to  replace  that  single  quote with  two  single  quotes  like  this. Here,  my  SQL  statement  is  SELECT  star from  my _table  m, where  m  name  equals  O'Malley, and  since  this  is  inside  the  string, I  have  to  replace  it  with  two  quotes. Here's  the  second  tip; use  column  aliases  for  readability. These  are  my  column  aliases, so  here's  my  column, and  then  in  double  quotes, I've  got  an  alias. You  notice  it's  mixed  case, there  are  spaces  in  there, it  makes  it  more  readable. Here's  my  table with  the  more  readable  column  headers. The  other  thing  about  this  is, I'm  using  backslash  open  square  bracket, close  square  bracket  backslash to  avoid  the  need to  escape  my  double  quotes. You  notice  I've  got  double  quote  here, then  I've  got my  backslash open  square  bracket, and  then  I've  got  double  quotes, and  then  I  close  it  out  here so  it  looks  a  lot  cleaner. What  if  you  want to  write  data  back  to  Oracle? You  can  issue  an  UPDATE  statement or  an  INSERT  statement. Here  is  my  UPDATE  statement, and  I'm  simply  defining  that, passing  to  a  variable, and  then  passing  to  Execute  SQL. I'm updating  the  table, setting  the  last  name  to  Smith where  the  first  name  is  Sweed. Or  here  I'm  inserting a  new  record  into  this  table and  I'm  setting  the  value  of  these  fields to  the  values  shown  here. One  thing  to  note about  UPDATE  and  INSERT is  Execute  SQL  does an  implicit  COMMIT  for  these  commands, so  you  don't  need  to  do a  COMMIT  yourself. One  thing  about  INSERT, if  you  have  multiple  INSERT  statements to  execute, they  can  be  slow, so  I  found  an  alternative which  is   INSERT ALL. Here's  an  example where  I'm  inserting  into  this  table. The  column  is  called  sample _number and  I'm  inserting  ten  values  all at  once  with  one  statement. The  only  weird  thing  is, you  have  to  put  something like  SELECT  1  FROM  DUAL at  the  end  of  it and  then  it  works. Let's  see  that  in  action. I  want  to  insert  100  rows  one  by  one and  compare it  to  100  rows  all  at  once. I  have  a  little  example  here, let's  go  ahead  and  run  it. So  it  took  8  seconds  to  do  one  at  a  time versus  0.15  for   INSERT ALL, and  so  it  was  52  times  faster. Let's  look  at  the  code  a  little  bit. We  have  making  a  connection, truncating  a  table. Here's  my  one  at  a  time. I  have my Execute SQL  inside  my  loop, and  by  the  way, I'm  looping  a  hundred  times. Here,  Execute  SQL  is  inside  the  loop, and  down  here  for   INSERT ALL, I'm  starting  with   INSERT ALL and  I  keep  adding into  the  table,  fields,  values. Keep  adding  that, and  then  I  only  run one  SQL  command. And  if  I  look  at  this  command, you can see it's  pretty  hefty. Here's  my  SQL  statement. It's  very  long, but  it  took  0 .15  seconds  to  run. INSERT ALL into  TEST  IMPORT, field  names,  values, into, into, into. Okay. Let's  go  back  to  slide  mode. What  if  you  have more  than  a thousand  rows to  insert  into  your  database? What  if  you  have  10,000  rows or  50,000  rows? You  can  use  a  tool  from  Oracle  called SQL Loader  for  faster  data  loading. SQL Loader  requires  a  data  file which  can  be  comma  separated, tab -delimited,  fixed  format, and  a  control  file. The  control  file  describes the  structure  of  the  data  file and  the  target  table, and  we're  going  to  add another  layer  on  this because  we're  going to  do  all  this  from  within  JSL. I'm  going  to  create  a  command  file which  runs   SQL Loader in  a  command  window. It  also  generates the  control  and  command  files  using  JSL, and  I'll  use  run program to  execute  the  file. Here's  my  file. It's  very  exciting, it's  six  columns,  four  rows and  it's  tab  delimited. Here  I'm  showing  the  reveal  code, so  this  is  my  tab  character, and  here's  my  example. I've  got  setting  some  variables, and  here  I'm  creating  my  control  file and  I'm  using  eval insert to  make  these  variables, —surrounded  by  the  little  carets— convert  to  their  values  up  here. Import _file name  will  be  test_import.txt, dest_table  will  be  TEST_IMPORT. Fields  are  terminated  by  tab, actually  enclosed  by  double  quotes. Here's  my  six  fields, and  I'm  adding  a  couple  of  other  fields, date _loaded  and  username _loaded. Date _loaded  will  be  the  system  date. The  username _loaded  will  be the  account  name of  the  person  running  it. I'm  saving  this  file  out to  the  directory and  I'm  creating  a  command  file to  run   SQL Loader. Setting  my  drive  to  the  C  drive, seeding  into  this  directory, and then  here's  my  command; SQL Loader  user  ID  equals  my  credentials at  my  database  name, and  here's  my  control  file, my  log  file. And  if  you  notice, I  had  to  add  these  backslash exclamation mark  capital  N. These  are  hard  returns. For  some  reason, it  didn't  work  without  these  in  there, so  I  had  to  add  those  in for  the  command  file  to  work. I'm saving  it, and  then  I'm  running  it  here with  run program. Then  I'm  checking  the  results. If  it  does  not  contain row  successfully  loaded  in  the  output, then  I  display  an  error  message and  display  the  output  from  run SQL  load. If  it  was  successful, then  I  load  the  log  file  in and  display  that. Here's  my  log  file. I've got about four  rows successfully  loaded  in  1.5  seconds. I  want  to  do  a  demo  of  six  columns with  30,000  records, and  let's  run  that  one. I'm  loading  into  the  same  table, I'm truncating  the  table, and  I'm  running  SQL Loader,   load  the  data  file. It's  the  same  control  file. When  it's  all  done, it's  going  to  display the  output  from  the  log  file. Here's  the  output. I've  got  30,000  rows  successfully  loaded. No  rows  were  not  loaded  due  to  data  errors and  it  took  16  seconds. That  was  30,000  rows, and  we  can  look  at  that  data. Here  I  am  in  a  tool called  PL/SQL  Developer. SELECT  star  from  the  table and  here's  my  values, here's  my  date  loaded, which  is  today, my  username, and  it's  going  to  select  all  the  rows. I'll let  that  run. Okay,  so  here's  a  look  at  the  data  file, and  here's  the  log  or  previous  run. It  took  25  seconds. All  right,  moving  on. Now,  what  if  you  want  to  execute an  Oracle  PL /SQL  procedure? PL /SQL  stands  for Procedural  Language  Extensions  to  SQL, and  it's  a  sort  of  a   3GL, 4GL  language  Oracle  uses to  do  functions, procedures  and  the  like. If  you  have  a  procedure, you  simply  surround  it  with  begin  and  end and  then  pass  it  to  Execute  SQL. Here is  BEGIN, that's my  schema  name. This  is  a  package  called  package  util, and  then  inside  there, there's  a  procedure  called  send  email with  an  argument   success , and  then  I  add  the  END  at  the  end. So  when  you  do  this, it  runs  it  and  control  will  return to  JMP  when  the  procedure  is  done, so   we'll  wait. Okay,  let's  talk  about some  security  things. When  you  pull  data  from  Oracle, by  default,  there's  a  source  property in  the  data  set, and  that  will  show  you the  username  and  password and  connection  string, and  you  might  not  want to  show  that  to  all  your  users. If  you  run  this  command, it  will  hide  the  connection  string in  the  data  set  that's  returned. I  go  a  couple  of  steps  beyond  that. I  create  a  connection in  an  encrypted  JSL  function. This  function  contains  a  database  name, the  username,  and  the  password, and  it  returns  a  database  connection and  default  local  ensures that  function  variables  are  not  visible. Let's  have  a  look  at  that. Here's  a  little  function  called  my _dbc, and  this  is  the  unencrypted  version. Here's  my  default  local. I  check  environment  variables. Here's  my  connection  string. This  is  the  one that  you  don't  want  people  to  see. Here's  my  driver,  my  database, my  username , my  password. I  create  a  database  connection with  that  string and  then  just  return my  database  connection. Go  to  encrypt  the  script, you  click  on  edit,  encrypt  script, enter  a  decrypt  password. —I  don't  use  run  passwords— and  then  click  yes here. Here's  my  encrypted  script, and  then  I'm  going  to  save  it as  my_dbc .jsl. To  use  it, I  include  that  script, that  encrypted  script  in  my  JSL  code, and  that  defines  that  function  for  me. Then  I  call  my _dbc to  get  a  database  connection, SELECT  star  from  this  table execute  SQL, close  database  connection, so  here's  my  table. And  if  you  look  at  DBC  in  the  log, all  it  shows  is  database and  then  your  Oracle  client  driver. Many  times  when  you  run SQL  commands  in  JMP, you  run  into  errors, and  it's  not  very  easy  to  debug  this, so  I  wrote  a  function called   log_execute_ sql, which  executes  SQL  commands  and  traps any  ODBC  errors  found  in  the  log. If it  finds  errors, it  displays  a  warning  message  to  the  user along  with  the  SQL, and along  with  the  error. If  you  set  a  global  variable  to  one, it  displays  a  SQL  before  executing  it. This  has  become  very  handy for  developing  and  debugging  SQL. The  function  uses  log  capture to  inspect  the  log  for  errors. This  is  the  syntax  for  log  capture, string  equals  log  capture  expression, and  this  is  whatever  commands you  want  executed  and  captured, and  anything  that  normally  go  to  log will  go  into  string, and  then  you  can  inspect  the string. Log _execute _sql  takes  five  arguments: the  name  of  the  calling  program, a  database  connection, a  SQL  statement, an  invisible  flag, and  a  table  name  to  return, and  here  are  two  examples. One  works  and  one  doesn't  work. This  has  SELECT   SYSDATE  FROM  DUAL, which  is  a  standard  Oracle  command to  get  the   current  system  date, and  this  has  an  intentional  error  in  it, dual  X, which  I  know  doesn't  exist. When  we  run  the  first  SQL  statement, we  get  the  system date. Very  good. When  we  run  the  second  statement, we  get  an  error  message. Calling  program is  listed  here, the  error  message is  here. This  is  very  important, along  with  this  code, ORA-00942, and  then  here's  your  SQL  statement, and  this  whole  message is  inside  of  a  text  edit  box, so  you  can  copy  and  paste  it. Here's  an  example  for  debug  output. I  turn  on  my  debug  flag and  when  I  run  my  statement, I  get  an  informational  message —It's  not  an  error, it's  just  informational— showing  the  calling  program, database  connection, whether  it's   invisible  or  not, the  table  name,  and  the  SQL  state, and  then  I  can  click  this  checkbox if  I  want  to  turn  off subsequent  debug  output. So  here's  log _execute _sql, there's  a  description, a description  of  the  arguments, a  couple  of  example  calls, and  then  here's  the  function  itself. Here's  my  arguments, I  check  the  database  connection, I  check  the  SQL  statement, I check  the  debug  flag. If  the  debug  flag  is  on, I  make  a  little  window and  I  display  the  current  SQL in  a  text  edit  box, and  then  I  give  the  user  the  option to  turn  off  subsequent  debug  output. If  they  click  that, I  reset  the  flag  to  zero. Then  here's  the  meat  of  this  function I  force  all  errors  to  go  to  the  log with  batch  interactive  one, then  I  call  log  capture with  either  an  invisible  flag  on or  non- invisible  flag on for  execute  SQL, and  then  I  turn  batch  interactive, set  it  back  to  zero, then  I  check  the  log  window for  ODBC  errors, I  look  for  Oracle  ODBC or  the  word  error  and  I  set  a  flag, and  then  I  use  words and  the   [inaudible 00:25:02] to remind me  to  parse the  output  of  the  log  into  separate  lines. Then  the  error  message  is  always on  line  one  of  this  message. If  we  found  an  error, then  it  displays  the  error. I  have  an  example  here  yet. So  here's  an  example  where  it  says FROM  keyword  not  found  where  expected. If  we  look  at  the  SQL, I  happen  to  know there's  a  comma  missing  here. Let's  talk  about  building  IN  lists. The  Oracle  IN  operator  determines  whether a  value  matches  any  values  in  a  list. Here's  an  example. SELECT  star, FROM  EBA _SALES_ SALES REPS, where  the  last  name is  in  one  of  these  values, and  it's  similar to  the  JSL  contains  function where  here  I'm  saying does  this  list  contain  the  word  Raj? In  this  case  it  was  down  at  position  two. There's  a  caveat  with  the  IN  operator. There's  a  limit  of  a thousand  values. Of  course,  I  wrote  my  own  function, get_sql _in _list, which  gets  around  that  limitation. And  what  it  does, it  builds  an  inlist from  the  list  provided. If  there  are more  than  1000  items  on  the  list, it  separates  them into  1,000  element  chunks connected  via  union to  avoid  the  limit  of  a thousand  items. And  if  the  elements  are  of  type  string, any  single  quotes  inside  the  strings will  be  replaced  with  two  single  quotes, and  single  quotes  will  be  put around  each  item. So  there's  two  arguments. First  one  is  item  list. It's  the  list  of  items  to  create an  endless  FROM, and  then  a  preamble, which  is  a  SQL  string to  preface  the  IN list  with, so  we'll  see  what  these  mean in  this  example. Here's  an  example  where  I  have a  numeric  list, and  my  preamble  is  select  this  ID from  schema  info, where  the  ID's  in  open  parentheses. So  here's  my  call  to  get _sql_in _list, my   id_list,  preamble and  the  output  looks  like  this; SELECT  star  from  my  table  m. The  ID  is  in  here. Select  ID  from  schema  info where  ID  is  in  one  of  these  numbers. If  you  look  at  a  string  example, here's  four  elements  in  this  list. The  first  one  and  the  third  one has  a  single  quote  inside  them, and  here's  my  preamble. When  I  call  get _sql_in_list and  combine  it  in  my  SQL  statement, this  is  my  result. SELECT  star  from  my table  m, where  the  product  name  is  IN and  here  it's  my  preamble, where  alert  name  is  IN, A, B, C,  or  D. And  you'll  notice  for  A  and  C it  replaced  the  single  quotes with  two  quotes, and  it  also  converted  these  double  quotes to  single  quotes  here. Here's  an  extract  of  a  long  example where  I  had  a thousand  of  these  ID  values, and  so  here's  the  first  thousand, and  then  a  union  statement, and  then  the  next  thousand and  so  on  and  so  forth. All  right, here's  tip  number  three. That  is  to  use an  integrated  development  environment for  developing  your  SQL  statements. These  have  a  GUI  front  end and  they  let  you  develop and  debug  SQL  and   PL/SQL. These  are  some  popular  tools that  I'm  aware  of. I  use   PL/SQL  Developer. It's from  All round  Automations. There's  a  tool  called  SQL  Developer from  Oracle  that's  a  freebie, and  TOAD  comes  from  Quest. Many  people  are  familiar  with  TOAD. Let's  have  a  look  at  PL/SQL  Developer. We  saw  it  earlier  when  I  selected from  this  table. I  can  do  things  like  highlight  these, copy  with  header, back  to  JMP, create  an  empty  data  set and  click  on  Edit, paste  with  column  names. Boom,  there's  my  data. Okay. And  I  can  browse  functions,  procedures, packages,  tables,  et  cetera,  et  cetera. Here's  a  little  more on  that  debugging  example that  we  saw  earlier, and  again, a  comma  missing  here. It's just  another  more  explicit  showing of  that  error  message. So  you  take  this,  copy  it, look  at  it,  and  rework  it. Another  tip, this  is   a  soft  tip, and  that  is  to  avoid  inline  comments  using  dash- dash as  it  can  confuse  the  parser. These  are  comments where  the  dash -dash  says everything  after  this  on  this  line is  a  comment. Sometimes, some  situations, the  parser  gets  confused and  doesn't  treat these  properties  as  comments, so  it's  better  to  use slash-star-star-slash. I've  just  seen  a  couple  of  times where  it  didn't  work and  I  traced  it  down to  these  comments. One  more  tip, and  that  is  to  use this  Oracle  SYS _CONTEXT  function to  get  useful  information. There's  a  namespace  called U SERENV  in  Oracle and  you  can  get  the  IP  address, the  client  computer, the  program  making  the  ODBC  call, the  operating  system  identifier for  the  client, the  current  session, operating  system  username, and  the  database  name. There's  many  more. If  you  Google  it, there 's  many  more  parameters, but  these  are  the  ones  that  I  use. Here  I'm  saying  select  IP  address, module,  terminal, operating  system  user and  service  name. Here's  my  call  to   SYS_CONTEXT, and  I'm  just  selecting  it  from  DUAL, unioning  these  together, and  the  results  look  like  this  down  here. Here's  my  IP  address. I'm  calling  from  JMP .exe, no  surprise  there. My  username,  my  database, and  then  my  computer  name. That's  all  I  have  today. The  conclusions  I'll  draw, or  if  you  configure the  Oracle  client  properly, get  the  ODBC  connection  string and  use  Execute  SQL  JMP  in  Oracle can  do  great  things. So  once  again, JMP and  Oracle  equals  a  happy  marriage. Thank  you, are there  any  questions?
The poster summarizes data exploration and machine learning modeling techniques applied to Consumer Assessment of Healthcare Providers and Systems (CAHPS) response data. Through the use of JMP unsupervised machine learning techniques, the presenters will identify patterns in responses. These patterns will be summarized as patient group/profiles which can inform the design of tailored care delivery models.       Hello. I'm  Renita  Washburn,  a  PhD  student at  the  University  of  Central  Florida in  the  Modeling and  Simulation  program. Today,  I' ll  be  presenting… Sorry. I  know  we  have  to  do  it  in  one  session. Dr.  Amon,  really  quick, do  you  want  me  to  say  mine and  then  you  say  yours and  then  I  do  the  title? You could restart. You  could  restart  if  you'd  like  to. We've  got  time. Sorry. You  can  just  introduce  me  to… or  whichever  works  for  you. I  can  go  ahead- I'll  just  stop  that  I'm  in  the  program and  then  you  can  introduce  yourself and  then  I'll  keep  going. Okay,  sounds  good. After  I  say  in  the  school  Modeling, Simulation  & Training  at  UCF then  you  can  pick  it  up,  okay? Okay. By  the  way,  this  will  be, like  I  said,  post  process, so  we  can  all  edit  all  of  that  stuff  out. Okay,  sounds  good. Okay,  I'm  going  to  go  on  mute  again, and  once  I  do,  it's  all  you. Hello. I'm  Renita  Washburn,  a  PhD  student in  the  Modeling and  Simulation  program at  the  University  of  Central  Florida. And  I'm  Dr.  Mary  Jean Amon. I'm  an  assistant  professor at  the  University  of  Central  Florida in  the  School  of  Modeling, Simulation,  &  Training. Today,  we  will  co- present  our  poster on  identifying  patterns in patient  experience  ratings with  machine  learning clustering  techniques. In  this  poster  session, we'll  summarize  the  objectives, method,  and  results from  the  exploration  of  these  patterns in  patient  experience  ratings. Patient  experience  ratings  were  obtained from  the  2019  Consumer  Assessment of  Healthcare  Providers  and  Systems, from  here  referred  to  as  CAHPS, response  data, and  limit  it  to  patients seen  by  primary  care  provider. We  use  JMP's  machine  learning  clustering and  data  preparation  tools to  identify  four  patient  groups based  on  their  survey  responses. Cluster  analysis is  a  machine  learning  technique used  in  many  industries for  customer  segmentation. The  goal  is  placing  customers  into  groups based  on  similarities  within  the  group and  differences  between  the  groups. In  healthcare, exploration  of  customer  segments provides  insights  on  possible  differences in  care  journeys  and  experiences, such  as  disparities  between  race, gender,  culture,  or  health  status. Identification  of  distinct  groups can  inform  the  design of  tailored  care  delivery  models. The  project's  three  objectives were  to,  first, conduct  a  hierarchical  cluster  analysis on  categorical  survey  response  data; second,  identify  clusters through  visual  inspection  of  dendrogram and  color  map  partitions based  on  their  journeys, which  was  measured  by  survey  questions related  to  link  the  relationship with  the  provider, utilization  of  services, and  level  of  care  management; and  lastly, conduct  pos t-hoc  analyses to  explore  differences  among  clusters in  their  ratings  of the  provider, their  overall  health, and  overall  mental  health. Before  we  dive  into  details of  the  methods  and  findings, we'd  like  to  acknowledge  the  US  Agency for  Healthcare  Research  and  Quality and Westat for  providing  the  identified  C AHPS data for  this  effort. The  CAHPS  data  is  used  to  gain  insight into  the  healthcare  experience from  the  patient's  perspective. The   12 selected  questions  are  intended to  capture  a  patient's  journey and  interaction with  your  primary  care  provider over  the  last  six  months. The  questions  again  focus  on  length of  relationship  with  the  physician, how  the  patient  interacts with  the  physician's  office for  routine  and  urgent  care  needs, and  the  level  of  care  coordination for  ancillary  services  requested. Prior  to  initiating  JMP's  clustering  tool, data  preparation, including  assigning the  appropriate  data  modeling  type for  the  survey  questions. The  data  modeling  type  was  either  nominal, yes- no  or  not  applicable, or  null,  a   [inaudible 00:03:43]   scale. Another  data  preparation  task was  reformatting  of  select  questions. The  CAHPS  survey  use  this  scale logic. For  example, one  question  asked  in  the  last  six  months, did  you  make  any  appointments for  a  checkup  or  routine  care with  this  provider? If  no,  skip   to next  question. It  was  determined  that  the  skip  questions were  relevant  to  the  exploratory  analysis. Therefore,  values  are  recorded from  missing  to  zero, which  JMP  refers  to  missing  not  at  random. The  last  step  of  preparations that  we  highlight is  related  to  the  missing  values. Instead  of  addressing  this prior  to  modeling, we  use  JMP's  built-in  missing value  feature  to  impute, to  replace  with  estimates those  missing  values. This  is  an  option  selected from  the  clustering  menu. Given  the  Likert  scale  questions, we  hypothesize  that  the  data was  hierarchical with  likely  subgroups  between  the  data. Hierarchical  clustering with  the  ward  distance   method  was  applied, and  the  output  was  limited to  four  clusters for  ease  of  interpretation. The  ward  method  was  appropriate for  the  categorical  data as  it  did  not  require pure  measure  of  distance. Instead,  it  builds  clusters  based  on an  analysis  of  variants  like  in  Innova. A  color  map  was  added to  the  dendrogram  output to  aid  visual  comparison on  response  differences  across  the  groups. Unique  patterns  within and  differences  between  clusters were  summarized  based  on  low, medium,  or  high  maintenance. Meaning  how  much  access  to  care was  used  by  the  patient such  as  frequency  of  routine and  urgent  office  visits or  contacting  the  office during  or  outside  of  regular  hours, as  well  as  how  well  patients  believe the  office  was  managing  their  care, which  was  weak,  higher,  sufficient, defined  by  ratings  and  follow- up  for  lab and  prescription  needs. The  cluster  output  was  saved  and  assigned to  each  response  for  the  ad  hoc  analysis. Now,  the  primary  focus  of  the  project was  comparing  clusters on  three  key  ratings related  to  the  provider, their  overall  health, and  overall  mental  health. However,  with  JMP, you  can  use  the  cluster  assignments to  explore  the  distribution of  demographic  data as  well  as  other  question  responses between  the  groups. For  future  analysis,  we  recommend exploring  differences  in  age or  race  distributions between  the  maintenance management- based  clusters. We  were  interested  in  understanding if  there  is  a  relationship between  the  cluster  assignment and  the  patient's  ratings  of  the  provider, their  overall  health,  and  mental  health. Visual  inspection  of  the  mean  scores for  each  of  these  three  variables suggested  that  there  may  be significant  differences  based  on  cluster. For  example, high  health  maintenance  patients who  utilize  more  healthcare  services but  also  have  satisfactory  ratings  for  lab and  prescription  management also  appear  to  have  higher  ratings of  overall  health  and  mental  health. If  we  go  to  the  next  slide, these  observations  were  further  examined using  JMP's  contingency  analysis, which  is  a  method for  examining  the  relationship between  two  categorical  variables. We  identify  statistically significant  differences in  provider,  overall  health, and  mental  health  ratings based  on  the  patient  cluster, which  further  highlights  the  utility of  our  clustering  approach in  identifying  meaningful  patient  groups. Overall,  understanding  the  relationship between  each  group's  care  journey and  overall  experience  and  health  ratings can  inform  the  design of  health  care  practices such  as  enhanced  communication  channels during  non- regular  office  hours or  care  navigation  services to  aid  with  follow- up  of  lab and  prescription  management. Thank  you  for  viewing  today's  session. We  welcome  your  questions  and  comments.
Parametric survival models are generally effective for describing personnel movements both within and external to an organization.  The State of Florida has published employee data on a weekly basis for several years, enabling analysis of job changes and separations for approximately 100,000 employees representing a wide variety of professions across the major Standard Occupation Classification (SOC) codes. Further, data collected over the past five years also incorporates the advent of the COVID-19 pandemic, capturing the varying influence of this major event across the professions. JMP Scripting Language (JSL) was used to prepare and analyze this large data set to visualize the divergence in employee behavior between roles and under the influence of the pandemic. Due to the unusually close registration between Florida’s job codes and the federal SOC system, which is linked to Department of Labor salary profiles, these data and analyses provide an open-source and broadly relevant view on personnel behavior in both periods of stability and crisis.     Hello. My  name  is  Thor  Osborn. I  work  at  Sandia  National  Laboratories as  a  systems  research  analyst. That's  basically a  combination  of  operations, research,  and  investigative  reporting. I'm  going  to  present  an  analysis of  personnel  movements pre-  and  post-COVID for  a  large  organization. In  this  case,  the  large  organization is  the  State  employees of  the  State  of  Florida, the  State  of  Florida  government  employees. I'll  say  that  that's  fortuitous because  that's  part of  their  transparency  policy. So  we  can  look  at  that  data,  anyone  can, and  I'll  give  you  the  link  for  that. The  data  that  I'm  going  to  be showing  analysis  of is  from  August  of  2017 through  July  of  this  year. Why  do  this? C OVID-19  pandemic  and  the  mitigations that  have  been  instituted  to  address  that have  been  cited  as  catalysts for  substantive  changes  in  the  workplace. For  example, 5 or 10  years  ago, work  from  home  was  considered an  unusual  opportunity or  a  temporary  thing  to  address some  kind  of  temporary  issue like  a   [inaudible 00:01:20] . It  wasn't  considered  something that  most  companies  would  do  a  lot  of on  an  ongoing  basis. Now,  it's  considered  a  thing that  job  seekers  may  rate  companies  on, if  you  take  a  look  at  the  web, for  example. Also,   a nursing  profession was  especially  impacted by  C OVID-19  response, extreme  hours,  burnt  out. But  that  has,  in  a  lot  of  cases,  led  to  exodus  from  the  profession. F or  many  who' ve  stayed in  the  profession, departure  from  typical  long- term employment  like  working  in  a  hospital in  favor  of  traveling  nurse or  concierge  contracts where  they  get  paid  a  lot  more and  they  have  more  flexible  hours and  aren't  committed  to, say,  12  hours  since  one  after  the  other in  a  hospital. I'll  say  that  the  hospital, our  end  turnover  went  up to  8.4% from  2020  to  2021  to  about  27%  per  year, according  to  the  NSI  National Health Care  Retention  Report. RN  vacancies,  meaning  slots that  hospitals  want  to  fill, have  gone  up  from  about  8%  in  2018% to  17%  this  year. So  basically,  one  in  every  six  nurses who's  supposed  to  be  there to  help  give  care  isn't. Now,  motivation  for  doing  it the  way  I'm  doing  it. Periodic  tabulation  of  movements  or  rates is  a  typical  business  approach to  business  reporting, and  almost  every  company  does  this. But  it  may  obscure underlying  behavior  patterns because  tallies  don't  tell  you the   micro behavior, and  time- to- event  analysis will  enable  a  deeper  look. I  like  to  use  parametric time- to- event  analysis  for  this because  the  parameters  can  be  informative. But  to  do  that, you  have  to  have  a  lot  of  events, and  to  get  a  lot  of  events unless  events  are  extremely  frequent and  need  an  even  larger  population. This  is  fortuitous that  the  State  of  Florida makes  weekly  employee  data  available for  about  100,000  people. Quick  synopsis  of  the  show for  those  with  no  attention  span or just  to  help  me  and  you, the   COVID-19  pandemic  was  implicated as  catalyst  for  many  changes. I  went  over  that. A  longitudinal  examination  of  behavior based  on  the  evidence from  a  large  organization  seems  timely. We  need  to  look  at  these  things. This  is  a  natural  experiment of  magnificent  or  awful  proportions. The  data  available  on  a  weekly  basis, as  I  said, straddle  the  beginning of  the   COVID-19  pandemic. This  is  a  fortuitous  collection. I  started  it  on  Intuition  back  in  2017, but  then  things  happened. The  State  of  Florida's  decision to  build  its  broadband  structure around  a  categorization  system that  mirrors  and  links  up with  the  federal  SOC, or  Standard  Occupation Classification  code  structure, also  provides  a  well  established and  readily  available  frame  of  reference, meaning  you  can  get  it  and  it's  free, and  it's  reasonably  well  worked  out, documented. You  can  look  at  employee  populations, you  can  look  at  hiring,  separations, all  longitudinally  within  that  framework at  varying  levels  of  specificity, and  that's  pre- framed by  the   SOC structure. The  fact  that  they've  melded with  that  structure provides  an  easy  window into  that  level  of  analysis. I  finish  off  with  an  analysis of  the  nursing  profession as  represented  by  registered  nurses. That  demonstrates  what  I'm  calling a  substantive  difference. It's  definitely  visible  in  personnel  flows between  the  pre- and  post-COVID  timeframes. This  is  an  example. I  haven't  been  able  to  go into  the  level  of  depth that  I  would  like  to with  this  analysis  and  this  data  set, but  I  wanted  to  show  at  least  an  example of  what  I  was  talking  about [inaudible 00:05:36] . Again,  just  to  really  beat  this  one  down, an  unusual  opportunity. Typical  practice  in  HR is  to  frame  salary  structures  in  context with  other  similar  organizations. Salary  information  is  generally  compiled by  a  consulting  firm  in  HR from  a  collection  of  organizations that  chose  to  participate in  a  defined  pool for  survey  and  referencing  purposes. They  don't  do  that  for  free. It  costs  a  substantial  amount  of  money. Now,  the  BLS  also  compiles  salary  surveys of  its  own  on  a  national  and  state  basis with  jobs  categorized by  a  standard  structure, which  they  call  the  SOC. That  data  can  be  downloaded  for  free, and  the  State  of  Florida has  referenced  its  structure  to  that. It's  fortuitous  in  a  way  for  them, because  they  don't  have  to  pay for  seller  surveys  if  they  don't  want  to, because  it's  all  referenced  against the  federally  established  free  data  set. Now,  I'm  going  to  show  if  I  can. Here's  just  a  table. Apologies  if  it's  a  little  small. A  table  showing  broadband  code  right  here, 10, that  means  Executive,  1011-03, Chief  Executives. The  point  is, everything  in  the  Florida  set, this  is  about  3,000  codes. Except  for  a  few  recent  ones, everything  in  the  Florida  set is  referenced  against  this. The  first  six  digits  are the  six  digits  in  the  SOC  code. The  first  two  are  the  major  code. It's the job  family, like  10  is  Executive s,  11  is  Management, 13  is  Business  jobs, and  then  a  four- digit  code for  more  specificity. In  the  case  of  Florida, they  have  an  extra  two  digits which  denotes  a  job  level within  their  salary  structure. But  this  framing  allows  you  to  link  things back  to  the  SOC. What  did  they  give  you? They  give  you  an  agency  name of  which  there  are  33  state  agencies; budget  entity, an  office  within  the  agency; a  position  number, that's  a  position  within  the  agency; employee  names. I'm  not  showing  you  that because  I  feel  uncomfortable even  though  when  you  download  it, you  can  obviously  see  who's  who, whether  the  person is  salaried  or  exempt  hourly, or other  personal  services they  call  them, full-  or  part- time. A  class  code  which  is  a  code that  indicates both  the  profession  and  the  level. A  class  title  which  is  essentially the  same  thing  in  words. State  hire  date,  which  is  the  first  date that  the  individuals  hired  by  the  state. They  could  have  had  many  terms of  employment,  come and  left , but  the  state  hire date  is  a  fixed  point in  time  for  each  person salary  or  hourly  rate if  the  person  is  doing  an  hourly  job. Again,  this  is  freely  available at  the  link  noted  on  the  screen. Just  for  a  bit  more  framing, long- term  view  of  wages in  the  State  of  Florida, looking  at  BLS, Bureau  of  Labor  Standards  data  for   SOC, is 00-000,  just  a  weighted all  occupations  number. It  covers  everything. These  are  a  lot  of  people, so  I  don't  have  error  bars, 130 million  people  nationally and  7  million  employees  in  Florida. What  you  see  is  that  Florida's  salaries , the  blue  line, are  typically  less  than  national, but  they've  been  tracking  pretty  closely. There's  not  been  much  relative  change in  a  long  time  except  for  this  past  year. Sometimes  there  are  revisions. I'm  not  going  to  say this  is  necessarily  meaningful. If  it  is  a  real  difference  then  obviously be  interesting  to  know  about  that. I  haven't  seen  anything reported  about  that  though, so  I  can't  give  you any  further  insight  on  that. If  you  look  at  Florida  State  employees versus  typical  Floridians, I  don't  have  enough  data  in  the  set to  really  say  very  much, except  for  it  looks  like  being a  state  employee  is  fairly  attractive, at  least  if  the  jobs are  typically  comparable. There's  no  overriding  incentive for  people  who  work  for  the  state  to  leave to  go  into  the  private  sector  there based  on  this. These  are  for  median  salaries, annual  salaries. Looking  at  the  Florida  State  employee population  totals  in  the  green  line  here starting  at  around  100,000 for  exempt  staff, doesn't  include  the  hourly  folks in  either  case  here  or  here. Looking  at  separation  rates and  hiring  rates as  nine- week  moving  averages to  be  about  two  months as  a  centralized  moving  average, with  JMP's  usual  capability for  handling  the  endpoints. What  you  see  is  that for  a  fairly  long  time, except  for  this  spike, which  again,  I  haven't  found  anything to  explain  in  the  literature, nor  in  HR  reports  published  by  Florida. Pretty  constant. After  the  pandemic  hit, there  was  a  long  time  period where  the  hiring  rate was  below  the  separation  rate. So  people  were  slowly  leaving  Florida. You  can  see  that  here in  a  downward  slope  on  the  green  line. And  then  just  this  year, that  stopped  and  began  to  reverse. Now,   to  be  clear, the  population  is  only  salaried  workers, only  those  holding one  salaried  state  position  at  all  times. Anybody  with  two  salaried  positions was  removed because  it  could  be  a flawed  data or  it  could  be  a  very  ambitious  person. But  I  can't  handle  that with  the  time- to- event  data because  it's  hard  to  understand  exactly what  a  separation  means when  you  still  have  a  job at  the  same  place. But  it's  only  less  than  half  a  percent of  the  total  people, so  it  shouldn't  be  a  huge  perturbation. Now,  when  I  show  this  is a  bit  of  a  demo  as  well. Florida  State  personnel flows  by  SOC major  code. But  you  can  see  on  the  right  table… Here's  the  population by  SOC  major  code, every  individual  grouping  over  time. This  is  code  43. That's  Office  Administrative  Assistants. What  I've  done  is  I've  used the  hide  and  exclude  capability to  remove  everything  except  for  six  codes, which  are  the  largest  codes. You  see, the  Administrative  Assistants  is  43. And  also  down  here,  19  for  Life and  Physical  Sciences  is  included, Business  is  included,  Manager is included. What  I'm  trying  to  say  here  is  simply that  this  is  only  including  six out  of  something  like  20  or  so major   SOC codes. But  these  are  the  largest. Using  graph  builder,  it  only  shows  those. That's  really  all  it  amounts  to. Business  and  Finance, Community  and  Social  Services, that  we  code  21,  code  19,  code  11. Now,  one  thing  you'll  see  with  Manager is  that  the  hiring  rate  is  always quite  a  bit  less  than  the  separation  rate, and  yet  the  net  number  of  managers is  roughly  the  same, and  that's  because  only about  half  of  the  managers come  from  external  sources, a  lot  of  them  come from  internal  promotions. You  see  this  population  over  time, despite  the  vast  difference in  separations  versus  hiring, that's  simply  because  about  half  of  them come  from  internal. Now,  you  can  also,  again, as  I  was  saying  earlier, you  can  do  detailed  codes and  the  same  principle  applies. All  I've  done  here  is  I've  only  included three   SOC detail  codes. The  29  major  code, which  is  Medical  Professionals, the  31  which  is Support  Folks  in  Medical  Work, and  then  back  to  29  again for  Registered  Nurses, but  this  is  the  Nurses and  Nursing  Assistants  taken  together. This  code  is  no  longer  used and  hasn't  been  for  a  while. But  Florida  set  up  its  code  system about  two  decades  ago and  so  it's  been  kept  in and  they  use  it  even  though  it  isn't  part of  the  standard  SOC  now. But  the  bottom  line  you'll  see  from  here is  that  Florida  is  not  attracting enough  nurses  to  compensate  for  attrition. If  you  look at  the  State  of  Florida  HR R eports, what  you'll  see  there  is  that they  think  most  separations are  voluntary,  about  92%. The  number  of  authorized  positions in  the  health  agency has  only  been  reduced  by  about  5% in  the  last  several  years, and  yet  the  number  of  RNs has  dropped  by  about  a  fourth. You  can  see  that  the  number  of  nurses is  falling  rapidly compared  to  the  allocation of  nursing  spots. If  you  go  to  the  State  of  Florida  website and  look  for  a  job  in  nursing, you'll  see  that there's  plenty  of  opportunity. They've been trying to hire. Now,  I  am  going  to  show some  time- to-e vent  analysis. I'm  not  going  to  show  the  script  work that  generated  the  data  set  for  this, because  although  I  find  it  fascinating, I  know  that  a  lot  of  folks don't  do  scripting. It's  essentially  an  inference between  who's  there  and  who  wasn't. If  you  go  from  one  week  to  the  next and  people  disappear and  you've  allowed  for  the  fact that  people  do  name  changes  sometimes, which  requires  coming  up with  a  different  way  of  IDing  people to  straddle  the  difference. Once  you've  accounted  for  that, then  they  must  have  left. Having  left,  that's  a  separation. They  can  also  get  promoted, and  you  can  see  that because  one  week  they  have  a  job, and  then  the  next  week they  have  a  job  that  pays  better, often  the  same  general  line, but  with  a  different  title. Capturing  those  movements is  a  bit  of  work but  it's  pretty  straightforward,  really. What  you  see  here,  I  tried  to  capture four  different  kinds  of  events:  demotions, a lateral  to  another   SOC, could  be  moving  out of  the  nursing  profession, but  nevertheless haven't  changed  their  salary  much, promotion,  or  separations. Separations  is  obviously the  dominant  factor  here in  terms  of  total  counts. I'm  using  the  Weibull  typically because  I  find  it  more  informative and  it's  not  a  bad  fit. Post-COVID,  you  see  a  very  similar  curve, more  promotions, relatively  speaking. That's  interesting. Now,  here  is  the  detail  in  tabular  form so  that  you  can  see all  the  different  pre-  and  post- cases for  the  major  movements,  lateral  movement, promotion,  and  separation. What  I'm  talking  about  though, let's  just  go  back  to  pre-COVID. Here's  a  subset  of  Exit  Events, essentially  exit  from  the  status  as  an  RN to  whatever  they  moved  to. Just  to  make  it  clear  what  was  done  here. If  you  relaunch,  what  you  see  is  that  I have  a  Censor  column, just  ones  and  zeros. The  Exit  Event  is  however  they  exit, or  if  they  didn't  exit,  then  it's  just an  active  person  in  the  field and  they  are  not  marked with  a  censor  code. The  Employment  Segment  Span, which  is  how  long  have  they  been  employed in  that  particular  segment  of  employment. Now, see  that  the  number  of  laterals  is really  small  compared  to  everything  else. Promotions  is  definitely  visible. Another  thing  you  can  see  if  you  go  down and  look  at  separations is  that  the  Weibull  beta, which  you  can  think  of as  the  acceleration  factor, even  at  the  high  end  of  the  95%  limits, it's  still  below  unity, and  below  unity  means  that people  are  less  likely  to  go  through that  transition  as  time  goes  on, less  likely  to  separate the  longer  they've  done  there. That's  straightforward. You'll  see  it  here. In  fact,  that's  also  through  post-COVID, same  basic  beta  factor or  parameter, rather. Now,  I'm  going  to  show  the  post-COVID. Again,  this  is  the  same  basic  analysis. This  is  what  happens when  you  do  live  demos. Something  goofy  with  this  one. Now  it's  giving  me  grief. Here,  you  see it's  basically  the  same  thing. Lateral  is  distorting  because  the  lateral, there's  only  two  counts. If  you  just  get  rid  of  that , you  can  see  a  much  more  clear  picture. What  you  see  is  that the  Promotions  piece  is  moving  up  faster, 50  versus  744,  whereas  pre over  a  longer  time  spent, it  was  about  50 for  about  1,200  separations. There's  a  predominance  is  shifting  there. Going  back to  the  more  convenient  layout  here. Pre-COVID  promotions  were  in  this  range where  beta  was  a  little  over  unity. But  the  95%  limits  basically  tell  you that  that's   ambiguous. It  could  be  really  anywhere between  a  bit  below  and  a  bit  above  unity. Post-COVID,  it's  about  1.29, and  within  these  95%  limits, always  above  unity. In  other  words, it's  accelerating  with  time. The  longer  you  go,  the  more  likely you  are  to  go  through  a  promotion if  you  stay  in  that  job. Here  with  the  lateral  movement, there  really  was  never  enough  counts to  do  much  of  anything  with  that. The  limits  are  very  broad. I  wouldn't  put too  much tal k  in  that,  regardless. Now,  if  you  put  these  on  a  common  scale just  to  make  sure that  this  isn't  too  confusing,  I  hope. You  see  very  similar. I've  shifted  the  color for  the  post-COVID  case  a  little  bit. On  a  similar  scale, if  you  didn't  superimpose  these, promotions  are  clearly  accelerated, p ost -COVID. Clearly  a  bigger  impact, they're  more  opportunity. We  do  know from  many  news  reports  that  people who  are  closest  to  retirement, often  within  the  COVID complications  and  changes, simply  moved  forward with  retirement  more  quickly because  they  wanted  to  get  out other  than  deal  with  things. There  is  a  shift  here with  the  separations, and  it  does  look  real, but  it's  also  small  enough compared  to  the  overall  magnitude that  it  isn't  quite as  obviously  different. In  either  case, the  separation  rate  is  similar and  not  changed  overly  much. This  is  a  factor  of  two. This  is  a  factor  of  a  few  percent. To  conclude, wages  in  Florida  have  run  lower than  national  values  typically over  the  last  decade, but  haven't  proportionally  changed  much. There  certainly  doesn't  seem to  be  any  obvious  change in  Florida  salaries that  would  cause  people  to  suddenly  leave. The  State  of  Florida's  registered  nurses have  enjoyed  greater and  earlier  promotion opportunities  post-COVID. But I  think  it's  also  worth  noting  here that  they  work  in  a  health  organization for  the  State. This  is  not  a  State  hospital. This  is  a  health  management, health  support, health  education   activity. It's  not  24/7  in  a  hospital. That   moderates  expectations. But  you  might  expect the  separation  behavior among  their  RNs  would  change because  opportunities  have  changed in  the  private  sector. There's  a  lot  of  demand. On  the  other  hand,  state  employees, they  might  be  thought  to  be  comfortable. I  was  expecting  my  hypothesis  was  that they  would  be  more  likely  to  separate, but  that  didn't  happen. There  really  is  no  apparent  difference. Now,  this  is  not  a  complete  bibliography of  everything  that  I  read in  the  last  five  years, before  and  after  COVID that  may  have  influenced  things. This  is  just  a  handful  of  things. I  thought  they  were  fairly  telling. The  National  Healthcare  Retention & RN  Staffing  Report is  a  fairly  thorough  assessment of  what  people  expect in  hospital  administration and  what's  actually  been  happening in  terms  of  the  employment and  the  separations, the  turnover  behavior  of  nurses. Three  State  of  Florida  annual  reports. They  do  an  annual  report  on  a  fiscal  year that  straddles  two  calendar  years. The  last  one  available  is  2020. But  essentially, they're  simply  reiterating  that, yes,  they  have  a  number  of  open  slots, they  don't  have  them  all  full. Employment  and  nursing  is  dropping. They  don't  have  any  explanations for  these  things. I  also  don't  have  any  explanation for  the  spikes and  activity  earlier,  pre-COVID. I  have  a  question  into  someone in  the  Governor's  office  there, but  I  haven't  heard  back  yet. That's  basically  all  I  have for  this  presentation. I  would  be  happy  to  entertain  questions. The  slides  show  at  the  beginning, there's  my  email. You  can  contact  me  if  you  want. Thanks.
We don’t live in a static world. Dynamic visualization and visual management are essential elements of Lean Six Sigma; they link data and problem solving. As with detective work, it is important to be able to spot clues and patterns of behavior in a situation. Establishing a visual environment enables rapid processing of large data sets, which leads to quick detection of trends and outliers. The goal of Lean is elimination of waste. Waste is present in many forms, such as waiting for information, moving data to multiple sources, and over-processing data. Data visualization allows for reduction of these waste streams.   This presentation provides a real-life case study where JMP is utilized to help “move the data to a story” in a visual way that aids in communicating information, eliminating waste, and driving continuous improvement. This case highlights the use of JMP tools, such as Excel import, Query Builder, Graph Builder, data filters, control charts, basic modeling, reporting, and dashboards. The presentation also explains how visual management helped engage and empower employees throughout the organization.     Hi,  everyone. Thank  you  for  joining  us   for  our  presentation of  From  Data  to  Story: Using  visualization   to  drive  continuous  improvement. My  name  is  Allison  Bankaitis, and  my  co- presenter  is  Scott  W ise. A  little  bit  about  myself. I  currently  supervise  a  small  team of process engineers at  Coherent  Incorporated, but  I'm  still  very  involved  in  the  daily  process  engineering  efforts. Previously,  I  held  various  process  engineering  roles at  Corney  Incorporated, and  I'm  very  excited  to  show  a  case  study of  how  we  view  some  of  these  JMP  tools  in  our  process  engineering  work. A  little  bit  for  Scott. Thank  you,  Allison. I'm Scott  Wise. I'm  from  JMP  in  support Allison's  JMP  usage, as  well  as  other  customers in  the  Northern  California  area. And  I'm  just  real  excited  to  be  a  part   of  this  really  cool  case  study. Hopefully,  you'll  pick  up  a  lot  of  best  practices and  tips  from  some  of  the  things  that  helped  us. All  right. I  placed  the  abstract  here   for  future  reference, but  just  wanted  to  highlight  a  few  things. Coherent  has  placed a  recent  focus  on  Lean, which  aims  to  eliminate  waste. Tools  from  JMP   have  aided  data  visualization, which  in  turn has  enabled  reduction  of  waste. Another  advantage  of  these  tools  is  the  ability  to  engage and  empower  employees  throughout  the  organization. These  areas  will  be  the  focus  of  this  presentation. Our  first  section  is  about  eliminating   waste  in  the  data  collection  process. In  this  case  study, we  had  a  data  collection  process with  unnecessary  complexity. It  used  to  take  20  minutes  to  process  one  part. So  to  do  this,  we  had  built  a  data  query  and  access. This  is  just  a  screenshot  here  showing  an  example  of  a  few  data  tables where  we  combined   variables  from  various  tables to  get  the  output   that  we  are  looking  for. And  then  we  used  a  macro  to  pull  data for  an  individual  part  into  Excel. This is again, just  a  screenshot  of  an  example  database  connection  in  Excel and  the  code that  we  would  write  in  Excel. This  was,  again,  done  for  each  individual  part. From  that  data,   then  we  could   attribute  data  in  Excel. We  could  then  calculate  average  values  of  each  attribute. We  would  then  pull  additional  data  from  our  MES  website, such  as  part  type  or  other  items  listed  here. Then  all  that  data  was  copied  into  an  Excel  summary  log. So  we  maintain  the  log, but  we  weren't  really  doing  anything  to  track  or  analyze  the  data. With  JMP,  I  was  able  to  streamline  the  process. I  built  this  framework  in  about  an  hour and  reduced  process  time  to  five  minutes  per  part. And  in  this  case, this  is  just  for  one  engineer  myself, on  one  product  that  I've  worked  on. But  if  we  can  extend  this  to  multiple  products and  multiple  engineers, we  could  really  gain  a  lot  large  savings  of  time. So  to  do  this,  I  built  a  data  query in  JMP, which  included  both  the  attribute  and  MES  data  in  one  location. Just  a  screenshot  of  there'd  be several tables  here  pulling  in  the  data, the  different  variables  here, we  can  do  some  initial  filtering  in  the  data  query. So  in  this  case  I  selected  a  time  frame  that  I  wanted  to  focus  on and  then  a  subset   of  variable  that  I  down  selected, so  I  don't  manage  both  data  set. And  then  I  can  build  the  data  table  here and  always  clean  up   more  of  the  data  later  on  as  necessary. After  I  had  the  data, I  replicated  some  of  the  charts  that  we  already  had  in  Excel, just  made  them  very  similar  so  that  people  could  see what  they're  used to  dealing  with  for  the  time  being. After  that,  again, I  built  the  summary  table  to  replicate   what  they  were  used  to  seeing. I  calculated  the  average  attribute  data and  merged  the  original  table and  the  tabulated  table  into  one  summary  table so  that  they  could  have  an  output  what  they  are  used  to  seeing. The  next  thing   is  to  take  this  data and  move  from  data  to  story. So  to  do  this,  the  first  thing  I  was  curious  to  know was  what  does  the  data tell  us  about  current  performance. So  I  plotted  the  data  over  time  as  my  first  aspect and  I  will  show  that  to  you  in  JMP. So  just  using  this  graph  builder and  the  timestamp  that  I  chose and  then  the  main  output  that  I  started  looking  at, added  that  to  the  chart  here. What  I  used  to  do  is  manually  go  in  here and  add  reference  lines  using  this  field  here. But  what  Scott  showed  me,  which  is  really  neat and  then  extends  to  all  of  the  graphs, is  that  you  can  add  it   directly  to  the  data  table, you  can  add  the  spec  limits. You  just  go  into  the  Variable  of  Interest, Column  Properties and  go  down  to  Spec  Limits  and  add  the  values  in  here. This  is  checked  so  that  you  can  see the  graph  reference  lines  on  each  graph. So  once  I  had  that  output, I  could  see  that  there  is  a  large  amount   of  variation  in  the  data and  many  of  the  values   were  outside  of  the  spec  limit. So  the  next  thing  I  wanted  to  do was  compare  additional  variables. So  to  do  that  pretty  quickly,  I  was  able  to  just  add the  column  switcher   to  this  graph  I  already  had by  going  here  and  selecting  the  variable  I  wanted  to  change along  with  the  other variables  that  I  trust. Then  from  here  I  can  quickly  click  through  all  these  variables and  see  the  variation  in  each  one. Next  for  me, I  have  some  process  knowledge and  I'm  sure  you  would  have  process  knowledge  of  your  situation  as  well. Based  on  this  process  knowledge, I  was  able  to  select  a  variable that  I  thought   might  be  driving  some  of  the  variation. In  my  case, I  thought  that  X3   might  be  responsible for  driving  some  of  these  trends  that  I  was  seeing. I  put  that  into  our  graph  here. The  other  piece   of  process  knowledge  that  I  have is  that  our  spec is  based  on  the  average  value  for  each  part. I  changed  this  to  mean  and  then  I  was  curious  to  see the  line  of  fit  over  time, so  I  put  that  and  added  that  here  as  well. The  next  thing  I  was  interested  in  seeing is  a  little  bit  more  about  performance by  looking  at  a  control  chart. To  see  the  control  chart,  analyze  quality   and  Process  Control  chart  builder, and  I  was  curious  to  see  it  against  X2 , which  is  a  part  number and  that  same  variable  that  I've  been  looking  at. A gain,  I  was  going  to  split  it  out  by X3. From  here,  we  can  see  there's  a  shift  in  the  average based  on  which  subset  of  X3 . A lso  the  thing  that  was  obvious  to  me is  that  the  sample  sizes  were  uneven. To  me,  knowing  the  process, I  know  they  should  each have  10  collections  of  data  for  each  part. So  based  on  our  process,  I  said, well, to  get  an  initial  look  at  the  performance, I'm  going  to  limit  to  only   parts  that  have  10  measurements  per. To  do  that,   we  made  a  new  data  table, cleaned  up  the  data  again, and  once  I  had  this, I  recreated  the  control  chart with  just  a  small  change. Then  here  I  added  a  local  data  filter to  have  X3 split  out  on  two  separate  graphs. That  was  my  learning. Now  I  can  see  these  upper  and  lower  control  limits and  this  process  capability  chart, since  now  I  have   the  even  subgroup  sample  size  of  tech. That's  where  I  will  hand  it  over  to  Scott. Thank  you  very  much. All  right. I'm  going  to  pick  up  with  the  rest  of  the  story. Allison  has  done  a  great  job  of  understanding where  the  current  performance  of  her  process  was. But  we  also  thought  there  might  be some   other  key  variables  within  her  data that  could  be  useful  for  explaining  these   differences  we're  seeing  in  the  output. One  of  the  things   that  we  tried  was  actually  a  modeling  tool that's  very  simple  and  often  used  to  screen  for  important  variables. It's  called  a  partition. In  this  partition, all  you  have  to  do  is  of  course, you're  going  to  pull  up  your  data and  then  it  is  under  predictive  modeling. People  call  this  a  decision  tree, and  I'll  show  you  why  when  we  start  to  fill  it  out. But  all  you  got  to  do  right  now  is  give  it  an  output  that's  our  21 there, X 21, and  get  it  the  inputs  we  want. I'm  going  to  put  all  the  inputs  in  except  for  X2 , which  was  the  kind  of  a  part  ID. I'm  going  to  remove  that  one. There  was  another  one  that  Allison  recommended  I  removed, given  her  process  knowledge,   and  that  was  X8. But  we'll  leave  all  the  others  in. When  I  say  Okay, it  brings  up  the  start  of  a  decision  tree. What  it's  doing  is  saying,  I  can  make  a  bunch  of  splits and  I'm  going  to  look  at  all  the  inputs and  I'm  going  to  try  to  find  a  cut  point. We'll  breaking  basically   any  of  those  variables  into  two  groups. Will  that  give  any  explanatory  value toward  the  differences I'm  seeing  in  the  output? In  this  case,  X 21. So  if  you  make  the  first  split, it's  saying  that  I've  explained  27% of  all  the  difference you're  seeing  in  the  output via  just  splitting  X 19  at  500. If  it's  greater  or  equal  to  500, I'm  going  to  have  a  much  lower  mean  of  12. 67. If  it's  less  than  500, watch  out,  it  jumps  up  to  13. This  is  really  cool  to  find  other  things   I  might  want  to  split, break,  view  on  my  graphs. You  can  continue  splitting  and  it  will  look  at  other  variables like  X3  came  into  place  here and  Allison  already  knew that  was  going  to  be an  important  variable. A s  you  keep  splitting, you  can  see  it  starts  to  add in  terms  of  the  predictability. This   RSquare, the  closer  to  one,  the  more  predictable. So it's  like  56%  predictability  here. I've  gone  ahead  and  done  that, I'll  show  you  what  that  view  looks  like. Here's  the  finished  view   I  came  up  with. I've  got  these  nice  big  column  contribution  bars  here  at  the  bottom. You  can  see  that  X 19  got  split. Actually  found  five  cut  points  for  X 19, but  52%  of  all  the  splits  it  was  doing  involved  X 19, so  it  gave  it  a  nice  big  bar. The  next  three  would  be  next. Then  everybody  else   was  very  small  contribution or  no  contribution. It leads  us  to  say, "Hey,  X 19  might  be  important   and  it  reinforces  X3  being  important." Now  that  we  have  that  information, well,  how  confident  are  we  that  these  things  do  belong in  our  study? Here  it  would  be  nice to  look  at  X 21  by  X 19  broken  out. This  one,  of  course, is  going  to  be  just  simply  going  back  into  our graph  builder. This  chance  we  can  put the  X 19  down  on  the  bottom  axis so that would be the only X. Let's  go  ahead  and  put  our  X 21  right  there  on  the  Y. We  can  break  that  out  by  the  X3  variable,  which  is  pretty  cool. Now,  one  thing  we  might  want  to  do, X2 was  the  part  ID. We  can  give  it  some  color  or  some  overlay. Either  way,  I  think  I  will  just  go  ahead and  give  it  some  color  here and  I  will  turn  off  the  line. That's  helpful. But  what  would  be  helpful is  to  use  that  local  data  filter that  Allison  showed, in  case  they  want  to  really  look   at  a  specific   sequence  of  parts. I'll  go  under  the  red  hotspot  there, that  red  triangle. I'll  go  local  data  filter  and  then  we'll  add  the  X2, and  beautiful. Now  we  can  go  and  just  change  up our  view  by  that  local  data  filter. That  was  a  cool  view  that  we've  got. I  can  see  that  it's  making  a  lot  of  differences  there. Now  one  thing  you  might  ask  is  could  we even  model  this? Before  I  even  go  and  model  it   so  we  can  make  some  predictions, how  sure  am  I  that  X3 and  X 19  really  are  affecting  X 21? Well,  we  can  actually do  a  statistical  test. We  can  test  means. The  way  we're  going  to  do  that  here is  we  are  going  to  go  back  to  our  data. We're  just  going  to  go  to  Analyze, fit  Y  by  X and  now  we're  going  to  go  into  our  output. We  want  to  look  at  things  either  the  effect  on  X 21 by  those  things  we  care  about,   X3  and  X 19. I'm  going  to  put  them  both  in  here and  it's  going  to  give  me  some  different  views. It's  going  to  enable  me   to  compare  means  in  this  one  way  analysis. I'm  going  to  right  click, I'm  going  to  turn  on  the  means  test. I'm  going  to  right  click  here. I  even  like  this  all  pairs  too  key. I'm  going  to  adjust  our  axis  here. It's  got  these   cool  means  diamonds. The  middle  of  your  diamond  is  the  mean. The  edges  is  your  95% confidence  around  the  mean. The  way  it  works, if  you  would  slide  these  things  over, would  they  overlap? It  looks  like  they  would  pass like  ships  in  the  night. There's  no  overlap. As  well  as   you  got  these  comparison  circles, you  can  click  on  one  and  see  if  the  other  one  turns  a  different  color. A ll  this  is  based  off  a  0.5  alpha. What  does  that  mean? That's  your  confidence so  that's  95%  confident. We'd  be  right  95  times  out  of  100  to  say input  X3  does  have  the  level  there, is  having  an  effect   on  what  my  observed  measures  are  for  X 21. Given  that  before  I  go   and  try  to  fit  a  line  or  a  curve  line, I  can  go  under  this  red  triangle  hotspot. I  can  go,  you  know  what, let's  go  ahead  and  group  by  X3 . Now  when  I  go  back  under  this  triangle  option and  I  go  to  fit  a  line,  or  in  this  case,  I  know  there's  a  little  curvature, so  I'm  going  to  fit a  quadratic  line or  polynomial  line. Now  it  broke  it  out  by   X-3 so  I'm  really, really  excited about  that  one. The  blue  line,  which  is  his  first  version, 3-0,  there's  the  formula  for  it. It  only  has  20%  explainability. It's  not  a  great  fit, but  you  can  see  that  jumped  up to  near  70%  predictability for the 3_1. It's  telling  me  that  I've  got  not  only   significance  in  saying  X3  is  different and  I'm  seeing  a  difference when  it  comes  to  X 19  by  X 21, but  it  matters  for  X 19   what  level  of  X3  we're  talking  about. That's  why  the  red  line  and  the  blue  line  are  not  on  top  of  each  other. Therefore,  that's  an  interaction. If  I'm  going  to  try  to  predict  something,   I  need  to  include  that. So  at  this  point, I  think  I  have  all  that  we're  going  to  need to  do  to  get  in  the  hands   of  Allison  Spears, a  really  cool  tool  that  can  help  them predict  what  the  output  is  going  to  be based  on  settings  of  X3  and  X 19. You're  seeing  on  the  screen  a  profiler that  comes  off  our  modeling  platform, and  it's  very  easy  to  go  and  set  up. If  we  go  back,  I'm  going  to  go  to  the  fit  model  here. We'll  do  our  output  for   X21  again. Under  my  inputs,  I  know  X3   and  X 19, there  it  is,  are  very  important. I  told  myself  X3  and  X 19  might  need  to  be  crossed, I  might  need  to  see  those  interactions. I  know  for   X19,  there's  some  curvature. The  way  I  would  check  for  this  is  I  go,  I'd  select  X 19, I  go  under  this  macros,   and  I'd  say  polynomial  two  degree. I  have  it  set  at  two  so  I  would  get  this  curve  term,  polynomial  term  here. There's  the  interaction, and  these  are  the  main  effects, so  it's  really  two  factors, but  it's  the  four  things  in  my  model. So  I'm  just  going  to  run  it   and  it's  going  to  try  to  fit  a  line. This  should  look   very  much  like  the  fit  Y  by  X. It's  only  really  explaining  52%. This  model  is  only  explaining  52% of  the  differences  I'm  seeing  in  x 21. Not  perfect,  but  it's  pretty  much, think  about  it, just  for  having  two  factors   and  their  interaction in  one  of  their  curve  terms,  that's  pretty  good. But  what  I  can  do  now  under  that  red  hotspot  is  turn  on  the  profiler. This  is  worth  the  price  of  admission. This  right  here  is  going  to  enable  Allison  and  her  team to  sit  there  and  talk  about   what  settings  we  should  have. S hould  I  be  at v3_1 ? Should  I  be  at  v3_0 for  this  x_3  input? Should  I  be  low  or  high? A gain,  it  shows  that  interaction  live. For  example, I'll  shift  this  color  here. Watch  what  happens  when  I'm  low. I'm  sorry,   not  low  but  high  on  X 19,  I'm  way  out  here  to  500. By  the  way,   you  can  type  in  what  you  care  about. Maybe  I  want  to  see  what  it's  at  480. Look  how  flat   that  line  is  between _0  and  _1. It doesn't  really  matter   which  one  I  select. I'm  going  to  get  the  same  kind  of  prediction. The  red  is  my  prediction, and  the  blue  around  it  here is  my  confidence  interval around  that  prediction. Of  course,  this  wouldn't  be  good because  I'm  right on  the  lower  spec  limits. Watch  what  happens  when  I  start  to  pull  it. Well,  I  might  be  happier  here   with  version  3-0  in  a  setting  around  350 because  that  gets  me  close  to  the  targets. But  if  I  keep  going  up  here, you  see  how  steep  this  line  is  begin, and  I  definitely   don't  want  to  be  on  version  3_1. Because  it  has  a  steeper  line and  it  has  made  this  slope  very  steep. It's  all  coming  out,   but  it's  interactive  in  this  profiler and  now  we  can  play  with  what  would  be  the  right  settings for if  I  had  to  stay  with  version  3-1. If  I  go  to  version  3-0, what  would  be  the  right  settings  here? They  might  be  different  settings. There's  always  multiple  optimal  settings  you  can  select. This  is  really  cool. We  now  have  the  ability  to  predict. All  right. Continuous  process  improvements. All  this  was  great. We  now  have  a  faster  way to  get  our  analysis  done. We've  gone  through  a  flow  that  enable s us  to  find  what's  important and  see  what's  important. But  what  if  we  want  to  use   that  information  to  monitor  over  time and  continually  improve  our  process? It  might  be  nice  to  have   for  different  levels  of  X-3  a  dashboard. Allison  and  I  worked  to  create   a  standard  type  of  dashboard that  her  team  is  used  to  seeing. They're  used  to  seeing control  charts  first, and  then  the  process  capability  around  their  specs. Then  next  they  would  want  to  see the  output  over  time. That's  the  top  chart  in  the  middle  there and  then  below  that  if  there's anything   else  they  should  worry  about. That  was  our  big  finding,  that, "Hey,  x 19  has  an  effect," so  they  would  want  to  see  that. Lastly,  on  the  right  hand  side, we  put  a  table  with  the  average  means for  the  output  of  interest,  plus  even  some more  outputs  they  like  to  take  a  look  at. Of  course,   we  want  this  to  be  interactive. So  how  can  we  build  this  dashboard for  level  zero  and  level  one? We're  going  to  bring  up  our  data  here. I  think  I  already  have  it  opened  up  here. I  will  go  now   and  just  create  in  just  one  swoop. This  is  why  it's  nice  to  be  able  to  save  your  graphs,  your  analysis  back  to  data. I'm  just  going  to  click  and  create   a  whole  bunch  of  views  here that  are  going  to  replicate  what  the  team  wants  to  see. Here  is  that  control  chart  builder   for  the  X  bar  and  R. Next,  we  have  the  process  capability. Next,  we  have  that  output  over  time. Next,  we  have  the  output   over  the  X 19  that  we  wanted  to  show. Now  we  have  the  table. I  have  all  the  elements, and  if  you  have  all  the  elements, you  don't  have  to  save  them  back to  make  someone  run  them  one  at  a  time. You  can  combine  them  into  a  dashboard  template and  it's  under  File,  New  Dashboard. It  will  allow  you  to  pick some  type  of  template  to  start  off  with. I'm  just  going  to  pick this  blank  template. Now  it's  got  all  my  reports,   all  the  graphs  and  tables and  things  I've  opened  on  the  left. Now  I  can  just  bring  into  the  body  of  the  dashboard  what  I  care  about and  I  can  orient  things  the  way   I  would  like  to  see  them  on  my  dashboard. When  I'm  done,   it's  easy  to  go  and  run  that  dashboard and  then  later  save  that  dashboard  when  I'm  ready. But  I've  already  got  that  run  here. So  I'm  going  to  close  down the  dashboard  builder. I'm  going  to  show  you   the  dashboard  we  have  already  created to  capture  all  this  information. With  one  click  of  the  button here's,  my  dashboard. And  boy,  beautiful  looking  dashboard  here, just  the  way  I  want  to  see  it. Now,  the  thing  that  we  loved  about  this was  your  ability  as  well to  still  use  the  junk  dynamic  linkage. I  can  select  a  couple  of  high  points and  I  can  see   where  they  will  flow  in  the  other  graphs. I  can  even  see  down  here   where t hey're  highlighted  to  my  table. So  this  is  great,   but  what  about  that  X3  variable? We  knew  we  wanted  to  be  able  to  create separate  dashboards  for  each  of  those. So  instead  of  using  a  local  data  filter, I'm  going  to  use  a  global  data  filter. It's  actually  under  your  Rose  venue. It's  right  at  the  bottom. This  one  affects  all  graphs,  all  analysis. It  affects  what's  hidden   and  selected  back  to  your  data  table. On  this  one, I'll  just  go  ahead  and  put  X3 . Now  when  I  click  on  Show  and  Include, I'll  turn  off  the  select   so  I  can  make  my  own  selections. Now  I  can  toggle  between   that _0  and  that  _1 . Now  it  works  the  same  way. I  can  see  things  that  were  out  of  control  or  out  of  spec  here  for  just  version  3-0, then  I  can  do  the  same  thing  for  3-_1. There  we  go. We  have  a  nice  tool  that  can  be  really   used  to,  again,  not  get  data  just  quicker and  not  just  do  one  analysis, but  actually  make  this a  continuous  process  improvement  tool that  we  can  use  day  in  and  day  out to  quickly  get  the  view  we  want and  ask  the  questions we  need  to  drive  improvements. All  right. So  that  is  our  story  of  moving from  data  to  story,  I  should  say. We  wanted  to  leave  you   with  where  to  learn  more, where  to  get  more  information. Of  course, we're  going  to  give  you  the  presentation. We're  going  to  give  you  the  journal  we  use so  you  can  replicate these  views  we're  seeing. But  Allison  and  I  felt  that  if  you  were wanting  to  really  get  started  with  JMP, go  to  the  Getting  Started  with  JMP  webinars  that  we  have. So  it's  on  the  JMP  website, will  include  links  in  the  journal, and  it  covers  about  everything we  showed  you  today. We  had  a  few  more  tips  and  tricks, but  the  new  user  welcome  kit  is  another  really  good  thing  to  take. This  one   allows  you  to  work  with  a  data  set, it gives  you  a  data  set  that  you  can  follow  along, and  it's  really  nice  step- by- step  instructions. We're  both  big  fans  of  the  Statistical   Thinking  for  Industrial  Problem Solving. Free  online  learning, basically  E-learning  course, and  you  have  so  many  different  places  you  can  do. I've  used  this  to  do  just  in  time  learning, and  I've  had  a  lot  of  people t ake  all  the  sections  just  to  get  up  to  speed on  everything  JMP  can  do   to  help  you  compare  and  describe and  predict  all  those  fun  things  you  want  to  do. Don't  forget,  if  you  have   specific  things  you  want  to  do, we  do  have  Mastering  JMP  webinars  that  are  available  here. The  JMP community, communityjmp .com is a good place to look for  just  in  time  learning, and  as  well,  JMP  Education, if  you  want  to  get  more   of  the  underlying  theory on  how  a  lot  of  these  things  work. We  do  a  lot  of  public  training   or  can  customize  training  for  you  as  well Just  talk  to  JMP  Education. All  right. I  will  allow  Allison to  say  a  few  words  when  we  finish. But  thanks,  everybody,  for  joining  us, and  we  hope  you  picked  up  on  a  few  things you  would  like  to  try  within  JMP. Thanks,  Scott. Thanks,  everyone,  for  joining  us. It  was  really  a  pleasure  to  share this  case  study  from  Coherent  with  you and  to  share  all  the  new  cool  tricks that  Scott  has  taught  me and  that  we've  learned  through  our  journey   with  JMP  at  Coherent. So  thanks  again  and  take  care. Bye.
Managed, non-persistent desktop and application virtualization services are gaining popularity in organizations that wish to give employees more flexible hardware choices (so-called "BYOD" policies), while at the same time gaining the economies of scale for desktop software management, system upgrades, scale-up and scale-down and adjacency to cloud data sources. In this paper, we examine using JMP Pro in one such service: Amazon AppStream 2.0. We cover configuration, installation, user and session management and benchmark performance of JMP using various available instances.     This  is   JMP in  the  Cloud, configuring  and  running   JMP for  non- persistent application  virtualization  services. I'm  Dan  Valente,  the  Manager of  Product  Management  at  JMP, and  I'm  joined  by   Dieter Pisot, our  Cloud  and  Deployment  Engineer, also  on  the  product  management  team. Today,  we're  going  to  talk about  how  a  organization  called Red T riangle I ndustries, who's  been  a  JMP  user for  the  last  several  years, in  their  JMP  growth  has  been  growing in  their  R&D  departments, or  quality  departments, IT, how  they're  considering  adapting to  new  remote and  flexible  work  environments and  hopefully  solving  some  problems with  some  new  technology for  virtualizing  JMP in  a  non- persistent  way. We're  going  to  play  the  role of  two  members  of  this  organization, and  we're  going to  go  through  the  configuration, the  deployment, and  ultimately  the  use  of   JMP in  this  way. This  is  how  Red T riangle  has  been  growing, started  in  2015  with  a  core  set  of  users. Every  year,  we've  been  growing the  JMP  footprint  at  Red  Triangle to  solve  problems,  to  visualize  data, to  communicate  results up  and  down  the  organization. Most  recently,  we've  added  JMP P ro for  our  data  science  team to  look  at  some  of  our  larger  problems. We've  got  an  IoT  initiative, and  we're  doing  some  data  mining, machine  learning  on  those  bigger  data sets from  our  sensors and  things  that  are  going  on in  our  manufacturing  plant. In  the  last  year, we  also  added   JMP Live to  our  product  portfolio. You  can  see  a  screenshot of  JMP L ive  in  the  back. What  we're  trying  to  do  is  automate the  presentation  of  our   JMP discoveries and  also  our  regular  reporting in  one  central  database so  that  everyone  in  the  organization has  access  to  those  data  to  make  decisions for  things  like  manufacturing  quality, for  our  revenue, and  for  other  key  business  metrics that  we're  sharing, all  in  one  single  place  with  JMP L ive. So  how  is  JMP  being  used  at  Red T riangle? Well,  one  thing  that  we  did in  the  past  year, our  I T  organization, of  which  Dieter  and  I  belong  to, is  we've  surveyed  all  of  our  users, and  we've  put  together an  interactive  visualization looking  at  which  parts  of  JMP that  they  use  by  department. This  is  called  the  workflow  assessment. It's  something  that  we  can  send  out, we  get  some  information, and  it  gives  us  some  opportunities to  look  for  growth  opportunities, training  opportunities. Also,  this  is  how  we  found  out that  some  of  our  users  want to  have  JMP  presented  to  them in  different  ways. This  is  why  we're  considering the  application  virtualization. We've  adopted a  bring- your- own- device  policy, which  lets  our  employees  purchase their  own  laptop, and  we  want  to  be  able to  give  them  JMP  in  there. This  is  put  together, a   situation,  some  pains, some  implications,  and  some  needs that  we're  considering  using  JMP in  a  presentation  virtualization in  a  application virtualization  standpoint. The  situation  for  us after  this  workflow  assessment. We're  profitable. We're  a  growing  business in  the  manufacturing  space. We're  adding  more  JMP  users every  year  in  different  departments. As  I  mentioned, we  have  our  core  JMP  use  growing year  on  year. We've  also   added  JMP  Pro for  our  data  science  team. In  the  past  year, JMP  Live  for  enterprise reporting and  sharing  of  JMP  discoveries. I'm  playing  the  role  here of  the  CTO,  Rhys Jordan, and  I'm  joined  by   Dieter who's  playing  the  role  of  Tomas  Tanner, our  Director  of  IT. We've  been  charged  with  ways of  getting  JMP  more  efficiently to  remote  employees. We  want  to  be  able to  analyze  bigger  problems, and  also  want  to  support  employees that  want  to  take  advantage of  our  BYOD  policy, or  bring- your- own- device  policy in  2022  and  beyond. Historically, our  standard  laptop  deployments have  used  between  8 -16  gigs  of  RAM. In  some  cases,  especially with  our  larger  manufacturing  problems and  sensors  being  put on  many  of  our  manufacturing  equipment, we've  got  data sets that  we  want  to  analyze that  are  just  bigger than  that  standard  deployment  can  be. We  also  want  to  be  able to  support  our  employees and  their  flexible  work e nvironment, which  means  if  they  purchase their  own  personal  laptop, we  want  to  be  able  to  get  JMP  to  them and  other  software  installed  on  these without  physically  being on  site  with  the  employees. We  want  to  look  into  delivering that  software   in  alternative  means. Also,  when  new  versions  of  JMP  come  out, and other  desktop  software, we  want  to  be  able to  seamlessly  apply  those  updates to  our  entire  workforce, and  do  that  in  a  way  that  minimizes the  latency  between  the  release and  when  our  employees actually  get  that  update. Finally,   when and if our  employee  leaves  Red T riangle or  moves  to  another  organization or  another  part  of  the  organization that  doesn't  require  use  of  JMP or  another  software, we  want  to  be  able to  retain  those  corporate  assets with  minimal  operational  burden. The  implications  for  this is  we've  been  given  a  mandate, like  many  other  organizations, to  reduce  our  corporate  technology  spend. We  feel   the  biggest  potential for  reducing  that  technology  spend is  through  automation. Looking  at  these  non- persistent application  virtualization  tools should  speed  up  this  entire  workflow of  getting  software to  our  end  users  efficiently. We  want  to  lower  the  total  cost of  resource  and  computer  ownership. This  is  why  we've  adopted  the  BYOD  policy. But  we  need  to  also  right- size  the  assets to  the  needs  of  the  users, even  these  virtual  assets. Our  power  users  that  are  analyzing our  biggest  datasets  will  need  more  RAM, more  speed  available  to  them. The  casual  users  will  be  able to  just  right- size  that  to  their  needs. With  employees on  three  different  time  zones, doing  something  like  just  having a  fleet  of  virtual  machines  for  everybody at  the  same  time doesn't  make  a  whole  lot  of  sense. Because  we  want  to  work  the  global  clock, we  can  design  a  fleet  of  virtual  assets that's  going  to  look  at  just the  total  number  of  concurrent  users that  are  accessing  the  asset  at  once. That's  what  we'll  get  to in  the  demo  here. Finally,  that  better  rollout of  software  updates and  the  transparency  of  usage to  our  executive  team, who's  using  the  software, how  much  are  they  using  it,  et cetera, are our  implications  for  us investigating  this  technology. As  far  as  needs, we  want  to  go  with  a  cloud  provider. We're  not  going to  build  this  tool  inhouse. We  want  to  use  one  of  the  cloud  providers and  the  out- of- the- box  capabilities that  they  have for  application  virtualization. Since  we've  moved a  lot  of  our  data  sources  to  the  cloud, to  Amazon  Web S ervices,  for  example, we'd  like  to  be  able to  put  our  analysis  or  analytic  tools close  to  those  data  sources to  minimize  the  costs of  moving  data  around. Our IT  department  wants  to  centralize the  management  of  JMP  setup and  license i nformation, and  also  have that  seamless  version  control. As  soon  as  a  new  version  is  released, we  want  to  be  able  to  push  those  updates as  efficiently  as  possible, and  then  look  at  usage  tracking through  things  like  cloud  metrics and  access  controls. With  this,  I'm  going to  hand  it  over  to  Dieter to  give  a  demo  of  running  JMP in  an  application  virtualization, a  non- persistent application  virtualization  tool like  Amazon   AppStream. Dieter. Thanks  Dan. The  first thing  we  have  to  do is  we  have  to  go  to  the  image  builder and  launch  that. We  have  to  pick a  Windows  operating  system. Windows  Server  '19 is   what  we  want  to  pick  here. There's  several  available. We  just  pick  a  generic  basic  one like  this  one,  move  on, and  give  it  a  meaningful  name and  display  name. Because  we are Red  Triangle, we  use  Red  Triangle  for  this  one. We  have  to  pick  a  size for  the  image  that  we  want  to  configure. We  pick  a  standard  one,  medium. I'm  going  to  add  an I AM  role because  I  want  to  connect  to   S3, where  I  keep  all  my  installer. Just  to  make  sure  I  can  connect  there, I  add  a  role  with  access  to   S3. Then,  I  have  to  define in  which  private  network I  want  to  run  my  image  builder  here. Pick  a  public  subnet so  that  I  can  actually  connect  to  it from  my  local  desktop. Security  group, just  to  make  sure  only  I  can  do  that, and  not  everybody  else  can  connect. We're  not  worrying about  the  active  directory  set up, but  we  want  to  make  sure we  have  internet  access so  we  can  download  things from  the   internet, like for  example,  a  browser. We  check  all  the  details. They're fine, s o  we  launch  our  image builder. This  is  going  to  take  a  while. AWS   AppStream  basically set up a  virtual  machine  for  us that  we  can  connect  to and  set  up  our  application. After  that  has  started, we  connect  to  the  machine as  an  administrator. To  save  some  time, I  downloaded the  JMP  Pro  installer  already, and  did  install  JMP  Pro, just  like  you  would on  any  other  Windows  desktop  machine. We  have  the  application   icon  here. In  addition  to  that, I  have  created  a  JSL  script in  the  Program  Data- SAS- JMP  directory, jmpStartAdmin,  that  has  a  few  settings so  that  we  make  it  easier for  our  users  to  do  certain  things. What  they  contain is  a  connection  to  a  database and  the  JMP L ive  connection to  our  Red Triangle JMP Live  site, so  the  users  don't  have  to  remember and  type  that  in. So   that's  perfectly  fine  here. Then,  we  have  to  go  to  the  image  assistant and  configure  our  image. The  first thing  we  add, an  application  to  our  image. That's  going  to  be the   JMP Pro  that  we  just  installed. We  are  going  to  pick  the  executable. We  use  the  JMP  executable. We  give  it  just  some  properties, give  it  a  more  meaningful  display  name. Save  that. What  we  can  do  now… Here's  our  application  that  we  want to  make  available  to  the  user. The  next  thing  we  can  do  is  test and  set  it  up  as  the  user  would  see  it. We  are  going  to  switch  users  here. We  have  the  ability  to  switch to  a  template  or  test  user. The  template  user,  that's  defining how  the  user  would  actually run  the  applications. Whatever  we  do  here is  going  to  be  remembered, and  the  user  will  have the  same  experience  as  our  template  user. We  can  do  a  few  things  in  the  setup. We  can  here also  make  sure that  our  database  connection  is  working. We  could  do  this  as  the  test  user  as  well, but  I'll  just  do  it  here as  our  template  user. Here  we  are,  applications  perfectly connected  to  our  database. With  that,  we're  fine  with  our  setup. We  go  back  to  the  image  assistant. Not  going  to  use  the  test  user, switch  users  again. We're  not  going  to  show  the  test  user. I'm  going  to  go  back  to  the  administrator and  continue  with  the  setup  of  our  image. We  switch  here. Same,   not  going  to  the  test  user. Now,  what  we  have  to  do is  we  have  to  optimize. You  have  to  configure  the  application. We're  going  to  launch  JMP. Once  it's  running, and  we're  happy  with  all  of  this, we  continue  the  setup  of  the  image by  clicking  the  Continue  button. What   AWS AppStream  is  doing  now is  optimizing the  application  for  the  user. We  just  wait  for  that  to  finish, and  then  give  our  image  a  name, and  a  display  name  as  well. Again,  we're  using  Red Triangle  here. We  also  want  to  make  sure we  use  the  latest  agent so  that  we  always  have an  up-to-date  image. Next,  review. We  disconnect  and  create  the  image  now. With  that,  we  are  going to  get  disconnected from  the  image  builder. Lost  the  connection,  obviously, and  our  session  expired. We  return  to  the  AppStream  2.0  console and  we  see that  our  image  has  been  created. It's pending  right  now. This  also  takes  time  to  create it the  way  we  want  it  to  be. We  have  to  wait  for  that  to  finish. We're  done. It  has  finished. The  next  step  is  to  create the  fleet  the  images  are  going  to  run  on. We  create  the  fleet, we  pick  which  type  of  fleet. We're going  to  go  with   on-demand  fleet because  that's  much  cheaper  for us. The  images  only  run when  the  user  actually  requests  the  image, whereas  the  always- on will  be  running  constantly. Here,  we  give  it  a  name  and  a  description. We  then  pick  the  type  of  image we  want  to  give  to  our  users. A bunch  of  other  settings are  available  to  us, like  timeout  settings  and  capacity. For  now,   we're just  going to  go  with  the  default. We  can  adjust  to  the  needs of  our  users  if  necessary  at  any  time. We  click  Next. We  pick  the  image that  we  just  created   to run on  our  fleet. We  define   the  virtual  network, and  the  subnets that  our  fleet  should  run  in. We'll just  pick  the  same we  used  before. Also  a  security  group,  of  course, to  make  sure  that  only  the  users  and  host that  we  want  can  access  our  fleet. Again,  we  want  to  give the  fleet  internet  access. We're  going  to  check  that to  make  sure  users  can  publish to  our  JMP Live  site. We  could  integrate active  directory  authentication  here, but  we  don't  want  to  do  that. That would  take  us  some  time. We're  just  going to  go  with  some  local  users that  I  have  already  created. We  click  Next. We  are  presented  with  a  review of  what  we  did, and  it's all  fine. We  are  creating  the  fleet. There's some  pricing  information we  have  to  acknowledge. With  that,  the  fleet  is  starting  up. Once  that  has  happened, we  can  move  on  and  create  the  stack. The stack helps  us  run  that  fleet and  helps  us  define  persistent  storage for  our  fleet,  for  example. Here,  we  create  the  stack. As  well,  give  it  a  meaningful  name. Since  this  is  our  Red Triangle  site, we'll  go  with  a  very  similar naming  convention  here. We  pick  the  fleet that  we  want  to  run  in  our  stack. All  looks  good. We  move  on. Here,  we  define  the  storage. We're  going  to  go  with  the  default, which  is  an   S3  bucket that's  available  to  each  of  the  users. We  could  hook  up  others, but   S3  for  us  is  fine  at  the  moment. Just  a  couple  of  things on  how  we  want  to  run  our  stack. All  of  them  seem  fine. We  go  with  the  defaults  here. Quick  review. Everything's  fine. We  create  our  stack. That's  it. The  stack  is  there. Stack  has  been  created. What  we  now  need  to  do is  go  to  that  user  pool that  I'd  mentioned  earlier since we're  not  using   active directory. In  here,  I  have  defined  three  users that  can  access  our  stacks. But  what  we  need  to  do  is  we  need to  assign  that  stack  to  each  of  the  users. In  my  case,  I'm  going  to  pick  me and  assign  that  stack that  we  just  created. We  could  send  an  email  to  the  user to  make  sure  they  are  aware of  what  just  happened, and  the  stack  has  been  assigned  to  them. That's  all  we  have  to  do  to  set  this  up. If  I  now  go  to  the  link that  was  emailed  to  me, I  can  log  into  that AppS tream  session. I  use  my  credentials that  my  admin  has  defined  for  me. Here are  my  stacks. I  use  the  Red  Triangle. Here's  the  application that  that  stack  provides  for  me. This  is  going  to  take  a  while. As  I  said,  it's  on -demand. It's  like  booting a  PC and  running  JMP  on  that  machine. It's  going  to  take  a  few  minutes. The  always- running  would  be  much  faster, but  again,  they  would  cost  money because  they're  running  constantly, versus  the  on- demand  runs  only  on- demand. Here in  my  browser,  JMP  is  started, and  JMP  is  running just  perfectly  fine  in  my  browser. Let's  do  some  work. I'm  going  to  connect  to  a  database. Because  my  administrator  has set this  up  already  for  me, there's  nothing  much  for  me  to  do. The  connection  to  my  database is  already  there. My  table's  available  to  me. I'm  going  to  pick one  of  the  tables  in  my  database. This  is  a  Postgres  database. I'm  going  to  import  it  right  away. Here's  my  table. I've  written  a  nice  script to  build  a  wonderful  report. I'm  going  to  just  quickly create  a  new  script. I'm  going  to  cut  and  paste  that from  my  local  machine to  my  AppStream  image by  using  the  menu  that's  available  to  me. Now,  I  can  cut  and  paste  it into  my  scripting  editor. I  run  that. Here's  my  report. That  report  I'm  going  to  publish to  our  Red Triangle   JMP Live site  now. I'm  going  to do  File , Publish. Because  again, my  administrator  has  set  this  up  for  me, the  information  about  my  Red Triangle  site is  available  to  me. I'm  just  going  to  get  prompted  to  sign  in to  make  sure  it's  really  me. In  this  case, I'm  going  to  use  our  imaginary  identity. Use  the  username  and  password, sign  in,  go  through  the  publish  dialogue, and  don't  change  anything, just  hit  Publish. Report  has  been  published. Now,  what  I  can  do, I  can  go  to  another  tab  on  my  browser and  verify  that  the  report is  actually  published to  our  Red T riangle  JMP  Live  site. I  switch  over,  go  to  All P osts. Here's  the  report that  Tomas  just  posted  a  minute  ago . It  looks  exactly  as  it  did in my  virtual  machine. Thank  you  very  much.
Building an analytic workflow for any manufacturing process can be be daunting. This presentation will demonstrate the ease of building an analytic workflow from preparing the data to analyzing the final product. The workflow demonstration will show steps for data visualization and multivariate analyses including clustering, predictive modeling and optimization of the process. Additionally, a chemometric modeling approach to quantify the active ingredient in a finished product will be included.     Hello,  everyone. My  name  is  Bill  Worley, and  I  am  a systems  engineer   for  JMP  for  the  US  Chem  team. Today  I'm  going  to  be  talking  to  you  about An  Analytic  Workflow  for  Data and  Chemometric  Analysis  with JMP . I have got  a  few  things I  want  to  highlight. We're  going  to  be  talking  about getting  the  data  in... Actually  just  following  the  analytic workflow  that  we  share about  getting  data  in,   cleaning  and  blending, visualization,  exploratory  data  analysis, building  models. And  then  ultimately, what  are  you  going  to  do  with  that  data and  how  are  you  going  to  share  it? Couple  of  things that  are  important are  the  new  JMP  Workflow  Builder, I'm  going  to  highlight  that. This  is  just  a  snapshot  of  what  I'll be  showing  you  in  a  little  bit. And  the  chemometric  part  of  this  is analyzing  spectral  data, using  functional Data  Explorer  for  pre- processing. And  these  are  now  built  into  JMP. If  you can  see  over  here, we've  got  a  tab  up  here  in  FTE  Now where  you  can  choose from  different  types  of  pre- processing, standard,  normal, variate , multiplicative  scattering  correction, Savitzy-Golay  filtering, and  baseline  correction. And  I  believe  there  will  be  maybe  one or  two  other  things  added  to  that. Just  so  you  know,  the  data  that  I'm  using is  pulled  from  this  paper. You see it right  down  here, back from  2002. Just  to  let  you  know, that's  where  the  date  is  coming  from. All  right,   I'm  going  to  put  that  aside for  now, and  I'm  going  to  go ahead  and  get  things  going  here. So  I've  got  my  home  window. I'm  going  to  go  and  start  a File,  New,  Project. And  the  workflow  that  I'm  going  to  be working  with  is  this  one  right  here. I'm  going  to  right- click  on  that. I'm  going  to  open  it. What  I've  done  is  I've  taken  the  data  set that  I  want  to  work  with and  I've  built  all  these  steps   in  the  Workflow  Builder, and  I'm  going  to  play  that  for  you  now. So  it's  going  to  populate our  project  here. So  I'm  going  to  go  ahead  and  hit  play. As  you  can  see,  that's  building the  workflow,  doing  some  analysis. We're  doing  some  model screening  right  now. And  then  everything  is  complete. And  now  we  have  all  these  tabs   across  the  top where we've  completed that  analysis  with  the  workflow. I've  actually  included one  other  table  in  there. When  we  get  there, I'll  talk  more  about  that. But  we  built that  table, we  pulled  the  table  in just  to  show  you they're  from the  source  data. We've  actually  pulled  this data  in  from  an  Excel  file. Getting  the  data  into  JMP from  Excel  is  fairly  easy. And  we  built  some exploratory  data  analysis. So  the  first  step  we  made  is doing  a  distribution, and  we  can  interact with  it  just  like  anything  else. Everything's  interactive from  the  Workflow  Builder. Did  some  graphing   where  I  put  the  column  switcher  in, added  the  local  data  filter. Just  so  you  know, as  we  build  the  workflow, all  these  things  are  built-in and  the  recording  helps  you keep  track  of  what's  going  on. And  you  can  see  that  we've  got full  functionality  going  on  there. Did  some  more  exploratory  data  analysis looking  at  Fit Live IX , doing  an   [inaudible 00:03:46] like in  this  case, Fit Live IX for mill time  versus  dissolution, and  blend  time  versus  dissolution. Just  to  get  back  to  it, this  is  tablet  data  that's  pretty  popular within  the  SCE  community  within  JMP. And  I'm  just  building  on  how  we  would analyze  that  and  build  out  this  workflow. Next  step  would  be  multi- variate   analysis  to  see, for  this  dissolution, which  is  our  key  performance  indicator, what  might  be  any  of  the  factors or  what  factors  might  be highly  correlated  with  dissolution. Not  seeing  anything  that's jumping  out  too  much. We  do  have  some  partial  correlation, but  no  one  factor  is  jumping  out  as the  answer  for  the  data that  we're  looking  at. We  can  get  a  better  understanding  of  what factors  might  be  important if  we  look  at  a  predictor  screening. We  can see  here  that  we  have  things called  screen  size,  mill  time,  spray  rate. Those  look  to  be  important  factors  that  we  could  use  to  build  a  better  model. Next,  we  can  actually  set up  a  stepwise  regression. I'm  going  to   go  here and  actually  run  this  model  in  a  second. And  then  we've  got  that  output   so  that  data  is  there and  we  could  use  that  as  needed. So we  built  that  model, and  we're  out  here looking  at  another  type  of  analysis, which  would  be  decision  tree,  a partition  analysis. We  can  do  neural  net,  build  that  model and  we  can  do  a   [inaudible 00:05:35]   squares. So  we've  got all  those  things  together. But  once  all  said  and  done, you  could  actually  use  something  called Model  Screening  in  JMP  Pro to  build  these  models  out  and  find out  which  is  the  best  overall  model. And  based  on  that,  we  can  see  that a  neural- boosted  model  is  probably  the best  overall  model  for  us  to  work  with. We  can  then  take  all  this  information, share  it  with  all  our  colleagues, co-workers, anybody  who  might  be  interested. And  we  can  do  this in  several  different  ways. One  of  the  best  ways  would  be to  use  JMP L ive and  put  everything  out  there for  folks  to  look  at  and  share. That's  the  first  part  of  the analytic  workflow. And  again,  if  we  look  back  here, that's  all  set  up  in  this  portfolio  here. And  as  I  said  before, I  had  opened  another  table. And  this  is  for  the  chemometric part of  the  analysis. This  is  near- infrared  data  for  finding the  active  ingredient  in  tablets. We  built  the  tablets,  we  made  the  tablets. Now  we  have  to  take  the  finished  product and  find  out  what's  it  all  about. Do  we  have  the  right  active  ingredient? And  can  we  tell  based  on  this  technique called  near- infrared  analysis? We're  going  to  step  through  a  few different  things, but  I'm  going  to  turn  that   Workflow  Builder  back  on to  record  these  steps. So  let's  turn  that  on, let's  go  back  to  our  data  set, and  let's  do  some  analysis  now. So  I  want  to  clear  out  these raw  states  first. And  now  I  want  to  go  to  Analyze Clustering,  Hierarchical  Clustering. So  I  got  that,  and  I've got  all  my  data  groups. So  there's  404  wavelengths that  are  grouped. I'm  going  to  pull  those  in,  say,  okay. Let's  build  this  out  a  little  bit, we're  going  to  look  at  three  different clusters, and  let's  color  those  clusters. All  right,  so  you  can  kind  of  see  that, let  me  pull  this  down  a  little  bit, you can see we've  got  three  clusters, fairly  big  green  cluster and  two  smaller  blue  and  red. So  we've  got  that. And  now  let's  go  back  to  our  data  set, and  let's  do  a  Graph  Builder. So  let's  go  to  Graph  Builder. Let's  pull  our  wavelengths  in, here  to  X, do  a  parallel  plot, clean  that  up  a  little  bit, and  right- click  there to  combine  scales  and  parallel  merged. I'm  doing  these  steps  pretty  fast. This  is  something  you'll  want  to  go back  and  watch  again,  if  it's  of  interest. But  the  thing  I  want  to  show  you  here  is that  the  data  is  pretty  scattered, and  there's  a  lot  of  baseline  separation, maybe  some  additive  and  multiplicative scattering  that  we  need  to  clean  up. So  let's  go  back  to  our  data  table and  go  to  another  analysis  step. Let's  go  to  a  multivariate  method. Let's  go  to  principal  components. Again,  we'll  pull  all  our  wavelengths  in. Say okay. And  the  thing  I  want  you  to  note   here  is  that we  have  some  404  wavelengths are  all  grouped  right  around  this little  area  right  here. That  is  highly  correlated  data. We  could  build  a  model  off  of  that, but  it  may  not  be  the  best. Because  we're  going  to  be  including wavelengths  that  are  not  of  importance because  of  the  high  correlation. So  we'll  clean  that  up  in  a  little  bit. I'll  show  you  how  to  clean that  up  in  a  little  bit. And  as  a  matter  of  fact, let's  go  to  that  step  right  now. Let's  go  back  over  here  and  go  to  Analyze, Specialized  Modeling. And  we're  going  to  go  to   Functional D ata  Explorer. And  let's  get  this  set  up  first, and  I'll  tell  you  more  about  it. Let's put our  wavelength  there, we  have  our  active  ingredient,   which  is  a  supplemental  variable. And  then  our   ID  function. There  we  go. Say okay. Raws as  functions. Let  me  do  that  again. Active  ingredient  and  our  wavelengths. This  kind  of  looks  like what  we  saw  before  in  Graph  Builder, and we  want  to  clean  this  up. So  we've  got  these  new  tabs  in  J MP  Pro  17 for  Functional  Data  Explorer. Spectral  is  one  of  the  tabs. And  then,  as  we  talked  before, we  have  standard  normal  variate multiplicative  scattering, Savitzy-Golay  and  baseline  correction. I'm  going  to  select the  standard  normal variate first to  cleaning  that  up. And  then  you  can  see  the  baseline is  a  little  wobbly  here. Let's  clean  that  up, take  that  next  step, and  then  go  ahead  and  say  okay. And  now  we've  got  that  set  up. It  looks  a  lot  better,  a  lot  cleaner. And  now  the  next  step  would  be to  model  this. We're  going  to  use  another  new  function in  Functional Data Explorer called  wavelets. It's  wavelet  modeling  here. And  you  can  see  down  here that  our  model  has  been  built, and  we're  explaining  a  lot of  the  variation  with  about five  functional  principle  components. But  if  you  look  at  these,  we're  explaining the  shape  with  our  shape  functions. That's  where  our  eigenvalues  come  into. This  is  really  just  a  nice  way   to  look  at  the  data and  make  sure  that  our  spectra is  being well  modeled. As  I  said,  we've  got five  shape  functions  that  are explaining  things  really  well. So  let's  clean  this  up  a  little  bit and  pull  this  back, make  our  model  a  little  simpler. We  won't  go  all  the  way  back  to  five, but  we'll  leave  it  at  10  for  now. You  can  look  at  the  score  plots. There's  still  some  scattering  here in  the  data, but  we'll  clean  that  up  in  a  second. And  then  one  of  the  other  steps  you  want to  take  care  is  to  do  a  wavelet  model. This  is  new  in  JMP  17, this  wavelet  analysis. And  what  this  is  really  all  about  is looking  for can  we  find  the  important  wavelengths that  are  going  to  give  us  a  telltale  sign of  what's  going  on  with  the  data. And  what  I'm  looking  for, especially  with  the  spectra, is  something  where  I  can  see  a  shift   in  the  baseline. And  I  can  see  that  we've  got  a  good  shift in  the  baseline and  a  grouping  of  spectral  wavelengths around  88 20  to  maybe  88 50. So  that's  the  important  part  here. So  we  get  an  idea  of  what the  important  wavelengths  are. All  the  data  that  I  had  done  up here  before, let  me  pull  this  back, the  pre- processing  that  I  had  done  before, I  want  to  save  that  data  out and  do  some  analysis  on  it. So  I'm  going  to  go  here  to  the   Functional  Data  Explorer, select  Save  Data. This  is  going  to  be  a  new  data  set, and  now  we've  got  to  do  some  work with  this  data  and  clean  it  up and  make  sure  we're  ready  to  go. I  want  to  do  a  transpose. Transpose  Y. X  is  our  label. And  these  two  drop  into   [inaudible 00:13:38] . See  if  we  got  this  right, let's hit okay. And  yes. So  we've  cleaned  that  table  up, we've  taken  those  300  spectra, and  then  we've  transposed them  into  another  data  table. This  is  all  the  pre- processed  data, so  we're  going  to  do  a  few  more things  to  this to  show  where  that  pre- processing  has  really  cleaned  up  the  data and  where  we  can  build   some  models  with  it. So  let  me  get  rid  of  this  column. All  right,  and  ready  to  go. So  we're  going  to  do  the  same thing  that  we  did  before. Let's  go  to  Analyze, Hierarchical  Clustering. Actually,  let  me  take  a  step  back  here real  quick. Here, I want to group  these  columns   to  make  things  a  lot  easier. So  group  those  columns   and  let's  go  back  to  where  we  were. Let's  go  to  Analyze,  Hireachrial  Clustering, pull  our  columns  in,  say,  okay. We'll  do  the  same  thing  we  did  before, we'll  look  at  three  different  clusters, and  color  those  clusters. This  will  be  a  quick  comparison, but  if  you  look  at  what  we  did  before to  what  we've  got  now, we've  got  a  lot  tighter  clusters. And  these  actually  are   pretty well  dispersed. They're  pretty  even. Those  clusters  are  fairly  even  right  now. Let's  go  back  to  our  data  table. Let's  go  back  to  our  Graph  Builder and  pull  our  wavelengths in  again,  we  did  before. We're  going  to  make  a  parallel plot  out  of  this  again. It  doesn't  look  great  right  now, but  let's  right- click  here, and go  to  combined  scales, parallel  merged. And  now  you  see  that  the  data is  really  cleaned  up where  we  did  that  pre- processing has  taken  things  to... They  look  a  lot  better. Let's  see  if  we  can  compare  that here. What  we  had  before, and  what  we  have  now. So  we've  got  that  data  much  cleaner. Any  analysis  that  we  do  from  here should  be  much  better. So  let's  go  back  to  Analyze, Multivariate M ethods,   Principle  Components. Pull our  wavelengths  in. Say okay. And  now  we've  taken  that  data and  we've  broken  that   correlation  structure  that  we  had  before. This  is  currently  after  pre- processing, this  is  what  we  had  before. Just  to  show  you  the  difference. So  we've  really  clean  things  up. Now  we'd  want  to  take  maybe one  more  step  in  the  analysis. Analyze, let's go to  Quality  and  Process Model- driven  Multivariate  Control  charts. We're  just  looking  for  maybe some  unusual  behavior  in  here. In  this  case,  it's  based on  the  principal  components. Say okay. And  this  is  looking  at   two  principal  components. You can  see  that  there's some  potential  outliers  here. But  this  is  spectral  data, we're  not  going  to  get  rid  of  anything. We  just  want  to  kind  of  view  that. And  one  other  thing  we  want  to  look  at is  to  go  to  Monitor  the  Process, we're  going  to  look  at  score  plots. And  now  we  can  look  at   our  subgrouping  down  here and  we  can  actually compare  these  groups. I'm  going  to  pull  up a  tool  here,  a lasso  tool. I'm  going  to  do  my  best  to  group  these, a  couple  of  these. That's  going  to  be  my  group A. A nd  I'm  gonna  do  another  lasso  here. We'll  just  leave  that  as  is and  go  there. We  grabbed  one  of  the  wrong  ones, but  I think we'll  be  okay. And  now  we  can  compare  where  we're  seeing  differences  in  the  spectra for  these  two  subgroups. And  as I  was  saying  before  if  we  looked  right  in  here, those  wavelengths  are  in somewhere  in  that  8800  range, and  then  we  can  see  that  there's a  real  difference  there. O ne  more  thing  we  want  to  do, and  this  is  the  last  step. What  I'd  shown  before, I'd  done  model  screening, and  I  want  to  do  model  screening  again. I  go  to  Analyze, Predictive  Modeling,  Model  Screening. We're  going  to  set  this  up, we're  going  to  do  our  active. This is our  response   that  we're  trying  to  model, and  we're  going  to  use  our wavelengths  to  build  this  model  out. I'm  going  to  clean  this  up  a  little  bit. We  don't  need  all  these different  modeling  types. We're  going  to  pull  this  out. But  the  nice  thing  about  this  is  I can  build  all  these  models  at  once and  really  find  out  what's  the  best modeling  approach  to  take  with  this  data. I don't need that. I don't need that. Let's  add  those. One  thing  I'm  not  going to  do for  time  sake is  I'm  not  going to  add any  cross- validation. If  we  take  that  into  account, it'll  actually  run  a  lot  longer. But  as  you'll  see,  this  is going  to  be  fairly  quick. I'm  going  to  go  ahead  and  say  okay. And  as  this  is  going,  just  talk  a  little bit  more  about  what  we're  seeing. We're  building  out  these  models. You  can  see  it's  stepping  through. And  let's  see  in  about  another   few  seconds  here it  should  be  done. There  we  go. Taking  a  little  longer. There  we  go. Based  on  what  I  said, I  didn't  use  any  validation, but  neural  requires  it. So  that's  the  validations that  you  see  there. But  overall,  we  get  a  really  good  idea XG  boost  is  going  to  be the  best  model  to  fit  this  data. We  could  use  any  of  these  others, because  they're  all  really good  models  as  well. But  you  get  to  choose,  select  one, let's  say  partially  squares. Because  that's  the  go- to  analysis method  for  spectral  data  anyway. But  we've  got  that. We  can  say  run,  selected, and  fill  out  that  model  and  find  out, can  we  make  it  even  better? Hopefully  what  I've  done  and  showed you  is  that  we  can  build  these... Let  me  pull  this  back to  our  beginning  here. Just a few steps. This  is  our  workflow, and  I've  added  those  steps. So  we've  got  that  table that  we're  working  with. We  cleared  the  raw states . We  transpose  the  data. Anything  that  we  closed  out is  now  part  of  that  workflow. So  we  continue  to  build  that  workflow. One  thing  I'll  say  is  that I  would  typically  not  build  a  workflow inside  a  project, but  just  showing you  that  it  can  be  done. Let  me  go  back  to  my  slide  here, and share. Let's flip this. One  more  step  here. I just  want  to  say  thank you  to  a  few  people. Jeremy  Ash,  who's  no  longer  at  JMP, but  he's  a  great  inspiration  for  this. Mark  Bailey  has  been  a  great  help. Ryan  Parker  and  Clay  Barker have  done  really  fantastic  things with  genreg  and  Functional  Data  Explorer. Chris  Gotwalt  has  been  really  helpful in  getting  things  set  up. And  then  Mia  Stevens  has  been  a  real supportive  person in  helping  me  build  the  spectral analysis  out  within  JMP. So  I  really  appreciate  everything and  that  I'll  say  thank  you. That's  it.
After JMP Live 16 was released, the JMP Live team and JMP product managers sorted through feedback from JMP Live customers and prospects. We then set out to address as many of these requests and concerns as humanly possible for JMP Live 17. The result is a virtually complete overhaul, designed to enhance collaboration and automate data updates. Whether your company has adopted JMP Live already, or you are still thinking about it, this talk is a must-see to understand what's coming.    JMP Live 17 adds the concept of spaces, which provide a much more flexible way to create separate areas where different groups can collaborate and define who can create content and who can only view it. Another exciting aspect is the ability to update data in JMP Live directly from a database, without the need to rely on external tools like Task Scheduler.     Come see what's new!     -Well,  thanks  everybody  for  coming. Today,  we're  going  to  talk about  JMP  Live  17 and  how  it  allows  you to  collaborate  better  with  JMP. My  name  is  Eric  Hill, I  am  a  developer on  the  JMP  Live  team, and  my  co- presenter  Chris  Humphrey, is  also  a  developer  on  the  JMP  Live  team. Thought  I'd  start  by  reminding  everyone what  JMP  Live  is, and  then  we  can  talk  a  little  bit  more about  what's  new  and  JMP  Live  17. We  introduced  JMP  Live back  in  the  JMP  15  cycle about  three  years  ago. And  what  JMP  Live  is  a  web  application that  you  access  through  your  browser. It  is  private  to  your  company, so  nobody  can  see  the  content of  your  JMP L ive  instance  other  than  you and  your  company. It  can  be  installed  on  premises at  your  company, or  you  have  the  option  of  having JMP  host your  JMP  Live  instance  on  AWS, either  way  it's  still  private to  your  organization. The  main  purpose, at  least  when  we  started JMP  Live was  to  allow  people  who  use  JMP experts  who  create  analyses, to  be  able  to  publish  them  to  a  place where  people  who  don't  have  JMP could  see  them  and  interact  with  them. There's  a  lot  of  interest  that  people  have in  sharing  their  JMP  discoveries with  people  who  don't  have  JMP, and you  can  already  do  that with  screenshots  and  powerpoints and  various  things. But  those  things  lack an  important  characteristic, and  that  is  the  interactivity that  JMP  is  known  for. With  JMP  Live, you  can  share  your  discoveries in  a  way  that  allows  people to  interact  with  them in  many  of  the  same  ways  that  they  can that  you  can  when  you're  in JMP . It  also  allows  people  who  do  not  have JMP , that's  the  main  thing. The  main  thing  it  allows  people  to  do is  people  who  don't  have  JMP to  see  and  interact  with  your  content. In  17,  we've  added  some  features to  facilitate  collaboration between  people  who  do  have  JMP, so  even  if  you and  your  colleague  both  have  JMP there's  value  in  publishing  both  data and  analyses  to  JMP  Live so  that  you  can  collaborate and  make  each  other's  analyses  better. Another  thing  that  we  have  added  in  17 is  the  ability  to  publish  analyses that  will  automatically  update when  new  data  becomes  available. And  that  was  a  big  customer  request and  prospective  customer  request that  we've  had  in  the  couple  years that  JMP L ive  has  been  out, so  in  17,  we  are  delivering  that  feature. Things  that  are  new  in  JMP  Live  17. Well,  we  have  this  concept  of  spaces that  you'll  see  a  lot during  our  demo  today. A  space  is  an  area  of  JMP  Live that  you  can  restrict to  a  certain  group  of  people. A certain  group  of  people may  have  the  right to  publish  content  there or  view  content  there, or  edit  content, improve  or  script  the  data, updates,  those  things. So  that's  what  a  space  is. We  have  greatly  streamlined the  publishing  process, and  you'll  see  that  in  a  couple  places in  our  demo  today. We  now  support unlimited  file  folder  hierarchies. So  in  previous  versions  of  JMP Live , you  had  the  root  level, and  you  could  create  one  level of  folders  on  top  of  that, but  beyond  that,  you  couldn't  continue to  create  folders  at  lower  levels. Well,  now  in  17,  you  have  a  complete  hierarchy  of  folders, so  you  can  organize  your  content, however,  makes  sense. There's  also  a  new  feature  in  JMP L ive  17 called  Open  in  JMP. So  you  can  be  using JMP  Live and  you  can  look  at  an  analysis that  you  want  to maybe  take  further  in  jump. You  want  to  see  if  you  could  add  something to  that  analysis or  improve  it  in  some  way. Well,  now  that  you  can  open  it directly  into  JMP and  if  you  have  JMP on  your  machine  of  course, and then  you  can  start  working  on  it, improving  it,  experimenting  with  it, to  see  if  you  can  add  to  it. So  we'll  again,  seek  some  examples of  that  in  our  demo. And  then  the  one  I  alluded  to in  the  last  slide  schedule  data  refresh, the  ability  to  create  a  script hat  knows  how  to  refresh  your  data, and  then  to  run  that on  a  schedule  every  5 m inutes, every  day,  every  week, however  you  want  to  do  that. Now,  the  premise  of  today's  demo is  that  Chris  and  I  work  together at  a  manufacturing  company, that  manufactures  widgets. And  we  are  responsible for  some  of  the  products, five  of  the  products  for  that  company, five  of  the  widget  products, and  each  of  those  products has  a  certain  yield that  gets  reported  every  day. And  we  need  to  find  ways to  present  that  are  helpful  to  both  us and  our  colleagues  at  the  organization. So  we're  going  to  collaborate, work  together to  come  up  with  some  analyses that  we  think  are  beneficial in  that  regard. So  Chris,  have  you  gotten  started at  all  with  the  analyses of  the  yields  of  our  widgets? -I  have, I  will  share  my  screen,  sorry. Eric  and  I  work  on  a  few  different  parts, and  we  have  a  couple  data  tables that  we  use  to  track  the  yields for  those  different  parts. The  first one  is  a  simple  data  table with  all  five  parts that  we're  responsible  for, with  their  yield  values  over  time. So  this  is  something that  I  need  to  share with  the  rest  of  the  group. So  I'm  going  to  create  a  simple  report that  shows  the  yield  over  time. So  I'll  use  graph  builder  for  this. I'll  drag  the  date  to  the  X  column, one  part  to  the  Y  column, and  now  I  have  a  report that  shows  my  yield  values  over  time for  this  one  part. I  could  create  four  more, one  for  each  of  the  remaining  parts, but  I  think  it'll  be  a  lot  easier  to  use if  I  add  a  column  switcher  to  this  report, switch  out  the  part  that  I  added  earlier with  the  five  parts that  are  in  the  data  table, and  now  I  can  see  every  part in  one  report. I  can  switch  between  it and  see  the  yield  values for  those  different  reports or  those  different  parts all  in  one  report. So  I  think  that's  good. The  second  data  table  that  we  use is  a  fit  model  for  a  photo  process that  we  run. This  model  has  four  factors, three  responses, and  there's  a  script already  in  the  data  table to  run  a  fit  lease  squares that  gives  me  a  profiler. This  allows  me  to  see  the  interaction between  the  different  values. I  can  change  the  values  to  see  how  it impacts  other  values, and  pretty  useful  to  the  other  engineers. So  I  think  this  is  a  report that  I'd  also  like  to  share with  the  group. So  in  the  past, I  would  either  have  to  share  my  data  table with  a  script  that  others  could  run. I  could  do  screenshots, which  I  would  lose  all  the  interaction ability  of  these  reports. Or  maybe  I  could  do  a  pdf  also, it's  not  interacted. But  with   JMP Live, I  can  publish  these  reports  to   JMP Live directly,  so  others  can  use  them  just  like I  have  here   in JMP. So  I  go  to  publish,  file  publish,  publish reports  to   JMP Live. And  JMP  is  now  setting  up a  connection  to  my   JMP Live  server. I  set  that  up  previously in  my  managed  connections. The  first screen  you  see  is  two  reports, actually  a  list  of  the  reports that  are  open  in  JMP.  These  are the  two  reports  that  we  just  created. I'll  select  both  of  those. And  down  here,  I'll  also  make sure  the  publish  new  is  selected. This  is  a  new  publish  to   JMP Live. I'll  select  next. And  now,  as  Eric  mentioned  earlier, I  need  to  pick  the  space  that  we  use to  collaborate  as  a  group, or  right  now  Eric  and  I  are  working on  this  before  we  share  it  with  everyone. So  I'm  going  to  use  the  Eric  and  Chris space  that  he  and  I  have  access  to. Within  the  space,  I'll  create   a  new  folder  called  yields to  store  our  data  and  our reports. I make  sure  that's  selected. And  I  click  next. The  next  screen  shows  me  the  reports that  are  going  to  be  published. I  see  my  yield  report  and  my  fit  model. And  so  I'll  give  him  a  new  title, and  I'll  change  this  one  something a  little  more  appropriate. And now  I'm  ready  to  publish. I'll  hit  the  publish  button. And  at  this  time, JMP  is  sending  the  reports  and  the  data up  to   JMP Live  to  publish  it  on  the  web. The  results  screen  appears, and  I  see  three  sections. First is  the  location, that's  the  yields  folder  that  I created  in  the  space,  Eric  and  Chris. Second  section  are  the  new  reports. That's  the  two  reports that  we  were  working  with  in  JMP. And  the  third section  is  the  new data  that  was  added  to   JMP Live. T hese  are  the  data  tables that  are  used  to  support the  two  reports  that  we  sent. All  of  these  values  are  hyperlinks. So  if  I  click  on  the  yields, I've  brought  to  a  folder  on  the  web that  shows  the  two  reports that  I  added  to   JMP Live. The  one  on  the  left  is  the  yield  report, and  I  have  the  column  switcher  that  I added,  just  like  I  had   in JMP, I  can  switch  between  the  products and  see  the  yield  data  for  each  product. If  I  go  back,  I  see  the  fit  model. And  in  this  report, the  profiler  is  present,  and  it's as  interactive  as  it  was in  JMP. So  now  anyone  that  has  access  to   JMP Live can  use  the  same  data and  provide  the  same  analysis that  they  had  in  JMP. So  Eric, I  think  that's  a  good  first step. Can  you  take  a  look  and  see if  maybe  that's  enough? -I  can  do  that. All  right. Well,  that's  interesting. Let  me  go  see  if  I  can  only  go  to  my,  we go  to   JMP Live  in  my  web  browser  here. And  here's  the  instance  of   JMP Live that  we're  doing  this. I'll log  into  that. All  right.  Okay. I  can  see,  even  right  in  my  homepage, I  can  see  the  two  reports that  Chris  created. I'm  going  to  go  ahead  and  go  to  the  space where  we're  collaborating  here. So  there's  the  Eric  and  Chris  space. And there's  the  yields  folder. All  right.  Well,  let's  see  what  we've  got  here. So  here's  the  widget, is  that  profiler that  I  heard  Chris  talking  about. So  yeah,  this  looks  very  helpful. I  think  our  engineers  will appreciate  this  interactivity. One  thing  I  would  like  to  do,  though, is  I'd  like  to  look  at  the  data behind  this  report. This  is  something  else  that  we added  new  in   JMP Live  17. I  need  this. Let  me  grab  this. To  get  to  the  data, I  can  go  to  this  details here  and  scroll  down. And  there  is  photo  process  app. That's  the  data  table  that  Chris  published that's  behind  this  analysis  here. So  I  will  go  there,  and  then  I  can  just  go to  view  data  right  here,  and  that will  bring  that  data  up  in  a  browser so  that  I  can just  take  a  look  at  it. It  doesn't  have  the  full power  of  the  JMP  data  table. I  can't  edit  the  data, and  I  can't  do  a  number  of  things that  you  can  do  in  the   JMP data  table. But  I  can  do  a  number  of  things, and  I  can  look  through  it and  just  get  an  idea what  the  data  looks  like. So  kind of I  get  a  feel for  what's  going  on  here. Now,  one  thing  I  notice  as  I'm  scrolling through  this  data  is  that  there  are  two material  suppliers  for  this  photo  process that  Chris  has  analyzed  here,  advanced materials  like  it  is,  and  Cooper. I'm  curious  if  the  material  supplier has  any  effect  on  the  relationship between  the  factors  and  the  responses. I'm  going  to  go  back  to  that  report  here. Here it  is. So  what  I  want  to  do is  I  want  to  add  a  data  filter  to  this, to  filter  on  that  material  supplier. Now  I  can't  do  that  directly  in   JMP Live. For  that,  I  need  JMP. But  it's  really, JMP  is  only  one  click  away. When  you're  in   JMP Live  17,  there's a  button  up  here  called  open  in  JMP. So  if  I  click  that, it's  going  to  open  this  report  right  here and  the  data  behind  it  into  JMP. And  there  it  is. And  here  is  the  report. And  then  down  here  is  the  data. Now  we've  opened  it  into  a  JMP  project. You  may  or  may  not  have  used  JMP  projects in  the  past, but  a  JMP  project  is  a  convenient  way to  collect  reports  and  data that  kind of go  together  into  one  object so  that   you  don't lose  track  of  what  goes  together. Because  in   JMP Live, when  you  have  a  report, it  has  data  that  goes  along  with  it. In  order  to  keep  those  together, when  we  open  them   in JMP, we  go  ahead  and  put  them  into  a  project, just  to  kind of hold  everything  together. But  other  than  that, it  will  work  just  like  JMP. So I  can  go  up  here  to  the  red  triangle and  go  to  local  data  filter. And  under  factors, I  have  material  supplier. So  I  will  add  a  data  filter  for  that. And  I  will  check  right  here   in JMP to  make  sure  that  it's worthwhile  to  add  this  even. And  sure  enough, there  is  a  good  bit  of  movement in  the  graphs  as  I  click between  the  two  material  suppliers. So  I  think  that's  a  worthwhile addition  to  what  Chris  did. So  I  would  like  to  publish  this back  to   JMP Live and  replace  the  version that  Chris  published. We  set  up  our  space  because we're  collaborating. We  set  it  up  so  we  have  the  ability to  edit  each  other's  content. You  don't  have  to  set  it  up  that  way. You  can  set  it  up so  that  each  person's  content  is  private from  the  other  people, or  not,  at  least  not  editable. But  we  wanted  to  be  able  to  collaborate, so  we  set  it  up  in  the  way  we  did. So  let's  do  file  publish reports  to   JMP Live. We're  getting  things  ready. Okay. There's  the  list  squares report  that  I  just  created. Now  I'm  not  going to  publish  new  this  time, I'm  going  to  replace  an  existing  report. So  let's  choose  that. So  here's  the  report  I'm  publishing. Now  I've  got  to  tell  JMP  what  report on   JMP Live  do  I  want  to  replace. So  I  click  here,  and  I  can  see, here's  the  fit  lease  squares that  Chris  just  created  moments  ago. So  I  will  select  that. So that's  the  report  I  will  replace. And  I  will  add   just  a  little  extra to the  title  there, just  so  we  differentiate  between  the  two. I  will  click,  and  I  will  click  next. Now  I'm  presented  here  with  an  interesting decision  I  need  to  make. When  Chris  published  this  data to   JMP Live,  the  photo  process  opt  data. And  when  I  downloaded  that  to  my  machine, it  just  made  a  copy  of  that  same data  table  on  my  machine. I  don't  really  need  to  republish  the  data here,  because  Chris  already  published  it. All  I  want  to  do  is  republish  the  report that  has  my  data  filter  in  it. So  rather  than  doing  anything  to  the  data that's  on   JMP Live  that  Chris  published, I'm  just  going  to  say, use  the   JMP Live  data  table. Use  the  table  that's  already  on   JMP Live that's  associated  with  the report  that  I'm  replacing. Click,  replace. Off  it  goes. All  right. So  you  see,  I  have, there's  the  folder  that  I  put  it  in, and  here's  the  report  I  created. And  you  see,   it  doesn't  show any  data  tables  here  being  published. And  that's  because  of  the  choice  I  made to  just  keep  using  the  data that's  already  out  there. All right.  Well,  if  I  go  back to  my   JMP Live window  here, you  can  see  that  there's a  little  note  here that  an  updated  version of  this  report  is  now  available because  I  publish  something  new. It  doesn't  immediately  update  it because  you  might  be  in  the  middle of  something  with  this  report, and  don't  want  it  to  be  jerked  out from   under  you. So  we  let  you be  the  one  to  decide,  okay, I'm  ready  to  reload  this. And   when  I  reload  it, there's  my  data  filter  that  I  added. So  I  can  select  advanced  materials, and  there's  the  data  filter for  advanced  materials. Switch  over  to  Cooper. And  I  get   the  slightly different  curves  there. All right. So  I  am  happy  with  that  part. L et's  take  a  look  at  what else  Chris  did  over  here. So  here's  the  yield  report that  Chris  published. It looks  fine. I  can  switch  between  them  and  see  them. But  one  thing  I  might  like  to  do  here is  create  a  control  chart for  this  yield  process. You  can  think  of  a  yield  process  as  being in  control  or  out  of  control, and  that  might  be  a  helpful  way to  display  the  yield  process. So  I'm  not  sure that  that's  what  we  want  to  do, but  I'm  going  to  create  that and  then  add  it to  the  folder  that  Chris  created, so  we  can  look  at  the  two and  decide  which  one  we  like  better or  maybe  you   like  them  both. So  to  do  that, I  don't  need  to  download  this  report, because  I'm  not  going  to  do  anything to  his  report. I'm  going  to  make  a  new  report. So  I'm  going  to  go  over to  the  data  for  it, and  I'm  going  to  open  that   in JMP. Give  it  permission. And  there  you  go. So  now  we  didn't  need to  create  a  project  or  anything   in JMP. We  just  brought  the  data  table  down and  it  looks  like  any  other  data table  you  would  open   in JMP. Now,  in  the  interest  of  time, let's  go  to,  let's  see, let's  go  to  the  home  window  here. Now  here's  my  script  right  here. So  I  made  a  script  to  create  the  control chart  that  I'm  interested  in. So  I  will  just  run  that  script. And  there  is  that  same  information that  Chris  published, only  this  time  it's  in  the  form of  a  control  chart, so  I  can  go  down  and  look, I've  set  some  spec  limits  on  here, so  I  can  see  that maybe  some  of  them  look  pretty  good. Others, here's  one  that  looks  like  it's  completely all  the  yields  are  below the  lower  spec  limit. So  that  looks  like  a  process that  we  might  want  to  look  at and  try  to  improve  the  yield  on. Okay,  so  I  like  the  way  this  looks, I  think  this  is  a  good  addition to  our  collaboration  here. So  I  will  publish  this  as  well. I'll  publish  reports  to   JMP Live. There's  the  yield  control chart  that  I  just  created. I'm  going  to  publish  new and  what  space  do  I  want  to  put  that  in? Well,  here's  the  Eric  and  Chris  space that  we've  been  collaborating  in. And  here's  the  yields  folder. So  I  want  to  put  that  right  in  there. Now  let's  think  about  the  data  for  this one  as  well,  because  here  again, I've  made  a  new  analysis, but  I  haven't  changed  Chris's  data. So  I  really  don't  want  to  republish the  data  with  this  report. I  just  want  to  continue  using, I  want  this  report  to  use  the  data  that Chris  already  published  to   JMP Live. So  to  do  that,  I  need  to  go to  the  data  options  tab. And  for  the  yield  five star  dot JMP  data  table. Instead  of  publish  new  data, I  need  to  select  existing  data, and  I  will  find  a  data  source. And  here  it  is,  yield  five  star. That's  the  one  I  want. I  will  save  that,  and  now  I  can  publish. There  we  are. Again,  we  have  the  folder  and  the  report. No  data,  because  we  didn't  ask, we  asked  jump  not  to  publish  the  data. So  should  be  good  there. I  can  go  back  to  jump  live. And  I've  got  some  new  posts, and  there's  my  control  chart. So,  see  how  that  looks. Hopefully  there  it  is. All  right. That  looks  pretty  good. Now,  Chris,   so  I've  made  some  updates  to, I've  updated  one  of  your  reports and  created  a  new  report  out  there. But  as  I  was  creating,  I  was  wondering, we've  got  all  the  yields  up  to  date to  today,  but   tomorrow we're  going  to  produce  more Widgets,  and  we're  going  to  have  a  yield value  in  the  day  after  that,  and  the  day after  that  on  into  the  future. So  I  wonder  if  there's  a  way  that  we  could allow  all  these  reports that  we've  just  created to  automatically  update  periodically, and  maybe  daily when  we  have  new  yield  data. Can  we  do  anything  like  that? -Sure,  I  think  we  can. So  in  the  past,  even  before   JMP 17, we  could  update  data  from  JMP. So  in  the  past,  if  I'd  had  this  request, I  would  have  used  a  simple  GSL  script that  would  have  opened  the  data  table that  Eric  wants  to  update  every  day, would  have  updated  the  values from  the  database, then  would  have  connected  to   JMP Live, and  then  updated  the  data  on   JMP Live with  the  new  data  table  using the  ID  of  the  post  on   JMP Live. So  that's  good, except  I  still  have  to  remember  every morning  when  I  come  in  to  make  sure  before I  get  my  coffee,  that  I  need to  push  this  up  to   JMP Live so  Eric  gets  the  new  data. It's  going  to  be  hard  to  do if  I'm  on  vacation, and  I  know  I  won't  remember.  So  maybe   in JMP Live. There's  a  better  way  to  do  it. So  here  I  am  on  the  yields  folder, and  I  can  see  Eric's  new  report  that  I like  a  lot,  the  control  chart  report, but  I  need  to  see  if  there's a  way  to  update  the  data. So  I'll  go  to  the  data  table, and  here  I  can  see  the  two reports  that  use  this  data  table. And  Eric's  asked  if  we  can update  this  data  every  day. I  see  an  update  data button  here  on  JMP Live . That  might  help. Let's  try  that. So  I'll  push  the  update  data  button, and  now  I  can  select  the  data  table on  my  local  machine  and  update JMP Live  with  that  data  table. Marginally  better,  I  guess,  because  now, at  least  I  guess  I  don't  have  to  run the  last  line  of  the  script, but  I  still  have  to  update  the  data  table with  the  data  from  the  database, and  then  use  this  button  to  upload the  new  data  to   JMP Live. Still  I have  a  lot  of  steps  that  I  need to  remember  to  do  every  day, and  I  probably  won't. If  we  look  under  settings, there's  some  new  items  here in   JMP Live  17  to  make  this  a  lot  easier. The  first one  we  see  are the  refresh  settings. That's  exactly  what  I  want  to  do. I  want  to  refresh  the  data as  quick  as  every  morning. So  Eric  gets  the  new  data  and  the  rest of  the  engineers  as  well. So  I'll  make  this  refreshable, I'll  go  back  to  the  reports. And  now  the  update  data  has changed  to  refresh  data. And  so  I'll  refresh  the  data. I  get  an  error  that  says it  can't  be  refresh able because  the  refresh  script  is  empty. I'll  go  back  to  the  settings, and  sure  enough,  there's nothing  in  my  refresh  script. When  we  look  at  this  screen, we  see  this  source  script. When  we  updated  or  uploaded  the  JMP  data table  to   JMP Live, we  stored  the  source  script  off for  that  data  table,  and  you  see  it  here. In  this  case,  this  source  script  can be  used  to  refresh  the  data  directly. So  all  I  really  need  to  do  is  to  copy  this source  script doubt,  add  it  to  the  refresh script,  and  then  see  if  that  will  refresh. So  I'll  copy  the  source  script,  and  then I'll  paste  it  in  the  refresh  script. So  now  we  have  the  connection  to  the  data source  and  the  creation  of  the  data  table. I'll  save  that. And  then  now  I'll  try to  do  another  refresh. I'll  refresh  the  data. This  time  it  got  queued  for  refresh. That's  nice,  but  it  failed  to  refresh. So  if  I  look  at  the  history  tab,  I  can  see the  different  things  that  have  happened. The  on demand  data  refresh  that  I just  ran  looks  like  it  failed. And  if  I  look  at  the  details, I  can  see  that  an  unknown  error occurred  with  the  connection  string. It  may  not  seem  that  helpful, but  fortunately,  I  know  what's  wrong. So  I'll  go  back  to  the  settings. We'll  look  at  the  refresh script  for  a  second. If  you  look  at  the  refresh  script,  you'll see  there's  a  password  and  a  user  ID. Now  I  could  just  paste  my  password in  here  and  my  user  ID  in  here. I  don't  think  that's  overly  secure. I  don't  think  that's  a  good  idea. So   JMP Live  is  provided  this  substitution parameter  syntax  for  user  ID and  password  in  a  refresh  script. So  what  I  need  to  do  is  I  need  to  provide this  user  ID  and  password  to  this refresh  script  somehow  securely. So  if  I  look  at  the  assigned credentials  tab,  or  hanging  here. I  can  see  there  are  no credentials  assigned. I'll  go  into  assign  a  credential. To  create  a  credential, it's  pretty  simple. You  just  store  the  credential  name, you  provide  the  user  ID  and  the  password. That's  all  there  is  to  it. You  do  it  every  day. But  I've  already  set  up  a  yield  table credentials  here  with  DBA  web  JMP as  a  username  and  a  secure  password that's  stored  in  a  database. I'll  assign  that  credential  to  the refresh  script  and  then  save  that. So  now  what's  going  to  happen  is  this refresh  script  is  going  to  run. When  it  runs,  it  will  request  the  assigned credentials,  add  the  user  ID  and  password to  the  substitution, and  then  run  the  refresh. So  I'll  go  back  to  the  reports, and  I'll  see  if  I  can  refresh the  data  now. I'll  run  refresh  data. Now  the  refresh  looks  like  it  worked. I  see  my  reports  are  automatically regenerating, and  I  can  actually  see  the  thumbnail on  the  report  was  updated with  the  new  yield  data  from  the  database. So  now  I'm  a  lot  closer. I  have  a  refresh  data  button  here that  all  I  have  to  do  is  press  it, and  it'll  refresh  the  data. I  don't  have  to  update  a  data  table. I  don't  have  to  upload anything  to   JMP Live. All  I  have  to  do  is  come  in  every  morning and  remember  to  press  this  button. It's  at  least  more  likely, but  it's  still  not  going  to  happen, I  promise,  especially when  I'm  at  the  beach. So  there's  one  more  pain that  we  haven't  messed  with  yet, and  that's  the  refresh  schedules. The refresh schedules provide  us a  way  to  set  up  times to  refresh  this  data  automatically. Sounds  like  what  I  want. So  I'm  going  to  set  up  a  refresh  schedule that  will  run  at seven o'clock  every morning  and  update  this  data. I  don't  think  I  need  to  run  it  on  Sundays or  Saturdays, and  so  I'll  take  those  two out  of  the  list,  and  I'll  save  that. So  now  this  refresh  schedule  is  in  place, which  means  the  refresh  script  will  run five  days  a  week at  seven o'clock  in  the  morning. So  I'll  go  back  to  my  reports, and  you  guys  just  sit and   talk  amongst  yourselves  while  we wait  for  seven o'clock  to  show  up. Probably  not. So  let's  see  if  we  can  look into  getting  a  reef  fresh  to  happen. We'll  create  another  schedule  here that  runs  every  5  minutes. And  I'll  set  that  to  run. This  is  the  most  complicated part  of  the  demo. I  will  set  that  to  run  at  five, where  we  run  here  in  just  a  second. So  let's  see,  I  think  it's  going  to  be  27. Let's  see  if  that's  right. -And you  can  just  put  32  or  33. There  you  go. -Yeah. So  now   I  have  a  refresh  schedule. It's  going  to  run  every  5  Min, and  I  think  I  calculate  it  properly, so  to  run  in  just  a  few  seconds. JMP Live  is  going  to  tell  me,  yes, it's  going  to  run  in  just  a  few  seconds. We  go  back  to  the  report  tab, and  we'll  wait  for  about  15  seconds here  for  the  refresh  to  run. Refresh  schedules  are,  you  can  have  as many  as  you  want  for  each  data  table. We  won't  run  two  at  a  time. So  now  the  refresh  is  about  to  run. You  see  the  automatic refreshes   as  run. My  reports  are  updated. Well,  they're  still  updating. So  both  reports  have  updated  now, just  a  few  seconds  ago, if  I  actually  go  to  the  data,  we  also see  it  was  updated  a  few  seconds  ago. If  we  look  at  the  history, there  was  a  scheduled  data  refresh that  ran  by  the  scheduler just  a  few  seconds  ago. So  now  we  have  a  situation  where  the  data automatically  refreshes  every  morning, just  like  Eric  wants,  whether  I'm on  vacation,  whether  I  remember  or  not. So  I  think  this  is  pretty much  what  Eric  wants. So,  Eric,  how  does  that  look? -Chris,  that  really  looks  good. Let  me  click  the  new  posts  here, and  I  can  see,  I'm  seeing  the  updated versions  of  the  reports  with  the  new  data. So  that  sounds  fantastic. -I  think  you  may  have shared  your  wrong  screen, E ric. -Okay. Let's  try  that  again. Let's  stop  sharing. Let's  share. Here  it  is. Yep. Yeah,  Chris,  that  looks  really  good. I'm  looking  at   JMP Live  here, and  I  see  the  two  reports, and  I  can  see  that  the  data has  been  updated. Looks  like  our  yield  process is  coming  right  along. The  process  that  they  put  in  place recently  seem  to  be  pushing  yields  up at  least  for  this  particular  part that  we're  seeing  in  the  thumbnail. So  that  is  really  good. Well,  now  that  we've  got  this  in  place and  it's  working,  there's  more  people than  just  the  two  of  us who  would  like  to  be  able  to  see  this. So   I  would  like  to  move  these  analyses  and  data  that  we've  created over  to  a  space  that  has  more people  allowed  to  see  it. Just  for  reference, if  I  go  to  the  permissions  of  this  space, the  Eric  and  Chris  space, the  only  two  people  who  can  see  content in  this  space  are  Chris  and  myself. So  we  want  to  rather  than  add  people to  this  space,  we  want  to  go  ahead  and  put this  in  a  space  that  other  people already  are  used  to  going  to. So  to  do  that,  I  can  switch  over  to  the files  view  of  this  particular  folder. And  I  can  see  it  looks  a  little  bit more  like  a  file  explorer  here. I've  got  the  three  reports  in  here, and  I've  got  the  two  data  tables. This  one  here,  it's  got  a  little  bit of  different  icon  to  indicate that  it  is  automatically  refreshing on  a  schedule. So  that's  nice  to  know. So  I'm  going  to  grab these  five  posts  here, and  then  I  will  go  to  move  up  here, and  we  will  move  those. I  need  to  pick  a  different  space. And  the  space  is  called discovery  America's  2022. And  there  is  a  folder  in  that space  called  five  star  line. That's  the  line  of  products we're  responsible  for. So  I'm  just  going  to  move those  over  there. All  right. They're  gone  from  here. That's  half  the  battle. Let's  go  back  to  my  space  directory  here and  flip  over  to  the  discovery  space, into  the  yields  folder and  the  five  star  line. And  there  are  our  reports,  right  there's the  report  that's  scheduled  update. So  in  this  space,  as  I  mentioned, this  space  has  all  the  people  that  are going  to  be  interested  in  this  yield  data, not  just  Chris  and  I,  but  the  other engineers  that  we  work  with. So  that's  one  approach to  putting  some  content   in JMP Live, massaging  it,  making  sure  you  like  it, and  then  share  it with  a  larger  group  of  people. So  that's  a  use  case  that  we  support  here. All  right. Well,  let  me  flip back  to  my  slides  here. If  you'd  like  to  view  the  content that  you  saw  us  create  here  today, there  it  is. The  place  we  published  it  to  on   JMP Live is  actually  viewable  by  anyone  who  can  log in  to  this   JMP Live  instance, dev live 17.  jmp.com.  There's  a  little shortened  link  to  it  down  here. And  hopefully  we'll  get  to  the  slide, so  you  can  just  click  on  it. But  as  long  as  you  have  a  SAS  Profile  ID, you  will  be  able  to  successfully log  into  Dev  Live  17. And  you  can  go  find  that  report, those  reports  we  just  published, and  you  can  watch  it   every  5  minutes, at  least  for  the  length  of  discovery, they'll  be  updating  so  you can  watch  that  yourself. All  right. W ith  that,  we  will  see  if  we  have any  questions  and  that  will  do  it. Thanks  for  joining  us  everybody.
A picture is said to be worth a thousand words, and the visuals that can be created in JMP Graph Builder can be considered fine works of art in their ability to convey compelling information to the viewer. This journal presentation features how to build popular and captivating advanced graph views using JMP Graph Builder. Based on the popular Pictures from the Gallery journals, the Gallery 7 presentation highlights new views available in the latest version of JMP. It features several popular industry graph formats that you may not have known could be easily built within JMP. Views such as dumbbell charts, word clouds, cumulative sum charts, advanced box plots and more will be included to help breathe new life into your graphs and reports!     Welcome,  everybody, to  Pictures  from  the  Gallery 7. My  name  is  Scott  Wise, I'm  a  Senior  Systems  Engineer in  the  US W est  Coast  and  I'm  joined today  by  my  daughter  Samantha. -Hey. -Hey. I  wanted  to  ask  you,  you're a  brand- new  incoming  college  student. What  are  you  most concerned  about  for  the  future? Well,  to  start,  I'm  pretty  worried  about negative  effects  on  the  environment like  deforestation  and  soil depletion  and  climate  change. Additionally,  I'm  worried about  things  like  sexism  in  the  workplace and  gender  gap  wages. Wow,  that's  a  lot  to  think  about. It  got  me  thinking  as  well  what  we  can do  to  make  this  world  a  better  place. To  start  off  our  presentation, I  got  three  suggestions  here. If  we  stay  curious, part  of  what  we  can  do  with  that  curiosity is  actually  share  with each  other  good  data. You  all  like  to  analyze  data, but  sharing  some  good  data would  be  a  great  idea. and JMP  has  a  data  for  green  initiative where  over  the  JMP  community, you  can  actually  share what  you  think  are  meaningful  views, meaningful  data  collections, and  we  can  analyze  these  things  together. The  second  thing you  can  do  is  to  use  your  time. So  JMP  and  SAS  have  both  partnered with  the  IIASA, which  is  trying  to  actively  measure the  amount  of  deforestation  in  the rainforest  to  help  guide  better  policy. So  it's  kind  of  a  cool  application that  lets  you  look  at  some of  their  satellite  images  and  actually help  identify  where  you  see development   in human  growth in  the  rainforest  and  enable them  to  do  a  better  measurements. Lastly,  user  skills. We  all  have  great  JMP  skills in  practice,  looking  at  analytics and  building  visualizations. Our  friends  at  WildTrack  I  think  are some  of  the  best  examples  where  they're using  the  footprints  of  many different  species  of  endangered  animals, and  by  doing  a  little  bit  of  JMP and  a  little  bit  of  visualization, they're  able  to  help  track  in  a non- invasive  way  these  endangered  animals to  help  us  again  create  better  policy. That's  Sky  and  Zoe, and  definitely  I'll  put  a  link  in  here where  you  can  check  out their  work  and  get  inspired. All  right,  so  without  further  ado, here  is  the  pictures for  the  gallery  for  this  version. And  in  our  version,  I  am  going to  dedicate  every  view  to  something  around environmental  green  data. We  can  start  that  conversation. But  as  usual,  I  am  showing  you  some  things that  are  new  into   Graph Builder, such  as  the  first  five  years  I  show  you have  been  brand  new  things  in  JMP  16, as  well  as  I'm  going  to  show you just  a  couple  of  tips  and  tricks you  probably  have  never  seen  before. All  right,  I've  got  the  first  chart here, and  the  first  chart  is going  to  address  equality and  it's  on  the  gender  wage  gap. It's  going  to  show  you  a  new  interval type  chart  that's  available  in  JMP 16 that  I  call  the  Dumbbell  Chart. Now  I'm  going  to  give  you  this  journal, and  why  I'm  pointing  this  out  is if  you  want  to  recreate  this  view, you  not  only  have a  picture  of  what  it  looks  like and  tips  on  how  to  set  your data up, to  make  this  chart, but  I  give  you  the  steps. In  order,  not  only  that, I  give  you  the  data, and  within  the  data,  you  can  just  click on  the  script  to  regenerate  the  view. It's  all  there  for  you. I'm  going  to  build  this  one  from  scratch. What  is  this  data? This  comes  from  the International  Labor  Organization and  it  is  looking  at  the  nominal mean  earnings  of  males  and  females. But  it  has  it  normalized  by  US  dollars. It  would  be  nice  to  see if  that  gap' s  getting  smaller, like with  Sammy's  concern about  the  gender  wage  gap. So  I  used  to  think you'd  have  to  create  a  formula which  actually  took  the  delta, but  to  graph  it,  you  do  not. You  just  need  to  have both  columns  you're  wanting  to  compare. I'm  going  to  go  to   Graph Builder. Everything  today is  going  to  be  in   Graph Builder. And  I'm  going  to  take  the  female monthly  and  the  male  monthly. I'm  going  to  put  them  both  on  the  x- axis, but  I'm  also  going  to  put  them on  the  interval  landing  spots. And  I'm  going  to  take  year and  put  year  on  the  Y. Now  it  looks  really  busy and  it's  because  there's  many  countries represented  here  over  this  span. If  I  go  under  the  red  triangle, I  like  to  call  this  a  hot  spot, and  add  a  local  data  filter. We'll  just  look  at  it  by  one  country. Let's  pick  out  France. Now  I  get  a  pretty  good  view. It  might  be  better to  clean  up  the  view  a  little  bit. I  can  right- click right  on  the  female  monthly  marker, and  I  can  take this  marker  size  up  a  little  bit. I'm  going  to  make  it  a  10. And  now  I  can  do  the  same thing  with  the  male  monthly. I'll  make  that   10 as  well. I  can  right- click  right  here into  the  graph  and  go  to  Customize. And  I  want  to  make  that  intersection  bar. I  want  to  make  it  a  different  color. It's  the  second  air  bar  in  that  list, and  I'm  going  to  make  it  gray and  maybe  make  it  a  bigger  width  of  three. Now  I  got  the  view  I  like. Now  you  can  kind  of  tell why I  call  it  a  Dumbbell  Chart, because  if  anybody  likes  to  work  out, you  know,  at  the  gym,  you  have  weights and  the  ends  of  the  weights  is where  the  heaviness  of  it  is  on  the  ends. In  the  middle,  you  have  a  bar  to  lift. That's  why  a  lot  of  people called  this  a  Dumbbell  Charts. Now,  a  couple  of  cool things  I  can  show  you. Number  one,  we  generally  don't  read from  bottom  up,  we  read  from  top  down. You  can  right- click  here in  your   Axis Settings, and  I  can  reverse  this  order  just  by clicking  on  this  little  box  right  here. Now  I'm  going  2010  to  2019. Also,  I  might  want  to  put  a  reference line  on  the  X-axis  to  help  my  eyes. So  I'm  going  to  right- click go  to   Axis Settings. And  I  think  about  3,500 would  make  a  good  little  reference  line. It's  going  to  put  one on  the  X- axis,  and  there  we  go. Now  I  can  kind  of  gauge, is  the  gap  closing,  is  the  wage  increasing for  both  sexes,  that  type  of  thing. Now  I'm  going  to  bring  a  lot of  pictures  in  as  examples  for  our  data, and  I'm  going  to  show  you  that  if  you just  take  a  picture  and  you  just  put  it into  your  graph,  it  will put  the  picture  as  a  background. Now,  it's  sized  horribly  here. It's  easy,  you  just  right- click   go  to  Image,  go  to  Size  and  Scale, and  save   Fill Graph. There  we  go. And  this  is  pretty  cool. Now,  I  know  the  female  symbol  here in  the  background  map  is  red, so  maybe  I'll  go  right  up  here to  my  legend  and  I  will  change the  colors  around  here. Pretty  easy  to  do. A lso  I'm  going  to right- click  back  into  the  graph and  in  my  image,  I'm  going  to  make that  background  a  little  more  transparent, maybe  like  a  0.3  here. Now  I've  got  a  really  cool  view. Now,  one  word  of  warning, it's  locked  into  this  scale , and  you're  going  to  get a  different  scale  for  each  country because  some  countries pay  more  than  others. I  know  Germany  pays  very  well. You  can  see  it  changed  my  picture, so  you'd  have  to  right- click and  go  to  Image  again  and  Fill the  Graph  to  pull  it  back  correctly. You  might  want to  move  your  reference  line. The  background  maps  are  not  great  if you're  going  to  change  your  scale  a  lot. I  do  have  a  version  here  that's a  multiple  view  version  without  a  picture and  if  you  right- click  on  this  one, you  can  see  here  I  was comparing  over  the  same  scale. I  fixed  the  scales but  made  them  a  little  bigger. What  was  the  difference in  France,  Germany  and  Sweden, and  you  can  see  that  in  France, the  wage  gap doesn't  look  so  bad  on  this  scale, but  Germany  has  a  bigger wage  gap  but  pays  higher. Maybe  you  want  to  be  in  Germany. And  I  noticed  in  Sweden that  the  females  make  more  than  the  males. There  you  go. There's  a  lot  of  differences  out  there. This  is  some  fun  data  to  play  with, so  definitely  see  what  views  you  like. All  right,  second  picture  we're going  to  look  at  is  a  Word  Cloud. This  was  the  second most  popular  thing  that  got  requested. And  you  might  have  seen  in  JMP, there  is  a  Text  Explorer  platform that  allows  you  to  look  at  unstructured text  data  that  you  might  have. And  Word  Cloud  was  one  of  the  views it gave  you  just  with  a  click  of  a  button. But  how  do  you  do  that  in   Graph Builder? Well,  let's  take  a  look. In   Graph Builder,  all  you  need is  the  unstructured  text. In  this  case,  I  have  a  column  of  words  and you  need  some  sort  of  counterweighting. Here  I  have  the  weight and  where  this  data  came  from. This  was  a  study  run  during  COVID of  what  are  the  top  five  things  teachers were  worried  about. Of  course  they  were  dealing  with  a  lo t. Remote  teaching, sick  students,  sick teachers, a  big  change  to  curriculums just  to  get  through  the  year. So  here,  the  highest  weight  was  anxious. Twenty  respondents  all mentioned  being  anxious. So what  I've  done  is  I  have  sorted the  words  by  weights and  then  I  just  put  a  order  here. So  the  highest  weight  got  an  order  of  one and  the  next  highest  got an  order  of  two  and  so  forth. That's  how  I  got  the  weight  column and  that's  how  I  got  the  order  column. And  I  also  created  some  random  data because  you  can  have  a  sorted, ordered  word  cloud,  but  you  can  also have  one  that  just  looks  like  a  cloud. To  generate  that  one... You  might  not  have  known  this, but  if  you  go  and  open  a  new  column  in  JMP with  this  initialized  data, you  can  put  in  random  data  and  you  can put  in  things  like  random  normal  data. Okay,  I  have  already  done  that. Let's  just  go  to   Graph Builder and  see  how  this  works. Well,  the  first  thing I'm  going  to  do,  I'm  going  to  put  weights. There  we  go  on  the  Y- axis, and  I  am  going  to  size  by  weight, but  I  don't  want  points. And  here's  a  little  trick  in  JMP  16, under  the  red  triangle, under  the  points  elements  panel  here that's on  your  bottom  left hand  side  of  the  Graph  Builder, I  could  set  a  shape  column. When  I  do  that,  I  can  substitute for  points  the  actual  word. And  now  you're  starting  to  see  the  words and  as  you  start  to move  around  your  graph, you  can  see  what's  going  on with  those  words  and  that  is  very  cool. This  right  here  is  your  first check  of  doing  a  word  cloud. Now  I  can  color,  by  the  way, to  give  this  thing  some  color. Now,  I'll  want  to  make it  as  cloud-like  as  possible. I  can  move  this  random  over  here. A gain,  playing around  with  the  data  set, I  can  get  the  view  that  I  like. Maybe  I'll  say  done  here, maybe  I'll  go  to  the  legend  position, maybe  I'll  put  it  on  the  inside  left. Maybe  I'll  go  under  the  legend  settings. Maybe  I'll  turn  off all  but  just  the  color  code . And  now  I  got a  pretty  nice- looking  word  cloud. Now  as  well, if  I  wanted  it  to  be  in  sorted  order, because  I  know  anxious is  the  most  important. That's  the  biggest- sized  word, I  love  that  to  kind  of  be  on  top and  then  the  next,  and  then  the  next. So  to  do  that  one,  I'll  open my  control  chart  panel  back  up. I  will  swap  out  the  order  for  the  random. Now  you  can  see all  the  big  words  are  on  the  bottom, so  I'm  going  to  right- click here  under  Axis  Settings, and  I'm  going  to  do that  reverse  order  again. N ow  all  the  big  stuff  is  up at  the  very  top  that  I  have. And  Jittering,  what  you  didn't  see  before was  you  were  getting a  centered  grid  jitter. And  that's  actually  what's automatically  in  your  points  jittering. If  I  do  a  positive  grid  now, I  get  things  in  order. Anxious,  constant  stress  and  tired, whatever  it  has  room  for  on  the  line. But  it  is  in  that  row  order, which  is  pretty  cool. But  it's  so  much  on  the left- hand  axis  that's  a  little  weird. So  I  can  right- click  here  on  the  X-axis. Even  though  there's  nothing  down  here, you  can  still  play  with  the  settings. And  I  can  go  and  maybe  make  this a  negative  0.5  for  the  minimum. It's  going  to  add  a  little bit  of  space  over  here. And  if  it  did  a  nice  job, did  a  really  nice  job. Now  you  can  get an  ordered  word  cloud. All  right. And  then,  of  course, the  ones  I  have  in  my  data  that  you  can play  with,  you  can  see  I  put  in  a  nice transparent  apple  background just  by  bringing  in  that  picture, which  is  really  nice  to  play  with the  colors  and  all  those  type  of  things. All  right,  that  was  a  nice popular  view  everybody  asked  for. The  third  most  popular view  was  Line  Charts. Everybody  likes  to  do  line  charts. In  JMP  16,  there's  many  new  features that  actually fit  lines  through  points in  a  lot  of  different  formats, as  well  as  label  your  line  interactively. And  we're  going to  look  at  tree  cover  loss. Remember  I  showed  you  that  link that  would  help  you  with  folks that  were  trying  to  save  the  rainforest. Well,  it's  important  to  know  how much  we're  losing  around  the  planet. We  have  some  of  that  data, the  under  three  here, moving  average  smoother  line  chart. I  bring  up  my  data  here. I've  got  tree  cover  loss  in  hectares. By  year,  this  should  be pretty  straightforward, so  I'll  just  go  to  my   Graph Builder. I  will  put  my  tree  cover  loss  in  hectares. I  will  put  my  year  down  here  on  the  X. And  you  can  see  I've  got points  and  smoother  lines. Not  so  exciting, maybe  take  drivers  and  overlay. Getting  more  interesting, but  I  don't  like  these  lines. What  other  options  do  I  have for  other  smoother  lines? Well,  in  JMP  16,  they  put things  like  moving  average. Maybe  a  moving  average  would  be  cool. You  can  control   the  spread  of that  mover  average  with  this  local  width. And  I'm  going  to  do  that  one. I'm  going  to  say  done. And  now  it's  looking  pretty  good. I'm  actually  going  to  open  it right  back  up, there's  one  thing  I  forgot  to  do. You  can  actually  put a  confidence  in  around  them. Now  I'll  say  done  just  fine,  but  I clicked  this  little  button  right  there. Very  cool. But  I  just  want  to  look at  kind  of  the  big  hitters. So  this  might  be  a  good  place to  go  under  the  red  triangle, go  to  the  local  data  filter. Go  ahead  under  drivers, and  just  take  the  top  three  drivers. There  we  go. This  is  a  good  chart,  I'm  very  close. But  one  thing  that this  legend' s  kind  of  hard  to  read. Wouldn't  it  be  nice  to  put  the  name  next to  the  line,  maybe  even  on  the  line? Oh,  that  would  be  awesome. Well,  you  can  do  it, you  might  not  know  this  is  the  place you  can  do  it,  but  if  you  just  right- click right  here  on  the  legend  where  it  says Agricultural  Shift,  you  can say  what  happens  to  the  label. You  can  add  minimum  values and  first  value,  last  value, but  just  go  click  Name  and  you  can  see... Oh,  look  at  that,  drew  it  right  in  there. I'm  going  to  do  the same  thing  with  Commodity, and  I'm  going  to  do the  same  thing  with  Forestry  Driven. Now  I  don't  need  that  legend. I  can  go  under  my  red triangle  versus   Graph Builder. I  can  turn  off  the  legend because  it's  not  adding  any  value. Now,  here's  what's  really  cool, you  don't  have  to  leave  it  out  here. You  can  move  it  anywhere  along  the  line if  you  get  close  to  the  line, it  will  try  to  take  the  slope of  the  line  as  its  orientation. I'll  do  this  for  Agricultural  Shift, I'll  put  that  one  there. Commodity, I'll put  one  there. And  now  I  can  move  out  the  axis. And  now  I've  got  a  really  cool  chart. By  the  way,  on  this  chart, Agricultural  Shift  was  the  big  haha. It  was  something  that  we  were definitely  having  a  huge  spike  in, but  I  think  those  efforts, of  our  friends  trying  to  save the  rainforest  have  managed to  pull  it  back  a  little  bit. A gain,  you  have that  version  scripted  in  your  data as  well  as  a  cool  little background  picture  in  the  background. What  else  do  we  have  here? Let's  go  to  the  bottom  of  our  chart. The  next  most  popular  chart  was  actually a  Point  Cumulative  Summary  Chart. Very  interesting,  it's  on  safety  data. It's  not  using  points, it's  using,  looks  like  a  value  of  years. That's  pretty  cool. This  data  came  from  the Bureau of Transportation Statistics, and  why  I  liked  this  data  was it  not  only  gave  us  a  index  of  crash  rate and  injury  rates,  and  this  is  all based  off  millions  of  miles  driven, but  for  each  year, like  in  1998  year, it  told  us  that  the  dual  front airbags  was  the  safety  innovation that  came  in  that  year. T his  would  be  cool  to  see  what's going  on  with  my  line  chart. I'm  going  to  go  graph,   Graph Builder. We'll  go  ahead  and  take both  crash  and  injury  rate, put  it  on  the  Y, put  year  on  the  X. I'm  going  to  turn  off  the  smoother line here  and  just  look  at  the  points. And  can  I  tell  any  difference between  crash  rate  and  injury  rate? I  really  can't. This  is  where  having  a  cumulative  summary would be  really  cool because  I  can  go under  the  summary  statistic, under  the  Points  element  and  just  change this  out  from  none  to  cumulative  summary. Now,  do  you  get  a  sense  of  the differences  in  the  slope  of  the  line? You  should,  because  the  summary  of  the crash  rate  definitely  has  a  steep  line, and  I  would  expect  this, there's  more  people on  the  Earth  now  driving  more  miles . But  you  can  see  that  it  looks  like the  injury  rate  has  less  steep  slope and  seems  to  be  flattening  out. And  maybe  that's because  of  these  innovations. Here  under  my  Axis  Settings for  the  X- axis,  I  can  put  in  like  in  1998, there  were  the  dual  airbags. And  see  if  that  might  be an  inflection  point, a  cause  of  cars that  are  now  protecting us  more  from  injury. That's  pretty  cool. The  other  cool  thing  to  do, you  could  as  well  under  your  points  red triangle,  set  the  shape  column  by  year. And  even  though  it's  continuous, it's  just  going  to  give  you  the  value in  this  case,  that  is  really  cool. So  now  I'm  seeing  it  by  year. Very  nice. Then  I  have  a  view  in  here where  I  have  gone  through and  added  a  whole  bunch of  the  safety  innovations  over  time and  put  a  nice  more  transparent airbag  background  because airbags  was  a  big  deal. But  you  can  see  when  things like  blind  spot  warnings  came  in, anti- lock  brake [inaudible 00:22:34]   technology. Really  cool  to  see  how  the  industry is  helping  to  save  us  from  injuries. All  right, [inaudible 00:22:45]   right  along. Our  next  to  last  chart,  but  still  very popular  review,  is  Advanced  Box  Plots. There's  a  lot  more  you  can  do with  box plots  to  integrate  them even  with  other elements  like  points  and  labels. And  we're  going  to  look at  some  climate  city  risk. And  this  is  some  really  fun  data that  I  found  on  looking  out, projecting  out  to  2050. And  it  was  coming  up with  this  total  climate  change risk  index  on  a 1- 100  scale. And  it  was  looking  at  things  like potential  sea  rise,  shifts  in  temperature. Shifts  in  climate  is  something very  important  for  a  lot  of  us, especially  us  out  in  the  West  Coast, which  is  water  stress  or  water  scarcity. And  that's  how  I  came  up  with  this total  climate  change  risk  score. If  I  want  to  see  what  that  one is  looking  like  on  a  box  plot, it  would  be  pretty  easy  to  just take my  total  climate  change  on  the  Y. Take  my...  Well,  actually, I'm  going  to  put  it  on  the  X. There  we  go. And  I'm  going  to  do  it  by  region  on  the  Y. Now  that's  going  to  allow  me to  then  ask  for  some   box plots and  here  we  go. It's  not  so  interesting  to  me. I'm  going  to  hold  my  Shift  key down  and  add  back  in the  points. Now,  boring  box  plots  don't  have  to  be boring  anymore  because  now  I  have different  types of  box  plot  types  I  can  do  in  styles. Under  style  I  got  this  Solid  Style. Now  it  colors  it  in, which  is  pretty  cool. And  as  well, you  can  go  and  notch  them, and  you  can  go  and  add  fences  to  them. By  the  way,  it's  got  the  Outlier  selected and  since  it  looks  like  on  this  data  set, you  can  see  I've  turned  the  labels  on, I've  already  sorted  this by  total  climate  risk. I  just  want  to  label  the  top 10 that's  already  on  there. I  can  turn  off  this  outlier  and  that  takes any  duplications  out  of  there and  that's  what  we're  looking  at, which  is  pretty  cool. Maybe  change  this  color. By  the  way,  if  you  cannot  see your  points  in the boxplot, sometimes  if  you  put the  box  plot  in  last, it  would  have  moved  it  forward and  it's  over  top  of  your  points. So  then  all  you  have  to  do  is go  into  your  points  and  move  it  forward. Just  right- clicking  into  your  graph and  just  go  into  the  right  element and  bringing  it  forward. That  is  pretty  cool. And  you  can  see  what's  going  on where  we  have  the  highest  risk of  cities  running in  the  climate  change  risk  in  2050 and  usually  things  coastal are  at  extreme  risk  of  water  shortage. Very  cool.  All  right. And  as  well, I  put  a  nice  little  background  picture in  the  background  on  this  one. It  moved  the  legend  in  a  little  bit,  so that's  all  in  the  instructions  as  well. All  right,  so  we  are  to  our  last beautiful  pictures from  the  gallery  review. And  that's  a  Wind  Rose  Chart, and  I  was  thinking  that we're  getting  a  lot  of  adverse  weather given  the  changes  in  our  climate, and  so  we're  always  trying  to  get  better at  predicting  which  way are  the  winds  blowing,  how  strong, where  are  hurricanes  going,  tornadoes, typhoons,  all  these  types  of  things. There  is  a  cool  view  and  this is  not  limited  to  JMP  16. But it is a  type  of... The  pie  chart  is  actually  a  version of  a  Coxcomb  chart that  will  make  a  compass  rose. If  I  can  get  it  labels  that  tell  it like  in  a compass,  what's  north  east? What's  north  west? Those  type  of  compass  directions. I  can  come  up  with  a  pretty  cool  pie  chart that  lets  me  segment that  chart  by,  in  this  case,  wind speed. So  let's  take  a  look  at  this  data. There  we  go. This  is  a day's  worth  of  data in  the  Great  Lakes  area. If  you  take  a  look  here  at  the  6th  row, you  can  see  that  I  not  only  have  latitude and  longitude  and  the  speed, where  it  starts,  how  strong  the  wind  was, what  direction  it  was  going, and  then  of  course,  I  can  get  that into  a  compass  direction like  west  southwest. With  that  I  should  be  able  to  go and  just  put  that  compass  direction down  on  the  X,  ask  for  a pie  chart, but not  only  any  kind  of  pie  chart, I  am  going  to  ask  for  the   Coxcomb chart and  I'm  going  to  take  the  wind  speed and  I'm  going  to  overlay it  by  the  wind  speed. You  can  play  with  these  colors, I  might  move  this  in  a  little  bit. I  might  make  the  really  fast  winds  red. And  now  I  can  see  that  they  were mainly  in  this  direction  on  the  compass. That  was  where  predominantly most  of  the  wind  was and  where  some of  the  darker  red  was  as  well. Very  cool. I  have  a  couple  of  versions  with  this, you  might  have  seen  that  I  brought  in kind  of  an  old  type  of  wind  direction  map where  they're  drawing  the  wind  vectors onto  a  map,  which  is  pretty  cool, and  if  you  want  to  see  how  to  do  that  one, I  put  this  in  the  instructions  as  well. And  if  I  open  the  control  panel  back  up, and  I  go  under  the  spread hot spot  where  points  are, and  I  go  to  Set  Shape E xpression. You  can  see  that there's  a  formula  behind  it. And  what  this  formula  is  doing is  it's  looking  at  each  point, which  is  plotted  by  the  latitude and  the  longitude  on  the  map. Then  it  is  taking  the  wind  speed, it  is  drawing  an  arrow, and  it's  drawing  a  bigger  arrow, of  course,  if  there  was  a  stronger  speed. That's  kind  of  cool, and  that's  what  draws  those  blue  lines that  you're  seeing  involved  right  there. All  right,  all  those are  in  your  instructions. I'm  just  about  out  of  time  here, I  will  show  you  that  I  have  put into  the  journal  that  I'm giving  you  where  to  learn  more. You  can  see  other  galleries, you  can  see  blogs  in  journals, other  presentations, and  even  great  tutorials. They're  all  from  the JMP C ommunity,  community.jmp.com. Those  are  there  for  you. Go  have  fun  with  your Pictures  from  the  Gallery  7. Go  try  to  recreate  these  views  on  your  own data  so  they  can  be  nice  and  compelling, and  do  use your  curiosity, time,  and  skills  to  help  save  the  planet.
Monday, September 12, 2022
In past Discovery talks, we've shown how to acquire data, create a report and publish it to JMP Live using the desktop task scheduler. But what if you have JMP Live and your report is not changing from day to day? Only the data changes, and you want to share those updates with your colleagues. JMP Live now provides a capability to schedule the refresh of your data on JMP Live without having to publish your work again. This talk shows how to use this new capability and explains this new feature in the context of greater plans for the future of JMP and JMP Live.     Welcome  to  Discovery. My  name  is  Brian  Corcoran. I'm  a  JMP  development  manager. My  team  develop  JMP  Live, and  I'm  here  to  talk  about  automatic refresh  of  data  in  JMP  Live  17. If  you've  seen  any  of  my  previous Discovery  talks, you'll  know  that  I  frequently  talk  about data  acquisition  and  repeatability. In  fact,  in  a  previous  Discovery, I  demonstrated  how  to  take  a  variety of  scripts  for  data  acquisition, piece  them  together  to  produce  a dashboard,  and  published  it  to  JMP  Live. And  then  I  used  a  Windows  Task  Scheduler to  schedule  a  refresh of  that  report  every  day. It  was  useful,  but  a  lot  of  customers said, "Gee,  it's  overly  complicated, and  it  should  be  nice  if  this  just happened  on  the  JMP  Live  system without  my  intervention." So  that's  what  we're  going to  talk  about  today. Before  I  do  that,  though, I  think  it  would  be  worth  having an  overview  of  what  JMP  Live  is for  those  that  may  not  know  about  it. JMP  Live  is  a  separate  product  from  JMP. It  is  a  web- based  collaboration  site. JMP  desktop  users  publish  reports and  data  from  JMP  to  JMP  Live. JMP  Live  users  can  interact  with  their reports,  explore  the  data, and  do  some  further  filtering  of  data if  they  so  desire. Users  on  JMP  Live  can  interact with  that  site without  having  a  JMP license  or  JMP  on  their  desktop. If  they  want  to  and  they  have  JMP  desktop, they  can  download  those  reports  and  work on  them  in  the  JMP  desktop  session for  further  analysis  if  they  so  desire. In  JMP  17, we  worked  very  hard  to  make  replacement of  data  and  reports  easier. Because  in  JMP  Live  15  and  16, a  lot  of  this  had  to  be  done on  the  desktop. In  order  to  do  this, we  had  to  make  data  kind  of  a  first- class citizen  the  way  we  refer  to  it  here, or  an  equal  of  the  report. Before,  when  you  published a  report on  JMP  Live, the  data  went  along  for  the  ride   and  you  really  couldn't  see  it. Now,  if  you  want,  you  can  actually publish  the  data  separately. Any  reports  that  use  that  data will  be  refreshed  on  the  JMP  Live  server when  it  gets  new  data. We've  also  provided  a  JMP  Live  Data V iewer so  that  you  don't  have  to  ever  go back  to  the  desktop  if  you  don't  want  to to see  what's  in  the  data. All  of  this  provides   of  the  foundation for  our  refreshable  data. That  means  that  the  contents of  the  data  can  be  refreshed  on  the  server without  you  having  to  do  anything on  your  JMP  desktop  session. Any  reports  that  use that  data  will  be  refreshed. And  we  provided  a  repeatable  way with  a  task  scheduler  on  JMP  Live that's  hopefully  very  simple  to  use so  that  you  can  tell  JMP  Live  when to  refresh  the  data. Refreshable  data  typically  comes from   a  database that  you  access  through  ODBC or  a  website  with  a  rest  endpoint. And  the  examples  of  that  are the  World  Bank  or  the  Census  Bureau. JMP  Live  can  store  credentials  that  you need  to  access  that  data securely  within  the  JMP  Live  database. So  those  are  the  basics. Let's  go  ahead  and  actually  do  a  demo so  we  can  see  what  I'm  talking  about  here. Let's  go  ahead  and  bring  up  JMP. This  is  JMP  Pro,  but  all  of  this would  work  with  regular  JMP  as  well. I'm  going  to  open  a  data  set  that  ships with  the  product  called  Financial. All  this  is, is  Fortune  500  companies, names  have  been  withheld, but  with  sales  and  profitability and,  number  of  employees. Let's  go  ahead  and  make a  report  with  this. I  want  to  just  do  sales and  number  of  employees. And  then  I'm  going  to  apply  a  data  filter for  the  type  of  industry that  we're  talking  about  here. And  you'll  see  within  the  JMP  report that  I  can  kind  of  cycle  between the  industries  to  look  at  the  individuals and  see  what  their  profitability  is, number  of  employees,  things  like  that. Let's  go  ahead  and  just publish  this  to  JMP  Live. I'm  going  to  use  an  internal JMP  Live  server  we  have  here. I've  created  a  folder for  this  information  already. You  could  create  a  new folder  if  you  so  desired. I'm  going  to  go  ahead  and  publish  this. Let's  go  ahead  and  look  at  JMP  Live. And  I'm  just  going  to  use a  Chrome  session. And  we'll  refresh  this. And  I  should  have  a  new  report  here. There's  my  information. It's  going  to  load  that. There's a lot  going  on   in  this  particular  server. So  it  might  take  a  second, but  there  it  is. We'll  cycle  through the  various  industries, and  there  it  is. Let's  suppose  that  I've  decided... I  wish  I  had  put  a  fit  line and  mean  on  this. I  can  go  back  to  JMP and  go  ahead  and  do  that. What  I'm  going  to   demonstrate  here is  the  separation between  report  and  data. Let's  suppose  though,  that  I  also got  information,  in  the  meantime, that  company  number one  is  actually  a  $19  billion  company. But  I'm  waiting  on  verification  on  that, and  I  don't  want  that  outlier to  show  up  in  my  report. What  I  can  do  is  I  can  go  ahead and  publish  this  to  JMP  Live, and I'm  going  to  replace an  existing  report. And  I'm  going   to  select  my  financial  report. But  I'm  going  to  say   rather  than  updating  JMP  Live  data  table, I'm  going  to  use   the  existing  one  up  there. Let's  go  ahead  and  look  at  JMP  Live. It's  reloading  our  report. There's  our  fit  lines. But  we  haven't  gotten our  outlier  company  yet. Still  9.8  billion  in  sales. Let's  go  ahead and  close  this  window. Let's  shut  this  report  down. Let's  suppose  that  I  do  find  out that  my  data  is  accurate  for  drugs, drug  company  number  one, and  it's  a  $19  billion  company. I  can  just  select  to  publish just  the  data  now. And  I'm  going  to  update  the  data, and  I'm  going  to  go  ahead and  replace  my  financial  post. And  we'll  look  here and  it's  going  to  regenerate  with  my  new  data. And  sure  enough, I  have  my  outlier  company  here. If  I  so  desired, I  could  go  in  and  look  at  the  data  here and  see  that  there's  my  $19 billion  company  as  well. So  there's  our   Data V iewer  in  JMP  Live  17. Let's  go  ahead  and  shut this  information  down. That  just  shows  you  how  you can  selectively  update  reports  and  data. For  my  next  part, I'm  going  to  get  into  a  refresh  script and  I'm  going  to  kind  of  put   a  sheemless  plug- in for  a  new  facility  called the  OSI Pi  Data  Server  Import. OSI Pi  is  something  called the  historian  database. And  historian  databases  will  collect information  from usually  a  lot  of  industrial  processes,   machines,  meters,  things  like  that. And  a  lot  of  times   it's  real- time  information collected  in  the  database  so you  can  analyze  it  later  at  your  leisure. And  JMP  17  has  a  new  facility  to  do  that. So  if  I  go  into  database   and   import  from   OSI Pi  server, I  can  connect   to  the  Pi  server  within SaaS. And  we  have  some  simulated  data  for  a  data center  for  like  cloud  computing. And  in  those  you're  going  to  find  a variety  of  servers  and  things  like  that. Here  we're  looking  at   the  server  rack  number  one for  how  much power  is  it  consuming. And  we  can  look  at  that. I'm  going  to  go  back and  I'm  going  to  collect  three days  of  data  at  a  certain  point  in  time. And  I'm  going  to  say  let's go  up  to  10,000  points. Let's  go  ahead  and  import  that. So  we  got  6300  points. And  you'll  notice  something here  called  the  source  script. Let's  take  a  look  at  that. The  source  script  provides  JMP  scripting language  to  recreate  the  data  fetch. And  this  is  going  to  be   important  for  us  in  our  refresh  script. It  contains  the  information  on  how to  connect  to  the  server, what  days  we're  sampling,  how  many points  we  want  to  do,  things  like  that. Let's  go  ahead  and  create  a  simple control  chart,  run  chart  with  this  data. I'm  going  to  go  ahead and  make  this  a  label. Let's  go  ahead and  put  that  information  in  here. And  there  is  our  chart. Let's  go  ahead  and  publish  this. Publish  it  new. And  you  know  what, I  think  I'm  going  to create  a  new  folder  for  this, and  I'm  going  to  put  it  in  there. We'll  go  ahead and  publish  that  information. Now  let's  go  up  and  take  a  look  at  that. There  is  our  new  report. I want to go  ahead  and  look in  my  space  where  this  report  resides. There's  our  Pi  data. So  let's  go  ahead. Here's  a  files  tab,  and  you'll  see that  we  have  a  report  and  a  data  table. So  let's  go  ahead  and  look  at  that  report. That's  kind  of  what  we  expect. And  if  we  hover  over  the  points,  there's our  values  and  when  they  were  collected. But  now  let's  look  at  the  table, and  we'll  go  into  settings  here and  we'll  see  that  that  very source  script  we  were  just  talking  about got  pushed  up to  the  server  along  with  our  report. Now  there's  something  called a  refresh  script . The  source  script   is  kind  of  provided  as  a  starting  point for  what  you'll do with  your  refresh  script. The  refresh  script  is  a  script  that's going  to  run  to  provide  data for  our  update. So  whatever  you  put  in  here  is  what's going  to  be  updated  for  your  report. Let's  go  ahead  and   make  this  refreshable. And  let's  copy  our  source  script and  edit  our  refresh  script. I'm  going  to  paste  that  in  there. First,  we  got  to  make  sure this  has  an  ending  semicolon  in  it. But  this  has  the  information  to  connect  to the   Pi server  from  our  server  process. The  Pi  Server,  by  default  can  get  a  whole lot  of  data  tables  at  once because  you  might  have  like five  different  power  supplies. So  you  get  five  different  tables,  and  you can  return  that  as  a  data  table  list. That's  how  the   Pi server  works. So  we  have  to  know  that in  order  to  do  this  right, because  rule  number  one, and  the,  really,  only  rule  you  need to  remember  for  refresh  scripts  is it  needs  to  return  a  data  table  as its  last  item, last  thing  it  does. Let's  go  ahead  and  we're  going to  return  our  data  table  list  here from  our  Pi  call. And  we're  going  to  take  the  first  item  off of  that  list  because w e  really  only  have  one  data  table  for  our Atlanta  Data  Center  power  meter. So  we'll  take  the  first item  off  of  that  list. And  since  this  is  setting  up  a  data  table with  the  last  action  in  the  script, we  should  be  good  for  our  rule. The  last  thing  we're  going to  do  is  return  a  data  table. But  while  we're  at  it,  let's  go  ahead and  get  one  more  days  worth  of  data. We'll  get  four  days'  worth going  back ways. So  there  is  all  we  have  to  do, I  believe,  for  our  refresh  script. Let's  go  ahead  and  try  it  out manually  by  saying  refresh  data. It's  saying  refreshing  and says  updated  a  few  seconds  ago. Let's  go  ahead  and  go  back and  look  at  our  report. There  it  is. And  you'll  see  it's  a  bit  different  than what  we  had  on  the  desktop because  we  went  and  got  an  additional   day's  worth  of  data. So  you  can  see  it's  matching up  to  this  last  day. And  then  we  have  some  additional  data. So  there  is  a  refresh  script from  our   OSI Pi  source. So  that's  our  first  example. And  if we  wanted  to,  we  could  view  that. The  history  pane  contains information  on  how  well  things  went. You  can  see  I  asked  for an  on- demand  data  refresh  here. It  took  only  two  seconds. If  we  are going to  look  at  the  details, we'd  say  it  just  returned  a  data  table. This  isn't  going  to  return the  contents  of  the  JMP  log. JMP  is  running  in  the  background to  fetch  the  data  for  you  on  the  server. So  there's  your  first  example. Let's  try  another  one. I'm  going  to  clean  this  up. Another  common  way  that  we  get  data is  from  a  traditional  database like  Oracle  or   SQL Server. So  let  me  do  an  example in  which  we  use   SQL Server. I'm  going  to  go  ahead  and  I'm  going to  bring  up  Query  Builder and  open  a  new  connection  here. I  have  some  stock  data and  for  this  particular  case it's  for  Apple  Computer. But  let's  imagine  that  we  have  a  table full  of  stock  quotes  to  get  updated at  the  close  of  the  market,  like  at  04:15  PM  every  day. And  we  have  some  background  process updating  those. Let's  go  ahead,  and  we're  going  to  get our  quotes  that  we  have  right  now. I'm  going  to  build  a  very  simple  query, bring  in  all  our  data. This  is  going  to  provide  the  Apple  quotes from  the  beginning  of  the  year. I'm  going  to  select  Date  as  a  label. We'll  go  ahead  and  do   another  run  chart with quotes. We  can  go  and  hover  over  points, see  what  the  closing  price  was on  any  given  day. Stock  market  has  gone  down, but  Apple's  bouncing  back  up  recently. Let's  go  ahead and  change  the  name  of  this. Let's  go  ahead and  publish  that  to  JMP  Live  now. I'm  going  to  create a  folder  for  this  too. Maybe  we  have  a  whole  bunch of  stock  quotes  in  here. Let's  go  ahead  and  rename  this  too, and  we'll  go  ahead  and  publish  that. Now  let's  go  ahead and  we'll  just  shut  this  down. Now  let's  look  up  here, and  there  are  stock  quotes  as  we  expected, and  if  we  look  at  our  table, there  is  our  source script. Let's  go  ahead  and  take  a  look  at  that. Let's  make  our  data  refreshable  again  like we  did  with  our   Pi server. And  let's  copy  that  and  edit. You'll  see  here  that  our query  consists  of  a  connection  section. That's  how  we're  going  to  reach the  database, in  this  case  SQL Server, using  an  ODBC  driver. This  was  present  on  the  source  script on  the  desktop  as  well, but  the  password  has  been  replaced by  a  placeholder,  as  have  the  user  ID, both  for  security  reasons. And  because  the  credentials  on  the  server may  be  different  than  the  credentials you're  going  to  use  on  the  desktop. Let's  go  ahead  and  we're  going  to complete  our  refresh  script. We  need  an ending  semicolon  here. And  the  new  SQL query   just  returns  a  single  table and  we're  going  to  assign  that  to  a  data table  variable  in  our  script. And   as  an  insurance  mechanism, I  like  putting  an  empty  reference to  the  table at  the  very  end  of  our  refresh  script to  make  sure  the  last thing  we  do  is  return  a  data  table. So  what  about the  password  and  user  ID? Well,  we're  going  to  talk  about  that. Let's  save  our  script  out. We  can  assign  credentials and  we  go  down  here, and  I  have  some  I  created  before, but  let's  create  a  new  one just  to  show  you  how  you  would  do  this. It's  called  the   SQL Server  for  JMP T est . And  what  I'm  going  to  do  is I'm  just  going  to  put  the  credentials   for  the  server's  access  of  the  database. So  maybe  these  are administrative  credentials. Maybe  they're  test  credentials that  you  use  up  here. And  I'm  going  to  say, let's  go  ahead  and  assign  this  set of  credentials  for  our  refresh  script. What  will  happen  is  the  user  ID  will be  substituted  into  this  field within  our  refresh  script   where  you  see  UID and  the  password  will  be  replaced with  password. And  it  will  be  fetched  in  a  secure  manner from  an  encrypted  database only  when  that  fetch  needs  to  be  made. So  nobody  will  ever  see  your  credentials. Let's  go  ahead  and  try  to  refresh  this. We  had  a  failure. Let's  take  a  look  at  that and  see  what  happened. Here's  our  JMP  log. And  if  we  read  through  this, it's  a  real  long  explanation  to  say it  didn't  quite  understand the  connection  credentials. Let's  take  a  look  at  that  and  see what  might  be  going  on. So  this  is  really something  important   to  know  about when  you're  doing refresh  scripts  on  JMP Live. JMP  Live  runs  a  variety  of  JMP  sessions to  help  you  fetch  data  and  recalculate analyses  in  the  background, they're  all  hidden  from  you, but  they  run  on  Windows  Server along  with  JMP  Live. You  must  have  the  ODBC  driver that  Windows  Server  needs to  access  that  database installed  on  the  server. It  turns  out  that  the  Windows  ODBC  driver has  a  different  name  than  my  Mac  driver. Remember  I'm  using  a  MacBook  and  it was  actual  ODBC  SQL Server  driver. My   SQL Server  driver  from  Microsoft is  just  called   SQL Server, so  I  need  to  modify  that  driver name  here  in  my  connection  script. Let's  go  ahead  and  save  that. Now  let's  go  ahead  and  try  refresh  again. And  it  says  we  updated  a  few  seconds ago. Okay,  that's  great,  except  it's the  same  data  we  already  had. So  it's  not  too  exciting. So  now  let's  simulate  what  would happen  if  the  data  got  updated. We're  going  to  create  a  refresh  schedule. Let's  go  ahead  and  say... Normally  we  probably  do  this around  this  time  of  day because  the  market  will  be closed  and  we  can  get  new  data. So  I'm  going  to  go  ahead  and  put in  like  04:18  PM. And  I'm  going  to  say we're  not  going  to  do  this on  Saturdays  and  Sundays because  the  market  is  not  open. So  let's  go  ahead  and  save  that. So  it'll  start  calculating when  it's  going  to  first  run. It's  going  to  run  in  a  minute  or  two. So  let's  go  back  to  JMP, and  I  have  a  script  that's  going  to, behind  the  scenes, update  the  stock  market  data. Like  what  normally  might  happen   from  a  data  feed  elsewhere. And  we're  going  to  put  a  stock  value of  200  in  there  for  today's  data. Make  sure  that  ran JMP  now. And  now  we're  just  going  to  wait on  our  JMP  Live  session. It's  going  to  run  in  a  few  seconds. And  while  I  do  this, I  just  mentioned  that  your  scheduling allows  you  to  both  set  termination  dates. If  you  want  it  to  only  run  for  like a  month  and  stop, you  can  get  greater  granularity so  that  it  runs  like  every  five  minutes, every  hour,  things  like  that. By  default,  it's  going  to  run  once  a  day. And  like  I  showed  you,  you  can  select which  days  of  the  week  it  runs. Says  it's  going  to  run  again  in  a  day. Let's  go  ahead  and  see what  our  history  says. Our  scheduled  data  refresh  succeeded. Let's  go  ahead  and  look  at  our  report. Okay. And  there  is  our  value  of  200. Stock  really  rocketed  up, maybe  it's  time  to  sell  it. But  the  important  thing  to  know  is  that our  refresh  schedule  worked. And  from  now  on,  now  that  we  set  this  up, it's  going  to  run  five  days  a  week, every  day, and  we'll  automatically  get  our report  generated so  that  when  we  come  in  in  the  morning, we're  going  to  see  our  updated  quotes and  be  able  to  make  decisions on  our  reports  and  the  latest information  that  we  have. So  that's  updatable,  refreshable  data in  its  simplest  form  on  JMP  Live   and  JMP  Live  17. I  hope  you  found  this  interesting. I  really  appreciate  you attending  this  talk. Have  a  good  day.
JMP is best known for allowing you to "touch" your data with interactive visualizations, dynamic linking and graphical statistical outputs. However, many repeatable options are buried either in the red triangle menu, within multiple layers of menu options or only available on the data table.  There is hope though!   Using JSL you can improve efficiency and create a personalized experience with custom toolbar items that allow you to stay in the analysis window and workflow.   In this presentation, I will review different scripts that adds little wins in your analysis; such as being able to select a column in the graph builder column list and then have it be selected in the data table or how to remove outliers by selecting them and replacing them as missing (vs having to hide and exclude) or how to control the profile desirability function more efficiently and more.  Other examples will include how to make your own tuning table for any analysis, quickly set spec limits based off of fitted distributions & desired sigma levels for a large number of columns, how to automate running an MSA on 100's of columns, plus how to identify columns that have subgroups automatically and then run the appropriate control charts.   My goal by the end of the presentation is that you will be more efficient, have a new way of thinking about how to modify JMP, and will dive into scripting.     Hello  everyone. My  name  is  Steve  Hampton. I  work  at  Precision  Castparts. I'm  the process  control  manager  there and  I'm  here  today to  talk  about  unleashing  your  productivity  with  JMP  customization. I  live  with  my  family in  Vancouver,  Washington. I  have  been  in  castings  my  entire  career. The  last  15, I ha ve  been  with  PCC,   which  is  investment  castings, and  I  am  a  self- proclaimed  stat  nerd. I  think  this  little  post-it  note, it briefly   describes   a  lot  of  my  conversation I  have  with my  wife, where  she  just  gives  me   a  very  strange  look  as  I  try  and  explain why  I  am  not  watching  TV   and  I'm  playing  around  on  JMP  on  Saturday, because  I  have  a  tasty  beer   to  go  along  with  it. So  when  I'm  not  nerding  out  on  stats, my  other  thoughts   usually  focus  around  work if it's  not  around  fun  activities  outside, and  work  is  pretty  cool. We  make  a  lot  of  different  products, but  the  one  that  I  really  like  to  show  off is  this  6  foot  in  diameter one- piece  titanium  casting. It's  called  an  engine  section  stator. It  goes  on  the  Trent  XWB  engine   which  goes  on  the  A 350  airplane, if  you're  keeping  track. And  you  can  actually  see  it  tucked   right  behind  those  main  fan  blades in  this  second  picture, as  the  first  thing  that  the  air  will  see before  it  enters   into  the  core  of  the  engine. So  just  a  really  cool  industry  to  be in, aerospace  and  some  high- tech   investment  castings. So  why  am  I  here? Well,  I  love  JMP, I  love  talking  to  people  that  love  JMP, and  I  love  talking to  people  that  love  stats. So  great  to  be  around  like  minded  people. And  I  hate  clicking  buttons to  get  things  accomplished. So  if  you  remember  back  in  the  day, there  is  a  little  known  Christmas  movie   called   The  Grinch,  [inaudible 00:02:02], and  he  had  a  scene  where  he's  just  saying,  "The  noise,  the  noise, the  noise," and  I  feel  like  that's  how  I  am   a  lot  of  times  when  I'm  in  JMP. It's  the  clicks,  the  clicks,  the  clicks,   they  just  drive  me  insane. And  I'm  here  because  I  like  flexibility, and  I  think  most  people  do. So  I  like  to  share  some  things   that  I've  done to  increase  my  flexibility  with  JMP. Interesting  note, when  I  thought  about  this  presentation, this  mindset  actually  started way  back  in  the  day. I  remember  loving  my  NES, but  the  controller,  very  quickly, I  had  the  thoughts  on,  "Well,   why  can't  I  have  Jump be  B  and  Run  be A?" Because  that  worked  better  for  me. And  then  that  only  got  worse. The  controllers  got  more  buttons, which  was  great. Added  to  my  need  for  flexibility but  it  took  a  long  time  for  them  to  start to  allow  us  to  customize  things. Now  they're  pretty  good. But  before  the  consoles  got  pretty  good, I  really  found  my  benchmark for  life  of  interfacing  with  a  program   in  computer  games, because  not  only  do  I  have a  keyboard  that  had  tons  of  buttons so  I  could  just  immediately  cause an  action  to  take  place, but  I  could  remap  all  of  them. So  that's  been  my  baseline when  I  compare  everything  that  I  interact with  that's  electronic  to  that. So  there  is  hope,  though, because  we  have  the  humble  toolbar, at  least  you  think  it's  humble, and  the  scripting, which  is  everyone  that's  involved   with  it  knows it's  incredibly  powerful. Real  quick,  our  first  efficiency  power  up  is  the  toolbar. So  it's  like  Mario  getting   a  little  mushroom  there. He goes  from  a  small  little  guy   to  a  bigger  guy. A  toolbar  is  by  default  pretty  limited,   but  it's  really  easy. Just  go  turn  on  things   and  you  immediately  have  a  lot  of  access to  things  you  can  do with  just  one  click  on  actions that  you'll  probably  do  a  lot   during  data  manipulation. So  I  recommend  you  keep  on  Reports, Tables,  Tools,  Analyze,  and  Data  Table, which  I  marked  up  here in  red  boxes  on  the  right. Another  tip  is  that  you  can  actually  turn on  toolbars for  these  windows  differently   or  independently, and  you  can  move  them   around  independently, which  is  really  nice  if  you  want   to  have  a  custom  set  for  each  window, but  be  careful,   if  you  move  around  too  much, you' ll  lose  some  of  your  efficiency   as  your  hunt  for  where  the  icon is, has  changed  from  one  window  to  another. Even  better,  you  can  make  your  own. So  if  you  go  into  the  customizing  toolbars and  new  toolbar,  you  can  make  your  own. You  see  all  those  blue  ones are  ones  that  I've  made. I  think  I've  actually  now  made   more  toolbars  for  myself than  come  in  JMP. So  w inner winner chicken dinner,  I  guess. And  the  black  ones  are  actually  ones that  I've  added   some  additional  icons  to  as  well. So  I  think  that  combination  gets  you  to  raccoon  Mario, which  I  thought  was  a pretty  neat  stuff   back  in  the  day. Always  wanted  to  be  raccoon  Mario. Just  some  real  quick  other  little  things before  we  get  into  JMP is  you  can  link frequently  used  buttons   if  you  use  the  built in  command. This  really  works  well   if  you  still  want  to  be  able  to  undo when  you  click  something. If  you  link  to  a  script  and  you  run  it, you  can't  undo  it, which  is  a  little  bit  annoying, but  I  usually  put  my  script   as  a Run JSL  in  this  file  so  it's  linked, not  embedding  it  in  the  toolbar, because  then  I  can  change  the  file   outside  of  the  toolbar and  update  the  functionality without  having  to  dig  back   into  this  toolbar  menu. You  can  use   built in  icons. I  use  PowerPoint  to  just   give a  shorthand  of  what  the  icon  does and  then  save  it  as  bitmap  and  upload  it. It's  not  great. If  anyone  has  any  better  way  of  doing  it, I'd  love  to  talk  to  you  at  some  point. But  then  you  can  also  assign  shortcuts, you  can  hide   built in  toolbar  icons that  you  don't  really  want   to  be  interested  in. You  can  also  add   to  your  standard  toolbars. So  really  scripting  in JMP  is  super  powerful, and  then  when  you  combine  it with  the  toolbar, you  get  a  pretty  legendary  efficiency  team that  I  think   looks  like  this. That's  how  it  appears   when  I  think  about  the  combination. So  let's  get  into  JMP. So  I  have  this  data  table  here, and  it's  got  a  lot  of  columns. Normally,  this  would  take a  fair  amount  of  cleaning  up. First  thing  I  do  is  just  understand, is  there  the  right  column  types and  the  right  column  amounts? So  I  can  immediately  see   these  two  columns  that  are  highlighted, that  they're  supposed  to  be  continuous. You  normally  could  right  click, go  into  column  info  and  change  things  up  here. It's  already  bothering  me because  that's  too  many  clicks. I  can't  change  it  to  continuous  here because  it's  a  categorical  base. So  I  made  myself  a  little  macro   that  I  can  go  ahead  and  click and  it's  done. No  matter  how  many  I  select,  it's  done. It kind of  is  a  combination. It's  a  one  click  standardized  attribute,  which  is  great. You  can  also  see  over  here I  have  this  column  that's  the  date   and it's  messed  up. Once  again,  I  don't  want  to  right  click and  dig  into  subfolder. I  just  want  to  click  and  go. So  I  have  made   myself a  two  date  function  here. So  now  I  have  a  two  date. I  also  have  this  batch,  which  is  right, it  is  technically  continuous, but  a  lot  of  ways  I  want   to  use  it  would  be  more  ordinal, but  I  want  to  have  both. I've  made  myself  a  script   that  just  throws  out  another  column called  Batch  Nom and  it's  now  a  nominal  column. And  the  reason  that  you  might  want   to  use  this, if  I  select  these  guys   and  go  into  a  filter, maybe  I  want  to  filter  being  able  to  drag, but  if  I  want  to  grab  just  a  single   or  just  a  couple  batches, then  it's  a  lot  easier  to  do  it   in  the  nominal  state. And  also  the  way  it  shows  up   on  some  of  the  graphs can  be  better i n  one  way  or  the  other. Then  the  next  thing  we  can  do  is  see  that  this  date  is  individual  date  for  a  day. A  lot  of  times  we  roll  things  up  by  weeks. So  I  could  use   the  awesome  built  in  formula  columns and  go  and  get  a  year  week  column  as  well, but  it  doesn't  really  mean  a  lot for  most  people  because  it's  not  a  date. It's  like,   "What  does  the  week  five  2020  mean?" So  I  have   built in  a  function   where  I  take  the  date and  it  will  return the  next  Sunday  after  that  date. And  so  now  I  have  a  weekend  column   where  I  can  bin  it  by  weeks and  it's  really  easy  for  people to  understand and  it's  continuous  versus  the  other  way  of  doing  it  makes  it  nominal. So  a  lot  of  advantages  to  me  in  that. So  I  have  just  a  lot  of  things that  helps  me  clean  up. The  last  thing  is  since  this  came from  the  categorical  to  numeric, then  it  has  some  missing  things  in  here. I  know  these  missings  are  actually  zeros because  if it  doesn't  have  any  data, it  means  there  wasn't  any  defect. So  since  I  do  this  a  lot, I  actually  have  this  recode  missing to zeros  and  recode  zeros  missing. So  recode  missing  to  zeros, there  we  go. So  I  haven't  had  to  actually  go  in  here, recode  and  then  do  more. Once  again,  already  too  much  typing. For  data  manipulation  steps  that  you  do, adding  in  some  scripting  really  can  make you  super  effective  in  the  data  cleanup, and  so  you  don't  have  to  think   about  scripting  just  for  analysis that  you're  running  a  lot. Just  think  about  it  in  more  micro steps to  get  some  efficiency  gains. The  next  thing  is, I'm  going  to  take  us  into  Graph  Builder, and  let's  bring  this  up. And  so  I  spend  a  ton  of  time   in  Graph  Builder because  it's  one  of  my  favorite  platforms. You  really  get  a  feel  for  your  data and  it's  easy  to  get  people that  maybe  aren't  as  deep  into  stats to  understand  what's  going  on. So  this  is  probably  the  main  platform  I  live  in. And  as  I  bring  this  up, immediately  you  can  see  like, "Oh  well,  since  defect  one   is  not  in  the  right  condition, the  graph  doesn't  look  great." But  the  nice  thing  is  that  I  don't have  to  go  to  the  data  table. When  I  first  started, I  hated  going  back to  the  analysis   or in the  data  table. Or  I  put  them  side  by  side,   but  then  everything  gets  crunched  up. So  what  the  win  was  here   is  that  by  learning  about  the  report  layer and  being  able  to  pull  out  the  state of  different  reports,  in  this  case, I  can  pull  up  the  state  of  what is  selected  in  this  box, I  can  actually  select it  in  the  data  table. So  now  that  is  selected  in  the  data  table, and  I  could  use  my  Go  To  Continuous, and  now  I'm  back  in  business. So  I  call  this  staying  in  the  workflow. I  learned  about  that  term  from  watching an  on  demand  webinar  about  formulas and  they  were  talking  about  staying  in  the  workflow  as  far  as  staying  in JMP. Don't  go  to  Excel,  do  some  formulas   and  bring  it  back  into  JMP. Like  learning  this  use  of  formulas  in JMP because  its  formula  maker  is  amazing and  you're  staying  in  the  workflow. So  I'm  saying  you're  staying   in  the  workflow of staying  in  your  analysis  window, and  that's  where  you  want  to  live. I  don't  want  to  have  to  go  back   to  the  data  table. So  I'm  going  to  use  a  standard  toolbar to  put  a  column  switcher  on and  we're  going  to  get  all  of  these... Oh  my  goodness, all  of  these  columns  here. So  we  got  a  column  switcher, and  I  also  have  put  in  another  script  here where  I  can  now  select  from  the  data  table with my  column  switcher,  which  is  great. And  it  opened  up  another  world of  using  a  script  that  Jordan  Hiller had  helped  me  with  when  I  was  just  starting  down my  scripting  path   of  what  we  called  newcome. So  it  was  a  way  of  taking  data, this  data  is  not  good  data, it's  not  fully  completed  parts. So  I  want  to  get  rid  of  this, but  I  don't  want  to  just  hide  and  exclude. If  I  use  my  little  shortcut,  Ctlr+Q, that  I  remapped,  that's  gone. That's  what  I  wanted  on  this  slide, but  now  I  lose  all  the  information on  that  row. And  I  don't  want   to  have  to  use  Row E ditor, I  don't  want  to  have to  use  subsect  with  linking. That's  all tuny c licking. So  what  I  have  here, I'm  on  the  right  response   in  the  column  switcher, I  can  select  these  guys and  I  can  run  my  newcome  script and  now  those  data  points  are  removed. So  very  quickly,  you  can  go  through with  the  column  switcher  and  the  newcome and  be  able  to  remove  data   that  is  either  an  outlier that  you  know  shouldn't  be  in  the  data, or  is  causing  problems, or  is  actual  bad  data  that  should  be  out. And  I  see  a  lot  of  bad  data in  the  form  of, it's  out  of  place  in  the  sequence  of  time. So  this  one's  obvious,  right? That's  obviously  bad  data. It's  obvious  to  me. So  I'm  just  going  to  blow  it  out. So  here's  an  interesting  one. This  is  my  interesting  one  at  first. So  you  can  see  that  the  A, it's  got  some  really  crazy  ones  here,  and  these  are  all  bad  data. So  another  way  you  can  look   at  that  is  I'm  going  to  use... This  toolbar  is  actually  something   you  can  just  select  as  a  standard  script. You  can  just  select this  function   in  JMP  to   redo. So  now  I  have  my  new  column. I  can  take  this  out   and  I  can  do  a  box  spot and  I  can  say, "Okay,  cool,  here's  outliers." So  that's  a  way  to  blow  things  out. You  can  see  I  had  a  lot  of  them, but  these  guys  are  not  outliers. And  really,  I'm  using  outliers   in  place  of  bad  data because  bad  data usually  shows  up  as  an  outlier. But  these  ones  were  not. [inaudible 00:14:40]  show  up  as  an  outlier  in the  box  spot. They  are  not  bad  data. So  I'm  going  to  nuke  out  all  these  guys. And  you  can  see  now, I  don't  have  anything  on  the  low  side that's  saying  is  an  outlier, but  I  do  know  that  I  have  outliers  still and  they' re  outliers   that  I'm  going  to  call  in  time. So  this  is  so  far  away   from  the  other  data  points that  I  know  from  my  experience and  looking  at  the  [inaudible 00:15:10] that  these  are  not  real  data  points, they  are  data  that  we  have  jacked  up. So  I  can  go  in  here   and  select  all  these  points that  are  bad  data  because  of  where  they  are  in  time and  get  rid  of  those. And  you  will  never  see  that   from  a  standard  outlier  analysis. So  now  I  have  a  very  nice  looking  curve, everything  is  cleaned  up, and  I  was  able   to  do  that  pretty  darn  fast. So  it's  a  really  powerful  tool. If  we  go  back  along  here, this  is  an  interesting  one. So  I  can  see  that  I  have this  outlier  right  here. I'm  going  to  nuke  it. But  you  can  see  that  there  is  a  shift, and  I  unfortunately  in  my  data  table   try to  label  it  as  a  trial. So  I  could  use  the   right  click row, name  selection  and  column, but  there's  still  a  lot  of  steps  in  there, so  I'm  just  going  to  select. And  I've  made  myself  a  binning  column. So  when  I  click  this, whatever  was  selected  is  now  binned. So  I  can  very  easily  see  what's  going  on. I  can  now  even  add  in  my  text  box and  see  the  differences  of  the  means. That's  really  useful. I  mean,  you  can  bin  things as  trial,  not  trial. I  use  good,  bad  a  lot. So  if  my  continuous  data  isn't  great because  of  the  measurement  system, but  it  does  do  an  okay  job, it's  just  saying  the  part  is  good  or  bad, I  can  bin  it  with  this and  then  do  an  analysis   with  the  pass,  fail, like a  logistics  analysis. So  that's  great. I  also  really  like  the  dynamic  selection. So  if  I  were  to  go  back  here, I'm  going  to  take  the  binning  off. And  now  I  have  this  selected  column where  it  just  changes  it  to a  one if  I  select  it. Now,  I  can  dynamically  go  through   and  select  different  things, and  I  can  see  the  mean. [inaudible 00:17:16]  j ust  real  quickly. Okay,  this  grouping  right  here, its  mean  is  100  and  above  it  is  288. And  it's  really  useful for  poking  a  data. Let's  say  right  here,  what's  going  on  with  this  data? One,  I  can  select  it   and  see  what  differences  and  means  are. But  then  two,  I  could  see  what  the  trend would  have  been  like if  this  had  not  happened. So  I  can  do  a  little bit  investigation. And  then  I  actually  use  inverse  selection  a  lot, which  is  buried  in  the  row  menu. So  I  just  have  a  toolbar  here, so  now  I  can  inverse  it. Everything  basically is  the  same except  for  that  now   the  bulk  of  the  data  is  highlighted, which  sometimes  makes  it  easier. So  that's  great  to  use  to  analyze. The  other  thing  I  have   is  sometimes  you  might  want  to, say  based  upon  what's  selected  here,   what  else  is  selected? So  I  call  this  my  selected  other  columns. And  then  we're  going  to  go  and  say, for  this  little  grouping   that  was  different, what else  shared  the  equipment one that  this  grouping  used. And  when  I  click  that,   you  can  see  that  barely  any  of  the  rest of  the  B  product  used  equipment  one  level, but  a  lot  of  item  A  did, and A  is  actually  higher  here. So  it  might  be  something  that  if  we  wanted to  possibly  not  have  this  higher  level, maybe  we  need  to  look   at  using  the  same  equipment that  the  rest  of  B is  using. A  lot  of  different  ways  to  slice and  dice  and  learn  things. The  last  thing  is  it  could  be   I  have  two  products  here, but  let's  say  I  don't  want   to  do  two  products, so  I  want  to  subset  it. So  I  would  go  in,   I  have  these  subsetting  icons, because  once  again, I  just  want  to  do it in  one  click and  I  do  a  lot  of  subsetting   so  it  makes  sense. So  now,  I  have  this  new  table. But  what  if  I  want  to  have  the  same graph  though,  and  build  that? I  don't  necessarily  want   to  have  to  rebuild  it  from  scratch, and  there's  some  other  ways   to   copy  and  paste  some  scripts  over, but  I  do  this   enough  that  I  actually am  going  to  save  the  script to  the  clipboard and  then  I  can  bring  this  back and  I  can  actually  run  the  script   from  my  clipboard. Hey,  now  I  have  a  graph and  it's  all  built  up  the  exact  same  way   I  had  before. So  this  is  a  really  nice  way to keep  the  efficiency  you  had from  a  previous  table  with  a  new  table. Now,  you'll  see  here, I'm  going  to  close  this and  it  pops  up  a  window   because  it's  saying, "What  do  you  want  to  do   with  your  other  windows that's open?" And  then  if  I  were  to  click   what  to  do  with  that, it  [inaudible 00:20:24]  say, "Hey,  you  didn't  save  this,   what  do  you  do  with  that?" And  it's  like  a  lot  of  times   I  have  subset  windows just  because  I  want to  be  exploring  things. And  so  all  the  clicking to  close  things is  driving  me  crazy. So  I  actually  made  myself  a  little close  everything  around  that  table. And  if  you're  in  a  window, it'll  go  close  the  base  table and  it  doesn't  ask  you  anything. So  I  can  do  real  quick  little  explorations on  little  data  sets  and  then  close  it  down and  just  stay  in  the  workflow  and  go  fast. If  I  did  want  to  save  something, I  made  this  little  macro where  it's  going  to  save  out   in  a  generic  name to  a  standard  file  location. And  so  I  don't  have  to  think  about  like, where  am  I  going  to  save  it   and  dive  into  a  bunch  of  save  menu. So  if  I  want  to  move   it  a  later  time,  I  can, but  I  know  at  least  where  all my  main  things  I  want  to  keep  are. And  then  if  I  do  change  something,  say I change something... Actually,  let's  even  say   I  change  something  from  the  graph. So  I'm  going  to  blow  out  all  these  guys. And  if  I  wanted  to  now  save  this, I  can't  just  click  save because  that's  going  to  try   and  save  this  window. So  I  found  it  really  useful   to  just  have  the  save  data  table  button that  shows  up  so  I  can,  once  again, stay  in  the  workflow  of  the  analysis  window and  save  my  base  data  table. And  once  I'm  done, I  can  close  and  get  out  of  there. All  right. That's  everything I  wanted  to  cover  for  there. So  let's  move  on  to  a  real  quick  example for  functional  data. This  will  be  super  quick. For  functional  data, the  one  thing  I  use  a  bunch  is, if  I  have  functional  data   that  has  a  timestamp, you  can  see  that's  not  super  useful   if  I'm  trying  to  look  at  all   my  lots because  there's  a  big  gap   between  the  times. I  could  possibly  step  through and   see  what  the  shape  is  looking  for. That's  not  super  fun. And  so  what  I  have  is  I  have  this. I make  a  counter  column   which  just uses  the  cumulus  sum  function. I  can  say,  "What  do  I  want  to  do  it  by?" And  I  can  add  up  to  four  items that  I   subgroup  the  cumulative  sum. I'm  just  going  to  do  pieces because  that's  really   the  only  thing  that  matters, and  what  I  get  out  of  that   is  I  get  a  counter  column that  now  everything  shows  up  nice   on  one  graph. And  this  is  really  good, but  it  only  works  well   if  the  timestamps  are  pretty  comparable. If the  timestamps  are  all  over  the  place because  it's  assuming   the  timestamps  are  the  same, then  you  have  to  get a  little  bit  more  creative. Okay. So  back  to  the  presentation. So  we  got  through  all  these  things, but  what  I  really  want  to  show as we tail out  of  here  is, for  the  ultimate   in  freedom  and  efficiency, you  need  to  use  scripts to  expand  JMP's  functionality to  fit  your  exact  needs. So  there's  a  lot  of  times, and  hopefully  you're  putting them into  the  wished  lists  on  the  community, but  there's  a  lot  of  simple  ones   you  can  actually  take  care  of  yourself. So  you  can  see a  nuclear  Godzilla  up  there and  we  all  know  that  a  nuclear  bomb  plus Godzilla  makes  him  king  on  the  monsters. And  so  it's  a  little  known  fact  probably that  JMP  plus  scripting  of  functions makes  you  the  king  of  data  analysis. And  I've  gotten  a  lot  of  value from  the  scripting  index, the  two  JMP  books  that  are  listed  here and  the  user  community, especially  these  two  guys   who  I  owe  massive  amounts  of  beer as  gratitude  for  the  time  they  saved  my  bacon and  probably  thousands of  other  people  as  well. So  let's  get   into  what  we're  going  to  do  here. So  the  first  thing  is, we'll  go  back  to  this  table. If  I'm  just  doing  more   of  an  exploratory  analysis or  trying  to  get  an  explanation  model versus  predictive  model, I'll  use  partition without  a  validation  column. And  this  is  nice  because  people that  don't  have  JMP  pro, they  can  use  this  as  well. And  what  I  do  is... Yeah,  we'll  just put  all  this  stuff  in,  that'll  be  fine. And  we're  going  to  go  click O kay. And  now  I  can actually... I like  to  split  by  LogW orth,   so  I  can  actually  split  by  LogW orth and  it's  showing  the  minimum  LogW orth  out  of  this  tree. And  so  I'll  just  split until  I  get  below  two. Okay,  there's  two. Go  back,  and  here's  my  model. Our  square  is 44.9. Now,  whenever  counts  get  low, I  do  think  that  I  might  be overfitting  a  little  bit, which  is  why  I  like   this  minimum  split  size, so  I  can  prune  back. Let's  just  say  minimum  split  size  is  way  too  low. So  I'm  going  to  go  15 and  then  okay. So  definitely  left  splits. Our  square  is  still  not  too  bad,  and  we  can  see  our  main  factors that  are  contributing  to  our  defects. These  top  three, I  really  like  using   the  assess  variable  importance since  it  reorders  what  you're  looking  at   into  the  main or  the  first  boxes  in  the  order. And  I  love  the  optimize  and  desirability. Once  again,  you  have  to  keep  clicking into  the  red  box  to  run  this. So  I  came  up  with  a  little  macro to  control  the  profiler . So  I  can  actually  come  in  here  and  say, "All  right,  I  want  to  first  maximize because  it  defaults  to  max and  I  can  now  remember  the  settings and  we'll  say  max, and  then  I  can  alter  the  desirability to  make  it  to  the  min and  I  can  maximize  and  remember  settings and  we  could  say  min. I  could  copy  the  paste  settings,  set  to  a  row. I  could  link  profilers   and  it's modal. Or  non-modal,  I  apologize. So  it  can  just  stay  up and  out  of  the  way  when  I  don't  need  it, but  yeah,  it  makes  using  the  profiler, which  is  already  just  super  powerful, super  efficient  as  well. That's  what  I  really  like, and  I  suggest  you  grab from  when  I  put  them  onto  my  page   for  this  presentation. Then  the  next  thing  is,  I  got  to  go  back  here. I'm  going  to  do  some  neural  net  stuff. So  I  definitely  want  to  make a  validation  column. So  I  have  these   built in  ones of  the  splits  that  I  like, so  it  automatically  creates  it  for  me. So  now  I  have  my  validation and  I  have  a   normal  random  uniform  one in  case  I  wanted  to  do any  prediction  screeners. And  that  helps  with   looking   at  cut  out  points, but  in  this  case,   we're  just  looking  at  neural  nets. And  where  I  got  from  here   is  I  really  like  the  Gen  Reg, how  it  has  this  model  comparison, and  I  really  like  in  Bootstrap  Force how  you  have  a  tuning  table. When  you're  using  a  neural  net, it  can  be  very  painful to  feel  like  you're  getting   the  right  model because  every  step   you  have  to  change  it,  rerun  it, and  then  look  to  see  what's  going  on. And  sometimes  it  just  feels  like you're  spinning  wheels. So  through  time,   I  found  some  models  that  I  really  like, and  so  I  just  built  this  platform   where  I'm  going  to  recall. Here's  everything, and  I  put  down  the  number  of  boosts, number  of  tours is  really  low   just  so this  run  faster. And  I  can  go  ahead  and  run  this. And  so  what  it's  going  to  do  now, is for  the  models   that  I've  put  into  my  tuning  table, ideally  down  the  road, I  like  to  have  a  tuning  table  be   a  little  bit  more  in  that  first  menu, but  not  there  yet. So  what  I  will  get   is  I'll  get  this  nice  preto showing  my  test,  my  training  validation and  the  different  models. And  so  I  can  go   through  [inaudible 00:28:45]  cool. Which  one  got  me  the  closest without  having  to  run   these  each  individually? So  I  do  see  that  it  looks  like this  TanH(10)Linear(10)Boosted(5), overall,  the  average  of  all  the  R  squares puts  it  at  the  highest, and  it  looks  like  everything's   pretty  close. So  let's  just  start  with  this  one. And  the  next  thing  I  like  to  do is  actually  look  at  the  min  and  maxes, and  see  did  it  actually  predict in  the  range  that  I  was  expecting? So  let's  see,   what  did  we  say? I said 10, 10  and  5  boost. So 10, 10 and boost five. There  we  go. So  I'll  look  at  the  min  and  max. So  it predicted  5- 112. It's good,  it  didn't  predict  negative. That's  definitely  something  I  look   for  a  model  with  defects  or  hours, because  you're  not  supposed   to  have  zero  on  any  of  those or  negatives  on  any  of  those. And  the  defects  we  had   was  1- 51. So  yeah, it did okay. It's  predicting  on  the  high  side, so  I  might  go  in  here  and  be  like, is  there  anything  else  that  was  actually  predicting  on  the  lower  side  or  closer and  still  had  good  test  values? So  this  is  a  really  powerful  tool because  then  I  can  just  go into  my  actual  window  here and  I  can  go  down  here   and  this  is  my  model. And  I  could  save  my  model  out. I  could  save  this  first  formula  out, I  can  save  this  neural, just  a  certain  one  to  my  data  table and  then  just  use  that  from  here  on  now. And  it's  already  got  built in  my  minimaxes  here. Let's save  from  there. I  find  this  to  be  a  very  powerful improvement  for  the  neural  net  platform, which  I  already  think  is  pretty  powerful. And  then  also  if  you're  just   in  standard JMP, the  last  thing  I'll  show  is, I started  trying  to  give  some  additional functionality  for  standard  JMP  people. And  so  here,  you  can... It contains   how  many  initial  nodes  you  have, what's  the  number   that  you  want  to  step  the  nodes  up, how  many  loops  you  want   to  go  through with  your  validation  percent, and  if  you  wanted  to  do  assess  importance,  you're  going  to  click  Okay. And  what  it  does  is, it  runs  all  your  models and  it  does  the  same  thing   except  for... I  had  a  chance  here  to  work  on  getting the  min  max  improved. So  here  I  can  see. Here's  my  min  max, is  what  I  was  actually  predicting, and  then  here  I  can  see my  training  and  validation. So  ideally  you  want  them  to  be as  close  together  and  as  high  as  possible and  then  predict  well. So  here  I'm  looking  at  TanH(8), which  puts  me  here. So  that's  pretty  good. So  that's  probably the  one  I  would  go  with. They're  the  closest, it  doesn't  overpredict. This  one  actually  is  predict... Even though it has  a  higher  training, this  one  has  a  higher  training, they're  actually   predicting  negative  values and  then  this  one  seems like  it's  getting  over complex. So  that's  what  I  would  go  with. It's  pretty  useful  for  more  standard  users to  get  some  more out  of  the  neural  net  platform  for  them. Finally,  let's  just  go  quickly   to some  dim  data  stuff. We  have  the  dim  data  example   of  get  specs. So dim data example. So  if  our  process  that  we  do   at  our  plant is  we'll  get  a  bunch  of  data   and  then  we  will  calculate a  spec  limit  from  that. Usually  it's  either  three   or  four  sigma spec limit, so  PPK  of  1  or  1.33, and  then  we'll  present   that  to  the  customer. That  can  take  a  long  time  in  old  days where  we  would  manually  run  analysis and  then  best  fit and  then  write  it  down or  just  use  the  normal  distribution   for  everything and  then  calculate  it  in  Excel. You  have  this  option  in  JMP  to  do  process  capability and  you  can  change  it   to  calculate  it  off  of  a  multiplier. And  that's  great   because  then  you  get  your  specs. The  problem  is  you  have  a  lot. Even  if  you  hit  the  broadcast  button, you  have  to  enter  that  for  each  one. So  what  I  did was  definitely  with  help   from  a  bunch  of  other  people, because  this  got  above  my  pay  grade  and  scripting  very  quickly, is I  went  in  and  made  this  macro where I could say, what  do  you  want  the  signal number to  be? Click  Okay  and  it  goes  through and  it  will  spit  out  this  for  everyone or  every  distribution. Now  I  can  right  click,   go  into  make  combined  data  table. I  have  my  data  table. Then I  can  go  here, select  all  for  lower- up  spec  limit, use  my  Subset  button,  and  this  here, now  I  can  submit  that  to  the  customer. Here's  my  upper- lower  spec  limits  for  all  these  things. I  did  that  in  hopefully  less  than  a  minute and  it  used  to  take  someone   to  do  that  half  their  day,  if  not  more. So  using  scripting  to  improve   what  you  want  to  do, and  the  functionality   and  flexibility  is  great. Dim data  unstacked  table, where  is  that? Dim data unstacked table. Coming  in  at  the  home, here  we  have  a  bunch   of  dimensional  data  done  by  parts. The  thing  is,  some  of  it  is  subgrouped and  some  of  it   is  [inaudible 00:34:57]  data. By  using  my  subgrouping  macro, I  can  select  all  my  Ys, say  what  I  want  to  check, and  it  will  then  put  it  as  a  subgroup or  as  an  individual. And  that  allows  me  to  go  in   and  use  my  Control  Chart  Builder. So  I  can  say  these  are  individuals, these  are  subgroups, and  I'm  going  to  subgroup  by  this. Click  Okay, and it  takes  a  little  bit  to  run. So  I  have  one  here, and  it  will  actually  put  all  the  mixed control  chart  types  all  in  one  window, which  is  really  nice  because  then I  can  now  actually  make  a  combined  table of  everything  of  the  control  limits   in  one  table,  which  you  can't  do. You'd  have  to  do  a  lot  more  steps of  concatenating   individual  tables  together. So  that's  great. You  can  also  do  the  same  thing with  Process  Screener, where  I  can  put  in  individual  and  IMR  here and  then  XB ar  stuff  here, and  I  can  output  a  table  here that  shows  for  mixed   subgrouping  types  IMR  and  XBar, and  I  can  see  the  PPK  of  them and  their  out  of  spec  rates   and  their  alarm  rates  all  in  one. So  it's  nice  to  be  able   to  keep  everything  together and  have  multiple  windows  open  depending  on  their  subgrouping  type. And  finally,  the  gauge  R&R. Gauge R&R,  especially  something  like  a  CMM, where  you  can  have  a  lot  of  codes   to  do  [inaudible 00:36:44]   on, so  it  can  be  a  lot  of  work. So  I  made  a  macro. The  first  thing  you  got  to  do   to  make  this  work  really  well is  you  got  to  add  in  specs. So  I  have  this  little  script  I  made   where  I  can  select  columns and  then  I  can  append  columns  if  I  need  to. If  I  forgot  one,  I'm  going  to  load   from  a  spec  table, click  Okay, and  then  I  will  save  this to  the  column  properties. And  I  can  actually  use  this  as  non-modal, so  I  can  just  keep  it  off  the  side in case you  want  to  change  something, and  then  I  can  go  in   and  run  my  selected  column  gauge  R&R. We're  not  going  to  go  too  crazy, but  I'll  just  select  these  guys. It says,  "Hey,  you're  going   to  run  a  gauge  R&R in these. A re  you  okay  with  that?" Click  Okay. We'll  say  part  and  operator  and  go. It  won't  take  too  long. And  why  this  is  nice? Is  because  you  can  see   that  if  I  go  to  connect  the  means, that  connects  really  nicely  like  you'd  expect. If  I  were  to  pull  up   a  traditional  gauge R&R , then  it  gaps   because  I  don't  have   for  each  hit  number, because  the  hit  number for  different  codes  are  different. I'm  missing  data. So  these  don't  apply   to  this  actual  item and  it  makes  the  charts  get  all  messed  up. But  by  using  my  macro, I  can  have  a  local  data  filter   for  each  item. And  when  I  select  that  local  data  filter, then  all  the  things  I'm  not  using  go  away. Now  the  charts  look  great. That  adds  a  lot  of  how   those  charts  look  improvements. All  the  data  down  below  is  the  same. Okay,   that  got  us  through  everything. So  I'm  going  to  move   on  to  some  final  thoughts. Okay,   final  thought. So  I  definitely  encourage  you  to  use  the  toolbar. Consistent  layout, icon  use  and  naming  conventions are  key  for  your  effectiveness. Get  into  scripting. Here's  some  things  I  suggest   that  you  focus  on, and  definitely  use  the  log  now that  it  will  record  your  steps  for  you. It  saves  you  a  lot  of  typing. And  really  think  beyond   what  JMP  currently  does and  try  and  see  if  you  can  actually  add   that  functionality  yourself. For  developers,   I  like  to  keep  moving to  keeping  commands  as  flat  as  possible   to  get  things  out  of  submenus. And  for  me,  I'm  working  on  getting  better at  making  icons, learning  how  to  reference and  pull  data  from  the  analysis  window, which  is  called  the  report  layer, and  always  including  a  recall  button. So  there  are  some  statistical  jokes   for  you,  some  of  my  favorite, and  that's  what I got. So  thank  you  very  much  for  your  time and  do  we  have  any  questions?