Choose Language Hide Translation Bar

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Control charts, like other statistical tools, can yield misleading results when they are improperly applied. We call attention to a troubling practice observed in the field: the inappropriate use of the Levey-Jennings chart (which uses the sample standard deviation statistic s to estimate the process sigma) beyond its intended purpose. This misuse guarantees that important process signals will be obscured. In this presentation, we review the history of the Levey-Jennings chart, describe how and why it is being misused, and provide suggestions for alternatives.     Hi,  everyone, my  name  is  Di  Michelson. I'm  an  Instructor in  the  JMP  Education  group. My  co-authors  for  this  presentation  are Jordan  Hiller  and  Byron  Wingerd, who  are  both  JMP  Systems  Engineers. We're  here  today  to  warn  you of  the  dangers  in  using JMP's  Levey- Jennings  control  charts. We  wanted  to  present  at  Discovery because  we  believe   Levey-Jennings  charts are  being  misused, resulting  in  missed  signals. Today,  I'll  start  with a  quick  control  chart  review followed  by  a  quick  history, and  that  will  lead  into  how  we've  seen Levey-Jennings  charts  being  misused. We'll  conclude  with  some  recommendations for  use  of  these  charts and  some  final  thoughts. Now,  every  quality  system has  a  bit  of  process  control  in  it. These  are  the  DMAIC  steps  from  Six  Sigma, but  you  might  be  using  QbD or  another  quality  system. Control  charts  are  used  at  the  end of  an  improvement  process when  you've  put  the  process  on  target and  minimize  the  variability. The  process  is  now  stable, which  means  the  probability  distribution is  not  changing  over  time. Now  it's  time  to  monitor that  process  variability to  verify  that the  distribution  stays  the  same and  is  not  changing  in  the  future. What  is  a  control  chart? A  control  chart is  simply  a  run  chart  of  the  data with  the  addition  of  control  limits. The  center  line  is  based on  the  historical  mean  of  the  process, and  the  width  of  the  control  limits is  based  on  the  historical variation  of  the  process. If  the  process  distribution  does  change, observations  will  eventually  fall outside  of  the  control  limits, signaling  to  the  process  owner to  take  action  to  assign  a  cause to  the  out-of-control  point  and  restore the  process  to  a  state  of  control. Many  types  of  control  charts  have  been developed  for  different  situations. XM-R  charts  or  individual and  moving  range  charts for  data  collected  one  point  at  a  time. Xbar-R  or  Xbar-S  charts for  summarized  data collected  multiple  points  at  a  time. CUSUM  or  EWMA  control  charts to  detect  small  shifts, and  even  m odel-driven multivariate control charts to  control  multiple  correlated variables  on  one  chart. To  find  those all  important  control  limits, you  have  to  answer  this  question, how  do  you  estimate  the  historical process  distribution? First,  you  have  to  determine if  the  process  is  stable  enough for  control  charting. Or  does  it  need active  process  improvement? This  is  called  Phase  I. It's  an  active  phase  of  data  collection and  sampling  plan  adjustment, limit  calculation, maybe  even  some  statistical  modeling. This  phase  ends when  you  have  collected  enough  data to  determine  that  the  process  is  stable. Then  the  SPC  system  shifts  into  Phase  II. The  sampling  plan and  the  control  limits  are  fixed and  new  data are  judged  by  the  fixed  limits. There's  many  considerations for  Phase  I  data  collection. The  sampling  plan, including  rational  sampling and  rational  subgrouping,  the  sample  size, how  much  data  do  you  need to  collect  to  fix  those  limits, the  type  of  data  that  you're  collecting, the  type  of  control  chart that  you  want  to  make. Rather  than  talking about  all  these  in  detail  here, I  will  refer  you  to  our excellent,  free,  online, self-paced,  e-learning  course called  Statistical  Process  Control. It's  available  in  the  JMP  Community. You'll  often  hear  that  control  chart limits  are  three- sigma  limits. This  means  that  we  need  to  estimate the  standard  deviation  of  the  process from  our  data,  multiply  by  three, and  then  add  and  subtract  from  the estimate  of  the  mean  of  the  process. When   Walter Shewhart  developed the  first  control  charts  in  the  1920s, he  realized  that, like  other  statistical  procedures, he  needed  to  test for  signals  using  the  noise. He  needed  to  compare the  between  subgroup  variation using  the  within  subgroup  variation. This  led  him  directly  to  X bar- R  charts, where  the  average of  the  within  subgroup  range is  used  as  an  estimate  of  sigma. Specifically,  it  led  him  away  from the  naive  thinking that  we  should  use  X bar  plus  or  minus three  times  the  sample  standard  deviation. Don't  estimate  sigma using  s for  control  charts. Let  me  say  that  again. Don't  estimate  sigma  using the  sample  standard  deviation. Oh,  and  here  it  is  in  red. Don't  use  the  sample standard  deviation  to  estimate  sigma. Why  not? The  purpose  of  control  charts is  to  detect  a  signal  from  your  process, detect  a  change in  the  process  distribution. Using  the  sample  standard  deviation aggregates  all  the  data, including  a  potential  signal, and  this  will  inflate  the  estimate of  sigma  and  you  will  miss  signals. That  takes  us  back  to  the  beginning. If  the  purpose  is  to  detect  signals, why  would  you  make  a  chart that  makes  you  miss  signals? Just  don't  do  it. But  the  default  method for  estimating  sigma in  JMP's   Levey-Jennings  control  chart is  to  use  the  sample  standard  deviation. The   Levey-Jennings  chart  was developed  for  a  very  specific  situation in  clinical  chemistry, one  where  an  analytical  method is  validated  using  a  gage  study with  factors  that  are  relevant to  that  process. The  data  are  collected  according to  a  specified  experimental  design. Here  are  some  possible  random and  fixed  factors  in  the  experiment. The  variance  components for  the  random  factors are  calculated  using  a  statistical  model. Then  the  total  variability  is calculated  from  the  variance  components and  used  as  the  estimate  of  sigma for  the   Levey-Jennings  control  chart. This  chart  has  a  specific purpose  in  clinical  chemistry. It  can  be  used  whenever  you  have an  external  estimate  of  sigma. How  did  we  get  here  where  these  charts are  being  used  for  situations in  industries  other  than the  one  I  just  described? The  history  of  SPC  started about  100  years  ago when   Walter Shewhart developed  the  first  control  chart. He  figured  out  to  use  the  within or  the  short-term  variation using  subgroups  with  multiple  items. But  he  couldn't  figure  out  how  to  estimate short-term  variation for  subgroups  of  size  1. There  is  no  within. In  1950,  Levey  and  Jennings published  their  paper, and  they  did  not  use   s to  estimate  sigma. Their  paper  introduced  control  charts to  clinical  chemistry and  used Shewhart's  Xbar- R chart with  two  replicates,  so  subgroup  size  2. Now,  just  after  that,  in  the  early  1950s, Tippett  discovered  that  you  can  use the  average  moving  range of  individual  values as  a  short-term  estimate  of  sigma. It's  not  perfectly  a  within  estimator, but  it's  short-term  enough that  you're  not  aggregating over  all  the  data. Very  quickly, ASTM  and  Western  Electric  published  books and  standards  using  Tippett's  X- MR  chart. But  clinical  chemistry didn't  hear  about  these  charts. In  1952,  Henry  and  Segalove published  a  paper  using  s to  estimate  sigma using  subgroups  of  size  1. Don  Wheeler  indicates  that XM-R  charts  were  not  used very  often  until  about  1980. He  introduced  them  to  Deming, and  Wheeler  and  Deming  taught  thousands of  clients  and  popularized  the  XM-R  chart. It's  really  interesting  to  me  that they  weren't  popular  back  then because  in  my  experience, which  started  in  the  early  '90s, charts  for  subgroups  of  size  1  are much  more  popular  than  Xbar  charts  are. All  right,  just  as  the  XM-R  charts were  gaining  popularity  in  manufacturing, James  Westgard published  his  Westgard  rules, and  wrote  his  popular  book, Basic  QC  Practices  for  clinical  chemistry. He  calls  the  control  charts, Levey-Jennings. They  were  added  to  JMP in  version  5  in  2003  by  customer  request. Now,  ever  since  Jordan  and  Byron  and  I have  been  working  at  JMP, we've  seen  examples  from  all  industries of  the  misuse  of  the  default Levey-Jennings  chart. But  now  you  know  better, and  you  will  never  use the  sample  standard  deviation to  estimate  sigma for  control  charts  ever  again,  right? All right, I'm  going  to  turn  it  over  to  Jordan, and  he'll  show  you  in  JMP the  problems  with  misusing  these  charts. Thanks,  Di. We  created  a  simulation  in  JMP  to  show  how the   Levey-Jennings  chart,  when  misused, will  definitely  lead  to  missed  signals. You  can  download  this. It's  a  JMP  add-in that's  saved  in  the  JMP C ommunity with  the  other  presentation  materials. Here  we  show  two  control  charts on  the  same  data  set. Here's  the  individual  chart  on  top, and  the   Levey-Jennings chart  on  the  bottom. Just  a  brief  note,  the  individual's  chart is  usually  displayed with  the  moving  range  chart  underneath  it. We're  omitting  that for  visual  simplicity  here. I'll  also  mention that  the   Levey-Jennings  chart is  drawn  in  the  way  we're  warning  against. That  is,  these  are  not  historical  limits, these  are  limits  calculated from  the  data  in  the  chart  itself. The  30  data  points that  we  are  graphing  here are  drawn  randomly from  a  standard  normal  distribution. That  is,  the  mean  is  zero, standard  deviation  is  one. Whenever  I  click this  new  data  button  over  here, we'll  get  a  new  random  sample. Because  we  know the  true  population  parameters, we  can  tell  how accurately  these  charts  perform. The  true  population  sigma  is  one, and  we're  going  to  hope that  the  sigma  estimates from  the  two  charts  are  close  to  that. The  data  is  stable, it's  from  a  normal  distribution. Whenever  we  see  a  point outside  the  control  limits, it's  a  false  positive  here. In  a  simulation  of  5,000  data  sets, the  individual's  chart has  a  false  positive  about   7%  of  the  time, while  the   Levey-Jennings has  one  about  3.5 %  of  the  times. This  small  difference, well,  it's  double,  but  this  difference in  the  false  positive  rates, we  think  is  one  of  the  reasons that  folks  have  tended  to  prefer the   Levey-Jennings to  the  individual's  chart  sometimes. But  here's  the  problem. What  happens  when  we  have some  signals  in  this  data? We're  going  to  talk  with  a  couple of  different  kinds  of  signals. Let's  start  with  a  shift. A  shift  is  an  abrupt  change in  the  mean  of  the  process. Let's  start  by  looking  at a  shift  of  three  standard  deviations, and  I'm  going  to  hit the  new  data  a  couple  of  times. Remember,  the  purpose  of  a  control  chart is  to  distinguish  the  signal from  the  noise  and  to  detect  signals. As  the  signal  gets  larger, we  should  be  more  likely  to  detect  it, and  the  sigma  estimates  should  remain close  to  the  true  value  of  one. The  individual's  chart  does exactly  what  we  expect. It  usually  detects  the  shift, and  the  sigma  estimate  for the  individual  chart  stays  close  to  one. The   Levey-Jennings,  not  so  much. It  conflates  the  signal  with  the  noise. The  sigma  estimate  is  inflated, and  signals  generally  can't  be  detected. How  big  does  this  shift  have  to  be before  the   Levey-Jennings  chart is  going  to  detect  it? Well,  I'm  just  kidding. It's  a  trick  question. The   Levey-Jennings  chart  is  never going  to  detect  a  shift  in  this  scenario. I  can  jack  it  up  to 50  standard  deviations, and  the  larger  the  shift  that  we  induce, the  more  that  sigma  estimate  is  inflated and  will  never  detect  a  shift. Let's  see,  we  did  some  simulation  work to  show  the  differences between  the  performance of  the  individual's  chart  that's  in  red and  the   Levey-Jennings  chart  in  blue. We're  looking  at  a  shift  that  occurs after  the  20th  data  point in  the  series  of  30  data  points. That's  what  we're  looking  at in  the  demo  I  just  showed. In  that  situation, a  shift  after  the  20th  data  point, you  can  see  the  performance of  both  of  these  charts as  the  shift  gets  larger. Individual's  chart  does  exactly what  we  hope  it  would  do. As  the  shift  becomes  larger, it's  easier  to  detect probability  of  having  an  alarm  is  greater. The   Levey-Jennings performs  terribly  as  you  saw, and  paradoxically, as  the  shift  gets  larger, the  probability of  detecting  it  approaches  zero. Let's  take  a  look at  that  here  in  the  demo. We  can  change  the  shift  location,  right? The  shift  was  after  the  20th  data  point. But  you'll  see  that  as  the  shift approaches  the  end  of  the  series, and  we'll  put  it  at  29, the  last  data  point  here. When  the  shift  occurs at  the  last  data  point, both  of  the  charts  perform  similarly. In  other  words,  if  you  are  running your  Levey-Jennings  chart after  every  additional  data  point, you'll  probably  detect  most  signals. And  this  is  to  be  expected. If  the  shift  occurs  later, there's  less  opportunity  for  that  signal to  contaminate  the  noise  estimate, and  the  sigma  estimate is  going  to  be  accurate. Here's  what  that  looks  like in  the  simulations. As  that  shift  gets later  and  later  in  the  series, the  performance of  the   Levey-Jennings  chart  in  blue, approaches  the  performance of  the  individual's  chart. That's  the  story  with  shifts. It's  even  more  alarming  with  drifts. I'm  going  to  reduce the  shift  size  to  zero. As  we  introduce  drift  into  the  data, drift  is  a  gradual  linear  trend as  the  mean  of  the  process moves  up  in  a  consistent  way. That's  drift. No  matter  how  large  the  drift that  I  induce  here, it  will  never  be  detected by  the   Levey-Jennings  chart. Just  can't  be  detected. Okay. N ext  let's  consider  a  batched  process. Here,  we're  going  to  simulate batches  with  size  six, five  batches  here  in  the  data. Some  folks  use  the   Levey-Jennings  chart as  a  way  to  avoid  having  an  alarm that's  due  to  expected  batch  variation. Well,  it's  easy  to  see the  problem  with  this  approach. When  we  overlay  a  shift  or  a  drift on  top  of  that  batch  effect, the   Levey-Jennings  chart is  still  never  going  to  detect  it. I'll  add  a  shift  too. Yeah,  the   Levey-Jennings  chart is  insensitive,  too  insensitive, and  the  individual's  chart is  too  sensitive. Neither  of  the  charts is  really  great  for  this  situation. Di  is  going  to  talk  about  this in  a  few  moments. Before  I  turn  it  back  to  Di, let  me  show  you  what a   Levey-Jennings  chart done  right  looks  like. This  is  a  demo  that  was  generated by  my  colleague,  Byron, using  data  that  come from  the  Westga rd  website, and  let's  just  talk  about maybe  this  top  chart  here. We're  showing  a  chart based  on  a  data  series. We  have  28  data  points. The  mean  is  198.75  in  this  sample. The  standard  deviation  is  5.9 in  this  sample. The  point  is,  we  are  in  Phase  II, and  as  such, we  have  set  the  control  limits using  a  historical  estimate  of  sigma. This  protects  us  against  all  of  the problems  that  we  were  just  discussing. It's  a  stable  estimate  of  sigma, and  it  will  not  be  contaminated by  any  potential  signals  in  the  data  set. Okay,  that  is  all  I  have. Di,  back  to  you. All  right. Thank  you,  Jordan,  for  that  alarming  demo. When  Jordan  and  I  and  Byron thought  that  it  was  time that  someone  gave  this  talk, we  asked  our  JMP  friends if  they  were  seeing  what  we  were  seeing from  talking  with  customers, and  that  is, misapplication  of   Levey-Jennings  charts. What  we  found  was  that  these  charts were  commonly  used for  specific  situations. The  first  one  is  what  Jordan  was talking  about,  that  batch  processing. It's  called  short- run  SPC, where  you  might  have  batches that  change  frequently, and  each  batch  has  a  new  mean, maybe  even  a  new  standard  deviation. Using  the  sample  standard  deviation to  estimate  sigma will  contain  batch  shifts as  well  as  within- batch  noise. Like  you  saw,  you  won't  be  able  to  detect shifts  within  a  batch  very  easily. Instead,  you  can  use  a  chart  that  plots the  difference or  the  standardized  difference from  the  batch  target  or  the  batch  mean. The  second  situation is  autocorrelated  processes, and  this  is  where  observations taken  close  together are  more  similar  than observations  taken  further  apart  in  time. This  behavior  is  often  seen  when measurements  are  taken  very  frequently in  a  continuous  process. Using  the  sample  standard  deviation, again,  includes  both  noise and  known  process  drift due  to  the  auto correlation in  the  estimate  of  sigma. There  are  a  few  ways to  deal  with  autocorrelated  data, including  reducing the  sampling  frequency  if  you  can, or  using  a  different type  of  chart  if  you  can't. Charts  like  CUSUM  and  EWMA can  be  adapted  for  autocorrelation, or  you  could  try to  model  the  autocorrelation and  then  use  control  charts on  the  residuals  from  the  model. The  residuals  also  contain  information about  process  shifts, and  they  should  be  uncorrelated. F inally,  we've  heard  a  lot  of  people  say that  using  the  sample  standard  deviation is  useful  because  it  gives  wide  limits that  are  able  to  detect  huge  shifts, but  not  too  wide  to  detect small  and  moderate  shifts. In  that  case,  we  recommend that  you  just  make  up  some  limits and  don't  advertise that  you're  using  control  charts because  control  charts  don't  ever  use the  sample  standard deviation  to  estimate  sigma. Remember  that  West gard's  wonderful  book is  called   Basic  QC  Practices , not  Basic  SPC  Practices. Feynman's  wonderful  quote is  applicable  here. We've  also  heard  some  arguments about   Levey-Jennings  charts. They  are  more  forgiving than  Shewhart's  charts. Of  course,  they  are. I  like  this  one. The  range  charts were  optimized  for  hand  calculation, and  we've  got  computers. Why  aren't  we  calculating standard  deviation? As  we've  seen, it's  not  range  versus  standard  deviation, it's  which  standard  deviation? You  should  always  choose within  variability when  you  calculate   three-sigma  limits. Aggregate  over  noise, not  signal  or  potential  signal. Why  use  an  estimate  of  sigma when  we  can  just  calculate  sigma? Oh,  this  one  hurts  me. This  comes  from  a  terminology  issue. We  may  say  three- sigma  limits, but  we  don't  know  sigma and  we  have  to  estimate  sigma  from  data. This  is  the  way. I  inherited  this  system, and  my  boss  says  I  have  to  do  it  this  way. Well,  can  you  find  a  new  boss? Really,  this  is  my  Oprah  moment. Maya  Angelou  said, "In  the  past, you  did  the  best  with  what  you  had, and  now  that  you  know  better, you  will  do  better." I  hope  that  you  can  educate others  to  do  better  as  well. Thank  you  for  listening. I've  got  some  references  for  you here  at  the  end  of  the  presentation. We'd  like  to  leave  you  with  two  thoughts. First,  don't  use  the   Levey-Jennings  charts as  they  have  been  defined in  the  modern  world. The  purpose  of  a  control  chart is  to  detect  process  changes, and  those  changes  are  found by  comparing  signal  to  noise. Use  of  the  sample  standard  deviation to  estimate  sigma  inflates  that  noise and  it  will  obscure  any  signals. The  second  thought  is  that  when  you  are in  Phase  I,  use  XM-R  charts, individual  moving  range  charts instead  of   Levey-Jennings. When  you  move  to  Phase  II for  ongoing  process  control, fix  those  limits. A  sign  of  a  stable  process is  that  the  sigma  estimate from  using  the  average  moving  range is  similar  to  the  sigma  estimate from   Levey-Jennings.
A leading semiconductor manufacturer has developed a novel multidisciplinary program for applied engineering statistics (AES), by incorporating applied problem-solving (6S DMAIC), JMP statistical thinking (STIPS), and core JMP curriculum course content, sponsored by JMP Education. The program's key features culminate in the use of JMP 17 features in powerful, systematic, and practical analytical contexts. The 6S Black Belt curriculum has been seamlessly innovated, based on both AES and JMP curriculum content across DMAIC phases.   The first initiative involved transitioning from Minitab 19 to JMP 16, followed by mapping the JCSA, JGPH, JANR, JMSA, JSPC modules to the DMAIC framework. These JMP core courses help facilitate DMAIC Black Belt Project execution more effectively through AES thinking. Several JMP case studies are demonstrated, including item analysis (Attribute GRR), text mining/data mining hybrid correlation, model-driven multivariate SPC, and group orthogonal supersaturated design.   To train internal instructors to teach Black Belt JMP modules, several advanced JMP curricula (JMP 17) have been created. In addition to internal Black Belt certification, the JMP STIPS Certification Exam was offered, as was the JMP DOE Certification Exam. JMP STIPS courses were made available to any internal employee. The program innovates the traditional 8D project framework by adding JMP 17 tools to make 8D project root cause analysis more objective and data-driven to resolve complex system-level quality challenges. The 2023 March Madness Forum Events, sponsored by 64 presenters, have drawn full attention across the organization, recognize the success of the JMP-oriented AES program.     All  right ,  well ,  thank  you everyone  for  joining  us . I 'm  incredibly  honored  to  be  here  for  JMP  US  Discovery  Summit . I  will  be  presenting  the  project  titled  Industry  Innovation  Blending  Modern  JMP and  Classical  Six  Sigma Applied  Engineering  Statistics . This  is  otherwise  referred to  as  a  Lean  Six  Sigma  JMP  Based Black Belt  Program , which  is  a  novel  program really  championed  by  Charles  Chen , my  co -author  and  co -presenter . Let 's  dive  right  into  it . First  of  all , what  is  the  high -level  roadmap for  this  program ? It  really  starts  with  transforming a  global  quality  culture and  ends  with  connecting  local  JMP  SMEs  and   Master Black Belts . The  way  that  we 've  done  it is  through  a  segmentation  of  programs , A  plus ,  A  minus ,  and  A . Really ,  we 're  focusing   on  the  A  plus  program being  the  elite  program , and  we 're  going  to  get into  that  much  more . But  the  outcome  is  really  for  a  focus on  the  entire  global  quality  initiative . This  program  covers  a  2 -3  year  span , and  it 's  really  based on  a  2 -3  year  outcome . But  the  ultimate  convergence  of  it  is  that this  organization  use  case  is  that we 've  decided  to  use  JMP for  all  the  programs , including  the  Six  Sigma  program , and  always  with  the  intent  to  deliver  it with  the  highest  quality  in  alignment with  the  global  quality  strategy . A  key  feature  of  this  program is  encouraging  healthy  competition through  hosting  JMP  forum  events , which  we 'll  showcase  a  little  bit . What  we 've  shown  here is  that  we 're  using  a  Six  Sigma  tool called  the  SWOT  Assessment, s trength , weaknesses ,  opportunity ,  and  threats, as  you  can  see  in  this  matrix . The  things  I  want to  emphasize  here  are  that using  JMP  as  both  the  external  curriculum and  the  internal  curriculum is  fundamental  to  this  program . What  I  mean  by  that  is , which  I 'll  show  later , is  that  we 've  really  combined JMP  education 's  curriculum , which  they 've  provided to  the  customer  base , for  a  nondisclosure  agreement with  an  internal  Six  Sigma  curriculum . An  applied  statistics  lean  with  JMP , plus  a  rigorous  Internal  Six  Sigma applied  engineering  program . It 's  important  that  our  leaders focus  on  opportunities . This  methodology  really  focuses on  building  details , but  if  we  go  from  details up  to  the  nucleus , we 're  talking  about  leaders who  focus  on  opportunity . We  want  to  start  small , think  big ,  and  act  fast . This  is  one  of  our  main  tenets . On  the  opportunity  side , really  looking  at  how  we  can  synergize the  JMP  program ,  the  JMP  16 /17  curriculum with  the  BB  curriculum , and  create  an  optimum  recipe for  leadership  development ,  learning, and  deployment  across  the  organization . The  other  thing  I  want to  mention  here  is  that  this  slide highlights  a  migration  from  Minitab . While  it  might  indicate a  shorter -term  productivity  loss , it 's  a  huge  return  in  medium to  longer -term  productivity  gain . As  everybody  probably who  attends  Discovery  knows , JMP 's  interactive  graphing  capabilities and  multiplatform  interaction and  flexible  and  powerful statistical  modeling  capabilities with  a  scripting  engine and  JSL  really  make  JMP  far  more  powerful than  Minitab  in  today 's  analytics  era . This  is  a  key  milestone  for  us that  we  achieved  early  on . How  does  this  A ,  A  plus ,  A  minus  tiered  program  design  work ? Basically ,  we 're  using  the  three  levels to  effectively  segment  our  trainees , our  stakeholders ,  our  customers , if  you  will . This  segmentation  strategy was  developed  by  Dr .  Chen to  maximize  the  potential for  really  getting  the  knowledge  out  there through  the  organization in  a  practical  and  applied  way . What  you  can  see  is  most  of  the  users within  this  case  study  organization , over  5 ,000  are  tiered to  the  A  minus  program . We  wanted  to  highlight there 's  a  20 %  growth  rate  in  2023 . So  we  continue  to  expect  growth . We 're  seeing  a  lot  of  engagement from  the  trainee  stakeholders . The  A  program  really  focuses on  quality  engineers , quality  and  reliability . That 's  over  300  users . Then  really  the  cornerstone of  this  program , the   Master Black Belts and  the   Black Belts ,  highly  trained , highly  educated ,  many  PhDs , probably  more  than  50  people . These  people  are  really  being  funneled through  the  A  plus  program , which  is  really  a  mentor and  an  instructor  program . These  people  are  really  critical for  the  success  of  this  program  because they 're  the  local  site  champions and  they 're  not  just  given  knowledge , but  they 're  given  the  tools  to  be  able to  apply  knowledge  with  JMP  as  the  tool . This  is ,  in  my  mind , probably  the  most  important  slide of  this  presentation because  it  really  focuses  on  the  strategy and  vision  of  this  program , which  just  really  funnels through  all t he  details . From  the  top  level ,  this  goes through  all  the  details  of  the  program . I  want  to  emphasize  these  four  called  key  visions . Cross -functional  leadership and  vision  team  building through  a  process  that  actually was  codified  in  ASQ and  other  Six  Sigma  literature, forming,  storming , norming ,  and  performing, and  being  data -driven ,  truly  data -driven , not  just  data -driven  in  lip  service or  through  maybe  basic  tools  like  Excel . Then  having— really,  this  is  probably the  most  important  one— having  a  long- term  vision . These  are  the  visions . The  audience  can  look at  these  bullet  points . But  what  I  wanted  to  mention  is  that  many  people try  to  embrace  these  visions , but  they  really  lack  their  embodiment . If  this  will  resonate ,  I  think , with  many  viewers ,  is  that  oftentimes meetings  might  be  cross -functional , but  the  communication  style isn 't  necessarily  cross -functional . Many  teams , in  my  experience  in  industry  as  well, jumped  to  this  norming  phase of  this  team  building where  they  skip  the  forming and  the  storming  phase . Really,  this  tendency  to  jump  to  norming is  driven  by  the  fact  that everybody  assumes  that  they  already  know what  they  need  to  do, and  many  people  come from  a  highly  educated  PhD  background . Just  because  we  have these  highly  educated  people doesn 't  mean  we  have  a  strong  team . Bypassing  this  forming  and  storming  phase where  people  really  take  the  time to  work  through  what  their  roles and  responsibilities  are and  understand  the  expectations, this  shortcut  ends  up  actually giving  us  really  a  non -long- term  payoff . In  fact ,  it  really  hurts  teams . The  other  thing  is  I  mentioned this  briefly ,  but  I 'll  say  it  again, the  notion  of  being  data -driven is  often  very  misunderstood . Many  people  think  that  if  they  present data  in  Excel  that  that 's  good  enough . But  it 's  one  thing to  present  the  data  in  Excel , but  it 's  another  thing   to  actually  meaningfully  present  the  data in  Excel  or  any  other  tool . Oftentimes ,  when  people  assume they  already  know  what  they  don 't  know , then  this  reflects a  miss  of  important  details that  they  can  use to  solve  their  problem  better . The  final  thing  I  want  to  say , which  I  may  not  mention , but  it 's  peppered  through  the  rest of  the  presentation ,  is  that it 's  fundamental  that  we  recognize the  achievements  of  every  individual , especially  the  MBBDV  mentor  candidates , because  it 's  through  this  recognition  that we  cultivate  passion  in  the  candidate . The  program  no  longer  becomes an  obligation  or  an  assignment from  management ,  but  really  an  honor and  an  empowerment  mechanism for  these  individuals  to  become more  capable  themselves , investing  in  them,  and  also give  back  more  to  the  company . This  is  the  roadmap . There 's  a  lot  here ,  but  the  key elements  here  are , again ,  that  we 're  grouping  internal and  external  material . When  I  say  internal  material , I 'm  talking  about the  material  in  the  blue . This  is  well -vetted , well -thought -out  material that  comes  from  the  product of  years  of  engineering , applied  engineering  experience within  this  organization and  within  the  program  champion's knowledge  base from  previous  organizations , which  is  a  Six  Sigma   Black Belt  focus . Then  in  the  red ,  we  have  all the  JMP  education  training  materials which  have  been  acquired  through  NDA by  this  test  case  organization . You  can  see  how  strategically , through  these  A ,  B ,  C ,  D ,  E, and  scripting  language  main  modules , we 're  looking  at  disseminating and  developing  leaders  for  this  program . Obviously ,  for  any  questions about  this ,  please  reach  Dr .  Chen , my  co -author  presenter  here . I 'm  also  happy  to  take  questions . One  other  thing  is  that  it 's  about a  50 -50  split  between  the  internal and  external  curriculum , if  it 's  not  obvious  from  this  slide . I  just  want  to  make sure  that  that 's  clear . Then  what  I 'll  show  in  subsequent  slides is  you 'll  see  this  0 .5  nomenclature and  I 'll  explain  what  that  means  later . This  is  a  beautiful  slide . There 's  a  lovely  story  here , and  it 's  really  about  connecting  ideas . But  let  me  say  first  that we 're  going  to  basically  start  down  here at  the  foundation . The  A0 ,  A1 ,  A15 ,  come  up  to  the  A2 , which  is  the  nucleus , and  I 'm  going  to  mention  this  again, and  then  go  up  to  A3  and  A4 , and  then  go  to  B1  to  B4 , and  then  down  to  C1  to  C4 . It  starts  on  the  lower  left , and  then  it  goes  around ,  counterclockwise, down,  and  ends  back  at  the  lower  left . What  are  we  trying  to  communicate  here ? The  focus ,  really , as  the  title  of  this  talk  implies , is  on  applied  engineering  statistics . There 's  five  stages . There 's  foundation ,  A0 ,  A1 ,  and  A1 .5 . The  0 .5  was  identified  as  a  bridge to  help  bridge  the  knowledge  gap between  the  A1  and  the  A2  curriculum , which  we  identified . I 'll  just  say  that  here to  make  that  clear . The  A2 ,  which  really  focuses on  basic  statistics and  modeling  ANOVA  and  regression . That 's  up  here . Then  connecting  the  dots , we 're  going  to  the  DMAIC  curriculum , B1  to  B4 , which  really  focus  on  the  DOE  material , and  then  from  there, progressing  on  to  the  data  mining and  text  mining  material , which  is  really what  the  course  content  entails . The  progression  here , there 's  an  analogy  here  drawn between  this  training  curriculum and  kung  fu  or  Chinese  kung  fu. Let 's  talk  about  that . Kung  fu  really  embodies  this  idea  that excellence  and  mastery  in  any  endeavor require  persistent  effort , dedication ,  and  time . It 's  not  a  quick  fix . There's no you  learn  everything and  you 're  an  expert . That 's  the  idea  here . The  emphasis  here  is  that the  foundation  must  be  very  solid , so  skill  achieved  through  hard  work . We  go  through  the  A0 ,  A1 , the  foundational  study , we  go  through  the  1 .5 to  help  bridge  up  to  A2 . By  doing  the  A0 ,  A1  and  A 1 .5 , we 're  going  diverse . There  are  72  skills  in  Chinese  kung  fu that  have  to  be  learned . I  think  this  illustrates  that . We  go  diverse  in  order to  build  a  foundation , and  then  we  strengthen the  foundation  through  the  A2 , through  the  ANOVA  and  regression modeling  techniques . Then  from  here , we  get  to  the  A3  and  A4 , where  we 're  talking about  the   central limit theorem and  rational  subgrouping , which  are  really  fundamental , not  just  in  statistical  thinking , but  in  applied  engineering  thinking , because  data  is  part  of  engineering  now . What  we 're  showing  here  is  that there 's  basically  two  key  maybe  Chakras or  acupuncture  points ,  if  you  will , that  require  opening  up in this  martial  arts  tradition . Those  two  points  are  drawn  parallel to   central limit theorem and  rational  subgrouping . What  I  will  say  is  that  these  concepts , while  theoretically  many  people understand  practically ,  they  may  not . The   central limit theorem and  rational  subgrouping  are  actually closely  connected  to  a  process engineers  use  all  the  time , the  CPK  and  PPK  and  understanding  that is  a  great  launching  point . I  think  if  anyone  has  any  questions about  the  importance  of  those , we 'd  be  happy  to  discuss  that . Rational  subgrouping  is  so  important because  it  really  refers to  how  we 're  going  to  understand  within versus  between  variation  and  how  we 're  going  to  meaningfully sample  our  data  so  that we  can  extract  practical  insights  from  it . So  many  times  in  an  industry , I 've  seen  examples  where  sampling is  done  non -systematically and  in  a  way  that  produces meaningless  information . So  rational  sampling , rational  subgrouping  is  really  critical . All  this ,  up  to  this  point , [inaudible 00:17:21] the  product  of  developing a  very  strong  foundation , connecting  the  dots , and  only  then  do  we  really  move  over to  the  more  advanced DOE  and  regression  tools , which  is  the  B1  to  B4  program , and  not  just  learning  the  tools , but  becoming  effective  and  fast and  using  the  systematic  DMAIC  framework to  drive  them . After  the  stakeholder  works  through the  fruit  of  this  endeavor , through  project  work  and  coursework , they  can  become  a  master  and  they  can start  employing  more  advanced  tools . Really ,  the  analysis  effort becomes  more  of  a  creative  integration of  different  tools  and  techniques and  with  an  understanding of  their  limitations to  be  able  to  solve a  problem  holistically . There 's  a  lot  there ,  but  I  really  wanted  to  summarize  the  power  there . Now ,  here  this  slide , we 're  talking  really  about the  high -level  deployment of  the   Black Belt  training  curriculum . You  can  see  that  most  of  people ,  again , like  as  shown  in  the  previous  slide , are  trained  in  the  A0 ,  A1 ,  and  A2 for  the  foundation over  800 ,  over  600 ,  over  300 . Then  you 've  got  the  MBB, BB  mentors   who  are  required  to  do  the  project  work , starting  to  be  trained to  be  in  A3  and  A4 . These  programs  ultimately  need to  be  driven  by  local  leaders who  promote  and  empower  the  folks that  are  taking  these  courses  under  them . This  is  the  only  way  that the  program  becomes  truly  impactful in  the  organization , because  the  leaders  drive  the  change , and  through  their  knowledge and  experience , they  teach  people who  came  before  them . The  key  thing  here  is  the  participation and  application  in  projects  is  fundamental for  these  leaders because  only  through  applied  learning in  a  project  context can  one  truly  become  an  effective   applied  engineering  statistician . This  slide ,  in  essence , maps  the  JMP  tools  to  the  DMAIC  steps . For  each  DMAIC  step ,   we  pair  it  with  the  JMP  platform so  that  the  stakeholder  can  consider and  learn  to  solve  a  problem in  a  more  systematic  manner . That 's  really  the  power of  the  DMAIC  framework . Our  approach  is ,  I  would  say , more  difficult  in  the  beginning because  the  candidate   has  to  get  the  data  first . That 's  difficult  at  first , especially  for  somebody  who  isn 't  trained in  real  applied  engineering  data  analysis . But  it  ends  up  being  smoother  in  the  end because  once  they  have  the  DMAIC  tool , they  can  drive  the  project  to  completion . They 're  much  more  likely  to  drive  it to  completion  in  a  systematic  manner . One  thing  to  mention  here   is  that a  lot  of  people  in  today 's  engineering technology  environment want  to  get  certification  in   Black Belt . But  oftentimes ,   especially  in  high -tech  industries , the  engineering  function   isn 't  well  defined . With  a  paper  certification , it 's  probably  not  of  any  real  help to  the  trainee . They  won 't  really  learn  anything to  be  able  to  actually  make  an  impact   in  their  organization . The  goal  of  this  program  is  really to  build  impact ,  create  impact . In  this  slide ,  really  the  emphasis  here   is  that  the  A2  class , the  ANOVA  and  regression , as  I  talked  about  as  being   the  fundamental  and  the  bridge , it 's  important  for  the  global  vision , and  it  really ,  truly  is  a  global  vision . There 's  deployment  across  the  world . So  far  in  this  case  study , we 've  deployed  over  30  A2  instructors ranging  from  geographies   like  Germany ,  Israel , even  China ,  bridging  over  to  Japan . We 've  offered  20  of  the  A2 in  2023,  20  of  the  classes . The  first  120  trainees have  to  do  projects . The  remaining  trainees ,  over  180 , have  an  optional  project  component , but  they  can 't  continue   beyond  the  A2  program  as  a  result  of  that . I  talked  a  little  bit  about the  A  minus  or  the  0 .5 . and  so  this  slide  is  really   to  speak  to  that . What  are  the  objectives in  creating  this  A  minus  program ? It 's  really  to  bridge  the  gap between  the  original  A0  and  A4  modules   that  were  developed . The  project  component is  they  don 't  need  to  do . Of  course ,  as  I  mentioned ,  not  having   the  data  initially  makes  it  difficult for  the  training  to  get  started . This  covers  that  limitation ,  if  you  will ,   for  certain  people and  addresses  that  knowledge  gap   that  we  identified . Once  they  finish  this  training , they  can  go  more  deeper   into  advanced  subjects . Many  people  come  to  learn  JMP and  they  like  DMAIC   as  a  problem -solving  methodology . This  is  a  good  mechanism  to  catch  them   and  find  them  where  they  are . Actually ,  this  program   is  extremely  popular  right  now . It 's  well  overbooked . The  plan  was  to  book  only  12  people in  three  of  the  1 .5  trainings ,  for  example and  there 's  already over  23  or  over  25  people . Again ,  the  local  leader , the  local  MBB  leader  is  facilitating and  maintaining  and  following  up and  continuing  to  generate  interest . That 's  actually  really  fundamental to  this  program and  the  success  of  this  program . Also ,  there 's  some  feedback  from  people who  already  took the  more  advanced   program , who  are  now  taking  the  1 .5 , and  they  like  it  very  much . I  think  many  of  them  find  that   it  helps  them  connect  the  dots  more with  what  they 've  already   journeyed  through . This  slide  just  gives  you  a  feel   for  the  contents  of  this  program . You  can  see  that  obviously   there 's  a  Getting  Started  component . Graphical  analysis   and  outliers, box  plot  analysis is  really  fundamental  if  done  properly . Then  the  A2 ,  A3 ,  the  ANOVA,  regression , the  MSA  measurement  systems  analysis,   and  Gauge  R&R . Then ,  in  the  A4 , we  introduce  SPC  and  Multivariate  SPC . Then  in  the  C  program , we  get  into  the  advanced  data  mining , text  mining ,  PCA ,  multivariate  methods , all  through  the  JMP 's  platforms . Actually ,  this  type  of  a  program , the  goal  would  ultimately  be aspirationally  to  develop  it   at  the  supplier  level so  we  can  create  a  synergy  with  suppliers and  improve  quality  using  this  true   data -driven  approach  with  JMP . How  would  we  deliver   A  minus  training  classes ? Well ,  this  slide  gives  you   a  high -level  overview  of  how  this  happens . This  reflects  the  training  style . We  deliver  the  subject  in  five  steps . The  example  isn 't  critical  in  this  context but  we  use  Choice  Design . In  this  case ,  we  use  Choice  Design   to  conduct  an  Attribute  GRR which  was  also  featured  later . We  go  through  the  launch  window and  demonstrate  how  to  populate  it in  Choice  Design  in  the  DOE . We  conduct  the  analysis and  look  at  the  top -level  statistics  here , the  marginal  probability  and  utility . We  interpret  the  analysis by  looking  at  these  statistics and  the  probability  profiler . Then  based  on  that ,  we  take   appropriate  improvement  actions . This  is  also  more  about  interpretation then  we  take  appropriate improvement  actions . When  the  trainees  go  through  this  process , it  actually  really  helps  them  identify   gaps  in  their  learning  and  fill  them  in . This  is  where  the  applied  style   is  really  important , even  in  the  non -project   required  curriculum . This  is  an  expansion   of  this  A -minus  program and  emphasizes  a  really  thoughtful and  strategic  vision where the yellow… These  are  actually  modules   as  summarized  in  previous  slides in  a  similar  manner . The  yellow  are  the  modules  that  really we 're  in  the  process  of  developing . The  green  ones  have  already  been  developed by  the  program  champion , mostly ,  I  think ,  or  exclusively   by  Dr .  Chen . If  you  jump  to  the  E ,  it 's  really   about  reliability  and  marketing . I  mentioned  B  is  the  DOE ,   and  the  custom  DOE  really  is  the  emphasis . C  is  the  abbreviated  data  mining ,   and  the  A1 ,  A2  are  really  the  foundations . The  emphasis  here  is  that   after  each  MBB  gets  certified , it 's  really  not  the  end , it 's  really  the  beginning  for  them . There 's  an  emphasis  and  careful  thought  in  customizing  each  of  their  functions , developing  them . Some ,  for  example ,  may  go  into  DOE . Some  might  take  data  mining . Some  will  become  a  leader in  the  A1 .5  curriculum  and  so  on . The  goal  is  for  us  to  make  sure   that  all  these  leaders ultimately  become  not  only  certified ,   but  capable  of  training and  developing  other  local  leaders . In  the  paper  certification  anyway   by  the  Global  VP  of  Quality  and  the  CEO is  a  testament  that   we  really  follow  these  people in  their  development  path   after  certification . If  it 's  not  obvious  already , their  strength  is  JMP  here and using  JMP  to  solve  their  problems . This  is  a  fun  slide . This  is  me  and  my  daughter , just  over  a  year  old . But  what  we 're  showing  here  is  that probably  a  record  number   of  internal  people  in  this  organization took  the  STIPS  exam  at  the  same  time . There  was  100 %  passing  rate . These  are  a  median  score  of  915 , an  average  of  899 . Top  two  scores  for  these  two  individuals , very  close  to  100 %. There 's ,  of  course ,  me  and  my  daughter . Dr .  Chen  is  here  and  here 's  his  son . We  have  this  little  JMP  girl   and  JMP  boy  thing  going . It 's  fun . I 'll  credit  and  thank  Sarah  Springer for  her  really  wonderful  collaboration at  the  end  of  the  presentation . But  here  she  is  here . A  key  component  of  this  program ,   just  like  today , is  participating  in  JMP  Discovery  Summit . This  highlights   our  Discovery  Summit  achievements . The  fruits  of  real  work that  are  being  recognized by  Discovery  Committee  members . I  think  this  is  from  a  June  12th   Gloucester  Forum  event . The  two  key  event  ingredients  I  mentioned are  the  forum  events   and  the  Discovery  Summit  participation . This  is  the  June  12th Gloucester  JMP  event  here . We  have  asked  JMP  to  support JMP  17  new  features . We 'll  continue  to  seek  their  support during  an  upcoming  September  11   Hillsborough  event . We 're  engaging  with  basically all of  the  JMP  Discovery  Summit  events . US ,  Japan ,  China ,  Europe . Let  me  see . I  forgot  to  mention, this  is  Agatha  Debris .  This  is  Sarah . This  is  one  of  our  candidates , or  actually  leader  at  this  point . This  is  Don  McCormick ,  of  course ,  of  JMP . We  anticipate  quite  a  bit  of  engagement in  the  2024  Europe . Actually ,  there  are  many  reports   that  are  available . Half  of  the  80  that  have  been  developed   are  confidential , so  those  will  probably  be  off  the  table . But  we 're  definitely  expecting   some  good  likely  acceptance based  on  the  work  that 's  been  done . Here ,  we  see  the  Singapore  event , the  2022  September  9  Singapore Elite Eight  Tournament  event . This  just  highlights  our  competition . It  was  a  very  competitive and  successful  event . Talks  were  very  diverse . I  actually  presented  a  talk  here . You 'll  see  me  in  here   on  box  plot  statistics , which  was  high -quality  enough to  be  accepted  at   US  Discovery  Summit  last  year . I 'll  probably  link  that on  the  Discovery  page  for  this  project . But  a  key  aspect  of  having   these  forum  events  is  soliciting  feedback on  the  presentation  quality . That 's  where  JMP 's  participation comes  in  the  organization . We  seek  engagement  from  JMP  stakeholders who  can  review  the  material because  it  garners  enthusiasm and  engagement  for  those  presentations . It  makes  them  more  concise ,   more  effectively  delivered , and  it  moves  to  a  feedback -forward  model where  continuous  improvement is  obtained  through  continuous  feedback . There 's  this  mindset  of  technology  facing , moving  to  service  facing . This  collaboration  with  JMP and  this  partnership  with  JMP helping  us  to  review  these  presentations   as  part  of  this  effort , not  only  improves  the  presentation  quality for  the  purpose  of  submitting  at  Discovery but  it  improves  the  quality  of  the  work that 's  being  done   at  the  organizational  level . I 'm  going  to  briefly  cover   some  case  studies just  to  give  you  a  flavor   of  what  we 're  doing  on  the  ground . This  case  study  was  really  focused on  comparing  Excel  versus  Minitab versus  JMP  Gauge  R&R  analysis . It 's  about  how  do  we  manage   a  destructive  Gauge  R&R ? First  of  all ,  how  do  you  know   a  test  is  destructive ? On  the  left  here ,  we  basically  did ... In  the  measure  phase , we  did  a  rigorous  comparison and  showed  that  JMP  is  more  reliable for  the  decision -making  process . The  key  is  because  it  considers  the  ANOVA ,  the  analysis  of  variance with  an  interaction , and  that 's  really  the  most  comprehensive and  best  tool  out  there . On  the  right -hand  side ,  we 're  talking   about  that  destructive  test  methodology and  how  do  we  determine  that . There 's  really  three  approaches . We  can  assume  that  the  study   is  fully  crossed  with  no  degradation . We  can  assume ... When  I  say  degradation , I  mean  the  sample  doesn 't  change as  it 's  being  tested  repeatedly . We  can  assume  the  test  is  nested , so  there 's  some  degradation  behavior . We  have  a  third  choice . We  can  use  the  crossed  methodology . But  if  we  can  systematically  go  through a  decision -making  process  with  a  flowchart to  show  that  the  destructive  quality on  the  sample  is  minimal within  some  prescribed  limits , then  we  can  use  this  third  choice , and  this  flowchart  helps  manage   that  decision -making  process to  decide  how  we  want  to  approach the  destructive  method . I  forgot  to  highlight . This  presenter  became  BB -certified and  scored  925  in  the  STIPS  exam . This  project  owner . The  second  case  study  achieved  third  place in  the  annual  rankings   within  the  organization . The  emphasis  here  is  on  conducting , which  I  alluded  to  before  conducting an  Attribute  Gauge  R&R using  a  Choice  Design , which  is  really ,  in  my  mind ,  a  very   novel  application  for  Choice  Design . It 's  very  exciting . The  objective  overall  on  the  response was  to  reduce  Failure  Analysis cycle  time  reduction . This  presenter  presented in  2020  at  the  US  Discovery  Summit and  scored  an  impressive   925  on  their  STIPS  exam . Just  to  give  you  a  sense  here , the  question  we 're  trying  to  answer  is how  do  we  know  that  our  team can  make  a  consensus  decision   in  our  meetings  in  general ? Considering  this  Attribute  GRR   in  our  model  for  that is  what 's  been  done  here . Briefly ,  if  all  the  members  achieve a  response  probability  of  100 %, then  that  would  imply  that  the  team can  make  a  consensus  decision , as  you  can  see  these  numbers  here , these  response  probability  numbers . But  what  this  is  showing  is  that   a  few  of  the  respondents that  scored  higher  in  the  green and   at  the  end ,  actually ,  too , scored  lower . These  respondents  who  scored  higher really  dominated  the  meeting . They  were  the  most  talkative , the  most  experienced . These  lower -scoring  respondents   are  the  people  that either  they  weren 't  paying  attention or  they  weren 't  talking  intentionally , they  weren 't  engaging  in  the  meeting . This  is  a  data -driven  way  to  demonstrate this  team  is  not  ready  to  make a  consensus  decision  right  now . This  is  another  case  study . A  lovely  case  study  where  we  use text  mining  to  search  keywords . Actually ,  you 'll  see , part  number  is  part  of  the  word  cloud that  we  identified  in  Text  Explorer . The  novel  thing  here  is  we  saved   the  indicators  for  these  words . By  doing  this  indicator  saving , we  put  part  number  into  the  model . When  we  put  part  number  into  the  model , that  model  went  from  a  poor  model   to  actually  a  very  good  model . Part  number  became  the  strongest , the  most  effective  factor   in  driving  the  response  here , failure  analysis  cycle  time  response . It 's  a  very  powerful   and  elegant  application of  using  Text  Explorer  bread -and -butter to  go  into  modeling without  JMP  pro  actually . What 's  great  about  it   is  it 's  very  easy  to  understand how  to  work  through  that  workflow in  JMP . One  thing  that 's  super  powerful  about  this is  when  we  have  a  model   and  it  predicts  well , we  don 't  have  to  do  a  bunch of  other  root  cause  analysis that  we  did  up  to  this  point for  other ,  say ,  similar  part  numbers in  the  product  family where  the  part  number  performance   would  vary even  within  that  part  number   product  family . This  is  another  quite exhaustive  case  study . The  key  here  is  this  presenter  did  scored  very ,  very  well among  the  top  in  the  STIPS . They  passed  the  DOE  certification  exam and  they  also  presented   at  Discovery  Summit  in  2022 and  were  BB  certified . But  they  were  basically  measuring a  very  difficult  shape . They  had  to  modify  their  recipe in  order  to  perform  that  measurement . This  analysis ,  it 's  quite  small , but  in  effect ,  what  it  ended  up  showing is  that  they  were  able  to  achieve a  significant  performance in  their  precision- to- tolerance  ratio , so  there 's  a  P- to- TV  ratio ,   and  P- to- T  ratio . This  shape  is  very  complex . This  is  a  hallmark  in  high  tech . Another  example  here  shows that  there 's  non -uniformity in  dimension  on  the  basis  of  a  location . We  have  to  perhaps  look at  the  distribution of  the  average  difference from  location  to  location , from  center  to  edge , rather  than  just  trying  to  take an  overall  aggregate  measurement to  capture  the  entire  surface . That 's  what  some   of  this  analysis  highlights . Here 's  another  really  lovely  case  study . This  one  used  group  orthogonal super-saturated  design to  block  the  first  failure  mode   from  the  second  failure  mode in  this  problem . It  was  a  two -step  process  optimization using  this  GO -SSD . The  project  owner  had  to  figure  out how  to  manage   seven  variables  in  this  context . It  took  her  over  two  weeks  to  figure  that  out . Basically  what  we 're  showing is  the  design  process  through  different [inaudible 00:42:20] she might have considered. Then  on  the  right,  we 're  showing some  Monte  Carlo  simulation  in  JMP , which  JMP  is  very  user- friendly  at  doing . Then  essentially  validating the  optimal  process  settings where  if  we  looked  at  the  simulation were  the  means   within  the  confidence  interval and  the  prediction   interval  range  of  the  model , and  that  was  the  tool  for  validation , graphical  visual  interactive  tool   for  validation . Then  the  presenter  also  went  into  some  SVC to  identify  any  process  issues . The  key  message  here  is  that  if  the  design  is  good , doesn 't  mean  the  process  is  stable . You  can  see  this  process  is  drifting . That 's  why  the  SVC  component is  so  critical   in  the  trainees'  learning  path . Here 's  another  really  lovely case  study  here . The  emphasis  is  using  SIPOC with  process  improvement  flowcharts   to  define  the  scope  of  the  project . The  interesting  thing  about  this  project is  the  project  scope  really  pertained   to  the  process  itself , so  we 're  in  effect  modeling  a  process . The  approach  was  creating  a  model and  then  refining  it and  working  on  cycle  time  reduction and  improving  the  model 's   predictive  capabilities  for  cycle  time . This  emphasizes  the  importance of  developing  a  robust  model , so  that  process  automation  predictions can  be  robust . On  the  right  here , we 're  looking  at  leverage  plots . This  is  quite  interesting because  we  see  data  points  that  are  off the  residual  by  predicted  plot  here . I  drew  some  lines  in  here . They  actually  reveal  a  collinearity  issue . This  pattern  reveals  a  collinearity  issue which  is  consistent  with  the  high  BIS that  we  see   in  the  parameter  estimates  table . This  reveals  a  combination    of  potential  things : hardware  constraint ,  phase  constraint , a  data  acquisition  problem through  the  absence  of  a  cyclic  pattern, and  that  we  can  see  in  some  time  series augmented  on  a  studentized  residual  plot . There 's  the  opportunity  here  to  go into  a  more  detailed  time  series  analysis to  understand  why  is  there this  time  shift  problem ? Why  is  there  this   collinearity  problem  in  time or  in  space ,  for  example , for  angular  depth  or  wafer  location ? Autocorrelation  is  fundamental and  this  project  highlights  that , whether  it  be  multicollinearity in  space  or  autocorrelation  in  time . This  slide  just  highlights  yet  another  wonderful  forum  event , 2023  March  Madness . It  was  extremely  close and  very  competitive and  the  top  eight  people   were  invited  to  the  Singapore  forum  event . Here  is  one  more  case  study . In  this  case  study, we 're  using  item  analysis to  focus  on  difficulty  and  discrimination , to  basically   strategically  assign  exam  questions , JMP  STIPS  questions  to  either a  Green  Belt  or  a   Black Belt  exam . By  carefully  looking  at  discrimination and  difficulty  in  the  context of  this  item  analysis  framework , we  can  effectively  categorize  the  questions . You  can  see  on  the  right  how  we  did  that . We  selected  four  questions that  showcased our  rigorous  selection  criteria . Actually,  I 'll  show  that in  the  next  slide . But  here  what  you  can  see is  that  discrimination  can  be  seen by  the  steepness  of  the  curve . If  you  see  this  step , you  can  say ,  okay ,  this  is a  very  discriminatory  question . Then  difficulty  refers to  the  translation  of  the  curve and  these  colors  indicate  different  categories where  we  used… Actually,  this  is  very  important , we  use  this  flowchart . We  can  dichotomize  difficulty   as  easy  and  hard and  then  discrimination in  terms  of  the  location . I  mentioned  that . Then  we  can  dichotomize  discrimination as  yes /no  in  terms  of  the  location  and  steepness . The  colors  show  the  different  categories and  this  goes into  actually  specific  examples of  which  questions fall  into  each  category . We  can  even  go  a  step  further by  looking  at  the  patterns   within  each  of  the  assigned  categories . Actually ,  you  can  think  of  this  as  group  one . The  group  one  questions  were  easy with  no  discrimination   where  basically  everybody  got  it  right . The  group  two  questions , easy  with  discrimination , basically,  most  people  got  it  right . The  group  three ,  hard  with  discrimination , basically,  everybody  got  them  wrong . Then  the  four  hard  with  discrimination , most  people  got  it  wrong ,  basically . You  can  really  think  about  the  power of  this  methodology for  segmenting  questions   and  classifying  them and  giving  a  meaningful   and  challenging  exam  to  students to  develop  them  in  the  right  way . Then  these  cell  plots  added an  additional  layer  of  analytic  immediacy in  JMP  where  we  can  see  the  proportion of  respondents  that  answer the  question  correct  versus  incorrect . It 's  a  very  nice  companion to  these  item  analysis  curves . For  reference ,  blue  is  incorrect , red  is  correct . It 's  a  little  flipped from  you  might  think . Just  a  couple  more   quick  case  studies and  I 'll  wrap  up . This  case  study , we  used  ANOVA  and  regression to  improve  supplier  quality  control . This  is  one  of  our  most   prolific  student's  mentors . Now  he  got  a  good  DOE  cert  exam  score , was   Black Belt  certified  over  900  on  the  STIPS  exam . Basically ,   I  think  the  novelty  in  this  project is  really  using  a  regression  algorithm— we 'll  just  jump  over  to  the  right— to  handle  outlier  problems in  the  sense  of  the  regression  itself  or  outlier  problems  in  general . There 's  a  lot  of  thinking  in  here with  respect   to  different  types  of  regression and  how  those  different regression  methodologies  help  us capture  outliers  on  the  low and  the  high  end . I  think  one  thing   that 's  really  important to  think  about  as  a  practitioner  is , is  the  low  end  more  important or  is  the  high  end  and  why ? From  an  engineering  sense , in  this  case ,   the  low  end  was  most  important . The  fitting  approach   had  to  be  tailored  to  the  low  end . We 're  talking  about  understanding types  of  outliers and  not  only  types  of  outliers , but  where  they  come  from ,  why  they  originate . Just  a  couple  more  here . This  one  is  quite  nice  because  it 's  a  VSM -focused  project . It 's  very  unique  in  the  implementation of  the  VSM  approach because  mostly  people  who  do  VSM , it 's  a  documentation  exercise . It 's  quite  qualitative . But  in  this  case ,  this  candidate who 's  also  BB  certified   and  taking  the  BB  certification , they  really  went  into  all  sorts of  detailed  calculations  and  mapping . As  a  result  of  that ,  they  actually determined  that  they  didn 't  even  need  to  do  a  Gage  R&R because  the  VSM  was  done . The  entire  quality  metric in  this  case  was  VSM- based . This  is  really  showcased  here because  this  was  the  most  successful Lean  Six  Sigma  project  to  date . Later  on  this  project  is  actually , I  think ,  still  being  extended  to  use  SBC to  validate  process  stability   after  building  the  process  up  like  this . One  final  case  study , I  think  I  just  want  to  highlight  here  that  more  in  the  flavor  of  SPC using  Control  Chart  Builder , JMP 's  super  powerful and  flexible  control  charting  platform to  verify  process   both  stability  and  capability. You  can  see  from  here that  there 's  a  certain  risk  of  the  process being  either  not  capable  or  stable  enough . You  can  see  this  process   floating  on  the  upper  spec  limit . The  idea  is  actually ,  I  believe , to  collect  more  data to  characterize  the  process  better , to  define  the  spec  ranges  better ,   to  match  the  process  behavior . We 're  practically  looking at  that  on  this  chart , in  addition  to  looking   at  the  control  limits that  are  driven  by  the  SVC  methodology . Here ,  this  just  showcases ,  this  was the  stakeholder ,  the  trainee ,  their  slide . These  folks  are  really  promoting  JMP   at  the  local  level and  they 're  really  thinking  creatively about  how  to  promote  JMP  more  themselves , which  is  fantastic . Here ,  this  is a  coming  mid -autumn… Dr .  Chen  is  calling  it  a  mid -autumn  festival . This  is  upcoming  here . You  can  just  see  there 's  just  a  lot of  great  thought  going  into  what 's  coming , who 's  involved , and  it 's  really  a  group  effort  with  these  top  level  MBBs working  with  the  program  champion to  drive  this  initiative . This  slide  shows  an  emphasis  on  really  to  become  a  top  five  performer in  this  healthy  competition  framework , it 's  really  not  easy . All  these  people   are  really  close  right  now , but  the  requirements  are  very  multidisciplinary . There 's  a  participation in  the  forum  events , there 's  participation  in  instruction and  mentorship , in  attending  Discovery  Summit  conferences, and  preparing  curriculum, and  getting  the  certification . Even  promoting  the  internal  initiatives through  writing  articles   and  distributing  them through  the  organization . Of  course ,  assisting  in  forum  events and  even  finding  additional  trainees in  the  vein  of  the  marketing  methodology and  training  material  that  will  be  coming  as  part  of  this  program  itself . This  healthy  competition  really  demonstrates the  program 's  reach  and  impact because  these  people   are  really  working  not  for  themselves , but  for  the  entire  organization . Realize  I 've  gone  over  a  little  bit , but  I 'll  wrap  up  very  quickly ,  very  soon. I  think  I  just  wanted  to  emphasize that  this  initiative  is  getting  internal  recognition  company- wide . The  takeaway  here  is  that  the  internal education  system  is  now  going  to  host a  lot  of  the  material that 's  been  developed  here . So  the  material  won 't  be  free  anymore as  it  currently  is  technically   in  the  cost  structure  of  the  organization . The  money  that 's  charged will  support  instructors   for  their  trips  and  their  training . The  money  collected  by  this  internal education  system  framework will  be  controlled  by  the  program  champion , and  that  program  champion will  assign  it  to  instructors based  on  their  participation and  involvement . So  there  can  be  a  transfer  of  money through  corporate  cost  center  entities . I  think  many  people  know  that when  there 's  money  lined  up  formally within  the  organizational  structure , people  really  take  it  seriously . I 'm  very  excited  about  this . I  know  Dr .  Chen is  super  excited  about  this because  there 's  a  potential  for  a  huge  amount  of  money inside  the  organization  behind  this , and  that  means  an  increasing amount  of  influence . Also,  I  wanted  to  say  that  JMP  accounts , Sarah  Springer  has  been  just  instrumental in  working  with  Charles  on  this and  in  making  sure  that  this  whole  process is  working  effectively when  considering  the  JMP  component  of  the  curriculum . I 'm  just  about  done   and  I  just  want  to  present  this  last  slide and  do  a  little  bit   of  a  quick  acknowledgment . The  key  here  is  moving  from  this  antiquated   Black Belt  program that  other  organizations  have  used to  this  really  JMP -centered , multidisciplinary  applied  engineering statistics  program that  really  emphasizes gradually  empowering  leaders   purposefully  and  sincerely , generating  core  values , proliferating  those  core  values and  using  them  to  really  drive the  program  and  grow  it . The  most  exciting  thing  for  me  as  a  JMP  employee and  a  former  industry  practitioner   for  15  years  is  synergizing with  industry  and  synergizing  with  JMP to  maximize  the  impact of  both  the  JMP  internal  materials  and  the  company  internal  materials . With  that ,  I  think  I 've  mentioned  Sarah many  times . We  really  appreciate  what  she 's  done . Peter  Hersch  and  Don  McCormack  have  been  great in  terms  of  deploying  trainings for  the  new  features  on  JMP  17 . Then  there 's  a  number  of  people from  the  different  sites  that  we 've  just called  out  by  name  here ,  JMP  Europe , Agatha  has  been  very  instrumental . Of  course  the  case  study  presenters who  we 've  referred  to  anonymously  here , but  hopefully,  you  can  see that  they 've  done  some  amazing  work . Here 's  the  JMP  girl  and  JMP  boy  again . My  lovely  daughter   and  Charles 's  son  Mason  here several  months  ago  now . With  that ,  thank  you  so  much  for  your  time . It 's  been  a  pleasure . Any  questions ,  please  reach  out  to  us . Thank  you .
Autonomous vehicles, or self-driving cars, no longer only live in science fiction. Engineers and scientists are making them a reality. However, their reliability concerns, or more importantly, safety concerns, have been crucial to their commercial success. Can we trust autonomous vehicles? Do we have the information to make this decision?   In this talk, we investigate the reliability of autonomous vehicles (AVs) produced by four leading manufacturers by analyzing the publicly available data that have been submitted to the California DMV AV testing program. We assess the quality of the data, evaluate the amount of information contained in the data, analyze the data in various ways, and eventually attempt to draw some conclusions from what we have learned in the process. We show how we utilized various tools in JMP in this study, including processing the raw data, establishing assumptions and limitations of the data, fitting different reliability models, and finally, selecting appropriate models to draw conclusions. The limitations of the data include both quality and quantity. As such, our results might be far from conclusive, but we can still gain important insights with proper statistical methodologies.     Hello,  my  name  is  Caleb  King.   I'm  a  senior  research statistician  developer here  in  the  Design  of  Experiments and  Reliability  group at  JMP  Statistical  Discovery. Its  quite  a  mouthful. A s  you  probably  guess from  the  title  of  my  talk, I'm  going  to  be  talking  about autonomous  vehicles, and  specifically  how  we  can probably  assess  the  reliability of  autonomous  vehicles using  some  of  the  reliability platforms  like  JMP, so  that  may  be  one  day we'll  be  like  this  fine  gentleman  here relaxing  in  our  autonomous  vehicles as  they  take  us  wherever  we  want  to  go without  being  scared that  we  won't  end  up  there. L et's  get  into  that. O f  course,  autonomous  vehicles are  fast  becoming  a  reality. We  have  a  lot  of  companies  now  involved in  extensive  testing  of  these  vehicles. Now,  of  course, some  of  that  testing  early  on is  basic  testing  of  the  software, testing  in  simulated  environments, but  ultimately, in  order  to  have  a  very  reliable  vehicle, you  need  to  test  it  out  on  the  roads, and  so  to  help  out  with  that, a  couple  of  years  back, the  California  Department of  Motor  Vehicles put  together  a  testing  program that  would  allow  these  companies to  opt  in  and  test  their vehicles  on  their  roads. As  part  of  that  arrangement,  though, every  year  these  companies have  to  provide  a  detailed  report on  any  disengagement, and  heaven  forbid,  crashes involving  their autonomous  vehicles  to  the  DMV. Now,  because  the  DMV is  a  federal  institution, these  reports  are  actually publicly  available  upon  request. In  fact,  I'll  pull  up  a  link  here real  quick  to  the  California  DMV  website where  you  would  be  able  to  go  to. In  this  case,  you  could  access reports  from  the  last  two  years. If  you  wanted  more, you  just  send  an  email  to  this  link there  pretty  quick on  getting  back  to  you. Now,  already  a  lot  of  people are  using  this  data  in  academic  research. As  a  quick  example, I'll  click  on  this  link  here to  a  paper  on  a  project that  I  was  personally  involved  in with  my  former  advisor and  one  of  his  new PhD  students, in  this  case,  analyzing  it  using what's  called,  recurrent  events  data. I'll  talk  a  little  bit about  that  more  later. But  again,  there's  already  research looking  at  how  can  we  use  this  data to  assess  the  reliability of  autonomous  vehicles, maybe  use  it  to  introduce new  methods  and  so  forth. I t's  already  been starting  to  be  widely  used. Now  for  this  particular  talk, I'll  be  looking  at  disengagement  reports, using  that  as  a  way  to  measure  reliability because  of  course, we  can't  access  the  actual  software, that's  all  proprietary  for  the  company. But  through  the  use of  these  disengagement  events, we  can  infer  the  reliability of  the  vehicles  for  a  particular  company. A gain,  I'll  be  showing  off, this  is  a  JMP statistical  discovery  conference, so  I'll  be  showing  off  some  tools  in  JMP that  allow  us  to  get  that  information. Wi th  that,  let's  quickly  talk  about what  data  are  we  looking  at? W hat  are  in  these  reports? F irst,  these  are  annual  reports, and  I'm  going  to  be  focusing on  a  single  company, those  coming  from  Waymo, which  is  formerly  this Google  self-driving  car  project. T hese  are  reports  submitted by  Waymo  to  the  California  DMV. Now,  I've  been  talking  about a  disengagement  event. What   do  I  mean  by  that? Well,  there's  two  types  of  testing. The  first  involves there  being a  driver  in  the  seat, the  vehicle  actually  can't  operate without  a  driver  in  the  seat. But  at  some  point, they  can  turn  on  autonomous  mode, and  if  anything  happens where  the  driver  has  to  take  over, that  was  not  planned as  part  of  the  testing, that's  a  disengagement  event. The  other  type  of  testing,  of  course, is  a  completely  driverless  vehicle. We're  not  going  to  focus  on  those. We're  just  going  to  focus on  the  driver  necessary  systems. Now,  these  annual  reports go  all  the  way  back  to  about  2015, even  containing  information  back  to  2014, which  is  about  when  this  whole California  DMV  program  started. Waymo  was  actually  one  of  the  first companies  to  get  involved  in  this. T hese  reports  contain, first  of  all,  they  contain  data typically  from  December of  the  previous  year, to  November  of  the  current  year of  the  report, the  exception  being  the  2015, which  actually  goes  back a  bit  further  into  2014. F or  example,  the  2016  report  would have  data  from  December  of  2015, all  the  way  up  to  November  of  2016. T here's  two  general types  of  data  present. The  first  is  going  to  tell  you  all about  the  disengagement  event. What  was  the  cause? Where  did  it  happen? S ometimes it  might  give  you  the  day and  the  VIN  of  the  vehicle involved  in  the  incident. The  other  type  of  report  is, how  many  miles  were  driven in  autonomous  mode for  a  vehicle,  for  a  particular  month. Now,  you  may  have  noticed  that I  said, in  the  disengagement  event  report, sometimes  we  know the  date  and  the  event. Yes,  that's  right. It's  not  very  consistent across  the  entire  time. In  fact,  that  information  actually wasn't  available  until  2018. Prior  to  that,  we  only  know  what  the  event was  and  how  many  happened  within  a  month. Starting  in  2018, they  started  providing  that  information, most  likely  because  the  DMV encouraged  them  to  do  so. W hat  it  means  then  is  that you  actually  have  two  levels of  resolution  with  our  data. T he  first  is  at  a  month  level, and  for  that  we  can  go  all  the  way back  to  when  testing  began. I'm  going  to  be  looking at  that  type  of  data using  our  reliability  growth  platform. T he  other  type  of  data is  actually  daily  and  at  the  VIN  level, so  individual  vehicle  level. I'll  be  using  the  recurrent  events platform  to  analyze  that. Before  I continue  further, let  me  show  you  where  those  are. You'll  find  them  under  the  Analyze  Menu, go  down  to  Reliability  and  Survival. You'll  notice  a  ton  of  platforms  here. The  one  I'll  be  highlighting are  Recurrence  Analysis. Th is  is,  we  got  individuals, in  this  case,  vehicles, that  can  experience a  recurring  event  over  time, in  this  case,  a  disengagement  event, so  we'll  be  using  that  for  the  VIN  level, and  then  the  reliability  growth I'll  be  using  to  analyze the  monthly  aggregate  data. The  same  idea, although  reliability  growth is  more  for  if  I  have  a  big  system that  could  encounter m ultiple  events, sometimes  they  have  to  be  repaired as  part  of  the  event, so  they're  experiencing  it  over  time. It  might  go  through  different  phases  in, so  the  underlying   process or  software  might  be  evolving  over  time. That's  something  that  the  reliability growth  platform  is  intended  to  handle. Now  this  could  also  handle the  VIN  information  as  well. It's  fine  with  that. I'm  just  picking  these  two because  I  want  to  show  off  the  breadth of  the  stuff  that  we  have  here  at  JMP. While  I'm  doing  that, I  also  want  to  show  you  a  bit  of  how  we use  some  JMP  tools  to  compile  the  data. Obviously,  there's  a  lot  of  reports, so  how  do  we  compile  all that  together  into  a  single  report? I'll  touch  briefly  on  some  of  the  tools that  we  used  in  JMP  to  help  with  that. L et's  dive  right  into  that. F irst  I'll  talk  about  reading in  some  of  the  earlier  reports. Y ou've  already  noticed  there's  a  bit of  inconsistency  in  what's  reported. There's  also  a  bit  of  inconsistency in  the  format  of  the  reporting, so  some  of  the  very  early  reports were  in  the  forms  of  PDF. Let  me  show  you  an  example using  the  2017  report. W e  have  a  nice  report  here  from  Waymo. It's  very  well  formatted. In  fact,  this  is  probably  one of  the  better,  nicer  reports that  we've  seen  amongst  the  companies that  were  participating  at  this  time. A  bunch  of  tables  in  here, The  ones  we  really  want  are  the  ones here  at  the  end  in  the  appendices. This  is  an  events  report. Notice  days  a  bit  of  a  misnomer. It's  actually  the  month  where  it  happened, some  information  about  the  operation, what  caused  it. We  definitely  want  this  information. This  is  a  table  that  we  want. We  also  want  a  table  here  at  the  end which  shows  the  autonomous  miles  for  each event,  in  this  case  the  last  four. Don't  even  get  the  complete  Vin for  each  vehicle  for  each  month. Now,  of  course  it's  in  the  PDF, so  we  really  don't  want  to  have to  go  in  and  manually  type  these  in. It  would  be  nice  if  JMP  had  a  way to  read  in  tables  from  a  PDF. Well,  you're  in  luck because  I  can  go  here  and  do  open. There's  my  pdf  I've  marked  it  to, Select  all  files,  not  just  JMP  files. There's  my  report, I'm  going  to  click  Open and  we  have  our  fancy PDF  reader. Now,  what  it's  gone  through  is  it's  trying to  identify  everything  that's  a  table in  that  report  and  it's done  a  pretty  good  job. Scroll  through  here. You  can  see  captured  most  of  those  tables. In  fact,  probably  too  many tables  don't  need  all  of  these. You  can  scroll  through  them  individually using  this  dropdown  menu and  you  can  see  it's  labeled  by  page and  the  table  number  on  that  page. I  don't  need  all  of  these  tables as  I  mentioned  before, I  just  want the  ones  that  are  in  the  appendices. That's  easy. I'll  go  to  the  Red  Triangle  menu that  suddenly  appeared  on  the  page and  say,  Just  ignore  all the  tables  on  this  page. I'll  scroll  down  and  do the  same  thing  for  this  page. Okay,  now  I've  got  the  tables I'm  interested  in. We  still  need  a  bit  of  tweaking  to  do because  as  you'll  see, they're  essentially  each  table on  each  page  is  its  own  entity. So  these  are  actually all  the  same  table. How  do  I  tell  JMP  to  do  that? Well,  now  I  can  go  to  the Red  Triangle  menu  on  the  table. I'm  going  to  scroll  down  here to  number  of  rows  to  use  this  header and  tell  it  to  use  none. Now  what  this  means  is if  there's  a  table  prior  to  it that's  telling  JMP  that,  hey, that's  actually  part of  the  previous  table. That's  why  there's  no  header  on  this. I'm  going  to  click  zero. You'll  notice  that  table, there's  one  less. You've  also  noticed  it's  added two  columns  in  this  case. What  page  is  that  table  occur and  what  table  on  that  page. If  you  wanted  that  information, JMP  automatically  provides  it. In  this  case,  I  don't  really  need  it, but  I  can  delete  it  later  after I'm  done  reading  these  then. We  also  notice we  missed  a  bit  of  that  table probably  because  the  formatting  change notice  it  was  able  to  pretty  easily  find what  the  roads  were  for  this  table, but  for  this  one,  not  so  much. Mainly  because  of  this  guy. Some  we  just  didn't  put  it  just  right, so  it  didn't  fit  the  pattern. That's  okay,  I'm  going  to  click, I'm  going  to  drag  around and  that  tells  JMP, "Hey,  there's  you  a  new  table", and  I  can  go  in  right  now  thinks  it's a  zone  table. I  can  just  tell  it no,  it's  part  of  the  previous  one. There  we  are. I'm  going  to  scroll  through  and  yeah,  it missed  some  of  this  information. But  again,  notice  it's  trying to  maintain  that  formatting, but  because  of  the  way  it  was  input, it   missed  that  last  row. That's  okay  I  can  input  that information  later.  It's  not  a  big  deal. If  we  go  to  this  next  page  in  this case  JMPs   done  the  opposite. It's  actually  concatenated  these two  together,  which  makes  sense. It   sees  these  are pretty  close  together. Maybe  they  are  the  same  table. Unfortunately,  they  are  not. As  you  can  see, they're  two  distinct  tables  here, whereas here  they're  concatenated  together. I  need  to  tell  JMP. No,  that's  actually  a  different  table. That's  pretty  easy. Just  say  how  many  rows. No,  there's  actually  is  a  header  row, and  that  tells  JMP, that's  a  different  table. Now  it's  created  a  different  table. I  would  have  to  repeat that  for  a  lot  of  these. Unfortunately,   the  way  they've  input  this, I'm  going  to  have  to  do a  bit  of  work  later  on to   get them  in  the  right  format. But  that's  okay. There  are  other  tools  and  JMP  to  do  that. For  right  now  for  the  sake  of  time I'm  not  going  to  go  into  that. I  just  want  to  show  you  how  we could  read  it  in  from  a  PDF. Now,  thankfully,  starting  in  about  2018, they  started  using  Excel for  their  reports. It  was  a  lot  easier  to  just  copy and  paste  directly  into  JMP. Now  that  we've  got  the  raw  data  tables  in, I'm  not  done  yet  because  I  would  like to  reformat  some  of  these  data  tables to  get  them  into  a  nice  common  formatting that  can  then  use  to  concatenate  them all  together  into  a  single  table. Now  I'll  show  you   how  I  did  that, and  it's  actually  quite  a  lot of  things  that  I  want  to  do. Here's  an  example of  the  Waymo  2022  report. We  in  this  case, we  have  the  date  of  the  incident, the  VIN  number  of  the  vehicle  involved. This  is  just  extra  information that  I  actually  don't  want  to  keep. It's  just  saying,  can  I  drive  by  itself? Is  it  completely  driverless? No. Yes,  sir. Of  course  there's  a  driver  present. Who  initiated  it. I'm  not  going  to  use  that  information. I'm  going  to  keep  the  location and  the  description. This  is  essentially  the  cause. I  will  probably  rename some  of  those  columns. I  also  need  to  summarize  over  this  table, because  from  this  I'd  like  to  create an  aggregate  events so  I  can  use  it with  the  monthly  aggregate  data. I would  need  to  aggregate  summarize  over the  VIN  numbers  over  the  dates  by  month. Do  all  that, so if  you  bear  with  me, I've  a  lot  of  about  five  minutes or  so  to   do  that. I  know  it's  a  bit  dangerous  to  kind of  do  that  live,  but  just  bear  with  me. First  thing  I  need  to  do  is I  need  to  open  this  guy and  then  I need  to  click  this  button. Okay.  Give  me. Let  me  take  a  little  break  here. That  was  a  lot. Thank  you  for  bearing  with  me  on  that. As  you  can  see,  thanks  to  something  new called  the  JMP  workflow, I  was  able  to  accomplish  all of  that  in  a  fraction  of  a  second. Well,  maybe  a  little  over  a  second, but  still  a  lot  faster. I've  got  a  data  table here  aggregated  by  month. I got  the  location,  the  cores, how  many  events  happened  in  that? I  can  relabel  some  of  that  and  I've  got my  individual  data  table that   has  the  date,  the  VIN  and  the actual  individual  events. I've  got  that  set  up. Now  I  can  save  those  in  different locations  to  be  concatenated  later. What  is  this  new  workflow? Well,  let  me  open  up  a  new  one  for  you just  so  you  can  see  it  if  can  get  there. There  we  go. This  is  what  you  would  start  off  with. What  you  do  is  you  click  this  red  record button  and  on  an  individual  data  table  you would  then  it  would  then  record  the actions  you're  taking  on  that  data  table. This  is  very  useful  if  you  have  an  this like  I  have  here,  I  have  a  lot  of  reports. I'm  going  to  be  doing  the  same thing  for  each  report. Then  having  to  instead  repeat  all  those actions  for  each  of  those  tables, I  just  record  it  for  one  table and  then  save  it  as  a  workflow. Then  when  I  ran  it, what  it  would  do  is  it  would  open  up a  window  saying,  "Hey,  couldn't find  that  original  data  table. Can  you  show  me  what  the  new  one  is?" Select  it  and  it  runs  it. Now,  in  this  case, it  didn't  do  it  here  because  I  already had  it  primed  for  this  data  table. But  typically  if  the  data  table  has changed  in  the  name, so  long  as  it  keeps  the  same  formatting, that's  the  only  thing you  have  to  worry  about. If  a  column  did  change a  name,  that's  okay. JMP  also  knows  to  ask, "Hey,  couldn't  find  that  column. Can  you  give  me  a  replacement?" This  is  very  helpful. Let's  say  you  have  a  daily  report that  you  have  to  collect  data  on so  you  would get  a  data  table  of  that  report. Maybe  there's  a  couple  cleaning  things you  need  to  do  on  that  data  table. Save  a  workflow,  open  that  data  table, click  Run,  select  the  right data  table  and  you're  done. This  is  a  great  time  saver. This  really  helped  save  us  a  lot  of  time when  we  were  compiling these  data  together. That  was  something  that  was just  recently  introduced  in  JMP  17. I'm  sure  you're  going  to  love  that. Let  me  exit  out  of  all  of  these. I've  already  got  those  tables  done, and  want  to  make  sure we're  going  to  go  on  to  the  analysis because  want to  make  sure  we  have  time  to  do  that. Let's  start  with  the monthly  aggregate  level. Here  I've  compiled  it  across   all  the  time  periods up  until  the  most  recent,  that  being  2022. Let's  quickly  walk through  what  I've  got  here. I've  got  date  in  this  case, it's  the  month. How  many  vehicles  are  actively being  tested  in  that  period? How  many  total  autonomous  miles were  driven  in  that  month? This  is   how  many different  event  types  happened. This  is  the  total  number of  disengagement  events  that  occurred. Let's  do  a  quick  visual look  at  this. JMP  is  all  about  graphs and  visual  exploration. Let's  look  at  that. Here  I've  got  plotted  the  cumulative number  of  events  over  time and  the  individual  events  and  we  notice is  an  interesting  little  pattern  here. There  seem  to  be  there's  clear  spikes and  then  drops  and  it  seems  to  happen about  4  or  5  times  where we  see  this  spikes. You  can  also   see that  with  the  cumulative  data. Ideally  what  you  would  like with  this  type  of  data, there'd  be  a  bit  of  an  increase at  the  beginning and  then it's  going  to  flatten  out. Ideally,  it  becomes  completely  flat. If  you  do  that,  you  have  achieved the  perfect  autonomous  vehicle. You  have  no  more  disengagement incidents  to  worry  about. But  of  course  that's never  going  to  happen. You  just  want  it  to  be as  flat  as  possible. What  we're  seeing,  of  course, is  a  bit  of  a  rise  or  a  burn  in  period, if  you  will,  and  then  it's starting  to  flat  out. But  we  have  that  repeated  a  couple of  times  most  significantly  here. That's  a  pretty  big  jump  there. Of  course,  you  can  see that  correlating  here. We've  got  our  burn  in  period. Then  as  we  drop in  the  number  of  incidents, our  curve  up top  is  going  to  flatten  out. This  is  going  to  be  the  key  piece  of information  we are going  to  be  looking  at in  our  reliability  platforms. We've   got that  information  there. Now  I'll  go  ahead  and  quickly address  one  question  that  might  come  up. Maybe  you  might  think,  well, maybe  the  more  you  drive  it, the  more  incidents  you  might  have. Maybe  we  should  try  and  account for  that  and  I  have. Here's  another  graph  where  I've  looked at  the  correlation  between,  in  this  case, the  log  of  the  mileage  and  the  log of  the  total  disengagement  events. There  is  a  bit  of  a  trend. It's  a  very  small  one. In  fact,  the  R²  is  not  good  at  all. There's  a  lot  of  spread  in  the  data. Here's  actually  the  prediction  equation with  its  individual  prediction  boundaries, there's  a  lot  going  on. There's  plenty  of  room for  a  flat  line  there. This  is  just   saying  that  there might  be  something  like  that  happening, but  it's  clearly  not  the  driving  force. Yes,  if  you  drive  it  a  little  bit  more, you  might  experience  a  few  more  incidents that   make  sense. But  it's  clearly  not  a  big  driving  force. Something  else  is  clearly  going  on. Hopefully  that  helps  address a  side  question  that  might  come  up. Well,  let's  get  into  the  actual reliability  analysis. I'm  going  to  run  this  script  here, and  this  is  going  to  run that  reliability  growth  platform. Don't  worry. After  I've  run  it,  I'm  going  to  click  here under  the  Red  Triangle  redo  relaunch. This  brings  up  the  initial  launch. This  is  what  the  initial launch  window  looks  like. You  have   four  different ways  of  entering  the  data. The  first  is  a  time  to  event. If  you  had  recorded  here's  how  many days  or  months  up  until  an  event  happened, you  would  use  this  one. Then  you  could  of  course, maybe  multiple  happened  at  that  time. You  could  put  that  there. We  have  the  dates,  which  is  what  I'll  be using  because  we  instead  of  an  actual timed  event  we  just  had  saying  that  in this  month  this  many  events  happen. We  have  a  timestamp and  an  event  count  for  that. You  also  have  it if  you  have  individual  information, maybe  we're  doing  concurrent testing  or  testing  in  parallel. You  can  do  that  information  here. Notice  there's  a  spot  for  the  system I D. If  I  were  looking  at  the   individual  event  information, this  is  where  I  would  go. But  for  right  now,  we're  going to  stick  with  the  dates. I'll  revisit  the  phase  stuff  in  a  moment. You'll  notice  it's  plotted  that  cumulative curve,  which  we  saw  earlier. There  it  is  right  there. Now  the  metric  of  interest  here  is what's  called  mean  time  between  failures. It's  clearly  oriented towards  reliability  data where  we're  looking  at  failures. Think  of  this  as  what's  the  average  month between  incidents, disengagement  incidents. There's  a  bit  of  a  non-parametric measure  here  that's  just  sort  of  a  moving window,  looking  at  it  over  time. But  you  can  get  a  sense  of  how  good we  get,  maybe  how  poor  we  get. I'm  actually  going  to  fit  a  model. The  model  I'll  be  fitting is  here  under  fit  model. This  was  a  bit  of  a  mouthful. It's  a  piecewise  viable  in change  point  detection. Let  me  break  that  down  for  you. The  typical  model  used  in  modeling this  type  of  account  type  data is  what's  called  a  Poisson  process. What  that's  essentially  says  is that  the  rate  at  which  incidents  occur in  this  case  rate  per  month is  constant  over  time. That  would  be  the  equivalent of  a  straight  line  here from  a  mathematical perspective,  that's  a  nice  model. It's  very  easy  to  analyze , from  a  practical  perspective that  is  a  horrible  model because  we  never  get  to  improve. Our  rate  of  incident  is  always  the  same. We  don't  want  a  constant  rate. We  want  the  rate  to  go to  zero,  preferably. That's  what  I  mean by  that  flattening  of  the  curve. What  we  want  is  what's  called a  Non-homogeneous  Poisson  process, fancy term  meaning  the  rate  changes  over  time. The  weibel  part  says  how  exactly does  it  change  over  time? It  consists  of  two  parts. There's  an  initial  rate  lambda Think  of  that  as  like  a  constant  out  front and  multiply  that  by  time raised  to  some  power  beta. That  value  beta  is  going  to  tell you  how  the  rate  changes. If  beta  is  greater  than  one, my  rate  is  increasing  over  time. That's  bad,  we  don't  want  that. That  rate  were  zero. We're  at  a  Poisson  process. It's  constant,  we  also   don't  want  that. What  we're  looking  for  is  a  value  of  beta that's  between  0  and  1  saying,  you  know, okay,  it'll  increase  a  little  bit, but  then  it's  going  to  start  to  die  off, it's  going  to  slow  down  getting  us that  leveling  off  that  we  want. When  we  estimate  these, we're  looking  at  a  value  between  0  to  1. We're  also  looking  visually for  that  flattening  off. For  the  meantime,  between  failures, what  we  want  is  that  to  get  very  large, ideally  it'd  be  infinite, meaning  it'll  never  happen, but  of  course  that  can't  happen. We  just  want  very  large mean  times  between  failures. I'm  going  to  fit  that  model  here. I'm  going  to  use  a  change  point because  that's  saying  that  at  some  point maybe  that  value  of  beta  in  my model  is  going  to  change  significantly, mainly  because  of  this  guy  here. I'm  going  to  run  that  because it  looks  so  different. Yes,  according  to  this, there  is  a  change  point  right  there, a  pretty  significant  one. If  you  look  graphically, that's  not  a  great  fit, but  it's  not  a  really  bad  fit  either. But  it  is  detecting  that  something significant  happened  in  2021. If  we  go  to  this  plot, this  is  showing  you  how  does  that  mean time  between  failure  changed  over  time and  it's  doing  it  for  each in  this  case  phase. This  is  sort  of  the  empirical phase  based  on  this  change  point. In  the  first  phase,  it  says,  well, we  got  to  about  as  good  as  0.15  months. That's  roughly  five  days between  incidents,  not  too  bad. Roughly  a  week. Again,  this  is  across a  fleet  of  vehicles. For  any  individual  vehicle, it  could  be  longer  or  shorter. But  for  the  fleet, essentially  the  software  itself,  about a  week  between  disengagement  incidents. After  2021, it  drops  significantly  to  about  5%. That's  almost  a  day  and  a  half. That's  quite  a  drop. We  can  look  at  the  parameter  estimates. There's  my  data,  that's  just  my  initial. Notice  For  beta  one  it's  less  than  one,  we're  doing  great. For  the  second  phase,  not  so  much. In  fact  we  can  see  maybe  we could  do  a  little  bit  better. Notice  there  were  about  four  of  those, so  maybe  I  can  invoke some  sort  of  empirical  grouping  on  that, which  is what  I've  done  in  the  other  script. It's  going  to  run  this. All  I  did  here. I'll  do  the  redo,  relaunch? I  just  added  a  column  called Empirical  Phase that's  just  got  a  bunch  of  letters  in  it that  just  distinguish the  phases  from  each  other. It's  a  categorical  factor. That's  all  it  is,  and  that's  what these  vertical  lines  represent. This  is  a  combination  of  me repeatedly  using  the  change  point. It  only  detects  one  change  point. You  can  do  a  trick  where  I  can  select  some of  the  data,  hide  and  exclude  it, and  then  do  it  again. That's   a  way  for  it  to  drive detect  different  change  points. Some  of  this  is  also just  me  eyeballing  it. Full  confession. Now  for  this  one,  the  model  all fit  is  called  a  piecewise  Weeble. The  same  thing  is  what  we  did  before. But  instead  of  it  detecting the  change  point,  it's  saying, I've  already  got  those  in  for  you. Each  time  you  see  one  of  these  changes, these  different  phases, that  value  of  beta  is  going  to  change. It's  going  to  connect  them  all  together. But  that  value  of  beta  might  change, indicating  a  change  in  the  process overall  based  on  these  phases. This  is  a  much  better  fit. It's  maybe  it's  a  bit of  a  too  much  overfitting. I  will  acknowledge  that. But  if  we  look  at  this, what  it's  essentially  saying  is for  that  first initial  period  of  changes  in  the  software, were  you  about got  almost  as  good  as  0.4  months. That's  almost  12  days  between  incidents. It's  still  saying  that  we  got  really good  up  until  2021  and  then  we've  dropped. Clearly  something  is going  on  in  around  2021. Something  significant  has  changed in  the  underlying  software  across the  fleet  of  vehicles  that's  dropped. That's  increased  its  incidence  rate. There's  those  parameter  estimates. Again,  notice  I'm  going to  ignore  that  one. That's  a  burning  period. A  lot  of  these  are  less  than  one until  we  get  to  the  season  D's, which  are  afterwards. Clearly  something's  happening because  it's  larger  than  one. We're  already  we've got  a  bit  of  a  story. Something  clearly  happened  in  2021 to  change  the  incident  rate. Maybe  when  we  look  at  the  individual  data, we'll  figure  out  what's  going  on. Let  me  clear  out  all  of  this  and  let's move  over  to  the  individual  vehicles. Quick  summary. We  have  our  date. The  actual  date  of  the  incident, the  vehicle  identification  number,  the VIN this  VIN  group,  it's  nothing  fancy. It's  just  the  first four  digits  of  the VIN. You're  going  to  see  why  I've created  a  group  in  a  moment. I've  got  the  location,  the  cause. I've  reduced  the  text  of  the  cause. It  was  very  wordy. I  just  reduced  it  to  a  basic  description. We'll  cover  that  in  just  a  moment. The  starting  and  ending  month. I've  got  this  information basically  telling  me how  long  was this  vehicle  under  test? This  particular  vehicle, it  was  basically  being  tested in  autonomous  mode  on  and  off from  December  of  2017  to  about  November, maybe  October  of  2019. We're  going  to  need  that  information. I'm  going  to  first  look  at  some basic  some  graphical  summary. First  thing  I'm  going  to  do, I'm  going  to  run  this. This  is  just  going to  summarize  over  the  causes. Let  me  look  at  the  total  number of  disengagement  events for  each  vehicle  each  month. Then  I'm  going  to  run  this  plot and  you're  going  to  see it's  going  to  look  interesting, like  coral  under the  sea,  waving  about  very  pretty. A  lot  of  information  going  on. But  the  basic  thing  I  want  to  tell  you  is the  way  to  interpret  this  is   like we  did  with  the  cumulative  events  graph. That's  what  these  are. You  want  to  see  a  bit  of  a  burn  in, but  it  ultimately  flattening  off. Now  I've  got  a  colored  by  individual  then, which  is  why  you  don't  see  the  labels. There'd  be  way  too  many. I'm  going  to  color  by  a  different variable  this  time. We're  going  to  color  by  the  VIN  group. Come  here  under  color. Again,  that's  just  the  first  four digits  of  the  VIN  number, which  clued  me  in  because  let  me  throw in  the  legend  so  you  can  actually  see  it. Show  legend. I'm  done. Notice  there  was  a  significant  change in  the  VIN  number indicating  maybe a  significant  change  in  the  vehicle. Notice  when  the  change  occurred right  around  2021. Very  interesting. You  can  also  see  a  general  trend in  the  behavior  for  these  early  vehicles. Now,  some  of  these  probably actually  started  earlier  than  this. There's  a  bit  of  censoring  involved, but  already  they're  pretty  flat. We've  got  a  bit  of  burn in  and  then  some  flattening  off. That's  great, that's  what  we  want  to  see. These  guys,  there's  almost a  straight  line  up. We've  got  a  lot  of  incidents  happening in  a  very  short  period  of  time. Now,  later  on  here,  later  in  2022,  there are  starting  to  lean  off  a  little  bit. We've  got  some  that  did  start  and  they're starting  to  be  pretty  shallow  curve. That's  good. But  clearly  something's  happened  here. This  seems  to  indicate that  a  different  vehicle and  perhaps  a  different  type of  software  was  introduced  about  2021. It's  having  some  issues. It's  struggling  a  bit, or  at  least  initially  did. About  here  it's  performing  about  as well  as  we  saw  early  on. They've  they've  done better  in  fixing  it. They   introduced  something new  and  had  a  bunch  of  bugs. That's  what  happens  when you  introduce  new  things. Let  me  exit  out  of  this. Let's  look  at  maybe  something about  a  different  type  of  cause. Now,  remember,  in  the  previous  video, we  didn't  really  look  at  the  cause  because it  was  aggregated  across  all  the  vehicles. But  let's  look  into  that  real  quick. Maybe  there  might  be something  going  on  there. Before  we  actually  get into  the  recurrence  data. Here  I've  just  done a  basic  bar  chart  of  the  causes, and  a  lot  of  them  are things  you  might  expect  to  be  a  cause. An  unwanted  maneuver equals  doing  something  you  don't  want it  to  do,  a  perception  discrepancy. The  vehicle  didn't see  something  it  should  have or  thought  it  saw something  that  wasn't  there. Then  there's,  of  course,  software, a  general  catchall  software  discrepancy. Something  happened  with  the  software. There  are  some  interesting  causes. For  example,  other  people being  jerks  on  the  road. A  reckless  road  user  car didn't  know  what  to  do. Adverse  weather,  incorrect  behavior prediction,  hardware  discrepancy. These  are  the  basic  causes. Now,  I'm  going  to  use a  feature  called  a  Graph  Lit to   investigate  things  further. I'm  going  to  right  click  inside  this  bar. I'm  going  to  hover  label, I'm  going  to  click  Bar. What  it's  done  is  it's  added a  little  bar  chart  Graph Lit. When  I  leave  my  bars  there with  a  hover,  here's  my  little  Graph Lit. Now,  by  default,  it's  going  to  use the  first  categorical  factor  it  sees, which  was  the  variant  number. I  don't  want  that. I  actually  want  the  date. I'm  going  to  open the  control  panel  here. I'm  going  to  put  the  date here  and  replace  it. I'm  going  to  right  click  in  the  date. Because  it's  a  date. This  was  recently  introduced. Think  it  was  either  16  or  17. I  can  actually  bend by  a  particular  type  of  date, in  this  case I  want  to  bend  by  a  month? Perfect. I  also  would  like to  color  by  the  VIN group. Just  so  that  we  get  a  clear  comparison of  the  different  types  of  groups. Notice  there's  a  bit  of  purple  here. That  means  there  is  some  overlap, which  makes  sense. There  was  a  bit  of  overlap  from  end of  2020  to  2021  with  a  new  groups, but  now  we  can  clearly  see, compare   visually  the  groups  across the  types  of  incidents  just  visually. I'm  going  to  come  here,  I'm  going  to  go back  down  to  save  script  to  a  clipboard. I'm  going  to  exit  out  of  this because  right  now, if  I  hover,  it's  no  longer  there, but  I  can  right  click go  to  hover  label  and  paste  Graph Lit. Now  it's  going  to  show  me the  graph  for  each  of  them. In  fact,  I'm  not  going  to  click and  zoom  in because  we  can see  what's  happening. There's  not  too  much  there's this  is  the  unwanted  maneuver. There  might  have  been  a  bit  more unwanted  maneuvers, but  that's  more  of  a  general  spike in  overall  terms, not  much  difference,  I  would  say, for  the  perception discrepancy  pretty  much  on  par. There's  a  bit  more  in  general  here, but  I'd  still  say  comparable. Sort  for  discrepancy. Whoa,  let's  look  into  that. Okay,  This  is  very  interesting. Apparently  with  the  old  group, there  was  hardly  any  software discrepancies,  maybe  one. Whereas  with  the  new  group,  there's  a  lot. Interesting. There  could  be  some  explanations. It  could  be  that  the  way  the  category was  defined  maybe  changed  over  time. Maybe  there  were  other  things. It  was  re  categorized as  other  things  for  the  old  one before  they  change the  definition  for  this  new  group. That's  a  very  valid  explanation. It  could  also  be  taken  at  face  value. Maybe   there  were  a  lot  of  other different  causes  with  the  old  one. Maybe  if  we  look, we're  able  to  look  back  early  on at  the  individual  vehicle information  for  the  earlier  data, maybe  we  would  have  seen  a  spike in  the  software  discrepancies. Because  this  is   later in  that  series  testing, maybe  we  did  those  out  a  little  bit. We're  just  seeing  these  early  bugs happening  with  the  new  vehicles. Let's  actually  now  go  into  the  recurrence event  analysis  now  for  current  events. That's  just  saying  have  an  individual object  that  can experience  an  event  repeatedly, in  this  case  a  disengagement  event. Over  time. I'm  going  to  run  the  script and  then  do  the  redo  relaunch  so you  can  see  what  I  did. Do. I'm  going  to  do  relaunch  by  group. In  this  case,  here's  what  you  would get  if  you  opened  it  up. In  this  case,  you  have an  age  or  event  timestamp. When  did  it  happen? Have  a  timestamp.  The  date. What's  the  system? That's  the  VIN. What's  the  cause? Well,  hey,  great  to  have a  column  called  cause it's  going  to  basically  break it  down  by  the  different  causes. You  can  look  at  these  different  causes. The  timestamp. When  did  I  start  testing  on  this  thing? Basically,  when  does  it  undergo  start? When  did  I  stop  looking  at  it? What's  the  timestamp? I  put  the  VIN  group  in  by, I  could  have  put  it  in  grouping, but  there  is  a  bit  of  an  issue  where  if there's  not  a  lot  of  data in  the  causes  for  both  of  them, it's  not  going  to  be  able  to  compare  them. For  right  now,  I'm  just  going to  use  a  by  group, a VIN  by  group. We  can  still  do  a  sort of  visual  comparison. For  the  scaling  you  can  select  that I've  done  today  because  that's the  level  of  resolution  we  have. It'll  do  it  in  a  day  just for  interpretation,  ease  of  that. Okay,  so  we're  looking at  this  first  group. This  is  the  early  group  two  for  our  group. Notice  it's  broken  it  up by  all  the  different  causes and  they're  pretty  similar. The  hardware  discrepancy  seems  like the  biggest  one, if  I'm  reading  that  correctly. Yes,  that's  hardware. Notice  this  is  what's called  a  mean  cost  function. It's  similar  in  interpretation to  that  cumulative  function. You're  looking  for  the  same  behavior, a  bit  of  a  rise,  but  you ultimately  want  it  to  flatten  out. This  is   saying  what's the  average  sort  of  incidence  rate for  a  single  vehicle  by  this  age? It's  a  little  bit  harder  to  interpret  than just  what's  the  mean  total  count because  it's  accumulating  over  time. Just  at  a  high  level  think, I  want  to  rise  but  also  want  it to  flatten  out  at  some  point. So that  I'm  getting  no  more  incidence for  that  vehicle. That's  what  we're  seeing with  a  lot  of  these. There's  a  bit  of  a  rise  and  it's starting  to  shallow  out. If  you  want  detailed  information, you  can  look  under  each  one. We  can  do  some  fit  models, I'm  not  going  to  at  this  point, because  we're  starting  near  the  end, but  it  would  be  similar  types of  models  that  you  would  fit  here. Let's  look  at  the  SAD  Group,  if  you  will. There's  clearly  a  difference  here. Notice  this  big  spike  and  again, that  is  the  software  discrepancy. What  we're  seeing  is  we've essentially  got  a  new  vehicle. You  can  you  can  clearly  see  if  we compare  that  is  starting  out  later. We  had  a  couple,  maybe  some  early on  while  the  other  vehicles  being  tested. Makes  sense. You  got  to  sort  of  pilot group  being  tested. Then  when  they  did  full  scale  testing, the  software  started  having  some  issues. Now  it's  starting  to  get  better. We're  seeing  that  level  out. We  saw  that  in  the  previous  data. There's  clearly  a  difference, but  they  are  doing  better. These  two,  the  unwanted  maneuver, the  perception  discrepancy, I  would  say  visually  they're on  par  with  what  I  see  here. Clearly  the  software is  the  big  issue  here. What's  the  big  takeaway  here? What  would  we  say  about  this? Well,  it  seems  that  we've  got  two different  series  of  vehicles. We've  got  one  that  they  were working  on  for  a  while. Then  in  2021,  they've  introduced a  new  series  of  vehicles. For  right  now,  especially  in  2021, there  is  a  bit  of  issues with  the  software  going  on. A  lot  of  software  discrepancy issues  causing  incidents. They  seem  to  be improving  that  now. That   makes  sense. You've  introduced  something, there's  going  to  be  bugs. Just  give  it  some  time. All  the  other  ones  seem  comparable with  what  was  going  before. Overall,  I've  got  a  pretty  good feeling  about  this  new  series. Give  it  some  time  to   burn  in. Well,  I  hope  I've  given  you  a  flavor of  all  the  things  that  JMP  can  do, especially  in  the  reliability  platform  as well  as  in  preparing  some  of  your  data. If  you  have  any  questions  about  this, this  journal  will  be  available where  you  can  find  this  talks. Some  of  the  data  tables  will be  available  there  as  well. You  get  to  play  with  all  of  this. There  are  also  some  other  graphs  didn't really  go  into  some  other  analyzes, so  feel  free  to  explore. That's  what  we  do  here  to  JMP statistical  discovery. If  you  have  any  questions  about  it, I'll  be  in  some  of  the  Meet  the  Expert sessions  so  you  can  find  me  there. I'd  love  to  chat  with  you  about  this or  anything  else  you  might  have  about designing  experiments  or  reliability. Thank  you  for  your  time.
Monday, October 16, 2023
JMP Pro’s Neural Network platform provides a powerful and flexible tool for generating predictive models for many types of data. Users optimize the models by adjusting various parameters and then compare individual models within the platform. This process is carried out manually, however, which takes a significant amount of user interaction and may not necessarily uncover an optimal model.   This presentation demonstrates the use of a JMP Pro add-in designed to screen neural network tuning parameters. The add-in creates a fast, flexible space-filling design based on user input, then runs a neural network model for each set of parameters. A graphical report helps users identify optimal models, so they can then run the selected model(s) or continue tuning the parameters to build their understanding of the system. Finally, the add-in allows users to save the results from the tuning optimization and return to it later, which is helpful when dealing with complex neural network models that might take considerable computing time.   Various JMP customers and Community members have inquired about a neural network tuning capability in JMP; this feedback was used to guide the development of this add-in.   The NN Tuning add-in is available for download in the add-in section of the JMP Community. Any updates will be posted there. If you find the add-in helpful, please let me know how you are using it and if you have any suggestions for future versions!     All  right. Hello  everyone. My  name  is  Scott  Allen. I'm  a  Systems  Engineer  at  JMP. I'm  going  to  be  presenting a  neural  network  tuning  add- in that  I've  developed  really to  help develop  neural  network  models in  an  automated  way  to  build  many  models, find  out  which  parameters  are  most effective  in  building  the  best  model. What  I'd  like  to  do  is go through just a little bit of background on  motivation  for  building  this  add- in. I'm  going  to  do  maybe  a  walkthrough, the  description, what's  going  on  in  the  background when  you  run  the  add- in and  then  show  a  use  case  for  it. A  sample  workflow. Then just really wrap  up   with  some  lessons  learned. This  was  my  first  big  scripting  project, so  certainly  learned  a  lot  along  the  way. I'd like  to  share  that   and  then  as  well  as  what's  next. First,  just  I  want  to  take 3, 4 minutes here just  on  the  background  and  motivation. A lot  of  other  machine  learning platforms in JMP  have  some  sort of parameter  tuning  function, whether  it's  a  tuning  table  that  you create  outside  the  platform, like  in  bootstrap  forest  or  boosted  trees, or  an  integrated  tuning  design like in  support  vector  machines and  the  new  XGBoost  platform. But  there's  not  one  for  neural  networks. That  led  to  some  customers   that I work with asking about it. "Is  this  something  that's  available? Are  there  scripts  out  there   that  we  can  use? Is  there  an  add- in?" There  are  a  few  things  out  there, but  maybe  not  everything  in  one  place. Then  on  a  personal  level,  I've  never undertaken  a  large  scripting  project. I've  done  some  small  JSL  scripts to  automate  workflows, or  to  clean  up  data, but  I  had  never  built  an  application. This was, I  really  wanted  a  project   to  help me learn. I  used  this  as  an  excuse  to  learn  JSL. The  neural  network  platform  in JMP is really  powerful, and  so  I'll  just  launch  it  really  quick. I'm  sure  if  you've  seen  it... I  will  mention  I  am  using JMP PRO version  17.1 . That's  where  all  the  testing  has  happened for  this  add- in. I  have  not  gone  backwards  many  versions. I  would  suggest if  you  are  going  to  use this you  need  to  have JMP  PRO  and  at  least  version 17.1. The  JMP  neural  network platform is  really  powerful and  I  just  want  to  take  a  few  minutes to  go  through  what  this  looks  like. I  think  everybody  has... If  you've  used  neural  networks, you've  gone  through  this workflow and  it's  a  really  nice  flexible platform  for  developing  these  models. You  can  start  off  with  a baseline model and  maybe  you're  not  satisfied with  the  performance of  your  model  in  this  case. You  can  just  go  back  to  the  model launch and  you  can  adjust  the  parameters  here. You  can  go  and  generate  another  model. It's  easy to  start changing  these  parameters and  finding  models  that  might  be the  most  appropriate  for  your  system. But  there  are  a  lot  of  options  in  here. You've  got  different  number  of  nodes, you've  got  one  or  two  layers, there's  boosting  options  as  well  as a  lot  of  different  fitting  options. You  don't  always  know... It's  not  always  clear  which  of  these parameters you need to change in  order  to  get  the  best  model for  the  data  table  that  you're  working  in and  the  predictors  that  you're  using. It's  not  always  clear if  adjusting  a  parameter is  going  to  lead  to  a  better  model. You  just   adjust  up a nd  down, maybe  take  big  steps to  figure  out  what's  going  on. It does  require  a  lot  of   manual  adjustments  by  the  user in  order to  generate  these  models. They  are  pretty  fast, so  it's  not  time- consuming  to  create  these but  it d oes  require  a  lot  of  clicks. My  goal  here  and  some  of  my  inspiration was  really  to  provide  a  single  platform for  tuning  and  evaluating those  neural  network  models. I  was  also  inspired there  were  some  tuning  scripts  out  there. Nick  Shelton  had  a  script that  would  help  you  build that table and  run  it  and  evaluate  those  models. Mark  Bailey  had  developed an  add- in  for  this  as  well, and  there  were s ome  community  posts  that were  trying  to f igure  out  how  to  do this. Those  were  all  some  starting  points for  this  add- in. Then  I  really  like the  XGBoost  add- in  interface. It's  really  clean, it's  got  some  graphical  outputs and  it's  really  easy  to  navigate and  lets  you  auto tune  those  networks. The  add- in  that  I'm  developing   isn't  auto tuning, it's  really  more  of  a  brute  force  method. Just  build  lots  of  models and  then  have  a  graphical  interface  there to  help y ou  find  the  best  one for  your  system. Let's  go  on  and  move  on t o  just   what  the  add- in  does  and  how  to  use  it. To  launch  the  add- in,  just  like  most  add- ins, you  go  to  the  add- in  menu, and  it's  called  Neural  Network  Tuning. That  takes  you  to  the  launch  dialog. This  gives  you  a  column role  interface that's very  similar to  many  other  platforms. We  can  just  select  all o f  our  factors  here and  our  response. Currently,  it  allows  a  single  Y  response, and  it  can  be  either  continuous or  categorical. The  add- in  currently  only  has the  two  validation  methods, so  you  can  specify  a  validation  column with  both  validation  and testing. Settings  if  you  have  that,  if  you  don't, it's  just  going  to  do  a  random  holdback, the  default  for t he  neural  network. Then  you  can  specify  and  toggle or  toggle  the  informative  missing  as  well. But  we'll  just  go  with  this  for  now. We  click  OK,  and  that  takes  you   to  the  tuning  launch. This  tuning  dialog,  first,   gives  you  a  little  bit  of  information about  what's  in  the  data  table and  what  you  specified  as  column  roles. Right  now  we've  got  the  data  table here, the  model  validation  method, as  well  as  whether  or  not   we  have  informative  missing  on  or  off. Then  we've  got  the  DOE  option. What  this  add- in  will  do,  is  look  at  all the  parameter  ranges  that  you've  set and  create  a  fast,  flexible,   space- filling  design. Really  the  only  design  specification   you  need is  to  set  the  number  of  trials   in  this  design. In  this  case  we've  got  by default it's  going  to  be  set  to  20, and  there are  some  rules  of  thumb. Maybe  you  want  to  specify at  least  10  treatments  in  this  design for  every  factor  that  you're  looking  at, but  it's  completely  up  to  the  user, and  so  we'll  come  back  and  adjust that  if  we  need  to. You  can  also  replicate  this  design with  a  different  random  seed. In  some  cases  if  you  want  to  see how  robust  you  might  be to  certain  parameters,  you  can  run the  same  neural  network  model  20  times, but  all  with  different  random  seeds to  see  how  robust  the  model  is  to  that. Then  you  can  toggle  whether  or  not  you want  to  see  the  DOE  dialog  that  shows  up. Once  you've  gone  through the DOE  options  here, you  can  come  down to  the  neural  network  tuning. I  tried  to  stay  pretty  true to  the  original  platform in  the  neural  network  by  using similar  language  and  options  here. This  is  where  you  go, a nd  instead  of  specifying  a  single  value for  one  of  these  parameters, we're  going  to  specify  a  range. Also,  I'm  limiting  a  little  bit  the  available  options for  the  type  of  neural  network  that you're  going  to  create. In  this  case  you  can  generate   a  single  hidden  layer, you  can  generate  two  hidden  layers, or  one  hidden  layer  with  boosting. Currently,  the  neural  network  platform  doesn't  allow a  second  layer  with  boosting. If  you  try  to  do  that in  the  neural  network  platform, it  will  give  you  a  warning  saying  that   it's  going  to  ignore  that  second  layer. In  this  add- in,  I  just  make  it  so  that  you  zero  out  all  those  second  layers when  you  activate  boosting. In  this  case  we're  going  to e xplore this  data  table  a  bit and maybe  we're  going  to  set  some wide  range  of  tuning  parameters. I'm  going  to  go  from  0-9  on  each  of  those activation function  number  of  nodes. Over  on  the  fitting  options,  you  don't  have  to  necessarily know what  might  be  best  for  your  system, you  can  just  select  them to  be  part  f  the  DOE,   or  in  the  case of these  transform  covariates  and  robust  fit  options. You  can  toggle  them  to  always  be  off, always  be  on,  or  include in the DOE. We'll  include  both  of  those  in  the  DOE for  this  example. Penalty  methods. You  can  test  any or  all  the  different  methods. If  you  don't  specify  any  penalty  method, it  will  run the  default  penalty  method  squared. But  we're  going  to  check  all  of  those. You  can  also  specify the  number  of  tours  per  model. In  retrospect,  I  probably  didn't  need to  add  this but because  you  could  do  something  similar by  just  replicating  all  of  the  designs and  seeing  the  results from  each  of  those  models. But  it's  here  in  case  you  want to  adjust  the  number  of  tours. You  can  also  specify  a  random  seed. In  this  case,  I've  got, 3,4,5,6. Six different  factors  in  this  design, and so  I'm  just  going  to  maybe   turn t his  up  to  60. I'm  going  to  create  60  models. Before  I  click  Run,   I'm g oing  to  talk  a  little  bit  about what's  going  on  when  you  click  Run. The  first  thing  that  happens when y ou  click  Run, is  it's  going  to  create  a  factor  table with  only  the  selected  parameters. We select it. In  this  case  it'll  create  a  factor  table   with  single- layer  nodes, all  of  the  different  fitting  options. It's  not  going  to  put  the  second l ayer   or  any  of  the  boosting  parameters into  the  factor  table. Then I'm  going  to  create  a  response table where  the  training,  validation, and  testing  R- squared  values are specified a nd  then  it's  going to  combine  those  into  a  single  table. This  is  what  the  sample DOE  table   will  look  like. It's  just  going  to  have  the  responses as  the  R- squared  values and  then  all  the  different  parameters. Then  these  just  get  passed  into  the neural  network  platforms  row- by- row, and  the  results  show  up  in  the  data  table. Just  like  we're  saying,  each  of  those  runs in  the  design  is  going  to  go  into  the neural  network  model  platform  sequentially  and  then  we  have  a  little  dialog  box that's  going to  indicate  the  overall  process. In  the  case  of  really  long- running   neural  networks, it's  good  to  just  see  how  quickly   the  whole  design  is  going  to  go. I  would  recommend  before  specifying  a  large  number  of  trials  in  this  design that  you  run  one  or  two  ahead  of  time  just  to  see how  long  they're  going  to  take. If  you  have  fairly  small  data  set,  then   they  go  pretty  quickly,  but  if  you  have a  complex  data  set,  they  can  take   sometimes  quite  a  while  to  run. In  this  case  I'm  just  using  a  sample data  set  from  the  sample  data  set  library. This  tablet  production. It's  not  the  best  data  set  to  run for  neural  networks  just  because there  aren't  a  whole  lot  of  observations, but  I'm  using  it  just b ecause the  neural  networks  run  really  quick. In  this  case,  if  we're  satisfied   with  all  of  the  specifications that  we  have  in  our  model,   we  can  just  go  ahead  and  click  Run. What  this  has  done  is  it's  created that  design  table and  now  it's  passing  through   each  of  those  trials into t he  neural  network  platform and  giving  an  output. If  we  look  at  the  home  window, all  this  has  done  is  it's  taken that  tuning  result  table  and  I've  hidden  it and  I  just  bring i t  out   into  the  same  window  here. Let's  just  go  on  to  the  next  one. We  can  work  through  that  data  table  here,  so  it's  interactive. What  you  first  see  up  here  is  graph  builder  report. T his  shows  you   all  of  your  different  parameters, so  the  different  activation  functions as  well  as  the  number  of  nodes is  the  scale, and  how  the  validation  R- squared  changes with  respect  to  each  of  those individual  parameters. In  the  case  of  categorical  parameters, you g et  a  box  plot  showing  the  difference in  the  values  this  way. We  can  see  in  this  case maybe  for  robust  fit, it's  got  a  wider  range,  but  on  average  it's g oing  to  be  better. The  median  value  is  better. Is  higher  R- squared. We  can  also  look  down  here at  the  data table, it's  going  to  sort  the  data  table by  the  validation  R- squared and  it  just  shades  those  a  deeper  purple depending  on  the  value. Let's  see. We can use... One  reason  I  just  kept   this  data  table  as  it  is, is  we  can  explore  it  just  like  we  explore any  data  table  in  JMP, so  you  can  look  at  column  headers or  you  can  launch  graph  builder or  do  various  analyses  on  this  data  table as  it  is  right  here. There  are a  few  additional  data  columns or  columns  that  are  added to  this  data  table. One,  it  records  the  random  seed. If  you  want  to  reproduce  this, you  know  what  random  seed  was used in  generating  the  n eural  network  models,  shows  you  the  informative  missing, shows  you  the  elapsed  time   for  each  of  the  models. If  you  are  building  large  models or  you  might  want  to  minimize  the r untime, then  you  can  get  an  estimate o f  how  long it  might  take  for  you  to  run additional  models. nDOE  just  records  what  DOE  was  run. This  was  the  first,  so  it  gets  a  1. As  you'll  see  we  can  run  a  second  and  third,  and  you  can  then  use  those as  some  graphing tools. Also  tells  you  the  type  of  neural  network that  was  run. Whether  it  was  single- layered,   double-  layer, or  a  single- layer  with  boosting. There's  a  couple  of  other columns  that  are  added. This  factor  column  just  lets  you   keep track  of  the  factors that  were  used  in  the  design in  case  you  want  to  compare  models that  have  different  factors  that  were  used  instead  of  just  parameters, and then  also  a  row  column. The  next  steps  so  we can  continue  building  out. Maybe  in  this  case   we're  exploring  the  parameters. Now  we've  got  a  baseline of  a  single  neural  network and  maybe  we  want  to  see  how  this compares  to  a  two- layer  neural  network. We  can  go  back  to  the  tuning  launch which  was  just  minimized. You  can  see  that  all  of  this  report, the  graph  builder  report as  well  as  the  table  are  just a ppended to  the  tuning  launch. We  can  go  back  up  to t his  tuning  launch   and  maybe  I  want  to  look  at two  hidden  layers. We'll  go  to  two, and  now  I'll  put  in  a  range  of those  results  or  those  parameters and  keep  everything  else  the  same. You  can  see  it's  going  to  preload  the  random  seed  that  I  generated in  the  first  DOE  just  to  make  sure everything's  using  the  same  random  seed. We'll  keep  it  at  60,  even  though  we  might  want  to  increase  it by  a  little  bit  since  we're a dding   some  more  factors, but  keep  it  simple,  we'll  just go  with  60. One other  thing  just  to  mention  is, in  the  case  it's  probably  pretty low  likelihood t hat  all  of  these   would  be  set  to  zero  in  a  design, but  if  they  are   it'll  just  go  to  the  default  with  JMP, which  is  a  single  10H  layer. You  don't  have  to  worry  about  having  a  neural  network  with  zero  layers or  zero  nodes. If  we're  satisfied  with  this,  we're  just  going  to  click  Run. While  this  is  running,  I'm  just  going   to  bring  up  the  home  window. You  can  see  what  it's  doing. We've  got   our  original  tuning  result  table  here, and  it's  built   a  second t uning  result  table, and when  it's  done, i t  just  appends  that to  our  original  table. Now  I  can  see  in  orange, is  all  of  those  second- layer  models. We  can  see  in  this  case, maybe  having  two  layers was  not  very  beneficial  in  this  case. Looks  like  in  pretty  much  all  cases, those  neural  networks  are  not  as  good. Then  maybe  just  to  round  this out,  we're  going  to  go and buld some  neural  networks  with  boosting. Boosting  is  a  really  nice  way  to  increase the  prediction  power  of  our  model. We're  going  to  specify  some boosting  levels  from  1-10,  as  well  as the  learning  rate  from  0.1  to  1. Once  again  our  random  seed  is  the same, and  these  are  going  to  take a  little  bit  longer,  so  I'm  just  going to  decrease the  number  of  runs just  in  the  interest  of  time. We  can  click  Run  here,  we  get  our  dialog  box  that  tells  us our  overall  progress,   and  we  can  see  these  in  the  cases where  we've  probably  got  lots  of  boosting, they  might  be  taking  a  little  bit  of  time. In  this  case  boosting  neural  networks, or  neural  networks  that  had  boosting are  now  this  purple,  and  we  can  see   in  pretty  much  all  the  cases purple  is  higher,  especially  when  it  looks  like we  can  see  by  these  box  plots  as  well. Then  we  can  also  see  in t his  case   over  here  our  boosting. Maybe  we  haven't  found   the  maximum  number  of  boosts, it  looks  like  it  continues  going  up, so  we  might  want  to  add  some  additional boosting  layers  or  maybe our  learning  rate  is  best at  this  middle  to  high  value instead  of  at  the  lowest learning  rate  values. The  next  thing  we  might w ant  to  do   once  we  have  this  initial  view of  our  system,  is  to  run  a  few  of  these neural  networks  and  compare  them. We  can  do  that  a  few  ways. We  can  look  in  our  table  here a nd  run or  and  select  some  neural  networks based  on  the  graph,  and  you  can  see they're  highlighting  down  below in  our  data  table  as  well. Maybe  what  we  want  to  do  is  run  the  top  four  or  five. I  want  to  run  the  top  five. I don't  have  to  go  back  into   the  neural  network  platform  to  do  this, I  can  just  run  it  by  clicking this  button,  run  selected  models. Now  it's  going  to  run  those   back  into  the  neural  network  platform and  give  us  a  little  output  that  shows  us the  row  in  our  data  table  that  was  run. This  is  just  the  standard neural  network  output  now. We  pass  through  the  same  random  seed, so  if  we  take  a  look  at R-squared values, we're  going  to  get  the  same  neural  network  that  was  developed because  we're  using  that, passing  through  that  same  seed. Then  we  can  come  through and  we  can  take  a  look  at more... Either  the  more  summary  statistic  values or  the  actual  versus  predicted, or  get  the  profilers. Whatever  we  might  want  to  do to  compare  these  neural  networks. One  other  thing  we  might  want to  do  at  this  stage is  create  some  additional  visualizations. We  can  go  through and  we  can  show  this  tuning  table. If  we  don't  want  to  work  on i t  here,   we  can  just  unhide  it, and  now  we've  got  a  data  table  that  we  can  use, and  we  can  go  into  graph  builder and  maybe  we  want  to take  a  look  at  validation  versus  training  R- squared just  to  see  how  that's  looking,  or  we  can  go  through. Because  we've  got  the  number  of  DOE, we can  track  how  our  neural  networks have  improved  with  our  tuning. I  can  go  into  graph  builder   and  I  can  use  that  nDOE, and  maybe  we  want  to  look  at  training  and validation R-squared, we  can  turn  on  box  plots  and  we  can  see how  our  tuning  has  progressed  over  time with  each  subsequent  design  that  we  ran. Other  option  is  if  you  want  to  do   some  more  analytical  type  analysis  here, we  can  go  into  screening and  we  can  run  predictor  screening. We  can  take  all  of  those  parameters,  load  them  in  as  predictors, and  look at  our  training   and  validation  R- squared. Click  OK,  and  now  we  can  see  which  factors  might  be  most  influential in the  training  or  validation. Not  surprisingly,  those  learning  rates   and  boosting a re  good  at  this  validation. There's  no  end  to  what  analysis you  might  want  to  do, on  here  it's  just   what  your  goal  of  the  neural  network  was and  where  you  want  to  go  with  it. But  you  can  do  that  all  in t his  data  table  which  now  has  160  neural  network  models. Maybe  these  neural  network models  take  a  little  bit  of  time, so another  thing  you  can  do is  we  can  just  save  this. I  want  to  go  file,  and I  can  save  this. I'm  just g oing  to  save  it  to  my  desktop as T uning  Results. I  can  come  back  to  this  later. Now  that  I've  got  it  saved, I'm  going  to  close  this. One  thing  that  you'll  get  a  warning, when  you  close  it. You  should  just  hide  this  data  table. If  you  close  it,  it's  going  to  take  it out  of  the  neural  network  platform and  you  won't  be  able   to  continue  your  work, but  you  can  hide  it. When  you  get  back  to  the  add- in, when  you  close  this  window, it's  just  going  to  remind  you . You can  close  it  without  saving   or  you  can  cancel  to  go  back  and  save  it. We're  just  going  to  click  OK. When  I  save  this  tuning  table  for  later, so  what  I  can  do  is, is  now  I  can  relaunch  that  add- in and  I  can  go  into  the  tuning  add- in, click  OK. But  instead  of  resetting  any  parameters and  clicking  Run,  I  can  go  to  Load. What  I  can  do  is  I  can  load a  previously  saved  tuning  table, and  what  this  does  is  let  me  in  the  case  of  neural  networks that  take  a  long  time, you  don't  have  to  start  over  from  scratch. You  can  just  keep  building  on  the  table that  you  wanted,  or  maybe  you  do some  additional  analyses  offline   and  want  to  come  back  in and  build  some  more  models, so  you  can  go  through and  when  you  specify all  the  different  parameters  and  number of  tuning  runs  you  want  to  have, it's  just  going  to  continue  appending  it   to  this  table. With  that  let's  minimize  this. That's  quick  run  through   of a ll  the  different  options in  this  tuning  table. It's  really  a  brute  force  way. You  can  see  there's  no  auto tune. Right  now  it's  just  running all  those  different  parameter  combinations  giving  you  an  idea of where  your  optimal  model  might  be . How  you  might  work  this  into  your  workflow so  you  can  explore  those  parameters, this  is  essentially  what  we  did  today. We  looked  at  single- layer  models, we  looked  at  models  with  two  layers. Single- layer  with  boosting. Another  thing  you  might  do   is  go  back  and  say, "W ell,  where  are  my  optimal  models?" They're  here. I  can  go  back  to  the  tuning  launch and  maybe  I  want  to  recreate  those  models and  go  more  towards  this  range. It  looks  like I'm in  the  middle  range  here,  is  optimal, so  instead  of  going  from  0-9, maybe  I  go  from  5-7 . Linear  models  also  might  be  best in  the  middle, and  maybe  the  gaussian,   there are  some  good  ones  on  the  left, so  we'll  go  from  1-8  or something  like  that,  seven  or  eight. Then  we  can  look  at  which... In  this  case,  there  wasn't  an  advantage to  having  it  on  or  off,  so  we  could  just  have  those  off, or  off  or  on. But  it  looks  like  maybe  in  this  case the  weight  decay  was  the  best  method, so  we  could  just  include  that  one and  then  go  through and  build  some  more  models with  a  narrow  range  of  those  parameters. We  can  also  check  parameter  sensitivity to  a  random  seed. In  this  case  we  can  pass  through  either constant  values o r  a  narrow  range. You  can  pass  through   a  constant  set  of  parameters just  by  setting  each parameter  equal  to  each  other, and  it'll  pass  that  through   the  neural  network  as  five  every  time, or  you  can  just  specify  the  min  value,  and  that  will  pass  it  through as  a  constant  value. If  we  wanted  to  we  could  look  at maybe this  was  a  neural  network  that  was  one  that  we  identified  as  optimal, but  I  want  to  see  how  robust it  is  to  the  random  seed. What  we  can  do  is   we  can  just  say  just  run  it  once, but  let's  replicate  it with  20  random  seeds. That  might  actually  take a  little  bit  of  time, so  let's  go  down  to  10. Then  we  can... Actually,  I  don't  want  to  do that  on  this  one, because  I've  got  all  those  others,  but  we'll  just  run  this. You  might  want to  do  this  on  a  clean  data  table where  you're  not  adding  it  to  the  design, but  we  can  use b ecause  we  have the  number  of  DOEs  we  could  find where  that  random  seed  was. We  can  see  here,   all  the  different  random  seeds and  I  can  go  into  graph  builder   and  I  could  just  put  in the  training  and  validation  R- squareds, and  then  I  can  use  local  data  filter  now on  my  random  seed  to  show  everything   except  the  one  where  we  had  all  those. Now  I  can  see  how  robust my  model  is  to  that  random  seed. That's  the  same  design now  run  10  times, and  so  on  average  we  can  see what  the  median  value  is or  the  range  of  expected  results. Then  finally,  w e  can  load  those  saved  data  tables. Those  are  really  beneficial   for  neural  networks that might have really  long  computing  times,   so  you  can  save  it  off and  come  back  to  it  later. Those  are  some  workflows   that  might b e  helpful  to  you  as  you  build your  neural  network  models. I  just  want  to  wrap  up with  a  few  lessons  learned. This  was  the  first  scripting  project   that  I  went  through that  developed an  application, and  so  I  think  this  is  pretty  obvious to  anybody  that's  developed  any JSL or  built  any  applications,  the  JMP  community  is  really an  invaluable  resource. I  would  say  95%  of  all  my  questions, I  would  just  go  to  the  community and  someone else  had  a  similar  issue. Really  recommend  if  you  are  interested in  building  add- ins, building  applications,  the  community is  really  the  first  place  to  go. Really,  your  imagination  and  your  targets are  the  limiting  factor in  developing  these  applications. A ligned  with  that is because  there are  so  many  possibilities, there  was  a  need  to  keep  this  focused. In  this  case,  I  just  have... I  think  in  one  iteration I  had  many  different  outputs,   different  graphs, but  they  weren't  really the  heart  of  the  application. Because  there's  so  many  possibilities, I really  tried  to  keep  this  focused on  just  giving  some  graphical  output   to  navigate  the  different  models and  run  them  and  identify   which  might  be  most  appropriate. What's  next? I  really  want  feedback. It's  selfish,  but  it's  what  I  really  want. I  would  love  to  hear  how  people   can  use  this,  how  it's  benefiting  them. You  can  find  this  in  the  community, in  the  add- in  section  of  the  community. Please  feel  free  to  make  a  post  here, tell  me  how  you're  using  it. I  do  have  a  few  known  issues, and  as  I  gather  more, I'm  sure  it's  not  100%  bug- proof,   but  this  will  be  the  place  to  go to  get  the  current  version  of  the  add-in  as  well  as  understand  any  known  issues. But  I'd  love  to  hear  how  you're  using  it, and  the  community   is  a  great  place  to  do  that. I  would  also  like  to  hear ideas  on  additional  reports. I  kept  this  pretty  simple,  but  if  you  find  that  there's  a  report that  you're  generating every  time  you  run  this, I'd  like  to  know  what  that  is a nd  maybe we  can  build  that  into  the  add- in. I'm  also  going  to  continue  debugging. There  are  a  few  known  issues  that  I'm working  through  right  now, and  then  maybe  a  larger  goal  is  to  try to  do  that  auto tune. Instead  of  doing  this  brute  force  method, maybe  start  out  with a small  number o f  parameters, figure  out  which  ones are a ffecting  the  response  the  most, and  then  having  a  directed  tuning. With  that,  I'd  like  to  thank you for  listening  to  this  presentation. This  is  a  really  fun  project  and  I hope  you  find  it  useful, and  would  love  to  hear  about  it. Thanks.
Traditionally, control charts for attribute data (p-charts and u-charts) assume the data is either binomial or Poisson, and that the mean is constant over time. However, this assumption is rarely true in practice. David Laney developed a technique that solves the problem so that control charts work well, whether the mean parameter is stable or not. This talk explains the evolution of the Laney P' and U' charts and gives examples of how best to apply them.     Hi ,  I 'm  Annie  Dudley , and  I  am a  Control  Chart  Developer  for  JMP . I  am  here  today  to  talk  with  you   about  the  new  in  JMP  version  17 Laney  P '  and  U '  control  charts . Let 's  review. The  control  charts   are  intended to  show  the  stability  of  your  process . If  your  process  is  not  stable , then  you  can 't  reliably  make  the  same  size or  the  same  parts ,   and  customers  will  get  upset and  not  want  to  purchase  from  you  again . In  this  case ,  we 're  talking  specifically about  attribute  charts , which  are  based  either  on  the  Binomial or  the  Poisson  distribution , and  they  assume  a  constant  variance, a  constant  variance  because   both  the  Binomial  and  the  Poisson are  a  one  parameter  distribution . You  got  one  parameter  for  the  mean and  then  manipulations  on  that for  the  variance . But  what  happens  if  your  variance   is  not  constant  over  time ? When  you  have  a  non -constant  variance , in  other  words , if  you  have  more  variance  in  your  model or  in  some  cases ,  less , more  variance  in  your  model   than  you 're  currently  describing , that 's  referred  to  as  overdispersion . One  parameter  model   cannot  model  overdispersion . David  Laney  proposed   that  we  normalize  the  data and  we  account  for  varying  subgroup  sizes , we  compute  a  moving  range , and  average  that ,   and  then  we  insert  that  into  our  model , into  our  limits  for  the  control  charts . Let 's  take  a  look  at  this  in  JMP . Let 's  keep  in  mind  this  assumption  is  that  for  the  regular  P  and  U  charts , we  have  this  assumption of  the  probability  of  nonconformity is  the  same  for  each  sample . In  other  words ,  we  have  assumption that  our  variance  is  constant . Let 's  look  briefly  at  our  limits  formulas . For  the  P  chart ,   it 's  the  average  plus  or  minus  3 times  our  standard  error and  the  same  with  the  U  chart . Again ,  remember , we  have  this  one  parameter . There 's  not  much  that  we  can  do if  our  limits  are  not  constant  here or  if  our  variance  is  not  constant . What  do  we  do  if  we  have  overdispersion ? This  is  the  big  question . What  Laney  proposed was  basically  we  standardize  our  data , we  take  the  moving  range , we  compute  an  average  moving  range , and  then  we  adjust  that and  form  a  sigma  sub  Z . These  limits  that  Laney  is  proposing look  very  similar  to  the  other  limits . We  were  just  inserting  the  sigma  sub  Z into  the  formula right  before  the  standard  error for  both  the  P  charts  and  the  U  charts . Let 's  take  a  look  at  an  example . I 'm  going  to  start  with  a  data  set from  the  JMP  sample  data  folder . We  have  lot  sizes that  are  not  terribly  large . F or  our  first  example , let 's  look  at  this  one . I 'm  going  to  go  through the  interface  for  this  first  one . We 'll  use  dialogs  for  future  ones . We  have  the  number  of  defective washers  that  we 're  going  to  model . I 'm  going  to  change the  chart  type  to  attribute . Before  we  go  to   Poisson ,  I  have  my  lot as  my i dentifier  for  our  X -axis , and  we 're  just  going  to  use the  constant  lot  size  of  400 . Drop  that  in  the   n Trials  drop  zone . Now ,  Laney 's  charts  are  only  available when  our  statistic  is  proportioned , so  I  have  to  change  that   back  to  proportion . When  the  statistic  is  proportion ,   then  we  have  four  choices for  our  sigma  value . I 'm  going  to  choose  Laney  P '. But  first ,  let 's  take  a  look  here . We  see  we  have  two  points  out  of  control on  this  P  chart, which  is  indicating  to  us   that  this  is  not  a  stable  process . We  can  certainly  turn  on  the  limits , and  that  just  affirms  what  we  spotted . If  we  change  the  Laney  P '  chart , that  sigma  sub  Z was  clearly  greater  than  1 ,   because  now  our  limits have  jumped  up  considerably . While  the  points  being  plotted   are  the  same  as  they  were  on  the  P  chart , the  upper  limit  in  particular has  gotten  larger . There 's  a  fairly  simple  example . Let 's  move  on  to  another  example . This  is  again ,  another  data  set   out  of  this  JMP  sample  data . In  this  case ,  we 're  looking at the  number  of  defects  out  of ... We  have  several  units   that  are  being  tested , several  braces that  are  being  tested  in  each  unit . Each  unit  isn 't  the  same  size . This  is  another  scenario  where this  problem with  the  non -constant  variance  pops  up. We  can  count  the  number  of  defects . We 're  not  counting the  number  of  defectives. We 're  counting the  number  of  defects  per  unit . In  this  case , our  unit  size  is  varying . Let 's  choose  a  U  chart under  the  Control  Chart  menu. We  have  our  number  of  defects , and  we  have  our  date as  our  subgroup  identifier ,   and  we  have  our  unit  size as  our   n Trials . We  talked  about  having that  non -constant  unit  size , so  we  have  varying  limits. We  do  see  some  of  the  points are  out  of  control  here . Again ,  this  appears  to  be   a  non -stable  process, which  is  cause  for  panic or  having  to  readjust  everything . Now ,  if  we  show  the  control  panel , since  we  have  proportion  as  a  statistic , we  can  change  this from  a  Poisson  or  CNU  to  a  Laney . Again ,  the  limits  have  now  increased . We  now  realize  we  have  a  stable  process . We  don 't  have  any  points  out  of  control . This  is  better  characterizing our  scenario ,  our  data  here . Now ,  let  me  show  you  a  third  example , which  is  a  little  bit  more  complex . The  picture  in  my  background  here is  actually  a  picture  I  took from  Acadia  National  Park . A  couple  of  summers  ago , I  was  working  at  the  Schoodic  Point and  volunteering  to  study microplastics  in  the  water  up  there . We  were  taking  samples  of  one  liter of  water  per  week  at  different  locations . We  would  take  that  water , and  we  would  pour  it  through  a  filter , and  then  we  take  the  filter and  look  under  a  microscope and  physically  count   the  number  of  micro  beads  and  microfibers that  were  appearing  on  the  filter that  we  poured  the  water  through . Let 's  analyze  this  as  a  control  chart . The  principal  investigator   for  this  particular  study is  trying  to  form  a  nice , complex  model  over  time to  find  out  whether  or  not the  water  plastics  were  increasing . But  in  order  to  do  that , you  really  want  to  know that  you 're  modeling  your  data  correctly and  whether  or  not we  have  actual  stable  data . This  is  an  example where we  could  run  a  P  chart , so  let 's  try  that . Being  the  control  chart  developer , I 'm  always  looking  for  more  opportunities to  model  control  charts . We  want  to  look  at  the  total  plastics. For  our  subgroup ,  we  have  week , and  for  our   n Trials ,   we  have  total  volume , and  then  we  have  different  site  IDs as  a  phase . Here  we  see  we  have  three  different  points that  we  were  taking  the  measurements  from . Some  were  better  than  others , but  they  all  seemed to  be  really  out  of  control . He 's  going  back  to  the  drawing  board   and  like ,  "What  should  I  do ? Should  I  take  more  data ? How  do  I ..." I  thought ,  "Hmm ,  let 's  see  if  this   really is  characterizing  the  situation , the  process  here  very  well. " Let 's  run  this  again  through  a  dialog , and  you  see  there  are  two new  entries  on  the  menu , one  for  Laney  P ' and  one  for  Laney  U  control  chart . Again ,  let 's  look  at  our  total  plastic as  the  Y , and  the  total  volume  is  our   n Trials . Our  week  is  our  subgroup , and  the  site  ID  is  the  phase . We 've  got  the  same  points  again , but  suddenly,  our  limits  are  much  wider . We  can  conclude  from  this  that ,  well , we  actually  have  a  pretty  stable  process . This  is  good  data . He  can  go  ahead  and  start  to  form his  larger  model  on  this  data . While  it 's  a  calming  thing , also,  it  makes  a  lot  more  sense . In  closing ,  I  would  like  to  recommend that  everyone  use  the  Laney  P '  and  U ' control  charts when  monitoring  your  proportion of  non -conforming  or  defects , especially  when  you  see  a  difference between  the  P  and  the  P '  chart . Thank  you  very  much .
This research examines impacts to United States fertility rates as a function of state legislative restrictions on the use of public resources (e.g., Medicaid funds) for abortions. Data from LawAtlas and the Centers for Disease Control's WONDER databases were used to model 2021 fertility rates based on maternal age group and abortion legislation. A two-way ANOVA of rank transformed fertility rates was used to identify impacts of legislative restriction across six maternal age groups. Latent class analysis was used to identify patterns in state abortion restrictions composition and their relationship with fertility rates based on age.    It was revealed that the impact of abortion restrictions targeting public resources on fertility rates varied based on maternal age; for example, women ages 15-29 had lower fertility rates when there were no restrictions. Additionally, legislative restrictions on multiple categories of public resources were associated with higher state fertility rates. The poster includes visualization of summary statistics and findings with maps and charts. The poster demonstrates a method for addressing unbalanced data using transformation, as well as the use of latent class analysis with binary categorical variables.     Welcome  to  this  poster  session. My  name  is  Renita  Washburn, and  my  colleague  and  I, Dr.  Mary  Jean Amon. For  this  project, we  examine  the  relationship between  legislative  restrictions  on the  use  of  public  resources  for  abortions and  their  impact  on  fertility  rates  based on  maternal  age  in  the  United  States. We  believe  that by  examining  these  regulations, we  could  offer  insight  into  potential impacts  of  future  legislative  changes, thus  aiding  and  understanding  the  dynamics and  potential  consequences  of  such policy  shifts  on  fertility  rates. I'll  start  by  quickly  discussing our  data  sources, and  then  I'll  demonstrate  how  we use  JMP  to  perform  a  two-way  ANOVA and  a  late-in  class  analysis to  investigate  these  relationships. There  were  two  data  sources used  for  the  study. The  first  was  legislation  that  was obtained  from  a  law,  ATLAS  policy surveillance  program  dataset. The  secondly,  we  got  the  2021  fertility rates  from  CDC's  WNDYR  database. The  data  sets  were  combined  together based  on  the  mother's state  of  residence  in  the  year, with  a  one  year  lag  in  the  birth  status so  that  the  legislation  was  in  effect when  the  pregnancy  began. Our  first  objective  was  to  identify impacts  of  restrictions on  public  resources  on  fertility  rates. We  broke  the  legislation  down into  three  buckets, whether  the  state  had  no  restrictions, whether  they  had  restrictions but  excluded  Medicaid, and  then  whether  they  were restrictions,  including  Medicaid. A  two-way  analysis of  variance  was  performed to  determine  whether  there  was a  statistically  significant  difference between  the  mean  fertility  rates  based  on these  three  buckets  in  the  maternal  age. We  first  started  by  visualizing  these three  categories  with  the  map  that  we made  with  JMP's  Graph  Builder. From  there,  we  observed  that  there  was an  uneven  number  of  states  in  each of  the  restriction  categories. We  used  a  common  practice  of  running the  analysis  with  the  rate  transform  data to  avoid  any  challenges from  the  imbalance  data. I'll  demonstrate  first  how  we  add  this ranking  to  the  data  set, and  then  we'll  go  through the  two-way  ANOVA. First,  you  start  with  Analyze, Distribution. We're  going  to  put  in  the  variable  that  we want  to  do  the  rank  transformation  on, which  is  fertility  rates and  to  Y  and  hit  OK. We  go  down  to  our  lovely  red triangle  then  to  Save  and  Ranks. That's  going  to  save  the  ranking of  from  lowest  to  highest  of  the  fertility rates  onto  our  main  data  set. We're  just  checking  that  it's  there. Next,  with  the  results of  the  rank  fertility  rates, we're  going  to  use that  to  do  our  two-way  ANOVA. Back  to  Analyze  Fit  Model, we're  going  to  add  in not  the  original  fertility  rates, but  that  ranked  fertility rate  that  we  just  created  to  our  Y. We're  going  to  add  in  our  two  independent variables  in  full  factorial, so  that  we  can  do  that  two-way for  the  maternal  age  and  that  restriction category  will  hit  Run. Then  it  gives  us  the  same  outputs that  we  would  obviously  expect, but  it's using  those  rank  transformations. We  can  go  down  and  look  at  the  effect summary  to  see  that  there  are statistically  significant interactions  going  on  there. The  two-way  ANOVA  identify  that  there  were statistically  significant  differences in  those  fertility  rates  based  on  maternal age  and  the  presence  of  restrictions. Specifically,  we  observed  that  women  ages 15  to  29  had  lower  fertility  rates when  there  were  no  restrictions present  in  their  state. The  next  objective  was  to  identify patterns  in  the  state  abortion restriction  composition. We  use  a  late-in-class  analysis  because  we had  these  binary, these  yes  or  no  indicators  for  six different  categories  of  interest. They  were  related  to  government  funds, government  facilities, and  other  various  programs  like  state insurance  programs  for  state  employees. We'll  demo  how  we  use  JMP's  clustering to  group  those  states  together based  on  these  six  different  categories, and  then  how  we  use  Graph  Builder  to  help us  display  the  results and  interpret  them  a  little  easier. To  run  the  late-in  class  analysis, we're  going  to  analyze  clustering, late-in  class  analysis. We're  going  to  put  in  our  six binary  indicators  into  our  Y. Then  we  could  adjust  the  number of  clusters  and  it  would  run  more  than just  three,  but  we're  just  going  to  use three  for  our  purposes  to  keep  it  simple. After  running  it,  we  have  this  high-level characterization  about each  one  of  the  clusters. But  let's  create  an  additional visualization  that'll  start  helping  us interpret  how  the  states  fell  out in  each  one  of  the  three  clusters. To  do  that,  we're  going to  go  to  Graph  Builder. We're  going  to  take  our  state  of  residence and  put  it  down  on  map  shape  that  lets  JMP know  that  we're  wanting  to  make  a  map. Then  we're  going  to  put  our  most  likely cluster,  which  is  added  to  the  data  set after  we  run  that  LCA,  and  we're  going to  use  that  to  actually  color  it. This  gives  us  a  really  quick  look at  how  each  of  our  states that  actually  had  restrictions, which  of  the  three clusters  they  fell  out  into. Then  you're  able to  use  Graph  Builder  again to  create  other  visualizations  like we  did  that  allow  you  to  compare  each of  the  states'  composition  of  their restrictions  and  their  fertility  rates using  the  three  clusters that  we  created  with  the  LCA. When  we  did  that, what  we  observed  is  that  when  there  were restrictions  on  multiple  types  of  public resources,  it  was  often  associated  with higher  fertility  rates  for  those  states. I  just  want  to  thank  you for  viewing  the  poster  session. The  goal  was  really  to  demonstrate  how  we used  JMP  to  examine  the  relationships between  abortion  restrictions  that  were targeting  public  resources  and  fertility rates  across  multiple  maternal  age  groups. By  performing  the  two-way  ANOVA on  that  rank  transform  data, we  observed  that  women  ages  15  to  29 had  lower  fertility  rates  when there  were  no  restrictions. Through  the  use  of  late-in-class  analysis and  visually  analyzing  the  results, we  observed  that  restrictions  on  multiple categories  of  public  restrictions  were associated  with  higher  fertility  rates. Thank  you  again.
App Builder was introduced in JMP 10. In the beginning, it had its bumps and warts and was, arguably, a bit of a challenge to use. As a long-time JSL scripter, I, too, was reticent to use App Builder and preferred to code everything from scratch, even dialog boxes. But I got tired of the work needed to make small visual tweaks to my dialogs and custom report windows, so I gave App Builder another look -- and have not looked back. If it's worth an interface, I'm using App Builder.   In this session I talk about some of the basics of creating applications with one or more interactive elements. I discuss key elements that make App Builder great, along with some lesser-known features. Topics include building multiple module applications, scoping, managing windows, working with the App Builder interface, how JSL is different with App Builder, and working with other JMP elements such as Data Tables, Add-In Builder and Workflows.     In  this  session,  we're  going to  talk  about  Application  Builder.   If  you've  ever  had to  build  an  application  in  JMP that  required  visual  elements, say,  a   Report window  or  a dialog  box, Application  Builder is  the  tool  that  you  want  to  use. It  will  allow  you to  create  that  application more  quickly  and  more  easily. Additionally,  it  will  let  you  create an  application  that's  more  compact, that's  easier  to  maintain, and  more  robust. I'm  going  to  assume that  you're  watching  this because  you're  interested in  Application  Builder, but  you  also  have a  little  bit  of  JSL  background. You  may or  may  not  have  used  Application  Builder, but  you  know  a  little  bit about  scripting   in JMP. Let's  get  started. I've  got  two  examples that  I'd  like  to  use  to  illustrate some  of  the  visual and  programming  characteristics that  might  not  be  obvious or  that  might  not  be  documented for  the  Application  Builder. I  am  going  to  start with  a  very  simple  example that  I've  actually  attached to  a  menu  here. It's  just  a  dialog  box. It's  going  to  allow  me to  navigate  my  directory  structure either  from  a  tree  that  I  already  have  set or  I  can  pick  a  different  directory. Once  I'm  there, I  can  look  to  see  whether  or  not files  of  specific  types  are  available, select  that  file,  and  then  open  it  up. Now,  one  of  the  important  aspects of  this  particular  example is  that  it  doesn't  involve  a  data  table. We're  going  to  see  in  the  second  example how  to  deal  with  data  tables, when  I  want  to  incorporate data  tables  into  my  application. It's  a  little  bit  trickier, so I'm  saving  that  example   for second. Let  me  put  this  off  to  the  side so  we  have  something  to  reference. Okay,  so  let's  go  ahead and  get  started  with  Application  Builder. Under  Files,  New,  Application. I  will  almost  always  start with  this  blank  application. Reason  I  do  that  is  I  can  always  get to  any  of  those  templates. I  could  build  any  of  those  samples starting  with  this  blank  application. Blank  application  is  just  going  to  give  me more  flexibility in  terms  of  where  I  want  to  go. Now,  if  you're  unfamiliar with  Application  Builder, if  you've  never  used  it  before, the  layout  is  relatively  straightforward. On  the  far  left, we've  got  our  source  panel, and  in  that  source  panel are  all  of  the  items  that  I'm  going  to  use to  display  information, to  organize  elements,  and  to  do  things, things  like  buttons  and  checkboxes. In  the  middle,  I've  got  my  palette where  I  actually  build  the  visuals of  my  application. On  the  right, I  have  an  Objects  window  that  shows  me the  tree  structure of  the  item  that  I  built. I've  got  my  Properties  window where  I  can  select  an  item and  change  its  properties. That's  one  of  the  things  that  makes Application  Builder  faster  to  use, is  that  I  can  go  in  there, I  could  select  an  item,  select  an  object, and  change  the  properties  interactively, not  having  to  worry about  what  the  name  of  the  message  is or  writing  up  the  code to  make  those  changes. Now,  what  I  find is  that  I  often  use  the  same  properties when  it  comes  to  specific  elements. Things  that   typically  have some sort of  text  element, I  like  text  of  a  certain  size and  a  certain  style. Certain  container  boxes, I  like  to  have  borders  around  them. Rather  than  drag  and  drop  items into  this  palette  and  make  changes, what  I  often  do is  I  will  start  with  a  template. Here  I've  created  a  template that  will  allow  me  just  to  copy  and  paste from  the  template  into  my  palette. For  example,  for  my  list  box, I've  changed  the  font  on  that  list  box, changed  the  font  size. I'm  just  going  to  Control  C  to  copy  that. I'm  going  to  paste  it into  my  new  application. The  great  thing about  working  from  a  template is  that  it  works for  composite  items  as  well. In  this  case,  I've  got  multiple  items. I  have  two  button  boxes that  are  stuck  next  to  one  another using   a horizontal  list  box, and  then  I've  got  a  horizontal  center  box. I  can  actually  copy  those, Control  C,  copy  those  en masse, and  paste  those  into  my  application. Now, a  couple  of  things  about  item  selection. If  you  work  quite  a  bit  with  PowerPoint, one  of  the  things that  you're  going  to  find  out is  that  to  select an  item  in   Application  Builder is  a  little  bit  different. Whereas  in  PowerPoint, you  need  to  select  the  entire  item, with  Application  Builder, all  you  need  to  do  is  select  part  of  that. You'll  notice  that  by  selecting part  of  that  horizontal  center  box, the  entire  box  is  selected. That  makes  items  much  easier  to  select in  the  sense  that  all  I  need to  do  is  grab  part  of  the  item to  select  the  entire  item  in  entirety. Now,  I  have  a  second  option that's  available  to  me  as  well. That  is,  if  I  find  an  item  hard  to  select, I  might  have,  let's  say, a  hierarchy  of  container  boxes and  it's  hard  to  get  to  the  right  box, I  might  have  tiny  objects that  are  behind  other  objects and  it's  just  hard  to  grab  onto. I  can  always  make  my  selection from  my  Objects  panel  as  well. Here,  I'll  select  my  Objects  panel and  you'll  notice that  it  selects  the  item  in  my  palette. The  only  drawback  to  this is  that  I  can  only  select  one  item from  the  Objects  panel. But  again,  it  makes  it  very  handy if  that  item  is  very  hard  to  grab  onto. Let  me  move  on to  my  next  couple  of  items I  want  to  talk  about. To  do  that,  what  I'd  like  to  do is  move  on  to  a  partially  built… Here,  I've  partially  built  my  application. Here,  I've  got  all  the  components. I  just  need  to  group  them. That  brings  me to  another  piece  of  functionality with  an  application  builder that  you  might  not  be  aware  of. Let's  take  these  three  items  here. What  I  want  to  do  is  I  want to  group  them  together  horizontally. You  might  think,  "Well,  to  do  that, I'll  take  a  horizontal  list  box, I  will  drop  it  into  my  pallette, and  then  drop  those  items and  position  them  in  the  palette." When  I  am  putting multiple  items  into  a  container, I  find  it  much  easier to  select  those  items, right-click,  and  say  Add  Container. By  doing  so, I  can  apply  the  container  to  the  items rather  than  the  other  way  around. One  of  the  other  advantages of  having  this  functionality is  that  if  I  were  to,  let's  say… While  the container is  still  selected,  right-click, I  can  actually  change  the  container. Let's  say,  I  don't  want  an  H list  box. I  want  an  outline  box. Maybe  that  H list  box  was  better. I  can  also  change  that  container. That  works  from  the  workspace, and  in  addition, it  works  from  the  Objects  panel. You'll  notice  that  if  I  right-click over  the  item  in  the  Objects  panel, I  can  also  change  the  container, I  can  add  a  container,  and  so  on. Now,  in  certain  circumstances, there  is  an  additional piece  of  functionality, and  that  is  the  ability to  remove  a  container. I  can  remove  a  container  anytime it  is  not  the  lowest- level  container. For  example, if  I  were  to  put  this  in  another… Let's  add  an  outline  box  to  that. Now,  what  I  can  do… Let  me  just  move  this so  this  is  out  of  the  way of  the  other  items. Now  that  I've  got  that  outline  box with  a  horizontal  list  box and  then  all  my  items, I  can't  remove  that  horizontal  list  box because  it's  the  lowest  level, but  if  I  were  to  right-click on  the  Outline  box,  I  can  remove  that. Again,  sometimes  it's  easier  to  just  grab the  items  and  apply  the  container  box to  those  items  rather  than  dropping  them into  the  container  box. Let  me  recap some  of  these  tips  that  we  talked  about. Again,  I  can  organize that  source  panel  items by  either  grouping  them  or  alphabetically. I  don't  think  I  showed  that, but  just  let  me  point  out that  if  you  go  onto  the  hotspot of  the  Application B uilder, go  to  Source  Panel. If  I  were  to  change  that  group  by  column, they're  in  groups. If  I   would  prefer  them  alphabetically, I  have  that  option  available  to  me. However  you  find  it  easier to  recognize,  to  find  those  objects, I  can  reorganize  that  source  panel. Work  from  a  template to  make  things  easier  and  faster, not  having  to  change  properties for  items  that  you  always  change. Applying  the  container  to  the  objects rather  than  dropping  the  objects into  the  container is  often  much  quicker. The  Object  panel  is  there  to  do  selection, to  add  containers, remove  containers,  and  so  on. Copy  and  paste  works  for  a  single  item or  for  multiple  items  as  well. Okay. Let  me  move into  some  of  the  scripting  concepts. To  do  that, I  have  my  pre-built  dialog  box  here. I've  got  everything  organized the  way  I  want  it  to  be  organized. I  don't  have any  scripts  associated  with  it. Now  there's  a  couple  of  things that  I  need  to  point  out  visually that  really  are  implemented via  the  scripting, but  they're  important  to  know  about. Not  every  property  from  every  item is  available  to  change  interactively. For  example, you'll  notice  that  this  list  box, I  still  have  an  item  in  it. Well,  what  if  I  wanted  to  start that  list  box  with  absolutely  no  items? As  it  turns  out,  interactively, there's  no  way to  get  rid  of  that  last  item. I've  got  to  do  that  using  a  message. Same  thing  goes  for  setting. If  I  want  to  set  this  check box, if  I  want  to  set the  first  selected  item  to  be  JMP  files, that  has  to  be  done  through  messaging. That's  not  available  interactively. The  place  where  I  do  that, I'm  going  to  go  to  the  scripting  tab. I've  got  namespaces. I've  got  different  options for  application,  my  module. We'll  talk  a  little  bit about  the  namespacing in  the  next  example. But  anytime  I  generate  a  module, I'm  going  to  have a  different  module  container for  my  script, and  then  I'm  going  to  have  one for  the  overall  application. I'm  going  to  go  to  the  module, and  what  I  want  to  do is  I  want  to  set  that  list  box  to  be  empty when  it  starts. Let  me  go  back  and  select  that  list  box. I  see  that  the  name that  I'd  given  to  that  list  box is   lbfiles. I'm  going  to  have  to  message  that. Now  it's  important  to  know that  when  I  message an  object  that  is  visible, I  can  only  do  so after  those  objects  are  created. That  message  has to  appear  after  that  line. That  is  a  line  in  which the  visual  items  are  instantiated. After  they're  instantiated, I'm  going  to  say   lbfiles,  Set  Items, and  we're  just  going  to  leave  it  blank. That's  just  basic  JSL  scripting that  you  should  know how  to  do  or  you  should  be  familiar  with. Now,  another  important  point is  that,  let's  say  at  this  point, I  want  to  test  it  to  make  sure that  that  is  working  properly. When  I  run,  when  I  debug  a  script, I  don't  do  it  from  my  Edit  menu, nor  do  I  do  it  from  my  options if  I  have  a  run or  debug  script  available  in  my  icons. All  the  running  and  debugging has  to  happen  from  the  hotspot. I  have  two  options  here. I've  got  the  Run  and  Debug  Application. This  is  how applications  are  run  and  are  debugged. Great,  I  ran  the  application. It  looks  like it's  removed  that  initial  item, so  things  are  good  to  go. Now,  let's  talk  about  getting  scripts into  things  like button  boxes  or   checkboxes. What  if  I  wanted  to  associate  a  script? Let's  start  with  this  Cancel  button. One  of  the  things that  the  Cancel  button  does in  the  application is  just  dismisses  the  dialog  box. I've  got  two  places where  I  can  put  that  script. I  can  put  that  script in  the  Scripts  window where  we  saw  before  where  I  had that  initial  setting  of  my  list  box. Or,  again,  I'll  select  my  Cancel  button. If  I  scroll   into  the  properties down  in  the  Properties  panel, I  see  this  item. I'm  going  to  hover  and  you  should  see where  it  says  Edit  Script. I've  got  this  area where  I  can  write  my  own  script. If  I  want a  little  bit  larger  area  to  work  with, like  the  Formula  Editor, I  can  click  on  this and  I  can  write  my  script  here. I'm  going  to  go  ahead  and  do  that. We'll  talk  a  little  bit about  the  namespaces, but  what  I  need  to  know  is  that this  is  called  thisM oduleI nstance, and  I  have  to  use  the  message  Get  Box. That'll  return  a  link  to  the  window that's  created  by  the  module, and  then  Close  Window. That  should  work. Whenever  I  use  scripting  in  that  fashion, whenever  I  don't  have  a  function  name associated  with   a script, if  you  go  to  the  documentation, it  will  be  referred to  as  an  anonymous  script. Again,  if  I  want  to, I  might  want  to  test  that. Let's  go  ahead  and  run  the  application. I'll  hit  the  Cancel  button  and  that  works. That's  one  place  I  can  put  my  script. I  had  mentioned  that  I  can  also put  the  script  in  the  Script  tab. Now,  whether  you  put  it in  the  Properties  window  or  the  Script  tab is  really  a  personal  preference. I  have  a  tendency to  avoid  the  Properties  window simply  because  I  forget that  I  put  scripts  in  there and  I  can't  remember where  a  particular  script  is. What  I'm  going  to  do  now is  just  Control C, I'm  going  to  copy  this  item. Let  me  just  go  ahead and  clear  this  box  out  for  now. Anytime  I  want  to  create  a  script for  any  of  the  objects  to  do  things, quickest  way  for  me to  get  that  script  into  the  Script  tab is  to  hover  over  it,  right-click, and  you'll  notice  at  the  very  bottom, there  will  be  one  or  more  selections. Most  of  the  objects within  the  Application B uilder only  do  one  thing:  the  Text Edit. Number  Edit  Boxes do  a  couple  of  different  things and   Mouse  Box  does a  whole  bunch  of  different  stuff. Here,  I'll  select  the  script  I  want, and  this  is  what  happens when  I  press  the  button. You'll  notice  that  as  soon  as  I  do  that, it's  going  to  generate  my  function. It's  going  to  give  me a  little  stub  of  a  function. It's  going  to  give  it  a  name based  on   the  variable  name I  gave  the  button, and  it's  going  to  give  me just  a  little  fill-in  for  some  code. Here  is  where  I  would  put the  script  that  I  showed  you  earlier. Let  me  just  paste  that  in  there. Okay. Couple   things  to  point  out  is  that  first, the  default  argument  of  a  link of  a  pointer  to  the  object  is  supplied when  you  generate  a  script  this  way. You  don't  necessarily  need  it, but  it's  given  to  you  by  default. Also,  the  default  local  is  used, which  means that  anything  defined  within  the  function is  local  to  that  function. If  you  want  to  change  that, you  need  to  get  rid  of  the  default  local. Again,  let's  go  ahead  and  test  this  out. That  works  as  well. Again,  two  options  here  in  terms of  where  you  can  put  that  script. You  can  either  put the  script  in  the  Script  window or  you  can  put the  script  in  the  Properties  panel. One  final  note about  scripts  in  the  Script  tab is  that  they  don't  have  to  appear after  the  objects  are  instantiated. You  can  actually  put  them anywhere  in  the  script. You  don't  need to  have  the  script  available. You  don't  need to  have  the  objects  available before  that  function  is  created. You  can  put  that  anywhere. As  a  matter  of  fact, I  often  will  separate  all  my  functions. I'll  put  those  at  the  top  of  the  script, and  then  I'll  put all  my  messages  to  objects obviously  below  this  object  instance. That's  about  all  I  wanted to  cover  with  this  particular  example. I  want  to  move  on  to  the  second  example. In  this  example,  I  want to  cover  a  couple  of  visual  elements. As  I  had  mentioned,  I  want  to  talk  to  you about  how  you  deal  with  data  tables when  you  have a  table  that  you  want  to  use. I  want  to  talk a  little  bit  about  tab  boxes, so  when  you  have  a  tab  display. Some  folks  have  a  tendency of  finding  them  a  bit  confusing. So I  want  to  talk  about  tab  display. Finally,  I  want  to  touch on  two  important  scripting  notes. One  is,  how  do  I  pass information  from  one  module  to  another? Often  I  will  have  one  module, like  a  dialog  box  that  passes information  to  another  dialog, and another  module, which  is  a   Report window, so  I  need  to  know  how  to  do  that. I  also  need  to  know a  little  bit  about  the  namespaces that  the  Application B uilder  generates. Let  me  start  with  demonstrating what  this  example  looks  like. In  this  example,  you  get  a  dialog  box. You'll  notice that  my  Run  button  is  grayed  out until  I  drop  things into  the  list  box  on  the  right. I'm  going  to  go  ahead  and  click  Run. Two  JMP  platforms  are  used. The  distribution  platform is  in  the  first  tab and  the  scatter plot  matrix in  the  second  tab. Now,  the  beauty  of  the  way this  application  was  set  up is  that  it  doesn't  matter the  data  table  that  I  use, it  doesn't  matter  the  number  of  columns, which  columns  I  pick,  and  so  on. If  I  were  to  rerun  this, different  set  of  columns, different  column  names… Let  me  just  minimize  that. My  computer  is  having a  little  bit  of  problem redrawing   the  screen. There  we  go. It  works. It  works  regardless  of  the  data  table. Let  me  go  ahead  and  close  that. I  am  going  to  start with  a  fresh  application to  illustrate  some  of  the  visual  aspects. I'm going to say  New, A pplication. Again,  I'm  going  to  start with  a  blank  application. What  you'll  see  is  that  some  of  the  items in  that  source  panel  are  not  available. There  are  actually  seven  items that  are  not  available and  only  become  available when  you  open  up  a  data  table. If  I  have an  application  that  I'm  generating, and  that  application is  going  to  be  using  a  data  table, what  I  like  to  do is  to  start  with  something from  the  sample  data  directory. In  this  example,  I  mean,  this  is  not from  the  sample  data  directory, but  what  I  would  do  if  I  were  creating an  application  for  general  use is  that  I  would  open  something  up from  the  sample  data  directory, I  would  build  my  report, and  then  I  would  use  that as  the  basis  for  my  application. The  reason  I  do  that is  that  when  I  start  dropping  in some  of  these  items that  require  a  data  table— in  this  case, I've  got  this  column  list  box  all— you'll  notice  that  it's  going to  add  that  data  table  to  my  application. Now,  with  these  types  of  applications, I  need  something  there, at  least  as  a  stand-in. The  advantage  of  having something  from  the  sample  data  directory is  that  it's  unlikely  to  change  names, it's  unlikely  to  change  locations, and  it's  unlikely  to  be  deleted. It's  a  safe  bet  in  terms of  what  data  table  to  use. You'll  also  notice that  I  can  specify  the  path  here. I  can  give  it  a  different  label. I  don't  have  to  start with  the  current  data  table, but  that  is  the  default. That's  what  I  find  that  I  use  most  often, is  I'm  building an  application  that's  for  general  use that's  going  to  be  used on  the  current  data  table. I'm  going  to  go  ahead  and  change. I  have  a  stock  variable  name that  I  use  for  that,  so  we'll  change  that. That's  the  way I  will  typically  start  an  application that  requires  a  data  table. Before I  move  into  the  scripting, let  me  talk a  little  bit  briefly  about  the  tab  boxes. If  we  scroll  up  to  the  containers, we  see  we've  got  two  different  tab  boxes. We've  got  a  tab  box  and  a  tab  page  box. As  it  turns  out, the  only  one  I  really  need  is  the  tab  box. If  I  were  to  drop  that  tab  box  in  there, and  let  me  just  to  call  out the  tab  box  in  my  Objects  panel, and  I  were  to  drop any  item  in  that  tab  box, and  let's  just  drop  a  border  box  in  there, you'll  notice that  it  automatically  generates a  tab  page  box. I  do  not  need to  drop  something  in  a  tab  page  box and  then  drop that  tab  page  box  into  my  tab  box. All  I  have  to  do  is  work  with  the  tab  box. If  I  want  to  add  tabs, all  I  need  to  do  is, again,  select  that  tab  box, make  sure  I've  got  it  selected  here, right-click, and  I  have  the  option  to  insert  tabs either  before  the  currently  selected  tabs or  after  the  currently  selected  tab, or  if  I  want  to,  I  can  also  delete  tabs. Now, one  of  the  questions  I  often  hear  as  well. I've  got  my  tabs  set  up  a  certain  way. How  do  I  move  tabs from  one  space  to  another? Currently,  there's  no  interactive  way to  do  it,  unfortunately. What  you  would  have  to  do is  you  would  have to  manually  add  a  tab  page and  move  the  contents of  those  tab  pages  around. You  might  be  tempted to  save  the  script  to  a  Script  window and  change  things  that  way, but  I  often  discourage  folks from  working  with  the  saved  script or  making  alterations  to  the  saved  script just  because  there  are  a  lot  of  elements within  that  saved  script that  are  not  documented, that  are  particular to  the  Application  Builder. I  would  stay  away from  making  changes  to  that  saved  script. Let's  do  this. To  talk about  the  scripting  elements  of  my  box, I  have  pre-built  an  application. Actually,  before  I  move  on, let  me  point  out  a  couple  of  things that  occasionally  will  happen  to  folks who  wonder, "Well,  why  won't  the  display  box  work?" One  of  the  things,  if  you've  had some  experience  with  Application  Builder, one  of  the  things  you  might  have  heard  of is  a  parameterized  application. Let  me  go  to  pass  number  one. I've  got  my  dialog  box  built. What  I've  done  is  I  have  opened  up a   Report window,  a   data  table, generated  a  couple  of  reports and  just  dropped those  reports  into  the  blank  area. For  example,  what  I  did  in  this  case was  I  had  my  cars  data  table  opened  up. I  had that  distribution  platform  generated, and  all  I  did  was  drop  that in and  worked  with  the  report. Now,  if  I  were  to  stop  there and  use  that  for  my  report, what  would  happen… Let  me  close  out  my  items. I've  actually  done  that, and  I've  saved  that  here. What  would  happen  is  that  unless  I  use the  data  table  I  started  with, I  will  not  get  any  results. Here,  I've  got  my  air  traffic. I'm  going  to  use  that  saved  script. Everything  looks  good  so  far. I  get  nothing. If  I  were  to  switch  back  to  my  cars and  try  the  same  thing, and  I'm  going  to  grab  different  columns, run, I  get  the  same  columns  I  started  with. The  reason  is, anytime  I  use  a   Report window, if  I  want  to  change those  columns  that  get  used, if  I  want  to  change the  data  table  that  gets  used, I  have  to  use  a  parameterized  version. Let  me  go  to  the  example of  the  parameterized  version, and  I  will  point  out  the  differences. It's  not  that  one. My  parameterized  version  is  here. Here's  my  parameterized  version. Dialog  box  looks  the  same. My  report  looks  slightly  different in  the  sense  that if  I  were  to  select  on  the  reports, you'll  notice  in  the  very  bottom, I've  got  these  roles, and  these  are  filled  out. If  I  do  the  same  thing  with  scatter plot, select  the  report, my  roles,  that's  filled  out. Now  when  I  run  the  application, I  can  change  the  data  table. But  in  most  circumstances, I  am  fixed  to  the  number  of  columns that  I  use  when  I  built  the  application. Let  me  show  you  an  example  of  that. Let  me  close  this  out. Let's  go  with  the  air  traffic  data  table. Here's  my  second  example, my  parameterized  example. You'll  notice that  when  I  call  up  the  dialog  box, I've  got  a  space  for  the  three  columns that  I  initially  used  for  distribution. I've  got  a  space  for  the  columns I  used  for  my  scatter plot  matrix, but  that's  what  I'm  limited  to. As  it  turns  out, if  I  am  dealing  with  platforms that  come  from  the  Multivariate  menu, then  I  can  use a  variable  number  of  columns. Anything  else,  I  am  fixed to  the  specific  number  of  columns that  I  used  when  I  built  the  application. Here's  the  workaround, here's  the  solution. Let  me  do  this. I'm  going  to  go  back  to  my  cars  data  table and  open  up  my  final  example. Again,  dialog  box  looks  the  same. Report  does  not  have  anything  in  it. I  just  have  placeholders to  put  my  built  application, my  built  platforms. Here's  where  the  ideas of  namespaces  and  passing  variables becomes  very  important. When  an  application  is  created, there  are  two  different  namespaces that  you  have  to  be  aware  of. One  is  the  thisA pplication  namespace. That  is  variables  that  get  created that  can  be  shared  by  any  of  the  modules. If  we  look  in  the  Objects  panel, I've  got  my  applications  here, these  three  variable  names, they  are  in  the thisA pplication  namespace. Let  me  actually  type  that  out. The  name  of  the  variable is  thisA pplication. It's  in  that  namespace. The  other  namespace that  I  have  to  be  aware  of occurs  in  each  one  of  my  modules. Each  one  of  my  modules  gets their  own  thisM odule Instance. You've  already  seen  this when  we've  instantiated  objects on  the  visual  parts  of  the  application. I  have  got  a thisA pplication and  thisM oduleI nstance. How  do  I  pass  data from  one  module  to  the  other to  make  sure  things  get  done  correctly? Well,  where  we  do  it  in  this Application is  in  this  spot  right  here. Here  I'm  referencing  that  second  module, that  report  module. Let  me  also  point  out that  I've  got  some  options  to  change whether  or  not  it  gets  launched  at  start. Obviously, I  don't  want  this  to  be  launched until  the  user  picks  columns. I've  got  my  report  module  referenced. I'm  going  to  use  Create  Instance to   create  that   Report window, and  I'm  going  to  pass  the  items that  were  put  in  the  list  box  this  way. Now  on  the  report  module  side, the  business  gets  done in  the  OnM oduleL oad. On  OnM oduleL oad, everything  defined  within  OnM odule is  local  to  the  function. What  I  need  to  do  is  I  need  to  create a  variable  specific  for  the  module that's  going  to  store that  data  that  gets  passed  in. I've  done  that  right  here. A fter  that,  I'm  just  using somewhat  standard  JSL  to  be  able to  generate  my  platforms using  the  platform  calls. Everything  that  you're  seeing  here, this  is  standard… A  bit  on  the  complex  side, but  standard  JSL in  order  for  me  to  do  that. One  more  thing  I  want  to  briefly  explain that  you  might  have  noticed that  I  know  folks  always  ask  about, and  that  is  if  I  go  to, let's  say,  one  of  my  modules, there's  a  whole  bunch of  different  module  types that  I  can  pick  from. Dialog  box  is  exactly what  you'd  think  with  no  menu. Dialog  box  with  a  menu is  a  window  with  a  menu. A  modal  dialog  box  is  exactly  that. It's  a  modal  dialog  box. A   Report window. The  difference  between  a   Report window and  say,  a  dialog  with  a  menu is  that  you  can  save a  report  to  a  journal,  a  JRN  file. The  other  two  items  that  are  in  there, there  is  a  Launcher  item, which  is  internally  different than  a  dialog, but  externally  the  same,  more  or  less. Then  display  box  is  used  as  a  template to  be  embedded  in  other  modules. It  does  not  generate  its  own  window, but  it's  used  to  embed  in  other  windows. Probably  a  good  place  for  me  to  stop. Lots  more  to  talk  about. As  I  mentioned, there'll  be  a  PDF  that  you  can  download. All  of  the  examples  that  we  saw, they'll  be  there. There'll  be  some  additional  examples and  much  more  detail in  terms  of  what  I  can  explain. Hopefully,  you  found   this  all  helpful, and  hopefully,  that'll  motivate  your  want to  use  Application B uilder  more  often. Thank  you.
A picture is said to be worth a thousand words, and the visuals that can be created in JMP Graph Builder can be considered fine works of art in their ability to convey compelling information to the viewer. This journal presentation features how to build popular and captivating graph views using JMP Graph Builder. Based on the popular Pictures from the Gallery journals, the Gallery 8 presentation highlights new views and tricks available in the latest versions of JMP. We feature several popular industry graph formats that you may not have known could be easily built within JMP. Views such as integrated tabular graphs, satellite mapping, formula-based graphs and more are also included, helping you breathe new life into your graphs and reports!     All  right . Welcome ,  everybody . My  name  is  Scott  Wise , and  we  are  going  to  talk about  pictures  from  the  Gallery  8 . Every  year  we  come  up  with  some fantastic  views  that  you  can  do  in  JMP that  you  might  not  have  known because  there  are  some  tips and  tricks  involved . We  definitely  are  excited to  show  you  our  next  release . Before  we  begin , I  always  like  to  start  off with  something  interactive  here . I  got  inspired  by  a  recent  trip I  took  with  my  daughter . We  went  to  the  National  Video  Game  Museum . It 's  up  in  Dallas ,  Texas , and  it  walked  through the  development  of  video  games . They  had  a  game that  I  used  to  love  to  play . It  was  called  Battle  Zone . You  kind  of  felt  like you 're  on  this  strange  planet , everything  was  in  3D and  you  felt  like  you 're  inside  a  tank . It  was  really  cool . To  make  this  game ,  they  had  to  overcome a  big  problem  with  graphics , which  is ,  if  you  have  3D  graphics , how  do  you  know  where  you  are in  relation  to  an  object ? If  there 's  a  wall  in  front  of  you or  are  you  in  the  wall   or  are  you  in  the  back  of  the  wall ? Are  you  in  front  of  the  wall ? It 's  obviously  something  they 've  overcome because  a  lot  of  the  games now  are  3D  and  first -person and  so  give  you  that  perspective . Well ,  actually ,  I  can  represent this  problem  in  Graph  Builder and  I 'm  going  to  challenge  you and  maybe  if  you  learn  this  trick on  how  they  solve  this  problem , it  can  help  you  maybe  win  a  bet sometime  down  the  road  here . Or  you  can  challenge  people  for  fun . What  I 'm  going  to  do is  I 'm  going  to  show  you  a  basic  shape and  there 's  going  to  be  two  points , a  point  A ,  a  point  B , I  want  to  know  if  point  A  is  inside or  outside  that  shape . If  point  B  is  inside or  outside  that  shape , and  you  are  going to  have  just  three  seconds . So  maybe  grab a  little  scrap  piece  of  paper and  a  pencil  so  you  can  write  this  down . I 'm  going  to  only  leave  it  up for  three  seconds . The  first  shape  is  going  to  be  a  polygon . All  right . Again ,  I 'm  going  to  show  this  to  you . I 'm  going  to  count  to  three , and  you  tell  me  if  point  A is  inside  or  outside  the  shape or  if  point  B  is  inside or  outside  the  shape . All  right . Are  you  ready ? All  right ,  here  we  go . All  right . Well ,  what  do  you  think ? Was  A  inside  the  shape  or  outside ? What  about  B ? Let 's  pull  t`his  back  up . Now  I  guarantee  probably   everybody  got  this  in  three  seconds . A  really  looks  like  it 's  inside this  little  U -shape  to  me . We  can  click  into  shapes  in  JMP we  can  color  the  background  of  shapes , and  that  makes  it  just  really  easy . So  A  is  inside  and  B  is  outside . Okay . I  think  you  got  this  down . Well ,  what  if  we  make this  a  little  more  challenging ? I  have  another  shape . In  this  shape , it  seems  it 's  going  to  be  a  polygon . I  think  it  looks  like  a  spiral . I 'm  going  to  do  the  same  thing . We  got  point  A ,  point  B . Tell  me  if  it 's  inside  the  shape of  the  spiral  or  outside . Same  thing  with  point  B . All  right . Ready ?  I 'm  going  to  going  to  launch  it and  I 'm  going  to  give  you  three  seconds . All  right . What  do  you  think ? Is  point  A  inside  the  shape  or  outside ? What  about  point  B ? All  right . Probably  didn 't  have  enough  time . Maybe  some  of  you were  trying  to  use  your  fingers and  maybe  trace  the  shape . Three  seconds is  not  enough  time  to  do  that . Now ,  again ,  within  JMP because  we  can  click  on  points , I  can  click  on  this  point and  I  can  see  that  point  A  is  inside and  point  B  is  outside  of  the  shape . But  that 's  not  the  easiest  way  to  do  this . From  a  computer  programing  standpoint , they  needed  a  better  way . There  is  a  method  actually  out  there that 's  actually  going  to  help  us  do  this , and  it  is  called  Ray  Casting . It 's  pretty  simple . It  just  involves … From  whatever  point  you  care  about , you  just  draw  a  line  moving  away in  any  direction  from  the  point and  you  count  the  number  of  lines in  the  shape  it  intersects , walls  of  the  shape  so  to  say . If  it  crosses  an  odd  count  of  lines  as  it 's  moving  outside  the  shape , it 's  in . If  it  crosses  an  even  count  of  lines , it 's  outside . All  right .  Well ,  let 's  see how  that  works  in  practice . Here 's  the  U -shape . All  we  got  to  do  is  just  draw  a  line and  see  how  many  times  it  intersects . I 've  done  this  in  JMP . I 'm  going  to  show  you  a  little  later  how you  can  draw  these  confidence  intervals , raise  these  lines out  of  points  within  JMP . But  I 'll  go  ahead  and  open  one  up  here . Let 's  take  a  look  at  point  A . There 's  point  A . Let 's  go  ahead and  see  how  many  points  it  has before  it  exits  the  total  shape  here . There 's  a  one , there 's  a  two ,  there 's  a  three . There  were  three  walls it  crossed  before  it  went  out . That 's  odd  so  it 's  in . This  is  one  of  the  few  places , if  you 're  odd ,  you 're  in . You 're  in  the  shape  if  you 're  odd . But  what  about  B ? Well ,  it  doesn 't  matter  which  way  you  go  with  B , let 's  go  in  this  direction, it  hits  that  wall, it hits that  wall ,  it 's  even . If  it 's  even ,  it 's  out . It  works  whether  you  go  left  or  right . Very  cool . Well ,  what  about … That 's  okay . But  you  could  have  eyeballed  that  one . What  about  that  nasty  spiral  shape ? Well ,  let 's  take  a  look  at  it . Let 's  go  ahead  and  take  B . I 'll  come  down  here  to  B . Let 's  just  go  this  direction . One  wall ,  two  wall ,  that 's  even . Let 's  take  A . Way  down  here  at  the  bottom . All  right . Let 's  see . One  wall ,  two  wall ,  three  wall ,  four  wall . 1 ,  2 ,  3 . I  missed  one ,  five  walls . Five  walls  it  goes  by . So  it 's  odd   and  therefore ,  it 's  in  the  shape . That 's  how  it  works . Basically ,  this  algorithm  drives all  those  3D  video  games and  all  those  3D  images  that  you  see and  just  a  really  cool  thing  you  can  do . I 'm  going  to  talk  a  little  bit about  drawing  these  lines  in  a  graph in  our  pictures  from  the  gallery . But  I  thought  that was  a  fun  interactive  example to  get  us  started  with  our  talk  today and  maybe  give  you  something you  can  amaze  your  friends  with . All  right .  Let 's  talk  about the  pictures  from  the  gallery . We  show  six  advanced  views that  either  we  challenged  ourselves to  come  up  with , our  customers  using  JMP  challenged   just  us  to  come  up  with , or  what  we  just  saw look  so  cool  and  we 're  like , how  do  we  do  this  in  JMP   and  Graph  Builder  can  do  about  anything . In  this  case ,  we 're  going to  look  at  formula -based  graphs . I  have  an  actual  formula  in  a  column . Can  I  have  that  work  within  Graph  Builder ? What  about  tabular  data ? Can  actually  have  information lined  up  like  report ,  tab  data , lined  up  underneath  my  graphic  shapes ? That  would  be  really  cool . What  about  an  input -output  parallel  plot or  we 'll  call  that  a  flow  parallel  plot . That  might  be  really  cool . Forest  plots . Forest  plots  help  you  look  at  means and  confidence  intervals and  you  can  eyeball  control  them . This  is  really  popular in  health  and  life  science . We 're  going  to  look  at  that  one. Percent of Factor . Everything  scales  to  100 %. Nice  way  to  compare  things using  bars  that  go  from  0 -100 %. You  can  see  what  segments here  account  for  what . Last  but  not  least , mapping  and  this  is  doing satellite  drill  down  something  you  can  do . The  things  I 'm  going  to  show  you, this  Mapbox  mapping  in  the  tabular  data , these  are  actually  new  features  in  JMP  17 . The  others  could  be  done with  older  versions  of  JMP , but  we  definitely  want to  feature  a  couple  new  things that  have  come  out in  the  latest  release  of  JMP . All  right .  I 'm  going  to  give  you this  journal ,  that 's  the  reward for  attending  this  talk . When  you  get  this  journal ,  it 's  going to  have  all  the  information  you  need . It 's  going  to  have  a  picture of  what  we 're  trying  to  replicate . Why  it 's  good ,  tips . It 's  going  to  give  you  the  raw  steps on  how  to  create  these . You  will  also  include  the  data with  the  scripts  to  recreate  it  here . All  right .  This  first  one  is  actually bringing  in  a  formula  into  JMP , which  is  really  cool . There 's  tips  to  do  this . The  tip  is  you  must  have  a  formula  column  within  your  data  table  and  JMP . That  makes  sense . But  you  need  to  find  a  way  to  include all  the  elements  of  the  formula . If  you  have  an  X  and  a  Y  in  your  formula , the  X  and  Y  need  to  be  somewhere in  one  of  the  landing  zones within  the  Graph  Builder . A  better  way  to  put  it . All  right . You  have  those  steps  if  you  need  them . I 'm  going  to  do  this  one  just  from  the  data  in  the  journal . Let 's  see  what  we  have  here . This  was  real  data  that  my  father asked  me  to  help  him  with . He  was  actually  trying  to  decide on  buying  a  garden  hose . He  was  doing  a  lot  of  washing  of  his  patio and  his  siding  at  his  house , and  he  wanted  to  make  sure that  he  had  the  best  water  flow . Well ,  there  is  a  formula  for  water  flow , and  here  is  that  formula . It  matters ,  the  diameter  of  your  hose . You  have  a  three -fourths -inch  diameter , a  half -inch  diameter . It  matters  how  long  the  hose  is . I  guess  the  distance  between  the  spigot  and  the  spray  attachment or  the  end  of  the  hose . That  worries  what  kind of  water  pressure  you  have coming  out  of  your  initial  spigot . You  have  40  pounds ,  60  pounds ,  so  on . There 's  a  formula  in  here and  easy  to  create  formulas using  JMP 's  formula  editor I  have  other  information . I 'll  turn  on  these  little  header  graphs . It  looks  like  I  have  from  0 .75 ,  the  0 .5 even  have  a  0 .625  hose  diameter . Looks  like  I  got  40 ,  50 ,  and  60  water  pressures running  through  that  formula and  it  looks  like  we  collected  data  for , looks  like  from  25 -50 to  75 -100  hose  lengths . So  the  100  feet . All  right . I  have  all  this  information . Let 's  just  go  to  Graph  Builder and  let 's  start  to  fill  it  out . Here  are  all  my  landing  zones . I 'm  going  to  take  the  one  that has  the  formula  in  it  and  put  it  in  first . That 's  the  water  flow . I  put  it  in  the  Y  here . I  think  I 'll  put  the  length down  here  on  the  X . Looking  a  little  more  interesting . Maybe  diameter  would  be a  good  thing  to  overlay  by . I  overlay  and  so  you  can  see I  get  three  different  lines  there for  the  hose  diameter . I  knew  the  water  pressure and  he  was  probably  going  to  go to  the  water  pressure  he  had . I 'm  going  to  put  that  on  the  Group  X instead  of  having  three  panels  here , I 'm  going  to  right -click and  I 'm  going  to  go  Level  in  View and  I 'm  going  to  go , let 's  just  do  one  at  a  time . Now  we   can  flip through  these  and  see  them . Okay ,  now .  How  to  represent  this . It 's  okay  to  have  points , but  this  smoother  line  is  not  the  formula . It 's  just  doing  some  sort of  spline  smoother  through  here . That 's  not  really  helping  me . What  about  if  I  select , I  don 't  know ,  a  straight  line ? Well ,  that 's  not  really  reflecting . That 's  just  connecting  the  points . That 's  not  reflecting  the  formula . To  do  the  formula ,  I  can  select this  little  formula  icon  here , or  you  can  even  right -click in  here  and  just  go , hey ,  line ,  change  that  out  to  formula . Now ,  it  is  reading  the  formula . It  will  not  work  unless  all the  elements  of  the  formula  are  there . See  if  I  take  this  overlay  diameter  out , you  see  the  line  disappears . It 's  got  to  have  all  the  elements of  the  formula  somewhere accounted  for  in  a  graph  element . But  now  it 's  pretty  cool . Now  I  can  sit  there  and  see  that , oh ,  looks  like  high  diameter ,  0 .75 . The  shorter  the  hose  length , the  higher  the  water  flow is  going  to  be  at  40 . It  looks  like  it 's  holding  for  50  or  60 . Something  you 're  going to  see  me  do  a  lot  as  well  is I 'm  going  to  show  you how  you  get  that  little  picture  in  here . It 's  easy  if  you  have  a  picture just  like  a  jpeg . It 's  easy  just  to  drag  it right  into  your  graph . Now  I  have  it  dragged . If  you  right -click  into  the  graph , there 's  a  section  for  images and  you  can  size  it . I  usually  use  this  fill  graph . Then  you  can  right -click  again and  you  can  even  make  it  transparent . I  like  to  do  that  so  I  can  see  the  points , maybe  make  it  only  a  40 %  clear , so  I  can  see  the  lines popping  through  there . Now  that 's  a  cool  graph and  that 's  how  you  do  this  view . Now  we  can  go  and  pick the  right  size  hose  that  we  want  to  use no  matter  the  water  pressure  we  have . All  right . Again ,  you  have  that available  to  you  at  any  time . Remember ,  I  have  the  scripts saved  to  the  data . You  can  click  on  it and  you  can  recreate  it  at  any  given  time . All  right . That 's  the  first  one . We 'll  go  through as  many  as  we  have  time  for . I  think  we 'll  get  through  all  six  today . I  have  them  ordered  in  terms of  when  I 've  shown  this  before and  what 's  the  most  popular. The  next  most  popular  one is  this  tabular  data . This  is  something  that became  available  in  JMP  17 . Why  this  is  nice  is … It  used  to  be  when   I  made  a  nice  graph  here , I 've  got  box  plots  up  in  this  area . It  used  to  be  I  had  to  go and  create  in  a  separate  window , maybe  something from  Tabulate  to  create  a  table and  just  had  to  line  those  two different  graph  windows  up . But  what  if  I  wanted  it  right  underneath ? Well ,  we  now  have  that  capability and  it 's  going  to  be  actually using  new  features  and  caption  boxes that  are  in  JMP  17 and  it 's  going  to  help  us  with  tables . It 's  even  going  to  help  us with  recalculating  reference  lines . Well ,  that 's  cool . What  does  that  look  like ? All  right .  I  have  this  data  set  here . It 's  cool  when  it 's  got all  this  chemical  production . I  got  this  rate  of  reaction and  I  have  these  different  vendors . Say  I 'm  just  really  interested in  graphically  seeing a  difference  in  the  vendors by  the  rate  of  reaction  here . Put  that  on  the  Y , vendor  on  the  X . Maybe  points . It 's  not  as  interesting as  maybe  a  box  plot . Maybe  I 'll  color  by  rate  of  reaction . Maybe  I  will  come  up in  this  little  bottom  left -hand  side , we  call  this  the  panel  boxes  here and  I  can  go  under  this  box  pop -panel and  I  say ,  give  me  a  confidence  diamond . That 's  pretty  cool . I  know  the  middle . That  diamond  is  where  my  mean  is . What  if  I  make  this  even a  little  more  interesting ? You  can  hold  your  control  key  down , shift  key  down if  you 're  using  a  Mac  like  I  am , I 'm  going  to  put  this, I  click . I  got  points  on  top  of  the  box  plots . I 'm  going  to  come  down  to  where this  point  is  and  say  summary  statistic , I  don 't  want  to  see  them  all . Just  show  me  the  mean . Yes ,  the  point  is ,  in  the  middle of  the  diamond ,  that  makes  sense . Oh ,  I  can  even  help  it  a  little  bit . I 'm  going  to  do this  air  interval  selection . I 'm  going  to  do a  confidence  interval . Now  I  can  see  the  ends  of  the  diamond . Oh ,  that 's  really  cool . Maybe  I  want  to  shade  it  all  in and  there 's  an  interval -style  here called  Hash  Band  I  like . Now  it 's  all  instead  of  those  lines , just  those  little  whisker  lines , now  I  have  this  little  shaded -in  square  and  that 's  pretty  cool . That 's  telling  me  maybe Acme  has  a  lower  rate  of  reaction than  somebody  like  Green . Acme  is  over  here , bluish ,  green  is  higher  and  reddish , although ,  the  box  plot 's  showing  me , there 's  a  lot  of  data  in  between , a  lot  of  spread  of  the  data . There 's  a  lot  of  variation  here . But  what  if  I  want  to  now bring  in  what  is  the  mean ? Not  only  what  is  the  mean for  the  rate  of  reaction  overall , but  what  is  the  mean for  Acme ,  Bloom ,  Green ,  and  this  Rizen ? How  can  I  do  that ? What  I 'm  going  to  do is  I 'm  just  going  to  right  click  in  here and  I 'm  going  to  add  a  caption  box and  you 're  like ,  Scott ,  that 's  boring . I  knew  how  to  do  that  in  JMP  16 . It 's  just  sitting  right  up  top  of  here . Yes ,  but  there 's  a  new  thing you  can  do  with  caption  boxes . There 's  a  location  area  here . For  the  mean ,  I  can  actually  say , you  know  what ,   make  it  an  axis  reference  line . Now ,  it  is  right  here  in  my  data . That 's  really  cool . What 's  really  cool  about  this, and I 'm  going  to  right -click  over  here , I 'm  going  to  turn  on ... I 'm  going  to  go  under  redo . You  might  have  seen column  switchers  before and  say ,  I  want  to  switch  out  the  rate of  reaction  with  some of  the  other  type of  continuous  factors  here and  now  get  my  little selection  box  over  here . I 'm  clicking  on  agitation . Do  you  see  it  recalculates the  mean  for  agitation and  here 's  the  mean  for  inlet ? This  is  much  better  than  right -clicking and  go  in  under  axis  settings and  setting  a  static  reference  line because  that  won 't  change . But  this  will  change  if  the  axis  change if  what  you 're  calculated  from  changes . That 's  pretty  cool , let 's  leave  it  at  rate  of  reaction and  let 's  go  ahead and  let 's  do  one  more  thing . I 'm  going  to  add a  second  caption  box . I  right -click  in  here , I 'm  going  to  go  add and  you  can  add  two  or  more , one  or  more  elements . Now  I  add  a  second  caption  box . This  first  one  is  doing an  axis  reference  line . It  doesn 't  know  what  to  do with  the  second  one . It  has  it  overlaying  on  top of  the  other  caption  box . Then  I  can  just  say ,  you  know  what ? This  one  make  it  into  an  axis  table . You  see  now ,  oh ,  it 's  lined  up right  underneath  all  the  labels and  underneath  all  the  columns ,  so  to  say , for  all  my  categorical  levels  here . I  can  even  add  another  summary  statistic and  I  can  go  and  do  like  standard  error and  I  can  just  keep  adding  more and  now  I  can  build  out  a  nice  table , I  can  say  done  here . All  I  really  would  have  to  do  now is  maybe  just go  under  the  Graph  Builder  red  triangle and  clean  up  the  legend  here . I  can  go  to  the  settings  here and  I  don 't  need  all  these  little  legends . I  can  just  keep  the  one for  the  color  gradient and  down  here  as  well . If  you  want  to  get back  to  that  control  panel . You  say  so ,  control  panel . There  was  one  more  thing I  was  going  to  show  you  here . You  see ,  I 'm  carrying four  decimal  points  down  here . The  caption  box will  let  you  change  the  format and  I  can  do  like  fixed  decimal  two . Now  that  looks  really  nice . Even  to  make  it  even  nicer , I  found  out  this  is  a  nice  little  trick . I  found  out  you  can  change the  legend  position . I  can  put  it  at  the  bottom . If  I  right -click  in  here and  go  to  the  gradient , you  can  even  make  it  horizontal . I  love  this  kind  of  horizontal  views and  now  it 's  a  much  more  compact  view and  it 's  going  to  look  a  lot  better when  I  start  to  change  things  around . All  right . A  very  cool  graph . I  would  like  to  give  some  thanks  as  well to  Joseph  Reese ,  one  of  my  peers who  helped  me  create  this  chart and  figure  it  out . Thank  you ,  Joseph . All  right . What  are  we  going  to  look  at  next ? The  in  and  out  parallel  plots . This  one  here . It 's   cool  because  I  often  did  work where  I  had  like  a  project  budget and  you  had  so  much  money that  would  go  into  the  total  budget and  then  you 're  pulling  out to  make  expenditures or  you  have  inputs and  outputs  of  a  process . I  used  to  do  a  lot of  input -output  boxes . Well ,  this  is  a  parallel  plot which  is  showing  me with  the  size and  the  width  of  these  bands . How  much ,  in  this  case  money is  coming  from  jobs  here . But  it 's  all  going into  one  big  bucket  here and  then  out  of  that  bucket , I  have  outflows . That 's  really  cool . How  do  we  set  this  up ? We 're  going  to  do  something that  enables  us  to  actually  look at  combine  data  in  a  parallel  plot . It 'll  be  a  little  easier to  show  you  by  hand . Here  is  my  data . Now  setup  is  everything  on  this and  every  row  here  is  an  expenditure . You  see , I  have  a  separate  amount  for  that . But  sometimes  those  expenditures get  rolled  up  into  groupings . Like  here ,  I 've  got  a  lot of  these  are  going  into  job , so  I  have  a  column  for  inflow and  I 'm  putting  the  category  for  inflow and  I  have  a  lot  that ... All  these  line  items of  inputs  go  in  to  job . That 's   cool . I  got  tax  refund . I 've  got  side  hustle  here . There 's  the  total  bucket and  you  can  see  I  can  start  with  outflows . Here ,  I  can  look  at  all  the  savings  here and  I  have  20K  versus  savings  here and  you  can  see  I  can  even  have  a  second outflow ,  which  breaks  that  savings  down to  where  what  type  of  savings  it  went  to . Some went  to  401K, some  went to  investment . If  you  have  things  set  up  like  this now ,  I  have  everything I  need  to  make  this  chart . I 'm  going  to  go  to  Graph  Builder . Border . I 'm  going  to  just  take all  the  categorical  factors and  I 'm  just  going to  dump  them  on  the  x -axis . I  might  color  by  the  outflow  one and  I 'm  going  to  size  by  the  amount . Now  it's  stuck  on  point so  I will  change  to  parallel  plot . I 'll  make  this  a  little  bigger . Now ,  if  I  put  the  control  chart  down , it 's  looking  okay . But  what  it 's  doing  now , it 's  taking  my  inflow  boxes and  then  it 's  slowly  breaking  them  out . Why  only  want  the  breakout  to  happen  here ? I  want  to  see  what  comes  in from  the  outflow and  what  goes  out  from  that  section . To  do  that , if  it 's  in  the  second  section , if  I  click  on  this  combined  data  sets , it  restarts  on  the  second  bar , so  to  say ,  of  the  parallel  plots . If  I  say  done , you  can  play  with ,  which  is , whether  things  are  ascending or  descending with  clicking  on  these  arrows  up  here , I 'm  going  to  click  on  a  few  of  these . Now  it 's  very  easy  for  me  to  see how  jobs,  side  hustle and  tax  refund  make  up  my  total of  101K and  now  I  can  see  something  like  auto and  that 's  very  cool . You  can  see  here  my  auto  was  11K and my  total ,  101K and  I  can  see  if  I  make this  even  a  little  bigger , I  can  see  that  it  gets  broken  out  among my  car  payment ,  my  gas  and  my  upkeep . This  is  just  a  really  cool  chart  to  use and  there 's  other  things  you  can  do to  make  it ,  enhance  it  a  little  bit  more . You  can  play  with  the  colors ,  but  a  really cool  inflow -outflow  parallel  plot . All  right . See  how  we 're  doing  on  time ? We 're  doing  pretty  good . Let 's  move  on  to the  next  most  popular  views  here in  our  pictures  from  the  gallery . This  one 's  going  to  be  a  forest  plot . Forest  plots  are  going  to  enable  you to  like  plot  means and  put  a  confidence  interval  around the  mean  that  you  can  compare  to  other means  with  confidence  intervals on  the  same  chart . It 's  very  popular , especially  in  health  and  life  science place  where  you 're  doing  a  lot of  summarization . But  we  have  a  cool  little  example here  on  how  to  do  forest  plots . Here  it  is . We  are  going  to  go  out and  we  are  going  to  buy  a  diamond . Maybe  you 're  getting  engaged . Maybe  you 're  getting  married  here . Now ,  people  always  talk about  that  cut  color  and  clarity and  there 's  all  these other  little  different  levels  within  them . Do  they  really  matter ? What  really  drives  what  you  care  about , which  is  what 's  the  average price  for  this ? How  much  am  I  going  to  have to  pay  for  a  diamond ? I  want  to  impress whoever  I 'm  getting  engaged  or  marry . But  want  to  do  it as  efficiently  as  possible . Let 's  take  care  of  this . Let 's  go  ahead  to  our  diamonds '  data . Let 's  take  a  look . I 'm  going  to  open  up the  column  headers  here . It  all  comes  from  one  table that 's  not  that  interesting . But  you  see ,  I  have  summarized . Each  row  is  summarizing  the  mean and  the  standard , the  mean  and  the  lower  and  upper confidence  interval ,  the  standard  error . Just  some  summarized  metrics here  for  it  looks  like  a  combination of  color ,  a  color  in  level  here . I 've  got  color ,  levels ,  I 've  got  clarity , I 've  got  cut and  just  all  kinds of  levels  within  there  that  I  want . To  look  at  this  data , I 'll  just  go  ahead and  put  the  Graph  Builder . I 'll  put  my  mean  price  down  at  the  bottom . Don 't  have  to  worry  about  the ... I 'm  not  going  to  worry  yet  about  the confidence  interval  around  the  mean . We 'll  do  that  last . But  let 's  go  ahead  and  put  the  X  here . There 's  my  color  clarity . I 'm  even  going  to  color  by  the  x . I 've  got  points automatically  being  driven ,  that 's  fine . Now  level ,  I 've  got  the  different  levels of  cut ,  clarity  and  color . Yeah ,  I 'm  going  to  move  it  here . But  you  see ,  you  can  embed  in  here . I  have  the  level  nested  within  my  x , which  was  the  cut  color  of  clarity . That 's  pretty  cool . Maybe  to  make  it  easier  to  segregate those  three  different  aspects , I 'm  going  to  right -click  here , I 'm  going  to  go  to  the  axis  settings and  I 'm  going  to  reverse  the  order . I 've  got  clarity  first and  under  the  X  tab  up  here , I 'm  going  to  show  a  grid that 's  going  to  draw  lines  there . Now  I 'm  looking  over  three  sections . This  is  okay , but  I  hate  the  eyeball  these  points . What  can  I  do ? Let 's  take  this  lower  95 and  upper  side  percent confidence  interval and  let 's  bring  it to  that  interval  landing  zone . You  see  what  it  has  done is  it  is  created  error  bars constructed  around  that  lower  95 and  upper  side 95 %  confidence  interval  around  the  mean . If  you 've  got  a  column  form , you  can  bring  them  into  that  landing  zone . You  can  even  bring  in  a  one -sided  one if  you  only  have  an  upper  or  lower . But  I  like  this  kind  of  shows me  where  those  points  are . You  can  right -click  in  here and  you  can  mess  with  the  marker  sizes and  you  can  make the  marker  sizes  all  big or  I 'm  going  to  make  them a  marker  size  a  five . Now ,  if  I  look  at  it now ,  I  can  answer  some  good  questions about  what  I  should  look  for  in  a  diamond . What 's  really  driving  the  price ? Remember  the  further  it 's  going  on  x -axis the  more  expensive  the  diamond  is . If  I  look  at  clarity , this  does  not  make  sense because  some  of  my  clearest these  things  this  IF it 's  almost  like  flawless  clarity or  very, very, very  subtle differences  in  it . This  is  actually  costing  less  than the  stuff  that 's  supposed  to  be  a  better . That  one  makes  no  sense . What  about  color ? D  was  supposed  to  be  the  best . K  was  supposed  to  be  the  worst. But  I 'm  seeing there 's  two  groups  over  here . What 's  really  driving  it ? It  looks  like  it 's  cut  and  the  ideal cuts  are  the  most  expensive  ones . Then  excellent and  very  good ,  then  good . You 're  going  to  go  buy  the  diamond , forget  about  the  color  rating , forget  about  the  clarity  rating , really  focus  on  the  cuts and  that 's  what 's  going  to  drive  price . All  right . Very  cool  to  do  that  type  of  graph . That 's  a  forest  plots . Very  easy  to  do  in  JMP . All  right . What  is  our  next  view ? Our  next  view  is  actually  something  cool . It 's  good  to  do  with  ranked  or  scale  data . It 's  percent  of  factor . This  is  something  most  people didn 't  know  we  had  the  capability to  do  in  Graph  Builder . We 're  going  to  look at  some  coffee  shops . I  have  all  this  data  from  my  hometown  here where  I  live  is  in  Austin ,  Texas , and  I  got  all  these  coffee  shops and  it  looks  like  there  were  reviews and  they  gave  them  ratings . Whether  the  ratings  low  or  high , 4  or  5  stars ,  the  best . I  even  have  things  like  sentiment , like  how ,  what 's  the  vibe  of  the  place ? That 's  something my  daughter  likes  to  say . This  place  has  good  vibes  or  bad  vibes . One  of  our  favorite  things is  checking  out  coffee  shops . Where  should  we  go  this  afternoon ? We  want  to  get  some  coffee  around  Austin . What  we 're  going  to  do is  we 're  going  to  set  this  up . I 'm  going  to  go  over  here to  my  Graph  Builder . I 'm  just  going  to  put my  coffee  shop  name  in  here . Okay ,  that 's  pretty  cool . Now ,  what  I  can  do , I  can  put  some  type  of  scale  here . I  have  the  rating . I  can  put  the  rating  down  here at  the  bottom , instead  of  points , I  can  ask  for  bars  to  be  done . That 's  not  too  interesting this  side  by  side ,  but  I  can  do a  percent  of  total . I  think  I  have  this ... Think  I  don 't  have  this set  up  correctly  here . This  is  a  good  thing  you  do when  you  got  everything  saved  for ,  yeah . I  can  come  right  back  in  here so  have  coffee  shop  name and  I  have  counts so  I  had  the  wrong  thing  on  there . You  want  something  continuous  on  there . We  had  the  actual  count  of  the  data instead ,  I  think  I 'm  going to  use  the  raw  numeric  rating instead  of  the  one that 's  categorical  here . Let 's  take  a  look  at  that  again . Let 's  go  ahead and  put  my  coffee  shop  name  out  here . Let 's  go  ahead  and  put  that  either that  rating  or  the  counts . I  guess  I 'll  just  put  the  count up  here ,  down  at  the  bottom . I 'll  go  to  bars  and  that 's more  what  I  was  looking  for . Now  I  have  the  count -down  here . It 's  going  to  give  me  a  raw  count . Doesn 't  look  that  interesting . But  now  I  can  take that  categorical  rating , and  we  can  do  something  like overlay  by  it  and  take  a  look  at  it . I 'm  going  to  go  back  and  double  check and  see  what  I  had  overlaid  by  here . It  looks  like  overlaid  by  the  rating . I  will  do  the  same  on  our  graph . Now  take  that  rating , I  will  overlay  by  it . It  looks  like  a  real  mess  right  now , but  now  I  can  go  in . You  can  choose  different  types of  bars  here , but  the  one  I  am  going to  look  for  is  going  to  be  one that  utilizes  a  new  summary statistic  called  percent  of  total . I do  that  one,  percent of  total . I  can  take  a  look  at  this in  different  types  of  bar  configurations . Again ,  I  will  take  a  look  to  see  what  kind of  bar  configuration  I  had  used  here in  my  finished  graph  and  I 'm  going  to  look at  stat  percent  of  factor is  the  one  I 'm  going  to  do . Instead  of  percent  of  total , let 's  do  percent  of  factor and  now  let 's  do  the  stacked  one . There  we  go . That 's  the  view  I  wanted . Stacked  percent  of  factor . It 's  going  to  change that  kind  of  count  to  100 %. It  is  going  to  break  out  what  portion of  it  went  to  what  rating , which  is  really  cool . Other  cool  things you  might  not  have  known  you  can  do . I  can  right -click  in  here and  you  can  order  by  something . I  can  even  order by  something  that 's  other . Like  already  have  a  high  rating . Yes /no . I  knew  this  was  like  fours and  fives  versus  one ,  twos  and  threes . I 'm  going  to  select  that  one . Now  you  can  see  it 's  kind  of  did a  nice  job  putting  the  ones that  have  the  more  four  and  fives on  the  top  of  the  graph . Now  I  can  say , "Hey ,  we  might  want  to  hit  the  Saa -Ten " if  I 'm  saying  that  correctly . This  is  my  favorite  coffee  house is  flight  path , so  maybe  we  weren 't  going  to  hit  that  one . But  my  daughter  might  say , "What  about  the  vibe ?" Then  you  can  say ,  "Well ,  that 's  good . Let 's  just  do  a  little  local  data filter  and  let 's  bring  in  the  vibe ." Maybe  make  this  a  block  style  view . Now  let 's  go  and  select just  the  ones  that  were  two ,  threes and  four  is  on  vibes  now , or  maybe  threes  and  fours  on  vibe and  we 'll  do  two ,  threes . I 'll  do  one ,  two ,  threes  and  fours . Now  we  can  see , "Oh  the  Hideout  has  really  good  vibes ." Flight  path  still  down  here , but  maybe  we  want  to  go  and  check the  hideout  out  if  can  find  it . That 's  the  easy  thing  you  can  do . Why  I  really  like  using this  percent  of  factor . If  you  can  have  a  continuous  x and  you  can  even  overlay by  something  which  can  go  break that  count  up  of  by  some  section , this  is  a  great  chart  to  use . All  right . That 's  just  going  to  leave us  with  our  last  chart . Our  last  chart  is  probably the  most  photogenic  of  it . It 's  going  to  be  Map box  Mapping . This  is  something  new  in  17 that 's  really  powerful . We  had  the  ability  to  look  up to  use  building  maps . You  had  the  ability to  use  mapping  services  in  the  past . But  in  17  we  came  up  with  a  much better  type  of  mapping  service , same  one  you  might  see  like with  Google  Maps  and  it 's  Mapbox  Mapping . I 'm  going  to  look I  have  these  some  select  hotels I  used  to  stay  at  when  I  used to  run  around  the  country  for  JMP . I 've  I 've  got  all  kinds of  information ,  but  the  most  important , I 've  got  latitude  and  longitude in  the  hotel  name  and  the  counts . This  will  be  a  good  thing  to  map . I  know  I  can  go  in  the  graph and  Graph  Builder  and  bring  up  a  map . I  know  I  can  put  my  latitude  down and  my  longitude  down and  I  know  you 're  already  starting  to  see , the  shape  of  the  country , East  Coast ,  West  Coast  going  on  here . I  know  I  can  right -click now  under  graph and  go  to  that  background  map . Now  instead  of  a  street  map  service or  just  doing  any  of  these  other  things that  used  to  be  in  JMP , I  can  do  a  web  map  service and  it 's  going  to  allow  me to  pick  from  all  these  Mapbox  options and  like  Mapbox  dark  one is  pretty  cool . That  we  can  select . There  we  go . You  do  it ,  and  I 'll  go  ahead and  I 'll  show  that  again . It 's  under  background  map and  you 'll  select  a  street  map  service . You  won 't  select  this  service unless  you  want  to  interact  with  it by  specifying  a  layer , but  I don 't  worry  about  that . Street  map  service  and  then you  pick  from  the  Mapbox  selections and  this  is  what  a  dark  one  looks  like . It 's  the  nighttime  view  of  the  country . That 's  cool . I  saved  to  my  script  a  couple  of  others . This  one  is  looking  at  the  Map box , outdoor  one . So  if  you  want  to  see if  you 're  in  the  water , you  want  to  see  if  you 're  in  the  bay , you  want  to  see things  a  little  differently . I  can  even  look  at  a  street  view . This  is  the  Map box  Street  View . Now ,  to  do  this  one ,  I 'm  going  to … It 's  hard  to  see ,  but  you  do  have a  little  plus  or  minus  into  your  map that  you  can  select  a  drill  down . I  like  doing  it  through the  magnifying  element  up  here . So  I  switch my  pointer  out  for  the  magnifier , and  here 's  a  hotel I  stayed  at  in  Sacramento . I 'm  going  to  click  on  it . I 'm  going  to  click  on  it . It  was  called  the  Delta  King . What 's  up  with  this  one ? It  looks  like  it 's  a  boat . Well ,  I 'm  going  to  right -click  right  here . I 'm  going  to  go  into  that  background  map and  instead  of  Streets , I 'm  going  to  select  a  satellite . How  cool . Now  I  can  go  see  it  is  a  boat . The  Delta  King  is  a  very ,  very  cool , ferry  boat , a  historic  one  that  used  to  run  between , I  think ,  Sacramento  and  San  Francisco , and  they  made  a  hotel  out  of  it . Now  you  can  stay  in  the  old  town in  Sacramento  and  check  it  out , thanks  to  Bonnie  Rigo . When  I  work  with  my  team  right  now who  introduced  me  to  this hotel  really  cool  one  to  use . All  right . That 's  how  we  do  these , and  I 've  got  other  cool  ones  in  here . You  want  to  see … I 've  got  some  of  the  other  cool  places I 'd  stayed  here . Like  looking  at  a  cool  one in  Las  Vegas  here . Maybe  you  recognize  this  one ? This  one 's  that  old  Luxor  pyramid . That 's  why  silicon  in  Las  Vegas … Satellite  views  are  really  cool . We  even  looking  at  maps  of  Miami  Beach . Cool  things  where  there 's  a  lot  of  water and  that 's  the  Fontainebleau in  Miami  Beach . Man ,  I  wish  it  was  one of  my  boats  as  well ,  but  it 's  not . Very  cool . Very  cool  things  you  can  do  now  in  JMP . Graph  Builder  with  that  Mapbox . All  right . Let 's  go  and  take a  look  at  the  other  ones . Again ,  thank  you . Thank  you  to  Joseph  Reese for  helping  with  tabular  data . I  don 't  think  I  mentioned Jason  Wiggins  helped  me with  the  flow  parallel  plot . So  thank  you ,  Jason ,  for  that  as  well , and  I  always  include in  my  journal  a  bonus  one . I 'm  not  going  to  show you  how  to  make  this  one . I 'm  going  to  give you  a  little  incentive  to  go  out , and  try  out  the  instructions . They 're  there  for  you . This  is  a  painter  chart . This  is  something  that  is  new in  the  Pareto  platform  in  JMP  17 , but  found  out  in  Graph  Builder , I  can  make  this  all  along . It 's  a  combination  of  a  bar  chart , a  run  chart ,  and  Pareto  chart . It  was  very  popular  at  places  like  Ford , when  they  were  looking  at  defects , and  I 've  seen  it  used  a  lot in  semiconductor  and  high -tech , for  example . This  is  just  an  example of  how  to  create  this  type  of  combo  chart within  a  Graph  Builder . All  right ,  so  that 's  your  bonus . I 'm  going  to  leave  you  behind with  where  to  learn  more . I 'm  going  to  give  you  the  link  to  the other  seven  pictures  from  the  gallery . Time  six , there 's  another  42  really  cool  views . You  can  go  and  look at  an  additional  the  one  I  just  gave  you , and  that 's  on  our  JMP  community , our  source  for  everything  you  want , for  learning  JMP , for  past  discovery  talks , for  Q&A ,  for  whatever  you  need . Blogs  and  journals , we 've  made  some  cool  blogs out  of  a  lot  of  these  graphs , so  please  visit those  as  well  on  the  community . There  was  a  link  in  the  community  again to  a  lot  of  the  good  training so  you  can  learn  from  our  the  godfather of  the  of  the  Graph  Builder ,  Xan  Gregg . Thanks  to  him for  creating  the  Graph  Builder and  making  it  so  powerful , and  there  are  other  tutorials and  training  available  to  you , as  well  as  the  presentations , and  if  you  have new  views  that  you  want  to  try  within  JMP , you  can  email  me , challenge  me  to  recreate  it . Maybe  you  get  this  in  some  other  thing besides  JMP , or  you 've  done  it  in  a  spreadsheet , it  takes  a  long  time  to  make , and  you 're  like , "Can  I  just  do  this  in  Graph  Builder ?" Challenge  us  and  if  we  have  something that  we 're  not  capable  of  making , but  would  be  good  to  consider for  future  releases  of  JMP , we  have  JMP  18 coming  out  pretty  soon  as  well . We  would  love  to  hear  that  and  all  you have  to  do  is  go  to  the  community , go  to  the  JMP  wish  list , and  put  in  what  that  is , and  it  will  go  under  consideration for  adding  into  possibly the  next  version  of  JMP . All  right . That  is  my  talk . Thank  you  so  much . I  hope  you  enjoy  all the  talks  here  at  Discovery , and  please  let  us  know  next if  you  have  any  questions , and  have  fun  exploring  with  Graph  Builder .
Data is everything. Every organization, big or small, collects data and knows the more insight they can gain from the data, the more competitive they will be. But what tools does the organization need? What skill sets are necessary for their most valuable asset, their employees? How do they quickly ascertain the level of competency their people have in order to achieve this end?   Enter the JMP Analytical Workflow Survey (JAWS). JAWS is an expeditious tool for organizations at all scales to rapidly and succinctly identify the competency of their people – their strengths and weaknesses and an estimate of time they spend to perform certain analytical tasks – to create a roadmap to achieve organizational goals. By deploying JAWS in your organization, you will gain insight into the current state of your analytical fitness, identify strengths within your organization, and develop targeted action items to address your weaknesses. Perhaps most importantly, it will identify areas in your analytical process that are overly time-consuming or ripe for automation, freeing up valuable brainpower to address more pressing issues.   In this presentation, we walk you through the steps of deploying the JAWS and highlight the incredibly valuable insights one can gain, which will allow your organization to make data-driven decisions efficiently to achieve the analytical ends you desire.   Learn More.     Hello.  My  name  is  Peter  Polito.   I'm  a  Senior  Systems  Engineer  at  JMP. Today  I'm  going  to  be  talking  to  you about  quantifying   your  organization's  analytical  maturity in  order  to  make   data- driven  decision  making so  you  can  do  things  better. I've  been  using  JMP  for  some  time. I  actually  learned  JMP  on  a  boot  leg  copy of  version  six   when  I  was  in  graduate  school. It's  been  a  part  of  my  life  for  a  long  time, and  being  able  to  use  it  to  help  others find  success  is  one  of  my  great  joys. Today,  I  have  a  lot  of  people   that  are  helping  support  this, primarily  is  Brady  Brady. He  is  a  Principal  Systems  Engineer   also  at  JMP. He  helped  craft  the  background  tool to  perform  all  the  analytics  that  I'll  be  presenting and  that  you  may be  able  to  take  advantage  of. Then,  of  course, I  want  to  thank  my  team. We  work  with  high  tech  companies   here  in  the  United  States, and  that  is  Ben  Ross, who  is  a  Strategic  Account  Manager, and  Kyle  Bickford, a  Senior  Account  Executive. The  goal  for  today  is  to  demonstrate how  you  can  collaborate   with  your  JMP  support  team to  quantify  your  organization's  analytical  maturity. By  quantifying  that, you'll  be  able  to  understand   where  people  are  spending  their  time and  how  competent  they  are, and  then  use  that  as a  benchmark  to  track  progress. By  working  with  your   JMP  support  team  in  this  effort, they  can  help  craft  the  support  necessary to  bring  your  team  from  where  they  are to  where  you  would  like  them  to  be. How  do  we  do  this? We  use  the   JMP Analytical Workflow Survey. I'll  refer  to  it  as  JAWS throughout  this  talk. I  feel  that's  a  little  catchier. The  tool,  it's  a  quantitative  tool to  just  measure  the  analytical  maturity  for  your  entire  organization. Despite  JMP being  in  the  name, it  is  not  just  for  JMP  users and  so  the  idea  is  that  you assess  an  entire  organization. You  can  break  it  down  by  department, by  job  title,  etc., and  then  use  this  by  annual  rerunning of  the  survey  to  understand if  you're  moving  in  the  right  direction and  what  areas  you  need  help  in. It  also  helps  you  identify  white  space where  maybe  analytics  is  heavily  used, so  that  you  can  bring   that  up  to  your  management or  whoever  it  might  be in  order  to  help  get   your  entire  organization  moving  in  mass in  the  direction   that's  going  to  drive  your  company towards  discovery,  efficiency  and  growth. How  does  it  work? It's  simple. It's  a  five  minute  anonymous  survey. It  measures  the  amount  of  time a  person  spends  performing a  particular  analytical  task. It's  not  just  one, but  all  of  their  analytical  tasks. It  understands  their self-professed  competency. By  doing  this  anonymously,  we  find that  people  tend  to  be  more  honest. Not  only  do  they  get  to  say, I  spend  two  hours  a  week  doing  data  visualization, but  they  can  also  say, I  don't  really  know  what  I'm  doing,   or  I  got  this. I'm  definitely  an  advanced  user. So  we  can  understand   how  much  time  they're  spending and  what  level  of  competency  they  have. Maybe  most  importantly, it'll  give  the  person  opportunity  to  say, I  need  to  know  how  to  do  this, and  I  need  help   because  I  don't  know  what  I'm  doing, or  I  need  to  do  this  to  complete  my  job, and  I'm  totally  competent  in  this. It's  time  spent, how  well  they  think   that  they're  able  to  do  it, and  where  do  they  need  support   to  do  their  job  better. If  you  think  about  it, those  three  things  for  management  to  know  that, I  mean,  that's  incredibly  valuable. This  is  a  very  easy  way   to  identify  this  information, and  we  present  it  in  a  way  that  has  great  visuals, easy  to  comprehend  and  digest, easy  to  share  with  upper  management to  help  build  that  roadmap for  getting  your  organization  from  where  they  are to  where  you  want  them  to  be. What  exactly  are  we  looking  at? Well,  if  you're  unfamiliar, this  is  called  the  JMP  analytical  workflow. It  works  left  to  right. On  the  left  you  have  where  is  data  coming  from? For  example,  in  the  survey, it's  going  to  ask  how  much  of  your  week is  spent  interacting  with  files  or  documents  or  databases  or  web  APIs. Your  data  comes  into  some  analytical  tool, preferably  JMP,   but  it  might  be  something  else. Then  people  spend  time  doing  tasks. From  accessing  data to   performing  basic  data  analysis  and  modeling. Maybe  they're  doing  reliability, consumer  research. Maybe  their  job  is  more  focused on  building  automations for  the  organization. They're  doing  something  or  a  series  of  some  things that  take  some  time   and  requires  competency. Then,  of  course, they  need  to  share  that. It'd  be  a  shame  if  all  of  our  hard  work  just  lived  in  our  hard  drive, we  presented  it  in  a  PowerPoint in  a  meeting,  and  then  it  just  goes  away. We  want  data  to  come  in, we  want  something  to  be  done  to  that  data, and  then  we  want  data  to  be  shared with  the  entire  organization so  that  people  can  learn. The  longer  I  am  in  this  position, the  longer  I  recognize  there's  no such  thing  as  a  one-off  problem. Everything  comes  back  in  some  shade of  gray  relative  to  where  it  started. If  we  have  an  area  or  a  way  to  query all  of  the  problems   an  organization  has  solved, it's  going  to  save  a  lot of  time  in  the  future because  people  aren't  going  to  be starting  at  ground  zero. W e  want  to  measure  all  of  these  things. Where  are  they  spending  their  time? How  capable  do  they  feel they're  at  doing  that? And  where  do  they  need support  to  do  that  better? The  first  step  is  collecting  the  data. This  is,  as  I  mentioned, about  a  five  minute  survey. The  first  three  questions are c ompletely  customizable to  fit  your  organization. This  is  anonymous,  but  we  want  to  know  some  things. We  want  to  know   where  are  these  people  located? For  some  of  you,  it  might  be  we're  all  in  one  place. Maybe  they  work  in  the  office, they  work  from  home. Who  knows? Maybe  they  work  in  the  United  States. Maybe  they  work  in  Europe. Maybe  they  work  in  Asia. Maybe  they're  just  spread  out in  different  sites  across  the  US. But  we  can  customize  that  to  fit your  o rganizational  design. Then  we  want  to  know  what  department  they're  in and  what  are  their  job  roles. This  allows  us  to  slice  and  dice  that  data once  that  survey  data  is  collected to  better  understand  where  are  things working  well  and  where  do  we  need  support? You  may  have  an  R&D  department  that's in  one  location  that's  just  crushing  it. They  are  very  competent. They're  very  capable. They're  not  spending  too  much  time because  they  built  automation. Then  their  peers  at  maybe a  newer  location  are  way  behind. They're  the  ones  that  need  support   by  designing  it  in  such  a  way that  we  can  slice  and  dice  it   by  department,  by  job  title,  by  region. We  can  really  get to  the  heart  of  where  support  is  needed or  really  pat  ourselves  in  the  back because  we're  doing  things  well. We  are  where  we  thought  we  would  be. But  I've  administered   the  survey  many  times, and  every  single  time  I  hear, I  thought  we  were  better, so this  is  a  great  way  to  measure  that. From  there,  we're  going  to  look  at  how  much  time do  people  spend  doing  particular  things. You'll  see,  none  of  this  is  JMP  specific. It's  really  designed  for  anyone. Anyone  working  with  data needs  to  get  that  data. Anyone  working  with  data  needs  to  clean  that  data, put  it  into  a  position   where  they  can  analyze  it, look  for  outliers, visualize  it,  whatever  the  case  may  be. Additionally,  we'll  have  something  very  similar  to  this to  collect  data  on  what  their  competency  is. Then,  of  course,  as  I  mentioned, we  have  the  opportunity  for  them  to  say, I  need  to  access  data. It  is  critical  to  my  task, and  I  am  not  good  at  it. I'm  inefficient, I  don't  know  how  to  query  our  database, I  don't  know  how  to  bring  in  55  CSV  files  efficiently, I  need  advanced  training. Or  I'm  really  good  at  this,   or  I  don't  even  need  this, someone  just  emails  me a  file  and  I  do  my  work. It  allows  them  to  tell  you  what  do  I  need  to  be  better  at, so  that  I  can  be  better  at  my  job. At  the  end  of  the  day, most  of  our  people   want  to  do  their  job  well. They  want  to  be  successful. They  want  to  advance  in  the  company. They  want  to  show  that  they have  value  and  worth. This  is  an  opportunity  for  them to  tell  you  where they  think  they  need  help. There's  been  a  few  instances   where  I've  talked  to  a  management  team, and  they're  like,  this  department doesn't  know  how  to  do  that. Then  the  survey  results  come  back and  the  manager  is  like,  my  goodness, they  all  feel  like  they  need to  know  how  to  do  this  better. I  have  no  idea. This  is  really  an  eye- opening  opportunity for  a  lot  of  people   when  they  see  these  results to  really  fully  understand  exactly  where  their  people  are versus  where  they  think  they  should  be. We  collect  this  data, your  JMP  support  team   will  analyze  this  data, and  then  they'll  be able  to  present  it  to  you. Now  we're  going  to  walk  through   what  this  data  looks  like, so  you  can  get  a  sense   of  what  will  I  learn. We'll  go  at  a  couple of  different  views. At  the  10,000  foot  view, we  get  these  heat  maps  that  show  where  are  people  spending  their  time. I've  broken  this  up by  organizational  wide  on  the  upper  left, by  job  title  on  the  upper  right, and  by  department  at  the  bottom. If  we  just  look  at  the  upper  left, we  can  see  that  people  are  predominantly interacting  with  files, probably  Excel  files  or  databases, and  they're  doing  a  lot  of  data  exploration, a  lot  of  basic  data  analysis  modeling, very  little  reliability  analysis, a  little  bit   of  quality  process  engineering, and  they're  primarily  sharing  images. If  we  look  at  job  title, it's  a  similar  story, but  this  particular  job  title is  maybe  doing  a  little  bit  more  time running  design  experiments. Then  by  department, maybe  this  is  an  analytics  department or  a  chemistry  department, but  they're  doing  a  lot  of  DOE, a  lot  of  basic  data  analysis, a  lot  of  data  base,   and  then  sharing  images. As  a  leader  in  your  organization, you  might  ask  yourself,  are  images the  best  way  to  share  this  data? If  not,  this  shines  a  light  on  the  fact that  your  company  is  spending a  lot  of  time  sharing  images. Maybe  A,  this  could  be  automated, or  B,  maybe  we  want  to  push  people in  a  different  direction sharing  some  other  data  format,  writing  particular  reports  or  etc. But  it  just  shines  a  light on  the  things  that  are  going  on. Then  you  and  the  support  team  will  work  together  to  better  understand, they'll  probably  have  lots  of  questions, is  this  what  you  want? Is  this  what  you  expected? Should  you  have  people  doing  more  quality if  you're,  say,  a  manufacturing  firm? Do  you  want  people  to  be  quality  minded or  do  you  have  a  quality  department? Those  sorts  of  questions  will  start to  flesh  themselves  out and  they'll  help  you  craft how  they  might  be  able to  provide  that  support  to  you. Or  you  might  just  take  these results  and  say,  thank  you  so  much. Now  we're  going  to  go do  what  we  think  we  need  to  do. Going  a  little  bit  deeper. We'll  call  this  the  maybe  8,000  foot  view. This  example  is  broken  up  by  department  and  location, and  it's  showing  how  much  time  people are  spending  doing data  exploration  and  visualization. We  can  see  in  the  manufacturing  department and  in  the  fermentation  PD  department, a  few  people  are  spending  over  eight  hours  a  week. Whereas  over  at  R&D,  they're  spending  considerably  less  time. Maybe  that's  fine, maybe  that  isn't  fine. But  again,  it  just  helps  you  see  exactly how  your  people  are  spending  their  time. If  you  have  someone  located  in  the  east  in  the  firm  department and  they're  spending  zero  time doing  data  exploration and  visualization, that  might  be  a  problem. I  would  think  they  would  need  to  share their  data  and  look  at  that  data. So  you  might  have  some  questions and  you  go  ask  that  team to  better  understand   exactly  how  they're  doing  things. Then  we  get  to  go  a  little  bit  deeper, and  these  are  my  favorite  images. What  we're  looking  at  is  proficiency  on  the  left and  usage  on  the  bottom. Y  versus  X. Then  these  cells  are  colored by  how  many  people  are  spending  time or  grafted  with  their  competency. If  we  look  on  the  left, we  see  quality  process  and  engineering. For  this  particular  organization, by  and  large,  people  are  not doing  much  quality  work. This  is  where  I  always  ask, do  you  want  your  company   to  be  quality  minded, or  is  quality  focused  on  a  single  department? If  you  want  your  people  to  be quality  minded,  this  might  be  a  red  flag. No  one  is  an  advanced  user. People  are  spending  predominantly   less  than  an  hour  a  week, and  the  majority of  the  people  aren't  doing  it  at  all. This  would  be  a  situation   where  I  might  come  in  and  say, can  I  teach  your  people  about  the  quality  tools  in  JMP so they  can  better  understand how  to  build   and  interpret  a  control  chart? How  they  might  look at  metrics  like  CPK  and  PPK. Are  we  hitting  spec  limits? Are  we  not  hitting  spec  limits? How  do  we  understand  that  more  deeply so  we  can  make  more  intelligent  decisions to  solve  problems  and  understand  things. Contrasting  that  on  the  right with  this  basic  data  analysis  and  modeling  image, we  see,  I  would  call,  more  maturity. There  are  very  few  people  that  aren't  performing  this  task  at  all. The  majority  of  the  people   are  intermediate  too  with  some  advance, and  there's  also  a  lot  of  beginners. But  what  I  see  here  is  critical  mass. I  see  that  this  organization  has  enough  competence and  enough  people that  understand  it  well  enough that  they  can  help  draw  those  novice  users down  to  the  intermediate  level and we  have  some  advanced  users that  can  bring  the  intermediate  down  to  their  level. We  also  have  people  spending, predominantly  1-4  hours. Not  that  many  people  are  spending   20 %  of  their  week  performing  this  task. When  I  see  that,  I  ask, could  this  be  automated? We'll  get  to  more of  that  in  just  a  moment. This  really  helps  people understand  where  they  are. Do  they  have  maturity  in  this  analytical  capability or  do  they  need  support? Are  they  where  they  thought  they  would  be or  do  they  need  to  move  their  people  through  support to  a  more  advanced understanding  and  competency? This  is  probably  one of  my  favorite  images. What  we're  looking  at  here  is on  the  left  on  the  Y  axis, is  the  amount  of  time  people  are  using a  particular  capability  per  week. On  the  X  are  those  different  capabilities that  we  saw in  the  JMP  analytical  workflow. For  now,  you  can  just  ignore  the  color. Those  are  color  coded  by  the  amount of  time  they've  been  using  JMP. That's  one  of  the  only   JMP  specific  questions in  the  entire  survey. But  what  we  see  here, and  I  want  to  draw  your  attention  right  here, is  there  are  six  people  spending  eight  plus  hours  a  week performing  data  access. If  you  have  1  person  spending 8   hours  a  week, that's  20 %  of  the  week. That  means  five  dots equate  to  one  annual  salary. Is  one  annual  salary   how  you  want  to  be  spending… Do  you  want  to  be  spending that  amount  of  time  on  data  access? Probably  not. I  mean,  it's  not  cheap  to  hire  someone. It's  not  cheap  to  support  them, provide  benefits  and  training, and  keep  them  motivated  and  keep  them growing  within  the  organization. This  is  a  very  impactful  image  because it  shows  us  where  can  we  automate. Clearly,  you  can  see   up  in  the  very  upper  left, that  green  dot,  someone  has already  automated  data  access. Whereas  we're  spending 1.2  annual  salaries  on  data  access, and  many  of  them  are  new  users. Half  of  them  are  only  1-3  years. So  could  we  come  in   and  teach  people  about  automation? Could  the  person   that  has  already  automated  this sit  down  with  these  other  six  people and  teach  them  how  they have  automated  their  process? Because  if  you  can  free  up   an  entire  annual  salary, think  of  what  you  can  do. I've  worked  with  people that  are  in  hiring. I've  worked  with  people  that  manage  teams. The  common  thread  I  hear is  we  need  more  people. Either  A,  we  don't  have  the  budget, or  B,  and  right  now  in  this  environment, it's  just  sometimes  hard  to  hire  people. If  you  can  liberate  an  entire  person from  a  particular  task, just  think  of  what  more  you  could  do. They  could  solve  more  problems, they  could  help  bring automation  elsewhere. They  can  automate  data  access for  everybody  potentially. This  is  a  really  useful  thing. On  the  flip  side, if  we  look  at,  say, predictive  modeling  and  machine  learning, it's  right  here  in  the  center, we  can  see  there  are  only  two  people spending  any  time  at  all: one,  one to  four  hours  a  week   and  one  less  than  an  hour  a  week. We're  spending  a  fraction  of  the  time, particularly  compared  to  data  access, on  predictive  modeling   and  machine  learning. Perhaps  this  is  not  necessary in  your  organization. Perhaps  it's  very  necessary if  you  are  trying  to  understand why  aren't  we  hitting  our  manufacturing  KPIs? Why  are  we  having  these issues  in  our  process? We're  not  able  to  understand  exactly  why, despite  having  everything  set  up the  way  we  think  should  work, we're  not  hitting  our  metrics. Well,  again,  this  is   a  ripe  opportunity  for  support. Yet  on  the  other  flip  side, so  I  think  we're  looking at  a  triangle  here,  a  prism  here, basic  data  analysis  and  modeling, I  see  they're  doing  fantastic. They  have  a  lot  of  people that  are  performing   basic  data  analysis  and  modeling. We  have,  as  we  saw  earlier, some  good  there. We're  not  spending  a  lot  of  time. These  are  one  of  the  tasks   where  automation  may  not  be  possible. It  might  be,  but  it  might  be  people  are dealing  with  problems  that  are  unique  every  single  time. Again,  this  is  an  opportunity  where  I, as  someone  trying  to  support  a  customer, might  be  starting  to  ask  some  questions and  understand  if  automation is  even  possible. But  by  and  large,  this  is  a  good  vertical  in  this  particular  graph, as  well  as  over  on  the  far  right, sharing  and  communicating  results. That's  another  one  that  is  very  easy  to  automate, but  the  majority of  people  are  sharing  results. I  might  have  some  questions about  why  there's  maybe   about  40 %  that  aren't and  is  that  important  to  you? But  this  just  really  puts  the  entire  story in  one  image  that  really  helps  you  understand where  opportunity  is  to  A,  automate,  B,  train, and  C,  say,  great,  we're  doing  well, we  don't  need  to  spend  time  on  that. Then  finally,  where  do  your  people   think  they  need  support? On  the  left, we  have  those  capabilities. On  the  bottom,  we  have  four  questions. Is  this  critical  to  my  task and  training  is  needed? This  is  critical  to  my  task and  basic  training  is  needed. I  don't  need  this, or  maybe  I'm  just  interested. I've  organized  these  based   on  the  majority  of  people or  the  number  of  people  that  feel  that  advanced  training  is  needed. This  is  from  a  different  organization. But  I  imagine  if  you   are  a  tech-driven  company, an  analytically- driven  company, you  would  probably  think that  basic  data  analysis  and  modeling is  not  something that  you  need  to  worry  about. But  here,  the  majority  of  the  people are  saying  this  is  critical  to  my  task and  I  need  advanced  training. Again,  it's  just  shining  a  spotlight on  the  areas  where  you  might  need to  support  your  organization, where  you  might  need  to  support  your  people because  they're  saying very  clearly,  I  need  help. Whereas  mass  customization, automation  and  scripting, reliability  analysis,  these  aren't  things that  people  need  as  much  support  on. You  can  know  that  I  don't  need to  spend  time  in  this  area  or  that  area. I  need  to  focus  up  here. It  turns  out  a  lot  of  these  are  fairly  basic  tasks that  I  think  a  lot  of  people  think that  people  are  fully  capable  and  competent  of, but  they're  clearly  saying,   no,  no,  I  need  some  help. The  benefits  of  the  JAWS  is, it  identifies  strengths  and  areas for  improvement  within  your  organization. You're  able  to  work  with  your  JMP  support  team and  provide  support  in  those  areas. Your  support  team  has  the  training, has  the  tools,  has  the  backing  support of  a  large  organization  that  is  focused solely  on  expanding  the  use  of  JMP to  come  in  and  guide  your  people in  whatever  support  you  need. We  are  here  to  help  you  out. Then  the  beauty  of  this  is  if  you administer  this  survey  annually, you're  going  to  start  to  be  able  to  track  progress. You'll  be  able  to  see  we  needed a  lot  of  help  in  design  of  experiment. A  year  later,  we  see  improvement. The  support  has  worked, and  we  just  need  to  go  a  little  bit  farther. We  can  say  DOE  is  now  doing  great. Let's  focus  our  attention  elsewhere. Some  best  practices. This  is  very  practical,   but  what  we  have  learned  is, don't  allow  a  long  time for  the  survey  to  be  filled  out. We  say  send  it  out  on  Monday, send  a  reminder  email  on  Wednesday, and  close  the  survey  out  on  Friday. How  this  would  work  is  you  would  work   with  your  JMP  support  team to  craft  those  questions  about  region, about  department,  about  job  title, and  then  they'll  just   provide  you  the  survey., You  send  that  survey  out and  people  fill  it  out, we  collect  the  data,  it's  all  anonymous, and  then  we  analyze  those  results and  then  come  back  and  share  those  results  with  you. But  keep  it  short. Don't  allow  people  a  long  time  because people  get  busy  and  they  just  forget. It's  really  important,  I  think, to  get  those  three  questions  right. We  don't  want  to  be  too  much  of  a  grouper because  then  you  don't  have the  level  of  understanding that  you  might  want. You  want  to  be  a  splitter. Really  dig  down,   get  those  departments  right, get  those  job  titles  right, get  those  regions  right, because  you  can  always   group  things  together  later to  understand  the  survey  results, but  you  can't  split  them once  you've  collected  that  data. This  one  is  probably  the  most  important, is  it's  incredibly  valuable   to  get  management  buy-in and  then  develop  a  team   to  help  administer  that  survey. You  need  people  that  people are  going  to  listen  to. If  you  have  someone  in  your  company who  is,  for  lack  of  a  better  term, not  well  liked  and  they  send  out  a  survey, people  probably  aren't  going  to  be  as  likely  to  participate as  if  you  have  a  team  of   3-5  people that  are  leaders  within  the  departments or  leaders  within  their  organizations, and  people  are  going   to  hear  that  and  listen  to  that. It  helps  even  more  when  you  have management  saying,  you  need  to  do  this. The  flip  side  of  this  is  when  you  get  the  data  back, you  want  management  to  be  involved. You  want  someone  that  has   some  decision- making  capabilities to  see  the  results and  understand  what's  going  on so  that  they  can  help   craft  the  big  picture. It's  great  when  JMP  usage  expands from  the  bottom  up, but  when  you're  trying  to  drive  something at  an  organizational  level, you  really  need  people  that  are  higher  up to  help  drive  the  usage  from  the  top  down. We  strongly  encourage  that  you administer  this  organization  wide. Ignore  JMP  usage. A  lot  of  our  companies  have email  list  group   of   strictly  their  JMP  users. But  really,  that's  often  just a  snippet  of  the  company. We  have  found  people  and  human  resources that  when  they  see  what  JMP  can  do, like  I've  got  to  have  that. We  don't  always  think  of  an  analytical software  tool  as  being  something that  maybe  HR  would  want   or  would  gain  benefit  from,  but  they  do. By  understanding   where  your  entire  organization  is, you're  going  to  be  able  to  make  better  decisions, you're  going  to  be  able  to  make better  support  calls, and  you're  going  to  be  able  to  move  your  entire  organization versus  just  moving a  single  department  or  a  single  job  title. Lastly,  I'm  sorry,  second  to  last, lean  on  your  JMP  support  team. They  administer  these  surveys  frequently. They  know  how  to  interpret  the  results. They  know  how  to  help  you. Even  if  you  don't  want  to  use  them to  provide  that  support, if  you  have  an  internal  education, maybe  you  want  to  build   internal  education  up as  a  result  of  the  survey, but  lean  on  them  to  help  guide  you  because  this  is  what  we  do. We  support  companies  like  yours to  help  them  build  analytical  excellence. Lastly,  and  I've  said  this  a  few  times, administer  the  survey  annually, so  that  you  can  actually  track  your  progress. It's  one  thing  to  have  an  analytical  snapshot. It's  a  lot  better  to  have an  analytical  time  series. Collecting  that  data  annually is  going  to  really  help  you gage  is  this  successful? Do  we  need  to  change  things? Is  what  we're  doing  working. Maybe  you  go  to  your  support  team and  they  lead  the  training and  you  don't  see  growth. So  you  turn  to  your  internal  education  team, or  maybe  the  flip  side  is  true. We  just  want  you  to  be  better, we  want  to  be  collaborators  with  you. We  want  to  support  you in  whatever  you  think  is  the  best  way to  execute  that  plan. I  will  close  with  a  call  to  action   [inaudible 00:24:26]   survey. It  can  really  help  an  organization. I've  seen  it  help  an  organization. I've  been  able  to  administer  close to  a  dozen  of  these, and  all  of  them  have  resulted  in... I'm  going  to  close with  a  call  to  action. Connect  with  your  JMP  support  team and  complete  the   JMP Analytical Workflow Survey. Being  able  to  understand   where  you  are  as  an  organization versus  where  you  want  to  be   is  incredibly  valuable. At  the  end  of  the  day, our  goal  is  to  help  you   democratize  analytics, help  you  have  one  version  of  the  truth, help  you  make   analytically- driven  decisions, and  from  that,  gain  efficiency, quicker  discovery, and  save  money,  save  time. This  survey  is   an  incredibly  powerful  tool to  help  you  achieve  those  ends. We  have  the  expertise  to  help  you  not  only  administer  the  survey, but  interpret  the  survey  and  create  a  plan to  then  make  decisions   for  training  support, and  help  drive  you  from  where  you  are  to  where  you  want  to  be. Thank  you  very  much.
Flexible visualizations allow for easy exploration of clinical trial data. JMP Clinical uses many different options in Graph Builder and Tabulate to allow for dynamic views and publication-ready output. From stacked bar charts to display adverse events by severity to line graphs to show change from baseline over time to swimmer plots graphing disease response in oncology studies, JMP Clinical uses many different Graph Builder elements. Paired with most graphs is Tabulate, which displays the statistics reflected in the graph. New Tabulate features in JMP 17 such as stack, pack, and unique ID, help make these tables ready for publication. This pairing of Graph Builder and Tabulate gives users a quick way to visualize the data through a graph and then dig further into the numbers.     Hi,  I'm  Sam  Gardner, and  I'm  presenting  today   with  Rebecca  Lyzinski. We're  going  to  be  talking  about how  we  use  Graph  Builder   and  Tabulate  in  applications for  visualizing  data  from  clinical  trials. We  have  a  product  called  JMP  Clinical, and  it's  a  focused  and  specialized  product for  clinical  trial  data  review. We  give  users  straight  out  of  the  box functionality  to  do   a  thorough  review of  clinical  trials  at  the  study  site and  subject  level. It  is  a  product  that's  used  across the  pharmaceutical  industry and  at  several  regulatory  agencies   across  the  world. It's  built  on  top  of  JMP  Pro, utilizing  JMP  scripting  language, and  it  provides  a  user  interface   for  data  management,  configuration, and  standardized  reports. We  utilize  data  standards   as  part  of  JMP  Clinical. We  utilize  the  CDISC study  data  tabulation  model and  analysis  data  model   as  required  input  formats for  the  data   that  you're  going  to  visualize. This  allows  us  then   to  develop  standardized  tools. In  general, if  you  want  to  develop  standardized  tools, having  a  data  standard  to  follow   for  the  input  data really  enables  you  to  do  that. We  use  Graph  Builder  and  Tabulate across  almost  all  of  our  reports that  we  have  in  JMP  Clinical. We  want  to  show  you   how  we  create  the  graphs that  we  use  in  JMP  Clinical so  you  can  learn  more  about   these  important  platforms, Tabulate  and  Graph  Builder, and  how  you  could  use  them   for  your  analysis  work and  if  you  wanted  to  make   standardized  reports  yourself. We're  going  to  show  you several  of  the  analysis  reports   that  we  have  in  JMP  Clinical: our  adverse  events  distribution   that  utilizes  Graph  Builder  and  Tabulate, our  adverse  events  risk  report, which  uses  a  multipanel  Graph  Builder that  also  utilizes  virtual  joins   between  data  tables, and  it  applies  some  data  filters. We  also  are  going  to  show  you   our  findings  time  trend  result, which  also  uses   a  complicated  graph  builder, a  column  switching  and  virtual  joins. Then  we'll  finish  up by  showing  you  an  adva nced-level collection  of  graph  builders that  we  use  for  our  patient  profiles. One  of  our  most  popular  reports   is  the  Adverse  Event  Distribution  Report. This  report  shows  a  bar  chart  at  the  top and  a  tabulate  underneath  it. The  bar  chart  has  dictionary- derived term  on  the  X- axis, which  is  a  way  of  grouping  adverse  events, and  planned  treatment  group  on  the  Y- axis. Each  bar  represents   a  count  of  the  adverse  events for  a  given   dictionary-derived term. Underneath  we  have  a  tabulate that  is  also  showing   dictionary-derived term, but  it  also  groups  the  adverse  events by  body  system  or  organ  class. We  have  two  different  grouping  variables. The  columns  represent   one  column  for  each  treatment  group, one  for  Nicardipine  in  the  study, and  another  for  placebo, as  well  as  a  total  column. The  ends  represent  counts and  the  percents  are  the  percents  of  subjects  with  an  adverse  event. Down  at  the  bottom,  there's  an  all  row, which  represents  any  subject  that  had an  adverse  event  in  the  given  column. For  example,   we  have  882  subjects  in  this  study that  had  at  least  one  adverse  event. One  of  the  options   that  JMP  Clinical  includes is  a  way  to  stack  the  bar  chart in  the  table. For  example,   you  can  stack  by  severity  or  intensity. This  creates  a  stacked  bar  chart with  green  bars  representing  mild  events, yellow  for  moderate,  and  red  for  severe. It  also  splits  out  the  table   by  mild,  moderate,  and  severe  events. In  order  to  recreate  these, we  can  pop  out  the  data  table. For  the  graph,   we  go  to  Graph  and  Graph  Builder. We  select  our  planned  treatment  group and  put  that  on  the  group  Y. Make this  a  little  bigger. There  are  a  lot  of  columns   in  this  data  table, so  we're  going  to  search for   dictionary-derived term and  we'll  grab  that   and  place  it  on  the  X- axis. This  is  pretty  close   to  what's  on  the  report, except  for  the  ordering. If  you  right-click  on  the   X-axis   and  go  to,  Order  By, we  can  select  Count  Descending, and  now  it's  ordered  the  same  way   that  it  was  in  the  report. If  we  want  to  also  add  that  stacking, we  can  search  for  severity and  drag  that  variable   over  to  the  overlay. That  gets  us  close  to  what  we  saw, but  the  bars  are  stacked   side-by-side. In  order  to  change  that, we  can  go  to  the  control  panel and  instead  of   side-by-side, select  Stacked  for  bar  style. Now  we're  back  to  our  original  graph. Next,  if  we  want  to  recreate  the  tabulate, we  go  to  Analyze  Tabulate. Again,  select  Plan  Treatment  Group for  our  columns. This  time  we're  going  to  select dictionary-derived term and  put  that  in  the  first  grouping  column. But  we  also  want  to  select   Body  System  or  Organ  Class and  put  that   as  a  grouping  column  as  well. By  default,   they  show  up  as  two  separate  columns. We're  going  to  select  both  columns, right-click   and  go  to  Stack  Group  and  Columns. That  allows  them  to  both be  concatenated  into  one  column. To  add  to  our  table, we  are  going  to  drag  the  little  N  over underneath  our  treatment  group so  that  we  get  our  counts   and  the  percent  next  to  the  N. By  default,   these  are  two  separate  columns. In  order  to  create  one  column, we're  going  to  first  drag  the  sum  up  above so  that  Tabulate  knows  both  the  N and  percent  are  supposed  to  be  sums. We  again  select  both  columns   and  right-click. Under  Pack  Columns,  go  to  Pack. Now  we  see  the  account and  the  percent  in  one  column. The  percent  has  no  formatting   on  the  number  of  decimal  places. If  we  go  to  Change  Format   and  go  to  the  percent, we  can  change  from  best to  one  decimal  place. Now  we  have  a  better  formatted  table. One  other  thing  that's  missing is  we  have  a  group  N that  we're  going  to  stick  underneath our  treatment  groups. That  way  we  can  see  how  many  subjects were  in  each  treatment  group. You'll  also  notice  by  default that  these  rows  for  the  body  system   organ  class  are  missing  values. In  order  to  fill  those  in, we're  going  to  select   Add  Aggregate  Statistics. Now  all  our  values  have  been  filled  in. Now  we  just  need  to  do  a  little  cleanup. Add  Aggregate  Statistics adds  all  columns  for  each  of  our  different  columns. The  first  step   is  we're  going  to  delete  some  of  these just  to  clean  it  up and  we'll  get  rid  of  this  all  at  the  end. In  order  to  get  the  columns  back  together, we  again  have  to  drag  the  sum  up  above and  the  N  and  percent   are  now  back  in  one  column. We  can  also  right-click  on  the  group N and  remove  column  label and  right-click  on  the  sum and  we're  just  going  to  change   the  item  label  to  be  missing so  that  it's  a  little  cleaner. One  final  thing. You'll  notice  that  these  ends are  a  little  bit  bigger   than  what  was  on  the  report. The  reason  for  that  is  automatically the  top  grouping  variable   is  a  sum  of  all  of  the  rows  underneath  it. This  is  not  what  we  want  for  this  report because  these  categories are  not  mutually  exclusive. A  subject  could  have  both a  vasoconstriction  event and  a  hypertension  event. But  we  don't  want  to  count  them  twice   under  vascular  disorders. We  only  want  to  count  them  once. In  order  to  do  that, we're  going  to  grab   the  unique  subject  identifier and  drag  it  into  the  ID. Now  you'll  see  those  numbers reduced  quite  a  bit. Now  they  represent   a  unique  subject  count for  that  type  of  event. One  other  thing  to  point  out   on  this  report is  that  we  do  on  this  data  table have  a  reference   by  unique  subject  identifier  to  TADSL. What  that  means  is  that  the  table  is  being  virtually  linked to  this  TADSL  table, which  allows  us  to  filter  on   demographic  variables that  are  found  in  the  ADSL  table. Now  I'm  going  to  hand  it  over  to  Sam so  he  can  talk  about  the  risk  reports and  how  we  use  virtual  joins   to  filter  those  tables. Thanks,  Rebecca. That  was  a  really  nice  description   of  that  report and  how  it  was  put  together. I'm  going  to  show  you  another  report that  Rebecca   is  responsible  for  developing. This  one  is  our  adverse  event  risk  report. What  this  report  will  do is  it  will  go  through   all  of  the  reported  adverse  events, and  it  will  count   the  number  of  events  by  treatment  group and  calculate  a  percent  or  a  rate. It  displays  on  the  graph  two  things. It  displays  two  different  graphs. On  the  left-hand  side, it  shows  the  rates for  each  treatment  group. You  can  see  that  the  color  key  there shows  which  treatment  group is  being  displayed  there. For  pruritus, we've  got  a  rate  of  pruritus   in  the  placebo  group  of  9.3%. We  have  a  rate  of  pruritus in  the  low-dose  group  of  27.381%, and  in  the  high-dose  group  of  30.92%. Usually,   what  we're  interested  in  in  this  case is  the  difference  in  the  rates   compared  to  the  placebo  group. What's  shown   on  the  right-hand  side  of  the  graph is  the  calculation  of  the  differences for  each  treatment  group and  the  placebo  group. There  are  three  treatment  groups, so  there  are  two  computed  differences: the  high  dose  and  the  low  dose   compared  to  placebo. We  also  display   a  confidence  interval  around  those  points. We  make  this  report  filterable. One  thing  we  can  do  is, let's  say  we  only  want  to  look  at adverse  events   that  occur  at  a  certain  rate, maybe  higher  than  5%  of  the  time, and  we  can  narrow  that  down quite  a  bit  and  reduce  it. We  also  have  a  corresponding   tabulate  table that's  very  similar  to  the  tabulate that  Rebecca  showed  earlier. What  I  want  to  do  is  show  you  how to  recreate  this  Graph  Builder  graph because  there's  lots  of  interesting  things you  can  do  in  Graph  Builder that  you  might  not  be  aware  of. The  first  thing  I'm  going  to  do  is I'm  going  to  open  up   the  data  tables  that  we  use, and  we  actually  use three  different  data  tables. Let  me  clear  this report  filter  selection  first. There  we  go. Now  I'm  going  to  open  up the  data  tables  that  we  use. The  first  one  is  the  data   that's u sed  to  actually  make  the  plot. We  call  that  the  risk  plot  table. The  second  table  is  the  term  ID  table, and  it's  just  the  list  of  unique  terms that  we  want  to  display  in  the  graph on  the  Y-axis. Then  the  last  table  is  a  filter  table. It's  what  we  use   to  actually  filter  the  report. I've  got  all  those  three  open. Let's  start  out  by  making  the  graph of  the  risks  and  the  risk  differences. I've  got  a  Graph  Builder. We're  going  to  take   the  dictionary-derived  term, drag  that  onto  the  Y-axis. Make  this  bigger. Then  we're  going  to  take  the  calculation, in  this  case, the  variable  we  just  labeled  percent, and  we're  going  to  put  that onto  the  X-axis. Then  what  we  want  to  do  is  we  want to  overlay  by  the  active  treatment. Just  want  to  put  that  on  there. The  next  thing  we  want  to  do is  we  want  to  introduce a  little  bit  of  customization of  the  way  the  points  are  presented. You  can  see  the  points, there's  a  bar  here   where  you  can  control  the  points and  the  way  they're  drawn. One  thing  is  we  want  to  make  sure that  they  have  some  jitter  in  them. In  this  case,  the  jitter  is  set  to  auto, but  we  can  set  that   type  of  jitter  that  we  want. I'm  going  to  change  it  to  center  grid. Another  thing  that  we  want  to  do is  we  want  to  change  the  way the  points  are  drawn. Under  the  Graph  menu that  you  can  access through  right-clicking  in  the  graph, I'm  going  to  right-click,  Select  Graph, and  go  to  Marker  Drawing  mode, and  I'm  going  to  choose  outlined. That  changes  those  points   to  have  an  outline,  a  more  sharp  outline, and  it  really  makes  them  stand  out   much  more  significantly. That's  this  plot  of  the  percent   or  the  rates of  each  of  those  adverse  events. One  thing  I'm  also  going  to  do is  I'm  just  going  to  add  a  grid  line  o n  the  Y-axis, which  you  can  just  look  across  then so  you  can  see  where the  individual  points  line  up and  match  that  to  the  grouping  level for  the  dictionary-d erived  term. That's  for  the  rates. The  next  thing  I  want  to  do   is  I  want  to  look  at  the  differences. We  have  a  risk  difference  variable and  I'm  going  to  drag  that and  I'm  going  to  drag  that  down   onto  the  X-axis  but  to  the  right. What  it  actually  does   is  it  creates  another  graph. Now  I  have  a  graph of  the  risk  differences. I  also  want  to  put an  interval  around  those. What  we  do  is  we  have  two  variables that  have  the  upper  and  lower  limit  for  the  intervals. I'm  going  to  select  those   and  drag  them  to  the  interval  zone. You   notice  what  happens  when  I  do  that. It  actually  draws  intervals  around   all  the  points  on  all  the  graphs that  I've  added  in  here. I  don't  want  that   on  the  points  for  the  percent  variable. For  the  interval  style   or  the  error  interval, I'm  going  to  set  that  to  be  none. It  turns  those  off. Essentially  what  we've  done is  we've  recreated  the  graph. This  is  a  little  busy because  there's  lots  of  categories  here. We  could  just  filter  this  table. We  could  go  here   and  we  could  select  the  local  data  filter. Let's  say  we  wanted  to  just filter  on  risk  difference. We  could  say, let's  only  look  when  the  risk  difference  may be  greater  than  3%  in  absolute  value. What  I'm  going  to  do  is   I'm  going  to  set  a  limit that  it's  between  three  and  three and  then  invert  that. Then  that  reduces  the  number. Maybe  I  want  that   to  be  a  little  bit  more  restrictive, so  make  that  5, - 5 . There  we  go. That  reduces  the  number  that  are  selected. But  the  thing  is,  it's kind of  weird. It  only  filtered  the  values   for  the  risk  difference and  it  didn't  filter   the  values  for  the  percent. It  didn't  filter  everything the  way  we  wanted  it  to. We  needed  to  do  something a  little  bit  different. I'm  going  to  clear  that. What  I'm  going  to  do  is I'm  going  to  link  this  table,   the  left-hand  table, which  is  used  to  make  the  graph, to  the  table  of  all  of  the  unique  levels that  are  displayed  on  the  Y-axis, all  of  the   dictionary-derived terms. The  way  we  do  that  first  is for  the   dictionary-derived term, we  have  to  make  sure that  the  link  ID  property  is  set. What  that  tells  JMP  is  that   if  I  link  to  this  table, this  is  the  variable  that  I'm  going  to  use for  matching  to  the  table   that  I'm  linking  it  to. I  set  that  to  be  the  link  ID. Then  over  here   for  the  dictionary-derived  term, what  I  need  to  do  is   I  need  to  choose  the  link  reference, and  I'm  going  to  choose  the  actual  table,  this  actual  table  as  the  link  reference. Now  what's  happened  is these  two  tables  are  linked. The  last  thing  I  need  to  do   to  specify  this  virtual  linking  properly is  to  right-click,  select  Column  Info, and  under  the  link  reference  property, choose  the  row  states   that  I  want  to  broadcast from  the  middle  table   to  this  linked  table. I  want  to  accept  the  row  states from  the  reference  table. I'm  going  to  choose  the  selected, excluded,  and  hidden  row  states and  accept  those  from  this  center  table. Then  when  I  do  that, when  I  select  rows  in  this  data  table, I  end  up  selecting  rows in  the  corresponding  link  table. If  I  hide  and  exclude  those  rows, they  become  hidden  and  excluded   in  this  plot because  I've  hidden  and  excluded several  of  the  rows  in  this  linked  table. That's  one  level  of  linking. But  what  we  really  want  to  do is  we  want  to  filter  this  graph based  on  corresponding  measures  or  metrics that  are  calculated  in  this  table. These  are  based  on   the  differences  and  the  risks and  other  difference  measures between  the  placebo  group   and  the  treatment  group. What  I'm  needing  to  do  to  make  that  happen is  I  need  to  link  this  right-hand  table   to  the  center  table. I'm  going  to  do  that  as  well. I'm  going  to  go  here to  the  dictionary-d erived  term for  the  right-hand  table, and  I'm  going  to  choose  the  link  reference and  pick  that  table   that  I  want  to  link  to, which  is  this  one  displayed  here. Open  up  the  column  properties  here, go  to  the  link  reference  property. Now  what  I  want  to  do  is  I  want  to  dispatch  row  states, the  selected,  excluded, and  hidden  row  states, from  this  table  to  the  linked  table. What  I've  done  now   is  I've  linked  two  tables  together. I  have  the  table  on  the  left   is  linked  to  the  center  table, and  there's  a  many-to-one  relationship. There's  actually a  three-to-one  relationship. For  every  row  in  the  center  table, there  are  three  rows in  the  left-hand  table. I've  also  linked  the  table  on  the  left, which  has  a  two-to-one  relationship where  every  row  in  the  center  table, there  are  two  rows   in  the  right-hand  table. What  I  want  to  do  is   I  want  to  push  row  states from  the  right-hand  table   to  the  far  left-hand  table, and  having  this  linking  between  the  two and  pushing  row  states from  the  right-hand  table to  the  center  table, and  then  those  row  states  being  accepted   by  the  left-hand  table allows  me  to  do  that. Now  what  I  can  do   is  I  can  filter  on  this  table, and  I'll  just  open  up  the  data  filter, and  I'm  going  to  choose the  absolute  risk  difference. I'm  going  to  say   if  the  absolute  risk  difference  is  large, that those  are  the  points   that  I  want  to  display. Now  that  I  have  everything  linked  up, I  can  use  the  table  that's  on  the left-hand  side  to  filter  the  graph, which  is  based  on  the  data on  the  right-hand  side. If  I  go  and  open  up  a  local  data  filter, I'm  going  to  turn  on   show  and  include  as  the  options. I'm  going  to  choose   absolute  risk  difference because  that's  a  good  filtering  criteria just  to  find  things  that  have  a  large  absolute  risk  difference. Then  now  notice  it's  filtering  the  table. I'm  filtering  the  left-hand  table, pushing  it  through  to  the  middle  table, which  is  pushing  those  row  states   through  to  the  right-hand  table. Some  clever  virtual  joins. One  thing  we  do   in  addition  to  JMP  Clinical is  we  use  a  callback  function   when  we  use  our  filters. Actually,  there's  a  way  with  JMP  scripting you  can  write  a  little  function  that  will  run every  time  you  change  a  data  filter and  it  can  do  some  additional  things to  customize  the  graph  just  a  little  bit  more. If  you  see  it  in  our  product, it  does  it  a  little  bit  differently, but  it  looks  pretty  much  the  same as  what  I'm  doing  here. Now,  Rebecca  can  show  you  something. One  of  our  other   most  commonly  used  reports is  our  findings  time  trends. When  you  launch  findings  time  trends, you'll  get  two  graphs, but  I'm  just  going  to  focus   on  the  first  graph. The  first  graph  shows   the  mean  change  from  baseline for  a  given  lab  test   over  the  course  of  the  study, shown  by  visits. The  bottom  graph  shows the  number  of  records   at  each  visit  for  that  lab  test. If  we  want  to  recreate  this  table, we  can  pop  out  our  data  table. Again,  we'll  go  to  Graph and  Graph  Builder. We're  going  to  use Billy  Rubin  as  an  example. We'll  grab  the  Billy  Rubin  record   and  put  it  on  our   Y-axis. We  want  our  graph  plotted  by  visit. We'll  grab  visit  and  put  it  on  the   X-axis. By  default,  we  end  up  with  points, but  we  want  to  switch  it  over   to  a  line  graph. Once  we  have  our  line  graph, we  also  want  to  split  it  out   by  treatment  group. Treatment  group  is  not  actually  a  variable in  our  base  data  set, but  it  is  a  variable in  the  linked  data  set  to   TADSL. We  can  go  into  our  reference  data  set and  actually  grab   that  plan  treatment  value from  the  virtually  joined  one and  drag  it  over  to  our  overlay. That  splits  out  our  lines   into  two  different  treatment  groups. We  also  want  to  add  intervals. Under  error  interval, instead  of  using  auto, we'll  change  that  to  confidence  interval. Now  we  have  a  confidence  interval   for  each  of  our  visits for  each  of  our  treatment  groups. We  also  want  to  add  the  bar  graph. We're  going  to  grab  Billy  Rubin  again and  this  time  drag  it  over   into  the  bottom c orner  of  the   Y-axis so  that  we  now  have  two  separate  panels. By  default, we  end  up  with  just  a  duplicate of  the  panel  we  already  created. We're  going  to  change  this   by  right- clicking, going  to  the  line  and  change  to  bar. Now  we  have  a  bar  graph. We  can  go  ahead  and  remove   our  confidence  intervals  from  this  one by  selecting  none  under  error  interval. We  get  pretty  close  to  our  final  results with  just  these  two  panels. But  we  also   want  to  add  points   onto  our  top  panel. By  default,  we  get  a  point   for  every  single  row  in  the  data  table. We're  going  to  just  go  over  here and  change  our  summary  statistic  to  mean. Now   we're  just  going  to  clean  up  a  little  bit. We  can  change  the  labels   by  clicking  on  the  label and  then  typing  in  a  new  one, so number  of  records. On  the  top,  we'll  do  the  same  thing and  change  it  to   mean  change  from  baseline. We  can  also  clean  up   our  legend  a  little  bit. You'll  see,  because  we  have  two  panels it  actually  show   the  treatment  group  twice, we  don't  really  want  to  see  that. We  can  go  to  the  red  triangle, go  to  Legend  Settings, and  we'll  just  unclick  a  couple  of  these so  that  we  only  see the  treatment  groups  once. We're  back  to  our  panel   that  we  created  in  the  report and  we  can  add  our  column  switcher. If  you  go  to  the  red  triangle   and  go  to  Redo  Column  Switcher, we  can  change  out  the  Billy  Rubin  record. We  want  to  change  the  lab  test that's  being  shown  on  the   Y-axis. This  dataset  has   a  bunch  of  different  lab  tests from  alkaline  phosphatase all  the  way  down   to  partial  pressure  oxygen. We're  just  going  to  select   all  those  lab  tests  and  click  OK. Now  we  have  a  column  switcher and  you  can  actually  select  each  lab  test and  you'll  see  that  the  graph  changes to  show  you  the  lab  test  that's  selected. Now  I'm  going  to  hand  it  back  to  Sam so  that  he  can  show   some  more  complicated  graphing using  patient  profiles. Thanks,  Rebecca. Just  to  wrap  up  here, I  want  to  show  you  a  more  advanced collection  of  graphs  that  we  use called  our  patient  profile. This  is  a  way  just  to  visualize  everything that  happens  to  a  subject in  a  clinical  trial   that's  been  recorded  in  the  data. We  collect  things  like   when  did  they  do  their  study  visits and  what  day  did  that  happen   and  what  study  day? What  were  the  disposition  events that  they  had,  the  exposure  they  had? Did  they  have  adverse  events and  what  was  the  duration   of  the  adverse  events? When  did  they  take medications  during  the  trial? Vital  signs  that  were  recorded or  test  results  that  were  recorded   can  also  be  displayed. We  have  a  lot  of  ability to  customize  this  and  configure  it. This  is  all  scripted, so  I  can't  really  show  you, "How  would  I  build  all  this  just  with the  graphical  user  interface  and  JMP?" You  need  to  know   a  little  bit  of  JMP  scripting. But  in  reality, each  one  of  these  graphs is  just  a  graph  builder that  have  been  put  together   on  top  of  each  other and  then  a  little  bit  of  editing   of  the  report  layer to  remove  some  of  the  controls. That's  really  what's  happened. We've  also  added  some  custom, like  the  legend  here  is  a  custom  graphic that  we  add  in  to  display, and  it's  just  an  image   that  displays  in  the  report  window. But  really,  all  of  this  is  just Graph  Builder  for  the  most  part. We  can  switch  between  patients  here and  look  at  the  profile   of  each  different  patient  in  the  trial as  we  would  want  to. I  hope  you've  been  able  to  see during  this  presentation just  some  of  the  ways   you  can  use  Graph  Builder  and  Tabulate to  summarize  your  data. More  than  just  the  standard, I  did  a  few  clicks   and  this  is  the  result  I  get. With  a  little  bit  of  extra  knowledge of  how  to  customize  the  output for  the  graphs  and  the  tables that  you  have  displayed, you  can  actually  get   some  very  nice-looking  output that  you  can  use  for  presentation  within  JMP and  even  for  reporting. Some  of  this  that  we've  used has  been  driven  by  regulatory  guidelines that  say  they  would  like  to  see   tables  and  graphs  look  this  way. We've  been  able  to  achieve  that   using  some  of  these  features like  stacking   and  packing  columns,  and  tabulate, having  multipanel  displays, doing  the  filtering  in  a  unique  way, and  using  column- switching, to  make  it  not  just  a  reporting  tool, but  an  interactive  tool  as  well. Thanks  for  your  attention, and  we  are  happy  to  hear   any  feedback  you've  got or  questions  about  the  product or  about  what  we've  shown you  in  this  presentation.
Advances in digital image analysis have created opportunities for quantitative histopathology assessments in rodent toxicology studies. Microscopic evaluation of rodent spleen is performed to assess for test article-induced immunotoxic effects but can be subject to inter- and intra-pathologist variability in characterization of differences between treatment groups and across studies.   To address this problem, an image detection algorithm was trained to quantify tissue compartments in histologic sections of rodent spleen. Our aim was to design a study to evaluate how the image detection algorithm compared to digital annotations performed by human raters for specific features of the spleen, while keeping within operational constraints (e.g., rater time and effort).   In this talk, we show how we used JMP Custom Designer, using data generated by the image algorithm as inputs, to select and allocate a test set across human raters. We used a response surface model, which is designed to select samples that fall on the boundaries and center of the input space. The resulting study design allowed us to strategically select a test set and create a balanced sampling plan for use across several pathologists from different institutions.     Hello  everyone. I'm  Caroll  Co.   I'm  a  statistician  at  DLH. Today,  I  will  talk  about   a  project  I worked on in  creating  a  time- efficient  strategy   for  selecting  a  test  set in  the  validation   of  an  image  detection  algorithm. This  work  was  done   in  collaboration with scientists from  the  National  Institute  of Environmental Health Sciences, pathologists  from  experimental   pathology laboratories, and  my  fellow  coworkers  at  DLH. In  rodent  toxicology, advances   in digital  image analysis offer  opportunities  for  quantitative   histopathology  assessments. An  example  of  where   digital  image  analysis  could  be  useful is in the evaluation of  test   article- induced immunotoxic effects in  rodent  spleens. Typically,  pathologists  use  a  microscope to  evaluate  whether  there  is   immunotoxicity in the spleen and  make  judgments  on  whether  spleens   from  animals that receive a treatment are  different  from  the  control  group. This  workflow  is  prone   to  inter-  and  intra-rater  variability in  characterizing  differences   within  the  study and  also  across  different  studies. Here's  an  example  of  a  zoomed- in   cross-section  view  of  a  rodent  spleen. Here,  I'm  just  pointing  out specific features  of  interest  to  our collaborators  that  we  want  to  capture. Our  problem. When  we  got  involved  in  this  project, our pathologist collaborators had already trained  an  algorithm that  measure  these  features  of  interest. The  question  that  was  posed  to  us  was   how  do  we  validate  this  algorithm? Validation  is  a  very  broad  term. To  narrow  our  focus, we  thought  about  what  are  some  of  the   main  questions  that  we  wanted  to  address. First,  the  algorithm  was  trained by a select few people, so  we  wanted  to  see  whether  different  pathologists from different laboratories would  agree  with  the  algorithm  output. Second,  there  were  multiple  features where  the  algorithm was  trained  to  measure. We  wanted  to  see   if  there  are  any specific features where  performance  is  better  or  worse. Then,  we  also  wanted  to  see if this algorithm  can  hold  up  to   against  a   wide  range  of  cases. If  there  are  any  blind  spots,   can  we  find  them? I'll  also  mention  that  from  here  onwards, I will  refer  to  the  image  algorithm   as  the  AI. One  solution  that  we  thought  of   was to have both humans and AI annotate  the  same  images   and  compare  the  output. This  is  how  it  would  work. A  tissue  sample  gets  scanned   so  it  turns  into  a  digital  image or a whole slide  image WSI, and  that  image  gets  fed   into  the  AI  for  processing. Then  for  humans,  they  view  the  image using  an  image software where  they  can  manually annotate  features  of  interest. One  of  the  questions  we  got  asked was how  many  images  do  we  need  to  validate? As statisticians, our  answer  is  always  as  many  as  you  can or  do  you  have  hundreds? Do you  have  thousands? But  after  talking  to  them, you realize that there are a number of   operational  constraints  to  implement  this. The  first  constraint  was  that there was a  need  for  each  image   to  be  evaluated  by  three  different  people. Having  three  people  was  useful so that  we  can  also  estimate the  variability  between  raters. It  also  turns  out  that  this  process of  annotation  is  very  time- consuming. After  talking  to  the  pathologists, the  maximum  number  of  images they were willing  to  annotate   was  about  24  per  person. Then,  lastly,  one  of  the  goal   of  this project is  to  get  buy- in  or  support from  pathologists  from  other  labs. It  meant  we  needed  participation from multiple labs and  multiple  people  from  each  lab. In  this  study,  we  got  participation from three centers and  we  had  three  pathologists   representing  each  center. In  total,  we  had  nine  pathologists recruited  in  the  study. Based  on  all  of  these  constraints, we've  determined  that  we  can  validate only  72  images. The  sampling  plan. Now  we  know  our  sample  size. How  do  we  select  the  72? Random  selection  is  okay, but  can  we  do  better? What  we  came  up  with  was since  the  cost  per  image  for  the  AI   is relatively low, we  got  the  AI  to  process a  larger  set  of  images and then from there, use  information  from  the  AI output to  better  select  our  72  samples. We  use  a  response  service  model   to  select  our  points and an RSM will select points towards  the  boundaries and  center  of  the  input  space. It's  a  model  containing  your  main  two- way interaction  effects  and  quadratic  effects. This  model  was  particularly  useful to  our  validation  problem because we wanted  to  look  for  areas  where agreement  fails  between  humans  and  AI. Generally,  bugs  tend  to  occur on  the  boundary  and  edges  of  this  space. This  type  of  model   fits  the  problem  that  we  had. Now,  we  have  a  plan  to  select  the  images. The  question  now  is  how  do  we  allocate these  72  images  across  nine  raters? We  expect  the  samples   to  have  a  wide  range  of   complexity. We  wanted  to  make  sure  that  everyone got  a  balanced  mix  of  slides. The  complexity  or  this  case  mix is determined  by  the  output that  we  got  from  the  AI. We  also  needed  to  satisfy the constraint of  making  sure  that  each  image   is  seen  by  three  different  people. Here  is  my  workflow. I  will  show  you  how  I  created   the  sampling plan that satisfied all  of  our  operational  constraints   in JMP. I  have  three  steps  in  this  workflow. First,  I'm  going  to  show  you how  we  selected  the  72  images from  a  larger  set. Second  is  how  we  replicated the  72  images  three  times, so we  have  216  runs. We needed to do this because we wanted each image to  be  seen  by  three  raters. The  last  step, I'll  show  you  how  we allocated  the 216 runs across nine raters so  that  each  person  has  exactly  24  images. In  each  of  these  steps, I  will  actually  be  using  the  DOE  platform. Now,  I'm  just  going  to  move  over  to  JMP. I  have  my  JMP  file  in  there. My   JMP journal. I  am  going  to  open  a  sample  data  set, and  I  say  sample  because  our  data   has  not  been  published  yet. For  the  purposes  of  this  demonstration, I  will  be  using  a  fake  data  set. This  data  set  has  the  same  features to   the  original  data  that  we  collected. This  is  what  the  data  looks  like. I  have  my  slide  ID,  which  is  just a  numeric  variable  going  from  1  to  100. Going  from  1  to  100  right  here. I  have  four  variables  that  I'm  looking  for. That's  been  collected  by  the  AI. Features  1,   2,   3, and  4. Three  of  them  are  continuous and  one  is  account  variable. If  you  look  at  under  Analyze,   Multivariate  Methods, Multivariate, if  we  just  show  the  scatterplot  matrix of all of the variables you'll see, all  of  them  are  uncorrelated. This  is   the  spread  of the  range  of  our  variables. Step  one. Under  DOE,  there's  a  Custom  Design, that's what I'm clicking. We  can  leave  the  responses  Y  here. Just  leave  it  alone. The  factors  here,  there's  actually   two  ways  to  do  this. One  way  is  to  click  on  this  button  here   that  says  Add  Factor. I  think  about  fifth  of  the  way and  there's  a  Covariate  selection. If  you  click  that, it'll  ask  you  which  columns  of covariates you  want  to  include  in  the  design. In  my  case,  it's  these  four  features. I'm  just   selecting  them and  you  click  Okay. JMP  will  automatically  populate the min and max for  each  of  these  variables. You'll  see  here  that   they're  all  treated  as  covariates. That  looks  good. I'll  just  quickly  close  this and  then  show  you  another  way   of  doing  the  same  thing. Again,  in  DOE  Custom  Design. What  I  showed  you  was  adding  it   using  this   way  here. Another  way  to  do  it   is  to  just  use  this  button  right  here which says Select Covariate  Factors and  it'll  give  you  the  exact  same  thing. Again,  I'm  picking  the  factors   that  I'm  interested  in and it automatically  populated this  Factors  window. I'm  just  scrolling  down  to  the  bottom   and  click  Continue. This  is  where  we  can put  in  the  model  that  we  want. For  us,  we  wanted  an  RSM or  response  service  model. There  is  again  a  shortcut button  right  here. I'm  just  going  to  click  that. What  that  did  is  it  added... We  already  have  our  main  factors  in  there, but  it  added  our  quadratics and  two- way  interaction  effects. Then,  lastly,  at  the  bottom in  here   where  it  says  number  of  runs, we  don't  want  100  because  we  actually just  want  to  select  72  out  of  100. I'm  just  going  to  change  this  to  72. This  all  looks  good  to  me. I'll  click  Make  Design. Let's  just  give  JMP  a  few  seconds   to  create  the  design  for  us. Here  we  go. This  is  the  design  that  it  created  for  us. As  you  can  see,  if  you  scroll  down, it  only  gave  us  72  rows, which  is  what  we  had  asked  for. I'm  just  going  to  say, turn  this  into  a  data  table. I'm  just  going  to  hit  Make  Table right  here  on  the  bottom  left. I'm  going  to  close  this  for  now. Now,  before  I  show  you what  this  design  looks  like, I  actually  want  to  go  back   to  the  original  table, just  so  you  could  see... We  can  compare  what  happened to the observations that were picked versus  observations  that  were  not  picked. If  you  go  back  to  the  original  data  table, the  ones  that  were  chosen, that  72  are  actually  highlighted. What  I'd  like  to  do  at  this  point is  I'd  like  to  create  a  new  column that  would  identify  which  rows   were  selected  and  which  ones  weren't. To  do  that,  you  go  under  Rows. Under  Row  Selection, there  is  the  last  option  in  here says  Name  Selection  in  Column. What  it'll  do  is it'll  label   the  currently selected rows and  save  whatever  values   you  assign  for  that  column. I'm  going  to  click  that. The  column  name. Y ou  get  to  create   a  name for that column. T he  rows  that  are  highlighted,   I'm  going to give them a value of 1. The  ones  that  were  not  selected, I  want  to  give  them  a  value  of  zero. I'm  going  to  press O kay, and  that  created  this  column  right  here. All  of  my  wants,  there's  72... There's  72  rows  for  the  ones   that  were selected and  then  28  for  the  ones that  were  not  selected. I  want  to  go  back  to  my  scatterplot  matrix to  see  the responses or  to  see  which  observations  were  picked   and  which  ones  weren't. But  before  we  do  that, I do want to color-code them so  that  when  you  look  at  the  graph, I  can  immediately  spot  which  ones were  picked  and  which  ones  weren't. A quick  way  of  doing  that  is  to  do  Rows, Color  or  Mark  by  Column. Then,  here,  I  want  to  use   this  column called Selected so  that  all  the  zeros  will  be  identified by an orange circle and  all  the  ones  will  be  identified   by  a  blue  circle. We  use  a  marker  as  well. That  might  make  it  easier to  identify  the  observations. I'll  just  click O kay. Now,  all  of  my  rows   are  marked   by  the  plus  or  the  circles. If I go back to my... If  I  go  back  to  my  scatterplot  matrix, again,  it's  under  Multivariate  Methods,  Multivariate,  and  I  just  hit  Recall. I'm  doing  the  same  thing as  I  did  initially. Let  me  click Okay. I'm  just  going  to  make  this   a  little  bit  bigger. You'll   now  see  that  the  blue  pluses are  the  observations  that  were  selected, and the  orange  circles   are  the  ones  that  weren't. The  ones  that  were  selected tend to  occur  more  on  the  boundary and  edges  of  our  space. This  is  exactly  what  we  wanted. This   looks  great  to  me. Let's close this. Just  to   convince  yourself that  the   model  is  doing what  it's  supposed  to  be  doing, I  actually  did  this  same  setup, but  this  time,  instead  of  picking  72, I'm  only  picking  24. Let's  make  it  a  little  bit  more  extreme. Just  to   show  you  an  example   of  what  that  looks  like. Again,  now  my  selection. Now,  there's  only  24  rows that  are  selected  in  here. Let's  do  the  same  thing. Making  this  a  little  bit  bigger. Now,  you  see  fewer  blue  pluses because  there  should  only  be  24. But  then  you'll  see  how   those  observations are getting picked versus  the  ones  that  were  not  picked. They  do  tend  to  occur  more   on  the outer boundaries, but  still  some  in  the  center. Let's  go  back  to  our  original  problem. We'll  just  close  this. This  was  our  original  data  table. The one that was created  by  the  design  is  this  one. What  do  we  have  here? We're  still  keeping  the  same  variables   that  we  asked  JMP to  include. We  have  our  features  1  2,   3, and  4. Why  our  response  is  still  missing? We  now  have  this  new  column that  says  Covariate  Row  Index. Basically,  this  just   links  you  back   to  your  original  table. If  this  says  88,  it  would  be  row  88 on  your  original  table  that  got  captured. This  one. This  row  here  should  be  the  same   as  this  row  right  here. Before  we  move,  I'm  actually  just going  to  rename  this  to  my  slide  ID. In  my  case,  the  slide  ID, which  is  just  a  number  from  1  to  100, it's  actually  just  the  same   as  the  Covariate Row  Index. That  was  all  for  step  one. Step  two  is  now  replicating  the  72  images. How  do  we  do  that? A  quick  way  of  doing  that  is  to  go  to  DOE. The  second  selection  is  augment  design. Then  this  window  will  pop  up  asking  you   what  are  your  responses  and  factors? I  know  why  we  don't  really   have a response, but  I  think  you  still  need  to  put  in   something  in  here. That's  okay. You  just  put  it  in  there   even  though  it's  all  missing  values. Then,  for  the  factors,  we're  going   to  select  our  features  1  to  4. This  time,  we  would  also need  to  select  our  Slide  ID. Click  Okay. This  new  window  pops  up with  all  of  our  factors . Again, JMP  auto- populates   the  ranges  of  all  of  our  variables. On  the  bottom  here,   under  Augmentation  Choices, there's  a  replicate  button, and  that's  exactly   what  we  need. Click  that  and  then  they'll  ask  you, how  many  number  of  times   do  you  want  to  perform  each  run? The  default  is  two, but  we  actually  want  it  to  be  three because  we  needed  each  image   to  show  up  three  times  in  our  design. Then  we  click  Okay. Now,  what  you'll  see  is  that   in  our design, instead  of  just  72, we  now  have  72  times  three. We'll  just  scroll   all  the  way  to  the  bottom. We  now  have  216  rows. I'll  just  click  Make  Table to  turn  that  into  a  data  table. We'll  close  this. A  few  more  things  that  I  want  to  check. A  couple  more  things  I  want  to  check before  we  move  on  to  step  three. That  is,   every  time  you're  doing   these  steps, you  want  to  make  sure  that  it's  actually doing  what  you  think  it  should  be  doing. In  this  case, I  wanted  to  check  that  each  slide  ID actually  occurs  three  times. We're  going  to  use  tabulate  to  do  that. Tabulate, S lide  ID,  and  I  should  have a  count  of  three  for  each  ID. That's  what  we  have. That  looks  great. That's  it  for  step  two. Let's  move  on  to  the  last  step, which  is  the  most  exciting  step. Now,  we  have  216  rows and  we  have  the  slide  IDs. Now,  we  want  to  distribute  this to  nine  different  people. How  would  we  do  that  in  a  way   that  each  person  gets   a  balanced  mix? A  cool  way  to  do  it  is  to use  DOE  again. DOE, click  on  DOE Custom  Design. Like  what  we  did  before, we're  still  going  to  use   our  covariate  as  factors. Just  select  features  1  to  4   and  our  slide  ID. JMP   will  auto- populate   the  mins  and  max. Now,  all  of  the  row  here  is they're  all  listed  as  covariate. But  at  this  point,  I  actually want  to  add  two  more  factors. One  factor  is  a  categorical  factor with three levels, and  that  is  the  center  or  the  laboratory because you  have  nine  pathologists, but  they're  all  coming   from  three  different  centers. I'm  just  going  to  rename  them  A,  B  and  C. Then  I  want  to  do... In each center, we  also  have   three  different  people  participating. I  want  to  add  another  categorical  factor   again  with  three  levels. This  is  going  to  be  our  rater. These  are  like  the  people. Let's  name  them  1,   2, and  3. Basically,  what  I'm  saying  is  that Center  A  will  have  Raters  1, 2, and 3, Center  B  has  Raters  1,  2, and   3, and  so  on. In  total,  we  have  nine  different  people, nine  different  combinations of  center  and  rater  in  here. I'm  going  to  minimize  this. This  actually  just  shows  the  data  table where  we  pulled  our  covariates  from. Then  hit  Continue  here. Now,  we  get  to  tell  JMP what  kind  of  model  do  we  want. There's  a  couple  of  things  in  here. First,  I  actually  wanted  to  add  in. JMP  will  automatically   put  in  your  main  effects. These  are  all  the  factors that  are  in  my  model. I  wanted  to  put  in  an  interaction  term between  center  and  rater, and  that's  because I  wanted  to  make  sure  that all combinations  of  center  and  rater   appear  in  our  design. That   guarantees  that. The  other  thing  to  know  is  that we actually  don't  have  enough runs  to  estimate  slide  ID. Remember,  slide  ID  goes  from... There's  72  distinct  slide  IDs  in  here, but  we  actually  don't  want  an  effect,  a  slide  ID  effect. We  just  want JMP  to   take  that  into  account when  it's  constructing  the  design. We  don't  really  want  to  estimate  slide  ID. Under Estima bility  here, you  can  change... The  estimability  for  slide  ID, from Necessary,  you  can  change  it  to  If  Possible. Then,  lastly,  the  number  of  runs. I  think  t he  number  of  runs they  calculated  for  me  was  18. But  we  actually  wanted   to  use  up  all  of  our  runs because  we  have  216  runs and  now  we're  just  looking  to  see... We wanted  to  get   JMP to  tell  us how  do  we  allocate  these  216. Make  Design. It  just  might  take  a  little  bit  longer. I'll  go  into  like, how do we check the design and  then  talk  a  little  bit   about  the  run  order. This  is  what  the  design  looks  like. We  have  our  original  features,   features  1  to  4  in  here,  the  slide  ID, and now it added  center  and  rater assigned   for  each  slide  ID. What  this  is  saying  is  that  rater   B1 would have to annotate slide number 27, A1  will  annotate  slide  96,  and  so  on. There  should  be  216  runs  in  here. That  looks  okay. The  last  part  here   under  Data  Table  Options, there  is  a  check mark   for  Include Run Order Column. I'm  going  to  click  yes   because  in  our case, for  annotation,  if  you  expect  there   to  be  some  type  of  time  effect in  whatever  process  that  you're  doing. In  our  case,  we  were  maybe   worried  about will  there  be  a  learning  curve, will  there  be  a  fatigue  effect? We  want  to  make  sure  that   not everyone is starting with a slide ID that's  like  the lower-numbered slide ID and  ending  with   the  higher- numbered  slide  IDs. I'm  just  going  to  click Make  Table. I'm  going  to  close  this  window  for  now. This  is  what  now  our  design  looks  like. Before  we  do  our  checks, I'm  just  going  to  create  a  new  column where  I  can  concatenate the  center  and  rater. I'm  just  highlighting   these  two  columns, right-click. Under New F ormula  Character, I  just  want  to  concatenate  them   with   a  comma. This  is  going  to  be our  Center, Rater  variable. There  you  go. Three  things  that  we're  checking  here. First  is  let's  do  tabulate. We  want  to  make  sure  that  each  person   has  exactly  24  images. That's  my  Center, R ater, and there's 24 here. That's  great. The  other  thing  that  we  want  to  check  for is  that  there  are  no  repeats. We  don't  want... For  example,  we  don't  want  slide  ID  1 being  assigned  to  the  same  person  twice because  that  would  not  be  fun   for  that  person. If  I  do  a  crosstab  of  slide  ID  by Center, Rater, I  should  see  just  a  column  of  1s. That  means  each  slide  ID  was  assigned   to  three  different  people. I'm  just  scrolling  through  here and   looking  at  this  table  to  see that  there  are  no   2s or 3s  in  here. That  looks  great. Then,  lastly,  I  do  want  to  check, how  does  the  case  mix  look  like across  these  nine  raters? One  way  of  thought  of   at  least checking that visually is  to  just  do  a  parallel  plot   using  Graph  Builder. I'm going to extend  this. I'm  going  to  highlight   features  1, 2, 3, and 4 because  those  are  my  original  variables. Then,  my  center  and  rater, I  am  dragging  it  to  here  on  my  x- axis. I'm  going  to  hit  this  parallel  plot  option right  here  on  the  top  right  icon. I  get  that. A ctually,  you  don't  want Center, Rater  in  there. I'm  just  going  to  turn  that  off. Maybe  what  I  want  is  Center, Rat er and  it's   own  panel. What  this  shows  us  is  we  have   the  assignments  from  the  nine  people, so  A1  all  the  way  to  C3. This  is  the  case  mix  of  the  images that  they  got  assigned  to. A gain,  we're  just  doing  this  visually. What  I'm  looking  for  is that  there are no... When  you  look  at  them   in total   that  they  all  look  about the same, that  they're  somewhat  blended   and  there's no clumping that's happening and  they  look  to  be  okay. Another  way  to  do  it  is  to, I  guess  you  can  overlay. It  gets  a  little  bit  hard  because   I  do  have  nine  different  colors  in  here. But  again,  you're  just  looking  for... You  don't  want  there  to  be  any  patterns  in  here. You  don't  want  there  to  be   clumps  of  green up here or down here or  wherever  in  this  space. You  could  also  check  it  by  center. I guess  my  colors  are  not  the  best. You  have  yellow,  blues,  and  greens. They  all  look  to  be  well- mixed. Then,  I  did  mention  about  the  run  order. What  it  is,  is  that   you  just  want to make sure that  when  you're  telling  people   how  to  do  their  annotation, you  want  to  make  sure  that   that  is  also  randomized so that if there is a time effect, that's  taken  into  account  already. What  I'm  plotting  here  is  the  slide  ID. A gain,  this  is  sequential. It goes from 1 to 100. Then  this  is  the  run  order, the  sequence  of  when  the  pathologist  would   rate  these  images. This  looks  random. We  could  also  plot  it  by  center  and  rater. This  is  for  each  individual  person. They  look  good. I'm just  going  to  close  that. Now,  I'll  just  go  back  to my PowerPoint slide and  just  end  with  some  conclusions  here. We  can  use  DOE  in  selecting   samples  or  test  cases where  you  have  prior information. If  you  have  data  or  covariates   that  you can use to inform the selection, why  not  use  them? A response  surface  model  is advantageous if  you're  interested in  the  boundary  or  edge  cases. Second  point  is  that  you  can  use augment  DOE replication if  you  have  a  situation  where  you  need  multiple raters per sample. The  job  or  the  design   already  gives you an opportunity to  factor  run  order  in  your  plan. That  could  be  really  useful   if  you expect there to be a time effect such  as  a  learning  curve  or  fatigue. I have  a  couple  of  links  here  to  a  blog that  talk  more  or  discuss  more  about  what is  a  covariate  in  design  of  experiments. Please  check  that  out  if  you  want to  learn  more  about  this  technique. Then,  lastly,  I  just  wanted  to  say thank  you  to  all  my  collaborators who  helped  make  this  project  possible. Thanks.
제목: 100자 이내 JMP를 활용한 다양한 무고장 조건에서 필드 차량 클레임 와이블분석 자동화 초록: 500자 이내 차량의 신뢰성을 분석하는 방법으로 와이블 분석기법을 활용한다. 와이블 분석시 시간 데이터가 필요한데 기본적으로 클레임 데이터에 포함된 시간 정보는 판매날짜 / 수리날짜 / 수리시점의 주행거리이므로 일반적으로 시작과 끝 시점이 있는 날짜 데이터를 이용하여 와이블분석을 진행한다. 그러나 차종의 주행특성, 보증, 개발목표 달성여부 목적등으로 주행거리 기준 와이블분석 필요시 무고장 차량의 주행거리를 여러 가정을 통해 추정하여야 한다. 이러한 과정에서 무고장 산출방식에 따라 와이블 분석결과를 파악하기 위해 JMP의 Life distriution 기능 / 분석결과를 script로 저장되는 기능 / JSL을 활용하여 설정한 여러 무고장 주행거리 산출방법에 따라 와이블 분석결과를 빠르게 비교할 수 있는 자동화 기법을 활용하였다. 이를 통해 날짜와 주행거리를 통한 와이블분석결과에는 큰 차이가 있는 것을 확인했고, 다양한 조건에서 엔지니어링 지식을 가지고 확률 높은 와이블 형상모수를 선택할 수 있게 되었다. 와이블 분석만이 아니라 분석의 정확도를 높히기 위한 다양한 전처리 조건을 유연성 높은 JMP의 기능을 활용하여 수행할 수 있음을 파악할 수 있다. 발표자 프로필: 500자 이내 현대자동차 상용전동화PT기능시험팀 근무
Negative space in the discipline of art defines the space around and between the subject of an image. The use of negative space is an element of artistic composition, since it is occasionally used to artistic effect as the "real" subject of an image. In painting, it is a technique that negatively touches the background of an object to be expressed, so that it gives a feeling of unique texture and silhouette by touching unnecessary parts while leaving necessary parts. As in art, negative space in a design can also be useful to identify an image of infeasible design ranges with a straightforward view. Similarity between two disciplines leads to the introduction of the negative space concept for design space exploration. A rough design space exploration using statistics and visual analytics may support more efficient decision-making, and can provide meaningful insights into the direction of early-phase system design. For this, the approach guarantees dynamic interactions between visualized information and human cognitive systems. Visual analytics is useful to summarize complex and large-scale data. It is useful for identifying feasible design spaces, as well as for avoiding infeasible spaces or highly risky spaces. This presentation introduces the possible use of the negative space concept via an application example using JMP. 예술에서 네거티브 공간은 관심 있는 피사체 주위 공간과 여러 피사체 사이 공간으로 정의됩니다. 이것은 때때로 이미지의 "실질적" 주제로서 예술적 효과에 사용되기 때문에 표현하고자 하는 사물의 배경을 음각으로 터치하는 화법으로 필요한 부분은 남기고 불필요한 부분만을 터치하여 독특한 질감과 실루엣의 느낌을 주는 회화기법으로 말할 수 있습니다. 예술에서 이런 개념처럼 설계에서의 네거티브 공간은 기술적으로 실행하기 어려운 설계범위를 직관적으로 파악하는데 유용할 수 있습니다. 이런 두 분야의 유사성은 설계영역탐색에 네거티브 개념의 도입을 이끌었습니다. 통계분석과 시각화 분석을 도구로 하는 설계영역탐색은 더 효율적인 의사결정을 지원하고 초기단계 시스템 설계의 방향에 대한 의미 있는 통찰력을 제공할 수 있습니다. 복잡하고 많은 양의 데이터를 요약한 시각화된 정보는 인간의 인지시스템과 동적인 상호작용을 보장하기 때문입니다. 기술적으로 실행할 수 없거나 위험성이 높은 설계공간을 피할수 있을 뿐만 아니라 실행 가능한 설계공간을 정의하는 데 유용할 것입니다. 본 발표에서는 JMP 사용한 적용 예를 통해 네거티브 설계 개념 기반 설계영역탐색법의 활용 가능성을 소개하고자 합니다.
Sunday, November 12, 2023
JMP를 활용한 데이터분석 사례는 주로 제조업의 신제품개발을 위한 연구개발 부서나 제조현장의 품질, 기술부서에서 제공되었다. 그러나, JMP의 다양하고 편리한 기능은 제조, 연구개발 뿐만 아니라 마케팅, 인사, 재무, 영업, 서비스지원 등 다양한 부서에서도 폭넓게 활용되고 있다. 금번 발표에서는 2022년 발표자료 보다도 더 고도화된 영업 부분의 JMP를 활용하여 의사결정에 활용한 사례를 공유해보고자 한다. 제조, 연구소에 근무하는 연구원, 엔지니어들에 비하여 영업사원들은 통계나 데이터분석 관련 경험이나 지식이 부족한 경우가 많다. 그러나, 쉽게 활용할 수 있는 JMP의 다양한 기능을 활용하여, 한 단계 업그레이드된 의사결정, 문제해결을 할 수 있음을 사례를 통하여 공유하고자 한다. 영업부분의 주요 사례는 1)내부외부 데이터를 활용한 마켓맵 작성, 2)다중회귀분석을 통한 적정 가격 구간설정, 3)그래프 빌더를 활용한 고객 구매형태 패턴 분석, 4) 머신러닝을 통한 KFS(Key Success Factor) 선정 등 입니다. 영업부문의 사례는 어려운 분석 도구를 사용한다는 것 보다, 쉽고 간단하지만 영업현장에 필요한 분석 기법을 접목해본다는 것에 큰 의미가 있다고 봅니다. 영업 분야에서도 다양한 JMP 데이터분석을 활용한 의사결정 사례가 많이 확대되기를 바랍니다.
Sunday, November 12, 2023
JMP/Python 에코시스템은 기본적으로 JMP의 기능과 Python의 기능을 상호 보완적으로 사용하는 에코시스템을 의미하며, 특히 머신러닝 관점에서 JMP Pro가 자체 보유하고 있는 머신러닝 기법과 더불어 Python이 보유하고 있는 scikit-learn, XGBoost, LightGBM, CatBoost, PyTorch, Tensor Flow 등을 상호보완적으로 사용하여 머신러닝 예측 모형을 구축하고 최적화하는 JMP/Python 머신러닝 에코시스템을 우선적으로 구축해보았다. JMP/Python 에코시스템의 로드맵은 IGEO로 정의되며 IGEO 로드맵은 IG(Identify/Generate) 영역과 EO(Explore/Optimize) 영역으로 대별되며 IG 영역은 문제정의 영역과 실험/시뮬레이션 데이터를 생성하는 영역이며, JMP/Python 에코시스템의 코어인 EO 영역은 DEbO(Design Exploration based Optimization: 설계영역 탐색 기반 최적화)를 실행하는 영역이다. EO 영역에서는 MariaDB, JMP, Python을 통합하여 설계를 수행하게 되며 필요에 따라 Python/JMP GUI를 사용하여 GUI를 구축하여 자동화를 수행한다. 설계영역 탐색을 위해서는 실험계획법/전산실험계획법 생성, 머신러닝 모델 구축의 2가지 단계를 반복하게 되며 JMP와 Python 모듈들을 상황에 따라 결합하여 진행한다. JMP/Python 머신러닝 에코시스템의 성공사례로 Explore 영역 사례 및 볼베어링 고장 감지 모니터링 시스템 사례, CAE 머신러닝 최적화 사례를 정리해 보았으며, JMP와 Python의 결합으로 거의 모든 공학적인 분석을 수행할 수 있는 방향을 제시해보고자 한다.
経済的毒性(FT)は、世界中の乳がん患者にとって注目すべき問題である。しかし、日本におけるFTの状況は十分に調査されていない。本研究では、日本の乳がん患者におけるFTについて検討し、班研究全体の知見の概要を発表したものである。 【方法】調査はQuestantアプリケーションを使用し、主に研究施設に通う乳がん患者と日本乳癌学会会員の医師を対象とした。患者のFTを定量化するために、omprehensive Score for FT(COST)の日本語版を用いた。アンケート項目は主成分解析を行い整理し、重回帰分析を用いて、日本の乳がん患者におけるFTに関連する因子を特定し、医療費に関する情報支援レベル(ISL)の充実度を評価した。すべての解析はJMP® 17.0.0(SAS Institute Inc.; Cary; NC; USA)を使用して行った。 【結果】患者からは1,558件、医師からは825件の回答を得た。FTに影響を与える要因としては、最近の支払いが最も影響が大きく、次いで病期(ステージ)、関連診療科がFTにプラスの影響を与えた。逆に、収入、年齢、家族支援などの因子は、FTにマイナスの影響を与えることがわかった。患者と医師の間には、情報サポートの認識において有意な相違が確認され、患者はしばしばサポートされていないと感じ、医師は十分なサポートを提供したと考えていた。さらに、医療費に関する説明の頻度や質問する機会についても、FTの毒性グレードによって違いが見られた。また、情報支援の必要性をよりよく理解し、医療費に関する知識が豊富な医師ほど、包括的な支援をより多く提供する傾向があることが示された。 【結論】本研究は、日本の乳がん患者におけるFTへの対応の重要性を強調し、経済的負担を軽減し、個々のニーズに合わせた個別化支援を提供するために、情報支援の充実、医師の理解、専門家間の協力の必要性を明らかにした。 【キーワード】経済的負担、乳腺腫瘍、医療サービスの利用しやすさ、多変量解析、日本乳癌学会班研究
鹿児島県茶市場では,荒茶品質改善を目的に,入札される荒茶の外観と水色がデジタルカメラで撮影され,その画像のテクスチャー解析や色度解析から得られる数値は,単価や画像とともにスマートフォン等で各農家にフィードバックされている。今回,お茶の味や香りに大きく影響する荒茶成分値を,栽培情報と画像解析データから説明・予測する手法をJMPの各種機能を用いて検討した。 県内各産地から茶市場に入荷・画像解析・落札後,近赤外法により成分分析された一番茶1,292サンプルのデータセットを用いた。一番茶荒茶成分値(全窒素,遊離アミノ酸,テアニン,繊維,タンニン,カフェイン,ビタミンC)を用いた主成分分析より,入札される荒茶の特徴は,全窒素や繊維に関する指標の主成分1と,タンニンやカフェイン量に関する指標の主成分2で74.4%説明でき,単価は全窒素と正の,繊維と負の相関があった。 管理図より,全窒素と繊維は操業後半(中生~晩生品種)で管理限界を逸脱する事例が多く,全窒素の仕様下限値を含有量5%,繊維の仕様上限値を含有量22%とした場合の不適合率は,それぞれ4.9%,6.4%であった。 荒茶画像解析データから全窒素と繊維を予測するため,栽培情報と10の画像解析項目をパラメータとしたPLS回帰や応答曲面モデルのあてはめを行った結果,両成分値は,入札日と品種に加えて,画像解析項目の「白茎」(摘採する新芽の熟度の指標)を組み込んだモデルで説明できた。さらにプロファイルのデザインスペース機能により,「中~晩生の主要品種は,5月7日までに白茎を程度3以下になるように摘採・製造」すると全窒素と繊維の仕様内割合を98.9%にできると予測された。 以上,荒茶成分値は栽培情報および画像解析データから予測でき,仕様内に維持する栽培指標が得られた。これらの情報を現場での摘採・製造における指導に活用している事例を紹介する。