jthi
Super User

Pythonless Python Integration for JMP® (2023-EU-30MP-1265)

Jarmo Hirvonen, Data Integration and Data Science Specialist, Murata Electronics Oy
Philip O'Leary, Data Integration Manager, Murata Finland

 

Challenges with a JMP® and Python integration resulted in a search for an alternative solution that would allow for the evaluation and testing of the various Python libraries and powerful algorithms with JMP. This would enable JMP users to work with Python from a familiar JMP environment. After a few different iterations, a RestAPI service was developed, and when JMP calls this service, it dynamically creates a user interface based on the options the service currently provides. The JMP user can then utilize this user interface to employ different algorithms such as HDBSCAN, OPTICS, and UMAP by sending data directly from JMP in one click. After the algorithm has finished its operations on the server side, it will return data to JMP for further analysis and visualization.

 

 

Welcome  to  the  Pythonless  Python  Integration  for  JMP  presented  by  Murata  Finland.  My  name  is  Philip  O'Leary.  Shortly  about  Murata,  we  are  a  global  leader  in  the  design,  manufacture,  and  supply  of  advanced  electronic  materials,  leading- edge  electronic  components,  and  multifunctional  high- density  modules.  Murata  innovations  can  be  found  in  a  wide  range  of  applications through  mobile  phones  to  home  appliances,  as  well  as  from  automotive  applications  to  energy  management  systems  and  health care  devices.

We  are  a  global  company,  and  as  of  March  2022,  there  was  approximately  77.5  thousand  employees  worldwide,  just  under  1,000  in  Finland,  where  we  are  located.  Our  product  line up  here  in  Finland  include  accelerometers,  inclinometers,  gyroscopes,  and  acceleration  and  pressure  sensors.  Our  main  markets  are  the  automotive,  industrial,  healthcare,  and  medical.

Today,  we  have  two  presenters,  myself,  Philip  O'Leary  and  my  colleague,  Jarmo Hirvonen .  I've  been  working  in  the  ASIC  and  MEMS  industry  for  over  40  years,  32  of  which  have  been  here  at  Marata.  I've  had  several  roles  here  and  have  come  to  appreciate  the  importance  of  data  within  manufacturing.  Most  recent  years  have  been  devoted  to  supporting  the  organization  take  benefit  from  the  vast  amount  of  data  found  from  within  manufacturing.  I  currently  lead  Murata's  data  integration  team.

Jarmo,  perhaps  you'd  like  to  give  a  few  words  on  your  background.

Yes,  sure.

Hi,  I'm  Jarmo  Hirvonen  and  I  work  in  Philips'  team  as  a  data  integration  and  data  science  specialist.  I  have  been  using  JMP  for  four  and  a  half  years,  approximately  the  same  time  that  I  have  been  working  at  Murata.  I'm  a  self- learned  programmer.  I  have  been  studying  both  programming  besides  a  couple  of  basic  courses  at  university.

In  my  position,  I  do  a  lot  of  JSL  scripting.  I  write  adding  scripts,  reports,  automatisation,  basically  almost  everything  you  can  script  with  JSL.  If  it  stays  mostly  inside  JMP.  I'm  active  JMP  community  member.  I'm  also  a  super  user  there.  B ecause  due  to  my  background  with  the  JSL  scripting,  I'm  also  steering  committee  member  in  the  community  scripters  club.  I  have  also  written,  I  think  at  the  moment  nine  add- ins  that  have  been  published  to  JMP  community.  Feel  free  and  try  them  out  if  you  are  interested  in  that.  Thank  you.

Thank  you,  Jarmo.  This  is  the  outline  for  the  presentation  that  we  have  for  you  today.  As  this  session  has  been  recorded,  I  will  not  read  through  the  outline  as  you  can  do  so  yourselves  afterwards.  Why  do  you  have  the  need  for  a  JMP  Python  integration?  Well,  basically,  we  are  very  happy  with  the  performance  and  the  usage  we  have  of  JMP.  It  doesn't  require  any  programming  for  the  basic  usage,  and  we  see  this  as  a  big  advantage.  JMP's  visualization  and  interactive  capabilities  are  excellent.  T he  majority  of  people  performing  analysis  at  Murata  in  Finland  are  already  using  JMP.  W e  have  a  large  group  of  people  throughout  the  organization  using  JMP,  and  we  want  to  maintain  that.

However,  on  the  Python  side,  we  see  that  Python  has  powerful  algorithms  that  are  not  yet  available  in  JMP.  We  already  have  people  working  with  Python  in  various  different  applications,  and  we  have  models  within  Python.  We  want  to  support  these  people  and  also  help  others  understand  and  take  advantage  of  the  Python  world.  B asically,  we  want  to  take  advantage  of  the  wide  use  of  JMP  here  at  MFI  and  offer  JMP  users  access  to  some  common  Python  capabilities  without  the  need  for  themselves  to  program.

I'll  continue  here.  Share.  JMP  already  has  Python  Integration,  but  why  we  are  not  using  that?  Basically,  there  are  two  groups  of  reasons, JMP  and  us  or  our  team.  My  experience  regarding  JMP  are  from  JMP 15  in  this  case.  JMP  update  at  least  once  broke  this  integration  and  it  caused  quite  a  few  issues  for  us  because  we  couldn't  use  the  Python  JMP  scripts  anymore  unless  we  modified  them  quite  heavily.  Getting  JMP  to  recognize  different  Python  installations  and  libraries  has  been  quite  difficult,  especially  if  you  are  trying  to  work  on  multiple  different  installations  or  computers.

Also,  JMP  didn't,  at  least  that  then  support  virtual  environments  that  are  basically  necessary  for  us.  Then  our  team  side,  we  don't  have  full  control  of  Python  versions  that  JMP  users  are  using  or  the  libraries  and  packages  they  are  using.  Because  not  everyone  is  using  JMP  as  the  main  tool.  They  might  be  using  Python  and  they  have  some  versions  that  don't  work  with  JMP  and  we  don't  want  to  mess  with  those  installations.  Also,  in  some  cases,  we  might  be  running  Python  or  library  versions  with  JMP  doesn't  support  yet,  or  maybe  it  doesn't  support  old  versions  anymore.

What  is  our  current  solution  for  this  Python  JMP  or  JMP  Python  Integration?  We  are  basically  hosting  Python  server  using  a  web  framework.  We  can  create  endpoints  to  that  server,  which  are  basically,  behind  them,  there  are  different  algorithms.  We  communicate  with  Rest API  between  JMP  and  the  server.  This  is  the  biggest  benefit.  This  test  we  can  use  JMP  with  the  server,  but  we  also  have  a  couple  of  additional  benefits.  We  can  have  centralized  computing  power  for  intensive  models.  For  example,  we  don't  have  to  rely  on  the  laptop  to  perform  some  heavy  model  calculations.  The  server  is  not  just  limited  to  JMP.  We  can  also  call  the  endpoints  from  Python  or  for  example,  R.  We  are  not  dependent  on  the  JMP  supported  Python  and  library  versions  anymore.  We  can  basically  use  whatever  we  want  to.

Next,  I  will  go  a  little  bit  away  from  the  PowerPoint  to  jump  and  show  a  little  bit  of  the  user  interface.  First,  I  will  explain  some  terminology  which  might  appear  here  and  there  on  this  presentation.  W e  have  endpoints,  basically,  this  path  here  is  endpoints.  These  come  directly  from  the  server.  Then  we  have  methods.  It's  the  last  part  of  the  endpoint,  DSME  and  XT  boost  in  these  two  here.

Then  we  have  parameters,  this  column,  and  this  is  basically  the  inputs  that  we  will  send  to  the  server.  Then  we  have  what  I  call  stack  or  we  call  stack.  It's  the  collection  of  stack  items.  O ne  row  is  the  stack  item  that  we  can  send  one  after  another  to  the  server.  Quickly  jump  here.  W hat  features  we  have?  We  have  easy  to  add  new  endpoints.  Basically,  we  write  at  the  end point  to  the  Python  server,  we  built  the  server,  we  ran  the  JMP  add- in,  and  this  list  will  get  updated.  This  adding  support  dynamic  data  table  list.  I f  I  change  the  table  here,  it  will  update  here.  Also,  if  new  table  is  opened,  the  other  screen,  but  it  doesn't  really matter.  You  can  see  it  here,  the  untitled3  tree  was  opened.

Then  we  can  send  data  directly  from  here  to  the  server,  but  they're  pressing  multiple  different  options  for  sending.  I  can  send  these  selections  that  I  had  here  basically  immediately.  I  will  show  the  results  here.  After  getting  the  data  back,  we  join  it.  These  are  from  the  server.  We  join  the  data  to  the  original  data  table  we  had  and  then  we  have  some  metadata  we  can  get  from  the  server  between  the  from  the  communication.  Notes,  column,  properties  telling   what  method  and  parameters  were  used  to  get  these  two  columns.  Then  we  group  them. I f I  have  run  multiple  models  or  methods,  it's  easier  to  see  which  are  from  which  runs.

Then  we  have  table  scripts  which  are  also  grouped.  This  is  different  screen,  let's  move  them  around.  We  have  stack.  What  was  sent?  HTTP  response  from  the  sent  that  comes  from  the  server.  Then  in  this  case,  we  also  receive  from  the  endpoint  an  image.  In  this  case,  it's  a  scatter  plot  from  the  t-SNE  components.  I  said  already  earlier,  we  can  send  multiple  items  from  the  stack  one  after  each  other.  You  can  build,  let's  say,  HPP  scan  with  different  input  parameters  used  in  the,  let's  say,  20  here  and  then  20  to  40,  add  them  to  stack  and  just  send  them  there  and  come  back  when  they're  done  and  you  can  start  comparing  if  there  are  some  difference  between  those.

T hen  endpoints  have  instructions  how  to  use  them.  Documentation  link,  if  we  have  one  short  description,  in  this  case,  very  short  description  of  the  endpoint,  and  then  what  each  of  the  parameters  do.  Minimum  values,  maximum  values,  default  values,  and  descriptions  of  those.

Then  we  also  have  user  management.  In  this  case,  I'm  logged  in  as  a  super  user,  so  I  can  see  these  two  here  experimental  endpoints  that  basic  user  would  not  be  able  to  even  see.  Then  back  to  PowerPoint.  This  may  be  a  partial  implementation,  partially  how  the  adding  works.  When  the  user  runs  the  adding,  the  JMP  will  ping  the  server,  and  if  the  server  is  up  and  running,  JMP  will  send  new  request  for  the  JSON  that  we  will  use  to  build  the  interface.  The  JSON  is  passed  and  then  the  interface  is  built  and  it  is  using  JMP- type  classes  that  I  will  show  a  bit  later.  C ustom  class  is  created  in  JMP.

A t  this  point,  users  can  start  using  the  user  interface.  User  fills  the  selections,  parameters,  data  tables,  and  such,  and  then  sends  the  item  from  the  stack.  We  will  get  the  columns  based  on  the  inputs,  get  the  date  that  we  need  and  convert  that  data  to  JSON.  In  this  case,  I  call  it  column  JSON  because  there's  a  demonstration.  Basically,  normal  JSON  would  always  have  the  column  name  duplicated.  Each  row  will  have  all  the  column  names  here.  I n  this  case,  we  will  have  column  name  only  once  and  then  list  of  values.  This  makes  the  object  we  send  much  smaller.

Before  we  send  the  data,  we  will  ping  the  server  again.  This  is  done  because  we  have  different  timeouts  for  ping  and  the  request.  Otherwise,  JMP  will  lock  down  for  a  long  time  if  the  server  is  not  running  and  we  are  using  two  minutes  timeout,  for  example.  T hen  when  the  server  gets  the  data,  it  will  run  the  analysis,  return  the  analysis  results,  and  we  join  them  back  table  at  the  metadata  table  scripts  and  so  on.  A t  this  point,  users  can  start  to  continue  using  JMP,  send  more  items  from  the  stack,  or  maybe  even  JMP  to  graph  builder and  start  analyzing  the  data  that  he  or  she  gets  back  from  the  server.

T his  is  the  JMP- type  classes.  W e  have  different  classes  for  different  type  of  data  we  get  from  the  server.  We  have  booleans.  I n  JMP,  this  is  checkbox  columns, enumerators,  this  would  be  combo  box  type  number,  TypeS tring,  and  not  implement  that.  This  is  basically  used  to  check  that  the  server  is  correctly  configurated.  This  is  a  quick  demonstration  of  one  of  those  Type Column.

On  server  side,  it  has  been  configured  like  this.  When  we  request  the  JSON,  it  will  look  more  like  this.  Then  this  type  column  class  will  convert  it  into  an  object  that  will  look  in  the  user  interface  like  this.  From  here  you  can  see  that  for  example,  minimum  items  is  one.  It's  the  same  as  minimum  here.  Max  items,  same  thing.  Then  modelling  types  have  also  been  defined  here.  We  can  limit  minimum,  maximum  values,  and  so  on  based  on  the  schema  we  receive  from  the  server.  A ll  of  these  are  made  by  the  custom  JMP  classes.  T his  is  enumerator,  some  options,  then  number  boxes,  and  here  is  the  boolean. N ow,  Phil,  we'll  continue  with  the  couple  of  demonstrations  of  the  Pueb  interface.

Thanks,  J armo.  All  demonstration  is  done  today  will  be  performed  using  standard  JMP 16.  There  are  three  demonstrations  I'd  like  to  go  through,  each  having  a  different  task  in  mind.  The  first  one,  I'll  just  open  the  data  set.  This  is  a  data  set  which  contains  probe  or  test  data  from  five  different  products.  It's  a  rather  small  data  table  just  to  ensure  that  we  don't  get  caught  for  time.

W e  have  29  probe  parameters  for  five  products  within  the  same  product  family.  T he  task  at  hand  is  to  try  to  determine  quickly,  do  we  have  anomalies  or  do  we  have  opportunities  for  improvement.  Looking  simultaneously  at  these  five  different  products,  29  different  parameters,  such  that  we  could  identify  something  that  could  help  reduce  risk  or  something  that  perhaps  could  reduce  cost  and  improve  yield.

O ne  possible  way  to  do  this,  of  course,  would  be  the  one  factor  at  a  time  whereby  we  would  just  manually  march  through  all  the  different  data,  all  the  different  parameters  and  look  for  patterns.  Very  inefficient  for  29  parameters,  it's  okay,  but  some  of  our  products  have  thousands  of  parameters,  so  it's  not  the  best  way  to  approach  the  task  at  hand.

Another  possibility  would  be  to  take  all  of  these  parameters  and  to  put  them  through  some  clustering  algorithm  to  see,  could  we  find  groups  naturally  from  the  data  that  we  have?   I  want  to  use  the  JMP- PyAPI  interface  that  we  have  here.  Jarmo  already  explained  briefly  how  these  work,  but  I  will  demonstrate  it.

T he  intention  that  I  have  now  is  to  make  a  HDBSCAN .  I'm  going  to  make  the  scan  on  all  the  probe  parameters.  I'm  going  to  use  the  default  settings.  Default  settings  are  typically  already  quite  good.  And  I'm  going  to  send  this...  I'm  not  going  to  make  a  big  stack.  I'm  going  to  send  this  setting  straight  for  analysis.  W e  can  see  rather  quickly,  the  algorithm  came  back  and  suggested  that  I  have  a  cluster.  There  are  actually  three  clusters  and  one  grouping  of  wafers  which  do  not,  in  fact,  belong  to  any  of  the  clusters.  Knowing  that  I  have  five  products,  I'm  going  to  go  with  this  for  the  sake  of  demonstration.  I  can  see  from  here  a  histogram  of  the  number  of  wafers  in  each  cluster,  but  it  doesn't  really  give  me  a  good  visualization  of  what's  going  on.

I'm  going  to  also  do  a  dimension- reduction  procedure.  I f  I  go  back  into  the  same  interface,  and  now  I'm  going  to  do  a  teeth  knee  dimension  reduction  on  the  same  parameters  and  send  it  immediately.  Wait  for  the  dimension  reduction  algorithm  to  do  its  job,  and  it  will  return  back  two  components  for  teeth  knee,  one  and  two,  against  which  then  I  can  actually  visualize  the  clusters  that  the   HDBSCAN  gave  me  such  that  if  I  now  plot  teacher  1,  teacher  2,  and  colour  code  them  in  accordance  with  the  clusters  that  have  already  been  identified.

As  I  said,  we  have  three  clusters  and  one  grouping  of  wafers  which  don't  necessarily  belong  to  a  cluster.  Maybe  somewhat  disappointing  knowing  that  I  have  five  different  products.  T hankfully,  I  have  an  indicator  of  the  product.  It's  here.   I  said,  this  is   actually  frustrating  because  now  I  have  two  different  products  being  clustered  as  being  the  same.  I n  actual  fact,  this  is  the  medical  application  of  the  same  automotive  part.  I n  fact,  the  parts  are  identical  so   them  being  in  the  same  cluster  is  not  a  problem.

This  part  is  rather  unique.  It's  different  to  the  other  products  in  the  same  family,  such  that  it  got  its  own  cluster  with  a  few  exceptions,  so  it's  quite  good. T hen  the  B2  and  the  B4  versions  basically  have  the  same  design.  W hat  I'm  concerned  is  that  the  B4  has  been  allocated  a  cluster  1  and  also  a  lot  of  minus  ones  for  wafers  in  the  same  product  type. I'd  like  to  further  investigate  what  this  might  be  due  to  so  that  I  have  scripted  to  the  table,  I  want  to  make  a  subset  of  this SENSORTYPE NR SA AB4,  and  then  I'm  going  to  plot  the  differences  for  every  parameter  by  cluster  minus  one  and  cluster  one.

H ere  we  see  the  parameters  in  question,  and  the  biggest  differences  are  observed  for  Orbot  1  and  Orbot  2.  I'm  not  going  to  get  into  the  parameters  themselves,  but  just  suffice  to  say  that  some  parameter  differences  are  bigger  than  others.  Now  that  I  know  that  these  exist,  I'd  like  to  check  across  all  the  wafers  in  this  subset,  how  does  Orbot  1  and  Orbot  2  actually  look?  H ere  we  see,  in  fact,  that  the  ones  which  have  been  allocated  minus  1  are  not  belonging  to  the  cluster  itself  have  a  much  higher  value  of  Orbot  1.  In  fact,  this  anomaly  is  a  positive  thing,  because  the  Orbot  value,  the  higher  it  is,  the  better.  W e  see  that  there's  quite  a  large  group  of  wafers  having  exceedingly  larger  values  of  Orbot  than  what  we  would  typically  see.

T he  next  step,  of  course,  would  be  then  to  do  a  commonality  study  to  figure  out  how  has  this  happened,  where  have  the  wafers  been,  what  has  the  process  been  like,  and  look  for  an  explanation.  Well,  we  can  see  that  very  quickly,  a  multi  product,  multi- parameter  evaluation  of  outliers  or  anomalies  can  be  very  quickly  performed  using  this  method.   I  will  now  move  on  to  the  second  demonstration.

Just  need  to  open  up  another  file.  T his  application  is  very  different.  I t's  very  much  trying  like...  Or  actually,  it  is  a  collection  of  functional  data.  In  fact,  there  are  bond  curves,  curves  which  occur  in  our  anodic  bonding  process  when  we  apply  temperature,  pressure,  voltage  across  a  wafer  stack  to  have  the  wafer,  the  glass  and  the  silicon  bond  together.  If  we  look  at  individual  wafer  curves,  we  can  see  that  each  wafer  has  a  similar  but  still  unique  curve  associated  with  it.  We  can  see  the  bonding  process  time  and  the  associated  current.

T he  task  I  would  like  to...  The  goal  I  would  have,  if  I  just  remove  the  filter,  I  would  like  to  know,  without  having  to  look  through,  in  this  case,  352,  but  we  would  have  thousands  of  these  every  week,  how  many  different  types  of  curves  do  I  actually  have  in  my  process?  T hen  tying  that  in  with  the  final  test  data,  can  this  curve  be  used  to  indicate  a  quality  level  at  the  end  of  the  line?

In  order  to  do  this,  I'm  going  to  split  the  data  set.  N ow  I  put  the  time  axis  across  the  top  and  the  current  through  each  column.  The  first  thing  that  I  do  after  doing  this  splitting  then  is  to  again  go  back  to  our  PyAPI  interface  and  I'm  going  to  look  at  Split  Data.  W hat  I  want  to  do  is  to  make  a  dimension  reduction  because  you  can  see  that  I  have  many,  many  columns,  and  it  would  be  much  better  that  I  can  reduce  the  dimension  here.

A gain,  I'm  going  to  do  a  teach- me  analysis.  I'm  going  to  send  it  straight  to  the  server,  and  we  can  see  that  the  algorithm  has  come  back  with  two  components.  I  can  demonstrate  them  very  quickly  what  they  look  like.  T he  352  wafers  which  were  represented  by  functional  data,  curve  type  data  a  few  minutes  ago  are  now  represented  using  a  single  point  for  each  wafer.

Now,  having  reduced  the  dimension  of  the  data,  I'd  like  to  perform  a  cluster  analysis  next.  A gain,  I'll  go  back  to  my  AyAPI.   I'm  now  going  to  do  a   HDBSCAN  on  the  titanium  components.  I  just  need  to  check  on  this  analysis  what  would  be  a  suitable  level.  If  I  send  it  immediately,  I  get,  colour  code,  the  cluster,  you  can  see  that.

Now  clusters  have  been  allocated  to  the  teach-me  components.  This  is  the  first  level  analysis  using  the  teach-me, sorry,  using  the  HDB  defaults,  I  could,  of  course,  try  another  setting.  I  could  perhaps  run,  maybe,  if  we  think  out  loud,  25  wafers,  a  batch  of  wafers,  and  half- wafer  batches  are  things  that  would  be  of  interest  to  me,  and  look  to  see  what  would  this  cluster  now  look  like.  N ow  all  of  a  sudden,  I  have  much  more  clusters.  O f  course,  it  does  take  some  subject  matter  expertise.

You  need  to  know  what  clusters  you  would  expect.  In  this  case,  I  said,  okay,  a  natural  rational  group  for  us  within  the  manufacturing  would  be   a  bunch  of  wafers,  a  lot  of  wafers,  wafers  are  posted  in  25  wafer  batches.  S ometimes  we  have  halfway  for  batches,  which  we  do  experimental  runs  on  and  so  on  and  so  forth.  N ow  we  can  see  that  we  have  clusters  associated  with  the  different  types  of  curves.   I'm  going  to  shorten  this  demonstration  rather  than  you  watching  me  do  joins  and  so  on  and  so  forth.  W hat  I'm  going  to  do  is  I'm  going  to  take  from  the  original  data,  I'm  going  to  put  this  cluster  into  the  original  data.  I t's  of  course,  opening  on  another  screen.

I f  I  do  cluster  overlays,  we  can  see...  T his  is  the  original  data  where  at  first  I  showed  you  each  individual  wafer  bond  curve.  Now  we  can  see  that  we  were  able  to  identify  the  distinct  differences  between  seven  clusters  and  one  group  of  wafers  which  don't  belong  to  any  particular  tester.  W e  can  see  that  very  quickly,  we've  been  able  to  go  through  large  numbers  of  wafers,  determine  similarities  between  them,  and  come  up  with  clusters.

If  you  bring  this  even  one  step  further,  we  can  take  a  look  at  the  actual  teach-me  components,  the  coloured  clusters,  and  have  a  quick  look  at  what  do  the  actual  contents...  W e  can  see  this  is  cluster  minus  one.  They  seemingly  have  something  which  has  a  very  high  bond  current  at  the  very  beginning,  cluster  zero,  very  high  bond  current  at  the  end.  Y ou  can  see  that  if  we  were  to  spend  enough  time  on  this,  you  would  see  lots  of  similarity  between  bond  curves  within  each  cluster.  short  demonstration  on  how  to  take  functional  data  from  hundreds  of  wafers,  cluster  them,  and  with  them  the  various  visualization  techniques  within  JMP,  how  to  clearly  identify  and  present  so  that  people  understand  the  different  groupings  that  exist  within  the  data  sets.

This  concludes  my  demonstration  number  two.  I  have  one  more  demonstration.  This  is  maybe  in  some  respects,  for  some,  maybe  a  fun  demonstration,  so  that longer  to  take...  Again,  it's  not  a  real  wafer,  but  I'm  playing  with  the  idea  that  I  have  a  silicon  wafer  and  there  are  some  noise.  This  is  a  defect  layout  from  an  automated  an  inspection  tool,  this  data  has  been  simulated.

The  purpose  of  having  this  simulation  is  to  look  for  scratches  or  patterns  found  from  defect  data  layout.  This  is  rather  easy  and  straightforward  if  I  don't  have  noise.  I  can  see  that  there's  noise  associated  with  this  data  set.  W hat  I  want  to  determine  is,  can  I  find  a  way  to  identify  these  three  spirals,  assuming  that  they  simulate  some  scratch.  In  fact,  they're  not  very  similar  to  a  scratch,  except  they  are  patterns  having  high- density  defects  in  a  small  area.  T hat's  the  main  purpose  of  using  it,  rather  than  showing  you  actual  wafer  automated  visual  inspection  data.

The  idea  is  that  the  task  at  hand  is  try  to  identify  the  spirals  from  this  data  set.   I'm  going  to  use,  again,  a  trust  ring  method.  A gain,  it  will  be... The  table  I  will  use  spiral  data  with  noise.  As  Jarmo  pointed  out,  we  can  run  because  if  I  don't  know,  obviously  putting  the  number  of  wafers  in  here,  25  and  12  won't  help  me  because  I'm  looking  at  a  single  wafer.  T he  numbers  I  put  in  should  be  somehow  representative  of  how  many  defects  are  typically  seen  within  a  scratch  and  what  are  the  smaller  sample  sizes  associated  with  clusters  and  so  on  and  so  forth,  minimum  samples.

Being  a  complete  novice,  I  don't  know.   I'm  going  to  put  in  some  numbers  to  play  with.  Twenty five  would  be  minimum  cluster  size  with  a  minimum  sample  size  of  zero.  Add  to  stack,  and  then  I  say,  Okay,  well,  this  is  rather  inexpensive  to  do  so  I'm  going  to  add...

You're  missing  the  columns.

Oh,  sorry.  Thank  you.  This  will  help.  Let  me  clear  stack  in  my  enthusiasm  to  move  forward.  I  did  not  include  what  I  should  have.  L et  me  start  again.  Thank  you.

Twenty five  minimum  cluster  size,  minimum  sample  size,  add  to  stack. Fifty  minimum  cluster,  add  to  stack.  Seventy five.   I'm  allowing  the  scratches  to  be  bigger  and  bigger.  Add  to  stack,  100.  Are  not  necessarily  bigger  and  bigger,  but  they  would  have  more  and  more  defects  associated  with  them.  Add  to  stack.  And  then  I'm  going  to  add  another  combination  of  75  too,  add  to  stack.   I  could  just  take  one  of  these  and  run  it.   I  could  select  one  and  run,  but  I'm  not.  I'm  going  to  be  greedy.  I'm  going  to  run  the  whole  stack  at  the  same  time.

I'm  going  to  run  one, two, three, four, five  cluster  analysis  against  the  data  that  I've  represented,  I've  taken  it  from  this  wafer.   send  the  whole  stack,  and  cluster,  something  has  gone wrong.  All  my  clusters  are  showing  minus  ones.  Let  me  try  this  again.  To  make  a  long  story  short,  and  also  the  fact  that  this  is being  recorded  and  we  don't  want  to  start  again  from  the  beginning.

I  know  that  at  the  end,  that  if  I  take  this...  I'm  not  sure  why  this  has  disappeared,  but  let  me  try  it  one  more  time.  The  table  I  need  is  the  noise  table.  I'm  taking   HDBSCAN  X,  Y  features,  X,  Y,  20.  I'm  going  to  make  a  shortcut,  75  and  two,  send  immediately.  Now,  thankfully,  I  don't  know  whether  I  had  selected  incorrectly  last  time,  the  table  or  whatever.   Now  that  we're  here,  put  up  a  few  thumbies,  send  it  immediately,  and  so  on.

As  I  said,  we  could  have  run  quite  many.  The  idea  then  is  to  look  then   at  the  layout  and  try  to  determine.  I s  it  with  this  particular  setup,  finding  good  clusters  and  it's  a  minus  one?  I t  says,  no,  you're  not  finding  anything  there.  T hen  if  I  colour  code  by  the  other  clusters,  it  has  in  fact  found  quite  well  lots  of  points  that  don't  belong  to  any  cluster. T hen  three  individual  spirals  which  are  very  well  identified.  Y ou  think,  what's  the  benefit  of  this?  Well,  now  that  I  know  what  typical  scratch  content  looks  like,  then  I  could  in  fact,  then  open  up  another  wafer.

If  I  open  up  data  from  another  wafer,  make  the  plot  of  the  layout,  we  can  see  that  there  are  no  scratches  on  this  wafer,  it's  only  noise.  W hat  would  happen  then  if  I  run   the  same  setup?  My  wafer  is  another  wafer.  I'm  doing  it  on  X,  Y.   I'm  looking  to  determine  based  on  my  best  settings  of  how  I  should  be  able  to  find  scratches  75  and  two,  send  immediately  and  plot  with  clusters.  We  only  have  minus  ones,  so  nothing  has  been  detected  has  been  a  scratch.

H aving  this  possibility  to  be  able  to  run  this  algorithm  against  wafers  on  the  database,  then  I  could  make  a  collection  of  wafers  that  have  scratches,  don't  have  scratches,  or  spirals  in  this  case,  and  then  use  that  data  for  an  input  to  a  commonality  study to  try  and  determine  which  machines  in  the  production  line  are  coming,  are  resulting  in  the  scratches  on  the  wafers.  This  concludes  the  third  demonstration.  Now  I'll  hand  it  back  to  Jarmo.

I'll  take  that.  W e  have  a  couple  more  slides  to left.  Here  is  a  couple  of  ideas  we  have  for  possible  future  development  using  DoE  approach  for  the  stack  building,  basically  what  Philip  did  by  hand,  but  used  DOE,  so  I  had  middle  max  values  and  so  on,  and  then  sent  that  whole  stack.  Then  metadata  viewer,  so  you  can  compare  the  results,  try  JMP 17's  new  multiple  HTTP  requests,  local  server,  so  we  don't  rely  on  the  server  being  up.  Try  the  new  hopefully  updated  native  JMP  Python  Integration.  This  would  allow  us  to  have  faster  data  transfer,  possibly  the  more... W e  could  start  testing  with  this  application,  then  try,  for  example,  running  from  graph  builder,  we  could  trigger  the  functions,  combining  different  endpoints.

F irst,  we  could  input  the  data  from  t-SNE  based  on  the  input  that  data  and  then  automatically  cluster  the  t-SNE .  T hen,  of  course,  we're  always  adding  new  endpoints  if  we  find  out  what  we  want  to  have.  Last  slide  is  that  we  will  be  sharing  small  sample  of  the  code.  There  will  be  a  JMP  file  with  the  JMP  script,  Python  script,  and  installation  instructions  there.  Y ou  can  try  to  be  quite  simple  user  interface  which  will  send  data  to  local  server  and  you  will  get  the  data  back.  It  also  has  some  ideas  in  the  instructions  sheet  that  you  can  try  to  implement  if  you're  interested  in  trying  this  approach  for  the  JMP  Python  Integration.  That's  for  us.  Thank  you.

Thank  you  also  from  me.  If  you  need  to  contact  us,  you  can  do  so  via  the  community.

Article Tags