Choose Language Hide Translation Bar

Using JMP 16 Data Mining and Text Mining Platform To Analyze Display Production Cycle Time (2022-US-30MP-1104)

JMP platforms have significantly helped find the right parameters to determine optimal process. This presentation demonstrates production cycle time analysis using JMP 16 data mining and text mining, using the distribution platform to set up histogram conditions for systemic root cause analysis, and building three partition platform models to improve the R square. Then we will optimize the partition model for both success and failure analysis, create a Neural model and use the text explorer platform to search key words that trigger the modeling parameters used in the predictive model.

 

 

Hello,  good  morning. Good  evening everyone.

My name  is Raisa.

I'm  a  manufacturing  quality  engineer of  Applied  Materials,  Taiwan.

I  started  to  learn  JMP at  beginning  of  this  year.

Recently,  I  pass ed this  certification  exam

with  score,  925  in  this  July.

Today,  I'd  like  to  make a  short  presentation

about   QN Immediate Fix  Time Analysis  by  JMP.

As  we  know,  once  a  quality notification  cure  is  created,

it  must  take  additional, more  or  less  time  to  fix  issue,

and  may  impact  on  production planning  and  scheduling.

Therefore,  we'd  like  to  find a  worst- case  by  analysis.

Okay.

For  analysis,  here  are  five s ub topics  on  agenda.

First,  the  Root  Cause Analysis  of  QN  Fix Cycle  Time,

Graphical  Root  Cause Analysis Summary,

Compare Fit Model, Partition,  Neural Model,

and  then H ybrid  Text M ining

and  the  Data  Mining  Analysis.

Finally,  Take  Away  Learnings.

Okay,  let's  get  started.

The Histogram.

1st  layer  of   Root Cause Analysis  of  QN  Fix  Cycle  Time.

Before  investigation,  think  about  that.

What scenarios  impact on  QN Fix  Cycle T ime

and  how  long  is  it endurable?

First up,

define  five  days  as  a  criteria

and  also  a  key  condition  to  follow C-wide  spread  i n wording

and  within  five  days,

we  in spec  and  success  criteria.

On  the  other  hand,  over  five days  out  of  spec  failure  analysis,

later,  I  make  directly a  breakdown  to  SA  and  the FA.

Notice  this  shade  of  distribution between  SA  and  the  FA.

Meanwhile, look  at  the  Mosaic  Plot

for  the  proportion  of  each  category

to  infer  a  potential r oot cause

for  the  success  analysis.

We  can  see  the  Workmanship and  MFG rework,

seem  to  have  quick  response and  better  fix  cycle  time.

For  FA  dimension  issue, take  more  fix  cycle  time.

It  is  obvious  variation  in  FA time.

Distribution  between  SA  and  the   FA,

suppose  if FA time is  one  of  the  key  factor,

S1 to  impact  the f ix c ycle  time.

The  Box P lot.

The box plot is  a  graph  of the  distribution  of  continuous  variable.

Therefore,  plot  continues fixed  cycle  time  versus nested  structure,

categorical  country  under  containment

to  search  other  factors impact  fix  cycle  time  or  not.

It  displays  the  five- number summary  of  setup  data.

It  is  non- parametric  tool to  use  Median  as  central  tendency.

Besides,  there  are  some observations  on  box plot  graph.

First, at  least  seven point  to  detect  the  first outlier.

Otherwise,  it  becomes whisker (skew)  group  problem

where  sample  size  is  less  than  seven .

Second, observed  screwed  distribution

by  Box Width  or  Whisker Length.

How  to  handle  marginal  outliers,

which  are  we  think   two Sigma GRR noise  from   whisker

and  back  to  the  Root Cause Analysis,

it  is  not  difficult  to  find  a  recycle  time

of  the  containment  replacement

is  much  longer  than  other  containment.

And  with  that,

X₂ containment and  X₃  country  here.

Heatmap.

Heatmap  is another  graphical  tool

to defect  data  value  by  color.

A gain, until now, we  gather  three  input  factors,

defect type from  histogram, containment  and  country  from  box plot

and  in  order  to  follow   study quarter  impact

quarter input  impact on  fixed  cycle  time.

Here  at  back  categorical called defect type  on   Y axis

and  the  color  cycle  time

and  keep  bus  prognostic  structure.

Categorical  country  under containment  in  X  and  X  group.

Then  use  a 8  by  9  layout  look  balance

to  quickly  catch  out  the  maximum

and  the  minimal  cycle time  scenarios.

For  FA,  it  is  easy  to  find a  little  red  area,  right?

The  library  highlights the  longest  fix  cycle  time.

With that, Replacement Taiwan  Damage  parts,

in  the  worst  case  for  cycle  time.

The  Replacement  United  States

and  the  Dimension  issue  parts is  the  second  worst  scenario.

For  SA  is  set  for dimension  and  damage defect.

Others  are  easy  to  quickly  fix.

For  Pareto C hart,

to  further  analyze the  FA  and  SA  from  heatmap ,

heatmap must  use  two- dimensional Pareto  Chart  by  to  variable

defect type  and  the  country under  specific  containment.

Here  are  X₁  defect type,

X₂  containment  and  X₃  country we  mentioned  before.

Then  add  additional workstation  X₄  here

in  the  course of  Pareto C hart to  visualize  frequency  event.

Now  for  FA failure  analysis,

we  get  replacing  high work supplier  damage  issue,

frequently  happen in  CVD  service  fraud

or  replacing  United  States  supplier  dimension  issue

often  happens  in  CVD  workstation.

In  the  same  way  for  X  analysis,

instead for  dimension issue  or  damaged defect,

United  prior,  functional and  a  workmanship  issue

can  quickly  fix  in  CVD  major  test.

Currently,  we  have  four  input factors  and  SA  and FA  frequency.

For  interface, one we  are  more  interest in

is  pass  or  fail  frequency or  pure  cycle  time.

Then  Tabulate.

Here  put  our  previous  mention of  factors   X₁ to X₄  for  on  Tabulate.

Meanwhile,  Tabulate  pure  cycle  time and  frequency  into  a  account

to  do  further  comparison.

For  FA, CVD  service  fraud,  damage  issue,

Taiwan  supplier  require  replacement,

it did take a  longer  cycle time

although  the  frequency is  now  the  highest,

like  seven  times  here,

the  means  of  the  cycle  time, 34  days is  much  longer  than  others.

For  SA,  in  CVD motor  test,

we  can  measure  issue   in  United  States  prior

and  fixed  by  MFG re work.

Even  there  is  only one  day  on  the  table,

but  the  frequency  and  is far  too  low  to  be  true.

Here  I  summarize  the  main points.

Follow   Root Cause Analysis,

use  different  graphical  JMP  platforms

in  engineering  and  large  caustics

sequence  to  conductive prior  Root Cause Analysis.

In  previous  slide, I  show  Histogram,  Box plot,

Heatmap, Pareto  Chart  and  the  Tabulate.

Second,  identify  a  potential  input X

to  protect the  QN  fixed  cycle  time.

A ccording  to  the  Tabulate, the  FA results  from  a  damage  issue,

replacement,  Taiwan  suppliers and  CVD  service  workstation.

Next,  build  a  model to  predict  QN  fix  cycle  time

and  validation  of  the  root  cause.

Before  entering  each  model's  detail,

here,  I'd  like  to  introduce  model selection  and  the  comparison first up.

The  fit  model,  consider  data structure  and  a  distribution.

Here  are  some challenge  in  fit  model.

For  skewed  distribution, use  log  transformation,  but  no  help.

All input  variable,  X₁ to X₄ are  categorical  type.

After  I  build  our  60 %  of workstation  category,

R-square increased  6 %  only.

Check  dependency among  a  categorical  variable

by correspondence  analysis part.

It  is  low  risk  because the  closer  things  are  to  a  region,

the  less  distinct  that  they  probably  are.

In  other  word,  the  farther away  the  more  distinct.

Second,  proximity  between  labels probably  indicate  a  similarity.

For  partition  tree  model,

the  plus  points are  distribution  free  model,

split  based  on  data  available,

little  overfit concern,

but  minus  points  recursive  split.

Therefore,  use JMP  projector screen  by  random  forest

to  average  a  recursive  product

and  find  out  a  five  input factor  with  their  ranking.

It  is  convenient  and  a  quickly way  to  find  important  input

to  optimize  or  improve  model.

Regarding  a  neural  network,

the  plus  points  are  strong transformation  model,

two  steps  training and  a  validation  model.

However,  the  minus  is significant  overfit concern .

Which  model  is  more proper  to  be  believed

that  goes  through each  model  results?

Come  back  to fit  model, main  event  only.

If  our  score  isn't  high,

only  30 %  are wrong,

because  data  is  severe  right skewness.

Observed significant  level  of  risk,

so  Max  R- square around  just   47 %  is  not  worth it

and  use  log  transformation of  the  cycle  time  variable

to  avoid  a  negative  number of  95 confidence  interval,

but  no  help a  lot so log  choice  is  out.

The  next  is  Partition  Tree  Model.

Here  are  three  partition  models,

are  baseline  model,

model augmentation

and  a  model  simplification.

Experience  a  series  of  improvement  per engineering  and  the  logical  thinking

that R  square  improved  to  62 % from  38%  baseline  model.

All the  detail  will  show  you step  by  step  in  following  slide.

Model  augment.

During  this  step,  we  improve  model  20 %.

Where  are  they  from?

First,  they  will  present  improve.

Here,  changing  QN age to  immediate fix  cycle  time

for propriety  experience,  but  no  help.

Second,  6%  and  add one  X  factor  workstation.

Remember  it  is  export, we  discover  from  Pareto C hart.

UD code  becomes  less critical  from  26 % to  8 %  only.

The  third and ano ther  4 % by  changing  to  containment  from  UD  code.

Now,  check  a  contribution  ranking  here.

The  number  two  become  workstation instead  of  country  anymore.

In model  simplification,

here,  improve  additional  6 % square  by  model  simplification.

Before  simplification  model,

the  plus  is  all  scenario under  consideration,

but  like  minus  two,  many categories  might  dilute  predict  power.

In  simplification  model,

filtering  out  minor categories  with  fewer  counts,

like  remove  60 %  categories  of  workstation,

the  total  amount decreased  to  270  from  426.

Check  again,  the  contribution  ranking,

the  defect  type  and  our  workstation  are still  the  number  one  and  the  number  two.

Now,  we  have  more confidence  to  use  the  model

to  predict  the  FA  and the SA.

For  Partition  Tree  Model Optimization

as  I  mentioned  before,

the  major  contributors are  Defect type  &  Workstation

around  80% for Pareto C oncept.

Compare  defect type in SA  and FA  prediction.

For  SA  here,  it's  labeling  issue.

Makes  sense,  we  don't  spend more  time  to  fix  every  issue.

For  FA,  it's  damage.

Yes,  it  would  take much  more  cycle  time.

About  a  workstation comparison  between  SA  and  FA

PVD  mechanical  and  CVD  module  tester.

Currently,  it  still  needs further  analysis  and  understanding.

About  the FA  country  here,  in  a profile  or  prediction  profile  is  flat.

Doesn' t  country  impact QN fixed  cycle  time?

Is  it right?

To  answer  the  question  here, I  introduce  the  model  limitation,

recursive  partition.

Recursive  partitions, sequential  dependency  risk.

Factor  country  is  spread  six  times,

and  only  one  time  happen in  higher  cycle  time  cluster.

Such  recursive  dependency  limitation may  impact the  predictive  model.

The  third model,

Neural  Network (Artificial Intelligence).

Here  observes  severe  overfit  concern

between  training and  the  variation   R square.

If  R-square  between Training  Set  and  V alidation  Set

is  over  20 %, it  has  overfit  concern.

Besides,  we  find  it in  neural  model,

the  number  one  ranking  is  workstation,

and  the  number  two  is  fault  by

which  is  different  from previous  partition  model.

For  SA,  the  workstation  is  at  staging,

CCT staging,  where material  are  brought  together

before  entering  MFG fault ,

and  it  doesn't  have competitive  operation  process.

Once  the  issue  happen, it  can  be  fixed  quickly.

Makes  sense.

For  FA  at  the  CVD service fraud,

it  has competitive  operation  process.

Yes,  it  did  to  have  longer  cycle time  to  treat  difficult  issue.

Until  now,  we  already  have  three  model,

Tree  Model  Partition and  the  Neural  Model,

and  which  model  is  much  more proper  and  meet  reality.

Therefore,

model  comparison  and  selection,

Root Cause Analysis,  graphical  tool

damage  issue  replacement,

Taiwan  CVD  service fraud  is  the  worst scenario  with  longer  fixed  cycle  time.

Currently,  Neural  model  has the  identical  scenario

as  the  Graphical  Root Cause Analysis,

but  only concern is overfit  risk.

Besides,  the  three  model has  very  close  prediction

on  the worst  cycle time  within  1.2  days.

The  final I will  introduce, Test  Mining  and  the  Data  Mining  Hybrid.

Currently  in  QN D ata base, it  still  has  test  messenger  resulting  well

to  such  more  information  about  long  cycle time  in QN  tester  variable  database.

Use  JMP Test Explorer  to discover  some  frequency  keywords

such  as  here,  I  circle  replace, rework,  dimension  and  the  F10246

a project   to  do  further  analysis,

then  convert  them  to  binary  detectors,

conduct  a  further  Data  Mining and  the   Root Cause Analysis

on F10246 case via  heatmap  graph.

Here,  put  dimension indicator  under   F10246

and  containment replace  and  re work in  Y.

According  to  the  heat map  results,  F10246

it did  suffer  lots  of  fix  cycle  time

that  other  project  by  color  results

and  check  dimension  detector  observed  is

not  only  dimension  issue, but  also  our  various  defect cause

long  cycle  time, even  if  just  are  fixed  by  rework.

In  the  end,  here  are  my takeaway learning.

JMP G raphical  Platforms are  very  powerful  to  conduct

deeper  r oot cause analysis  through  engineering

and  the  logical,  data- driven  process

and  compare  and  select a  more  appropriate  JMP  model

from  Classic  Fit M odel, P artition  and  a  Neural  Network

by  knowing  the  model  limitation

and  the  risk  of  connecting  to  previous Graphical   Root Cause Analysis.

Conduct  a  Hybrid  Text M ining and Data Mining R oot  Cause  Analysis

on  the   complicated QN Database.

Final,  I'd  like  to  thank  GCI  M BB Charles Chen  as  my  project  mentor

and  that's  all  my  presentation.

Thank  you  for  your  time  and  attention.

Thank  you.