Choose Language Hide Translation Bar

Modeling of Product Quality Based on Certain Product Measurements (2022-US-EPO-1085)

This presentation highlights modeling approaches used to understand and predict product quality based on product measurements. The model development process involves four steps: data collection, model development, testing, and implementation. During data collection, parts are collected from the production line and product measurements are completed.  These parts continue through the process and are subject to a quality test. These product measurements and quality test results are combined into the dataset and used for modeling. The quality test is the response and product measurements are predictors.  In model development, the data is examined thoroughly to ensure it is as clean as possible. Variable clustering and stepwise regression are applied to remove highly correlated input variables and select the important variables. The final step is to apply generalized regression using log-normal distribution and the adaptive lasso estimation method. The model must have an accuracy of greater than a certain metric that is acceptable. If the model meets this criterion, it’s moved into the testing phase.  This phase involves using the model under engineering control to determine how well the model predicts the product quality.  Once satisfied, the model is implemented into production. The operations team receives instant feedback on how the part will perform and can adjust and tune the process in real-time. With this information, we can also deem the product acceptable or not.  If rejected, the product is disposed of and doesn’t continue through the process. These predictive models identify unacceptable parts and process upsets in the upstream processes.

 

 

Welcome,  and  thank  you  for  joining my  poster  presentation

at  this  year's  JMP C onference.

My  name  is  Kaitlin Shorkey ,

and  I'm  a  senior  statistical engineer  at  Corning  Incorporated.

How  do  you  get  a  glimpse of  a  product  quality

before  it  completes the  production  process?

We  chose  to  build  a  model  that  will predict  the  product  quality  outcome

before  it  has  completed   the  entire  process.

There  are  two  major benefits  of  this  approach.

One  is  the  operations  team

receives  instant  feedback on  how  the  parts  will  perform

and  can  adjust the  process  in  real  time.

The  second  is  that  we  can  dem the  product  acceptable  or  not.

Like  I  just  mentioned, the  main  objective  of  this  work

is  to  build  a  predictive  model using  a  few  modeling  approaches

to  understand and  predict  product  quality

based  on  certain  product  measurements.

Our  major  steps  in  building  this  model

are  data  collection,  model  development, testing  and  implementation.

First off,  for  the  data  collection  phase, parts  are  collected  at  the  end

of  the  production  line  and  appropriate product  measurements  are  completed.

The   parts  are  then  subjected   to  the  quality  test.

The  product  measurements   and  quality  measurement  results

are  combined  into  a  data set  and  used  for  building  the  model.

In  this  case,   the  quality  measurement  results

are  combined  into  a  data set and  used  for  building  the  model.

In  this  case,  the  quality  measurement  is  the  response

and  all  the  product  measurements  are  the  predictors.

The   dataset  consists  767  predictors and  990  observations  or  parts.

This  stuff  can  take a  long  time  to  execute.

Since  we're  building  a  model, it's  important  to  get  as  large  of  a  range

of  product  measurements   and  quality  measurement  results  as  we  can.

If  we  leave  this  the  accuracy   and  model  predictions

are  more  consistent  across  the  range.

Essentially,  this  allows  the  model  to  accurately  predict  at  all  levels

of  the  product  quality  results.

Once  the   dataset  is  compiled,

it  is  thoroughly  examined  to  ensure it  is  as  clean  as  possible.

After  the  data  collection  and  cleaning,

the  second  phase   of  model  development  is  started.

For  this,  we  begin   with  variable  clustering.

Step  by  regression, remove  highly  correlated  variables

and  select  the  most  important  ones.

With  so  many  predictors we  first apply  variable  clustering.

This  method  allows  for  the  reduction in  the  number  of  variables.

Variable  clustering  groups  the  predictors,

the  variables  into  clusters that  share  common  characteristics.

Each  cluster  can  be  represented by  a  single  component  or  variable.

A  snippet  of  the  cluster  summary   from  JMP  is  shown,

which  indicates  that  85 of  the  variation

is  explained  by  clustering.

Cluster  12  has  49  members,

and  V 232  is  the  most representative  of  that  cluster.

The  variables  that  are  identified  as  the  most  representative  ones

are  then  used  in  the  next  method   of  stepwise  regression.

Stepwise  regression  is  used

on  the  identified  clustered  variables to  select  the  most  important  one

to  us  in  the  model,

and  further  reduces   the  number  of  variables.

For  this,  the  forward  direction

and  minimum  corrected  AIC   stopping  rule  is  used.

The  direction  controls  how  variables enter  and  leave  the  model.

The  forward  direction  means  that  terms  are  entered  into  the  model

that  have  the  smallest  p-value.

The  stopping  rule  is  used   to  determine  which  model  is  selected.

The  corrected   AIC  is  based  on   negative  two,  law  of  likelihood,

and  the  model  with  the  smallest corrected  AIC  is  a  preferred  model.

From  this,  51  variables are  entered  into  the  model

of  the  99  available  variables

from  the  variable clustering  step.

At  this  point,   we  have  reduced  the  number

of  variables  from  767  to  51

using  variable  clustering  and  stepwise  regression.

The  final  method  is  to  fit a  generalized  regression  model.

For  this,  the  law  of  normal  distribution

is  used  with  an  adaptive  lasso   estimation  method.

For  this,  the  long  normal  distribution  is  used

with  an  adaptive  lasso estimation  method.

The  law  of  normal  distribution  is  the  best

fit  for  the  response,  so  is  chosen to  use  in  the  regression  model.

The  adaptive lasso  estimation  method

is  a  penalized  regression  technique

which  shrinks  the  size  of  the  regression  coefficient

and  reduces  the  variance  in  the  estimate.

This  helps  to  improve   predictive  ability  of  the  model.

Also  the  data set  was  split into  a  training  and  validation  set.

The  training  set  has  744  observations, and  the  validation  set  has  246.

From  this,  the  resulting  model  produces  a  .81  generalized   R-square

for  the  training  set   and  .8  for  the  validation  set.

These   R-squares  are  acceptable  for  our process  that  we  will  now  evaluate

the  accuracy  and  predictability of  the  resulting  model.

Now  that  we  have  a  model, we  need  to  review  its  accuracy

and  predictability  to  see  if  it  would be  suitable  to  use  in  production.

In  doing  this,

a  graph  is  produced  that  compares a  predicted  quality  measurement

for  a  specific  part  to  the  actual quality  measurement.

In  the  graph  the xx shows  the  predicted  value,

and  the  yx  shows  the  actual.

Also,  the  quality  measurement  is  bucketed  into  three  groups

based  on  its  value, which  is  shown

by  the  three  colors  on  the  graph.

In  general,  the  model  predicts the  quality  measurement well.

It  does  appear that  the  model  may  fit  better

in  the  lower  product  quality  range

than  the  upper,  which  may  be  due to  more  observations  in  the  lower  range.

As  mentioned,   the  quality  measurement

was  bucketed  into  three  different categories  based  on  its  value.

This  was  also  done   for  the  predictive  quality  measurement.

For  each  observation, if  the  quality  measurement  category

is  the  same  as  a   predicted  measurement  category,

it  is  assigned  to  one.

If  not,  it  is  assigned  to  zero.

For  both  the  training  and  validation  sets, the  average  of  these  ones  and  zeros

is  calculated  and  is  used as  the  accuracy to  measure.

We  see  that  training  set  has  an  accuracy  of  87.5%

and  the  validation  set  has  an  accuracy  of  84%.

For  the  model  to  be  moved  to  the  testing  phase,

accuracy  must  be  above  a  certain  limit,

and  both  of  these  accuracy  values  are.

This  will  allow  us  to  move to  the  testing  phase  of  the  project.

In  addition,  we  look  at  the  confusion  matrix

to  visualize  the  accuracy  of  the  model

by  comparing  the  actual to  the  predicted  categories.

Ideally,  the  off  diagonals  of  each  matrix should  be  all  zeros,

with  the  diagonal  from  top  left   to  bottom  right  containing  all  the  counts.

The  matrices  show  on  the  poster that  the  higher  counts  are  along

that  diagonal  with  lower  numbers  in  the  off  diagonal,

but  discrepancies  still  exist among  the  three  categories.

For  example,  in  the  training  set, there  are  29  instances  where  the  actual

quality  measurement  of  the  three was  predicted  as  a  two.

In  the  same  case   for  the  validation  set,  there  are  12.

The  confusion  matrix  helps  to  understand where  these  discrepancies  are

so  further  investigations  can  be  done and  improvements  made.

Overall  though,  the  model  has  an  accuracy above  that  requires  limit,

where  our  next  step  would  be the  testing  and  implementation  phases.

Now  that  our  model   is  through  the  development  phase,

it's  time  to  test  it  in  live  situations.

For  this,  the  model  is  used  under  engineering  control

to  determine  how  well  it  predicts  the  quality  measurement

in  small,  controlled  experiment.

This  is  done  by  the  engineering  team

with  support  from  the  project  team  when  necessary.

Once  the  engineering  team  is  satisfied  with  this  testing,

the  model  is  fully  implemented into  production  and  monitored  over  time.

In  conclusion,  this  model  development  process

has  allowed  us  to  build

predictive  models  for  the  production  process.

The  methods  of  variable  clustering,

stepwise  variable  selection  in  generalized  regression

were  the  most  appropriate and  best  students  to  use

for  this  application.

With  further  research  and  investigation,

other  methods  could  be  potentially  applied

to  improve  model  performance  even  more.

From  a  production  standpoint, the  benefit  of  this  model

is  that  the  operations  team  will  receive instant  feedback  on  how  a  part

or  group  of  parts  will  perform,   and  can  ingest  the  tune

and  tune  the  process  in  real  time.

We  can  also  deem  the  product acceptable  or  not.

If  rejected,  the  product  is  disposed  of   and  will  not  continue  through  the  process,

which  over  time reduces  production  costs.

Lastly,  I'd  like  to  give   a  huge  special  thank  you

to   Zvouno and  Chova and  the  entire  project  team

at  Corning Incorporated.

Thank  you  for  joining  and  listening to  my  poster  presentation.