Choose Language Hide Translation Bar

Heroes or Zeros: A Product Distribution Analysis Using JMP - (2023-US-PO-1478)

Quality is a top concern for most manufacturers. Within the space of an established sampling mechanism, it is vital to be able to tell how likely a set of good samples (hero) actually represents that the entire batch/crate is good. In this presentation, we provide a distribution analysis strategy to assist in answering this question through such methods as modeling, simulation, probability analysis, and data visualization. We also demonstrate how to accomplish this analysis and develop an end-to-end application using the JMP script and user interface. The strategy is evaluated on a real-world induced data set of product samples. It provides a valuable strategy and tool for evaluating the current quality of products and decision making so that the process can be improved.

 

 

Hello  everyone.

Today  my  topic  is,  Heros  or  Zeros: Product  Distribution  Analysis  Using  JMP.

First,  a  little  bit  background.

An  organization with  an  established  process

may  decide  to  implement  process  control

and  process  discipline in  their  organization.

For  example,  if  you  have  a  product

moving  from  the  development  stage to  the  mass  production  stage,

at  this  conjunction, one  of  the  problems  that  can  happen

in  that  we  may  have process  variation  issue.

The  variation  can  be  too  large  or  the variation  doesn't  meet  the  expectation.

The  variation  here  can  be the  variation  of  mean,

the  variation  of  standard  deviation,  etc.

We  will  want  to  find  out what  the  root  cause  is

for  such  variation  problem and  try  to  fix  that.

But  before  that, we  will  need  to  figure  out

what  type  of  variation  we  are  facing

because  the  type  of  variation will  dictate  what  kind of action

and  investigation  strategy  we  should  take.

The  demonstration  here  today

will  investigate  these through  an  explanatory  analysis.

We  use  standard  deviation as  statistics  of  interest  here.

The  issue  is  we  have  a  process

that  has  a  high overall   standard deviation,

but  we  can  also  observe  some  of  the batches  that  has  lower  standard  deviation.

We  call  these  batches,  hero  batches.

We  will  want  to  find  out  what  caused such  high  overall  standard  deviation,

but  before  that,  we  need  to  figure  out what  process  variation  we  are  facing,

what  kind of process  variation can  give  us  what  we  observed.

In  general,  there  will  be two  types  of  situation  here.

One  is  that  we  have a  completely  random  process

and  the  variation  is  systemic, as  we  can  see  here.

Although  the  process  is  random,

but  depends  how  we  batch  it and  how  we  sample  it.

Some  of  the  batches  may  have  lower sample  standard  deviation  than  the  others.

Another  situation  is  that our  process  is  not  random.

As  we  can  see  here, this  process  goes  up  and  down.

It  has  some  mean  shift.

It's  not  a  random  process, but  depends  on  how  we  batch  it.

Some  of  the  batches  that  reside in  the  stable  period

will  have  relatively  lower sample  standard  deviation

compared  to  some  of  the  batches that  reside  in  the  unstable  period

that  might  have  larger  standard  deviation.

We  can  also  define  a  threshold such  as  here,  point  A,  standard  deviation.

We  compare  this  threshold to  the  batch  standard  deviation.

It  will  tell  us  how  many of  the  batches  satisfy  the  criteria.

With  these  two  scenarios  in  mind,

we  can  formulate a  statistical  hypothesis  test

to  test  what  process variation  we  are  dealing  with.

We  can  assume  our  process  is  random,

then  how  likely  we  will observe  what  we  observed?

A  more  detailed  statement  is  like  this,

assuming  batches with  low  standard  deviation

are  just  due  to  sampling  lack

and  the  historical  data is  representative  of  the  population,

then  the  simulated  batches generated  from  the  same  distribution

should  have  a  passing  rate that  is  statistically  indistinguishable

than  the  actual  passing  rate of  the  historical  data.

On  the  right-hand  side, you  can  see  this  wheel.

This  is  the  procedure  we  went through  to  make  this  testing  happen.

First,  we  will  need  to  define  a  threshold.

Through  this  threshold, we  can  calculate  the  passing  rate.

We  compare  the  batches

in  the  historical  data through  this  threshold

to  get  the  percentage  of  the historical  batches  that  are  good  batches.

Because  we  also  assume that  our  process  is  random,

we  can  fit  the  historical  data to  several  distribution

and  then pick  the  best  fitted  one.

Using  this  fitted  distribution, we  can  generate  a  set  of  K  samples.

K  here  is  the  same as  the  number  of  samples

in  each  batches  in  the  historical  data.

We  repeat  this  procedure  N  times.

N here  is  the  same  as  the  number of  batches  in  the  historical  data.

For  each  simulated  batches,

we  can  then  calculate their  sample  standard  deviation.

Compare  this  sample  standard  deviation

to  the  threshold  we  defined  before, it  will  give  us  a  set  of  binomial  data.

With  this  binomial  data and  the  passing  rate  we  already  have,

we  can  perform  a  one-sample  proportion test  to  test  our  hypothesis.

Using  JMP,  we  are  able  to  integrate this  entire  procedure  into  an  application.

Here,  I  will  do  a  quick  demonstration to  show  you  how  this  application  works.

This  application  can  import any  of  the  data  file

with  a  value  column  and  also  index  column that  indicates  the  batch  index.

With  a  click  of  button, it  will  automatically  fit  our  data

to  several  distribution and  pick  the  best  one.

Right  now,  the  best  fitted  one is  a  normal  distribution.

We  can  then  set  up the  number  of  simulated  data  sets  we  want

and  also  the  size  of  the  set and  also  the  threshold.

When  we  click,

it  will  perform  the  hypothesis  testing I  mentioned  before.

It  also  shows  the  percentage of  historical  batches  that  are  good

and  also  the  percentage  of  the simulated  batches  that  are  good.

At  the  last,  it  will  show  you a  visualization  of  a  histogram

which  indicates  the  proportion of the  simulated  batches  that  are  good.

Now,  we  go  back  to  the  testing, the  hypothesis  testing.

The  data  we  have  here  shows we  reject  the  null  hypothesis.

We  check  the  P  value,

we  reject  the  null  hypothesis with  95%  confidence.

The  95%  confidence is  the  default  setting  here.

This  conclusion  suggests the  process  is  not  random

and  the  good  batches  do  exist in  the  stable  period  of  the  process.

This  conclusion  can  lead  to several  action  items.

For  example,  we  can  investigate

the  process  variable, the  process  parameter

between  the  stable  period and  the  unstable  period

and  see  what  changed.

Of  course, we  can  also  get  a  different  testing  result

where  we  cannot  reject the  null hypothesis.

These  suggest  our  process  might  be  random.

We  might  have  systemic  variation

and  these  will  lead  to  completely different  investigation  and  action  method.

For  example,  the  worst-case  scenario, in  order  to  reduce  the  systemic  variation,

we  might  need  to  completely  change the  manufacturing  environment.

With  this,  I  conclude my  today's  presentation.

I  also  want  to  thank  John  Daffin, who  is  a  colleague  of  mine.

He  brought  up  this  interesting  question to  my  attention  during  a  project  meeting.

I  also  want  to  thank  you  today for  hearing  my  presentation.

I'm  very  appreciative  of  it.