Quality is a top concern for most manufacturers. Within the space of an established sampling mechanism, it is vital to be able to tell how likely a set of good samples (hero) actually represents that the entire batch/crate is good. In this presentation, we provide a distribution analysis strategy to assist in answering this question through such methods as modeling, simulation, probability analysis, and data visualization. We also demonstrate how to accomplish this analysis and develop an end-to-end application using the JMP script and user interface. The strategy is evaluated on a real-world induced data set of product samples. It provides a valuable strategy and tool for evaluating the current quality of products and decision making so that the process can be improved.

Hello everyone.

Today my topic is, Heros or Zeros: A Product Distribution Analysis Using JMP.

First, a little bit background.

An organization with an established process

may decide to implement process control

and process discipline in their organization.

For example, if you have a product

moving from the development stage to the mass production stage,

at this conjunction, one of the problems that can happen

in that we may have process variation issue.

The variation can be too large or the variation doesn't meet the expectation.

The variation here can be the variation of mean,

the variation of standard deviation, etc.

We will want to find out what the root cause is

for such variation problem and try to fix that.

But before that, we will need to figure out

what type of variation we are facing

because the type of variation will dictate what kind of action

and investigation strategy we should take.

The demonstration here today

will investigate these through an explanatory analysis.

We use standard deviation as statistics of interest here.

The issue is we have a process

that has a high overall standard deviation,

but we can also observe some of the batches that has lower standard deviation.

We call these batches, hero batches.

We will want to find out what caused such high overall standard deviation,

but before that, we need to figure out what process variation we are facing,

what kind of process variation can give us what we observed.

In general, there will be two types of situation here.

One is that we have a completely random process

and the variation is systemic, as we can see here.

Although the process is random,

but depends how we batch it and how we sample it.

Some of the batches may have lower sample standard deviation than the others.

Another situation is that our process is not random.

As we can see here, this process goes up and down.

It has some mean shift.

It's not a random process, but depends on how we batch it.

Some of the batches that reside in the stable period

will have relatively lower sample standard deviation

compared to some of the batches that reside in the unstable period

that might have larger standard deviation.

We can also define a threshold such as here, point A, standard deviation.

We compare this threshold to the batch standard deviation.

It will tell us how many of the batches satisfy the criteria.

With these two scenarios in mind,

we can formulate a statistical hypothesis test

to test what process variation we are dealing with.

We can assume our process is random,

then how likely we will observe what we observed?

A more detailed statement is like this,

assuming batches with low standard deviation

are just due to sampling lack

and the historical data is representative of the population,

then the simulated batches generated from the same distribution

should have a passing rate that is statistically indistinguishable

than the actual passing rate of the historical data.

On the right-hand side, you can see this wheel.

This is the procedure we went through to make this testing happen.

First, we will need to define a threshold.

Through this threshold, we can calculate the passing rate.

We compare the batches

in the historical data through this threshold

to get the percentage of the historical batches that are good batches.

Because we also assume that our process is random,

we can fit the historical data to several distribution

and then pick the best fitted one.

Using this fitted distribution, we can generate a set of K samples.

K here is the same as the number of samples

in each batches in the historical data.

We repeat this procedure N times.

N here is the same as the number of batches in the historical data.

For each simulated batches,

we can then calculate their sample standard deviation.

Compare this sample standard deviation

to the threshold we defined before, it will give us a set of binomial data.

With this binomial data and the passing rate we already have,

we can perform a one-sample proportion test to test our hypothesis.

Using JMP, we are able to integrate this entire procedure into an application.

Here, I will do a quick demonstration to show you how this application works.

This application can import any of the data file

with a value column and also index column that indicates the batch index.

With a click of button, it will automatically fit our data

to several distribution and pick the best one.

Right now, the best fitted one is a normal distribution.

We can then set up the number of simulated data sets we want

and also the size of the set and also the threshold.

When we click,

it will perform the hypothesis testing I mentioned before.

It also shows the percentage of historical batches that are good

and also the percentage of the simulated batches that are good.

At the last, it will show you a visualization of a histogram

which indicates the proportion of the simulated batches that are good.

Now, we go back to the testing, the hypothesis testing.

The data we have here shows we reject the null hypothesis.

We check the P value,

we reject the null hypothesis with 95% confidence.

The 95% confidence is the default setting here.

This conclusion suggests the process is not random

and the good batches do exist in the stable period of the process.

This conclusion can lead to several action items.

For example, we can investigate

the process variable, the process parameter

between the stable period and the unstable period

and see what changed.

Of course, we can also get a different testing result

where we cannot reject the null hypothesis.

These suggest our process might be random.

We might have systemic variation

and these will lead to completely different investigation and action method.

For example, the worst-case scenario, in order to reduce the systemic variation,

we might need to completely change the manufacturing environment.

With this, I conclude my today's presentation.

I also want to thank John Daffin, who is a colleague of mine.

He brought up this interesting question to my attention during a project meeting.

I also want to thank you today for hearing my presentation.

I'm very appreciative of it.

Published on ‎03-25-2024 04:53 PM by | Updated on ‎07-07-2025 12:12 PM

Quality is a top concern for most manufacturers. Within the space of an established sampling mechanism, it is vital to be able to tell how likely a set of good samples (hero) actually represents that the entire batch/crate is good. In this presentation, we provide a distribution analysis strategy to assist in answering this question through such methods as modeling, simulation, probability analysis, and data visualization. We also demonstrate how to accomplish this analysis and develop an end-to-end application using the JMP script and user interface. The strategy is evaluated on a real-world induced data set of product samples. It provides a valuable strategy and tool for evaluating the current quality of products and decision making so that the process can be improved.

Hello everyone.

Today my topic is, Heros or Zeros: A Product Distribution Analysis Using JMP.

First, a little bit background.

An organization with an established process

may decide to implement process control

and process discipline in their organization.

For example, if you have a product

moving from the development stage to the mass production stage,

at this conjunction, one of the problems that can happen

in that we may have process variation issue.

The variation can be too large or the variation doesn't meet the expectation.

The variation here can be the variation of mean,

the variation of standard deviation, etc.

We will want to find out what the root cause is

for such variation problem and try to fix that.

But before that, we will need to figure out

what type of variation we are facing

because the type of variation will dictate what kind of action

and investigation strategy we should take.

The demonstration here today

will investigate these through an explanatory analysis.

We use standard deviation as statistics of interest here.

The issue is we have a process

that has a high overall standard deviation,

but we can also observe some of the batches that has lower standard deviation.

We call these batches, hero batches.

We will want to find out what caused such high overall standard deviation,

but before that, we need to figure out what process variation we are facing,

what kind of process variation can give us what we observed.

In general, there will be two types of situation here.

One is that we have a completely random process

and the variation is systemic, as we can see here.

Although the process is random,

but depends how we batch it and how we sample it.

Some of the batches may have lower sample standard deviation than the others.

Another situation is that our process is not random.

As we can see here, this process goes up and down.

It has some mean shift.

It's not a random process, but depends on how we batch it.

Some of the batches that reside in the stable period

will have relatively lower sample standard deviation

compared to some of the batches that reside in the unstable period

that might have larger standard deviation.

We can also define a threshold such as here, point A, standard deviation.

We compare this threshold to the batch standard deviation.

It will tell us how many of the batches satisfy the criteria.

With these two scenarios in mind,

we can formulate a statistical hypothesis test

to test what process variation we are dealing with.

We can assume our process is random,

then how likely we will observe what we observed?

A more detailed statement is like this,

assuming batches with low standard deviation

are just due to sampling lack

and the historical data is representative of the population,

then the simulated batches generated from the same distribution

should have a passing rate that is statistically indistinguishable

than the actual passing rate of the historical data.

On the right-hand side, you can see this wheel.

This is the procedure we went through to make this testing happen.

First, we will need to define a threshold.

Through this threshold, we can calculate the passing rate.

We compare the batches

in the historical data through this threshold

to get the percentage of the historical batches that are good batches.

Because we also assume that our process is random,

we can fit the historical data to several distribution

and then pick the best fitted one.

Using this fitted distribution, we can generate a set of K samples.

K here is the same as the number of samples

in each batches in the historical data.

We repeat this procedure N times.

N here is the same as the number of batches in the historical data.

For each simulated batches,

we can then calculate their sample standard deviation.

Compare this sample standard deviation

to the threshold we defined before, it will give us a set of binomial data.

With this binomial data and the passing rate we already have,

we can perform a one-sample proportion test to test our hypothesis.

Using JMP, we are able to integrate this entire procedure into an application.

Here, I will do a quick demonstration to show you how this application works.

This application can import any of the data file

with a value column and also index column that indicates the batch index.

With a click of button, it will automatically fit our data

to several distribution and pick the best one.

Right now, the best fitted one is a normal distribution.

We can then set up the number of simulated data sets we want

and also the size of the set and also the threshold.

When we click,

it will perform the hypothesis testing I mentioned before.

It also shows the percentage of historical batches that are good

and also the percentage of the simulated batches that are good.

At the last, it will show you a visualization of a histogram

which indicates the proportion of the simulated batches that are good.

Now, we go back to the testing, the hypothesis testing.

The data we have here shows we reject the null hypothesis.

We check the P value,

we reject the null hypothesis with 95% confidence.

The 95% confidence is the default setting here.

This conclusion suggests the process is not random

and the good batches do exist in the stable period of the process.

This conclusion can lead to several action items.

For example, we can investigate

the process variable, the process parameter

between the stable period and the unstable period

and see what changed.

Of course, we can also get a different testing result

where we cannot reject the null hypothesis.

These suggest our process might be random.

We might have systemic variation

and these will lead to completely different investigation and action method.

For example, the worst-case scenario, in order to reduce the systemic variation,

we might need to completely change the manufacturing environment.

With this, I conclude my today's presentation.

I also want to thank John Daffin, who is a colleague of mine.

He brought up this interesting question to my attention during a project meeting.

I also want to thank you today for hearing my presentation.

I'm very appreciative of it.



0 Kudos