Today my topic is, Heros or Zeros: A Product Distribution Analysis Using JMP.
First, a little bit background.
An organization with an established process
may decide to implement process control
and process discipline in their organization.
For example, if you have a product
moving from the development stage to the mass production stage,
at this conjunction, one of the problems that can happen
in that we may have process variation issue.
The variation can be too large or the variation doesn't meet the expectation.
The variation here can be the variation of mean,
the variation of standard deviation, etc.
We will want to find out what the root cause is
for such variation problem and try to fix that.
But before that, we will need to figure out
what type of variation we are facing
because the type of variation will dictate what kind of action
and investigation strategy we should take.
The demonstration here today
will investigate these through an explanatory analysis.
We use standard deviation as statistics of interest here.
The issue is we have a process
that has a high overall standard deviation,
but we can also observe some of the batches that has lower standard deviation.
We call these batches, hero batches.
We will want to find out what caused such high overall standard deviation,
but before that, we need to figure out what process variation we are facing,
what kind of process variation can give us what we observed.
In general, there will be two types of situation here.
One is that we have a completely random process
and the variation is systemic, as we can see here.
Although the process is random,
but depends how we batch it and how we sample it.
Some of the batches may have lower sample standard deviation than the others.
Another situation is that our process is not random.
As we can see here, this process goes up and down.
It has some mean shift.
It's not a random process, but depends on how we batch it.
Some of the batches that reside in the stable period
will have relatively lower sample standard deviation
compared to some of the batches that reside in the unstable period
that might have larger standard deviation.
We can also define a threshold such as here, point A, standard deviation.
We compare this threshold to the batch standard deviation.
It will tell us how many of the batches satisfy the criteria.
With these two scenarios in mind,
we can formulate a statistical hypothesis test
to test what process variation we are dealing with.
We can assume our process is random,
then how likely we will observe what we observed?
A more detailed statement is like this,
assuming batches with low standard deviation
are just due to sampling lack
and the historical data is representative of the population,
then the simulated batches generated from the same distribution
should have a passing rate that is statistically indistinguishable
than the actual passing rate of the historical data.
On the right-hand side, you can see this wheel.
This is the procedure we went through to make this testing happen.
First, we will need to define a threshold.
Through this threshold, we can calculate the passing rate.
We compare the batches
in the historical data through this threshold
to get the percentage of the historical batches that are good batches.
Because we also assume that our process is random,
we can fit the historical data to several distribution
and then pick the best fitted one.
Using this fitted distribution, we can generate a set of K samples.
K here is the same as the number of samples
in each batches in the historical data.
We repeat this procedure N times.
N here is the same as the number of batches in the historical data.
For each simulated batches,
we can then calculate their sample standard deviation.
Compare this sample standard deviation
to the threshold we defined before, it will give us a set of binomial data.
With this binomial data and the passing rate we already have,
we can perform a one-sample proportion test to test our hypothesis.
Using JMP, we are able to integrate this entire procedure into an application.
Here, I will do a quick demonstration to show you how this application works.
This application can import any of the data file
with a value column and also index column that indicates the batch index.
With a click of button, it will automatically fit our data
to several distribution and pick the best one.
Right now, the best fitted one is a normal distribution.
We can then set up the number of simulated data sets we want
and also the size of the set and also the threshold.
When we click,
it will perform the hypothesis testing I mentioned before.
It also shows the percentage of historical batches that are good
and also the percentage of the simulated batches that are good.
At the last, it will show you a visualization of a histogram
which indicates the proportion of the simulated batches that are good.
Now, we go back to the testing, the hypothesis testing.
The data we have here shows we reject the null hypothesis.
We check the P value,
we reject the null hypothesis with 95% confidence.
The 95% confidence is the default setting here.
This conclusion suggests the process is not random
and the good batches do exist in the stable period of the process.
This conclusion can lead to several action items.
For example, we can investigate
the process variable, the process parameter
between the stable period and the unstable period
and see what changed.
Of course, we can also get a different testing result
where we cannot reject the null hypothesis.
These suggest our process might be random.
We might have systemic variation
and these will lead to completely different investigation and action method.
For example, the worst-case scenario, in order to reduce the systemic variation,
we might need to completely change the manufacturing environment.
With this, I conclude my today's presentation.
I also want to thank John Daffin, who is a colleague of mine.
He brought up this interesting question to my attention during a project meeting.
I also want to thank you today for hearing my presentation.
I'm very appreciative of it.