I am trying to calculate CpK for two different parameters. Data for both does not have a normal distribution. I followed the tips provided in the following post:
Based on this analysis Mixture of 3 normals looks the best (AICc). Then I plotted the individual probability plots.Showing below are the top 3 probability plots including "Normal".
Based on this analysis the calculated CpK (Mixture of 3 normals) was 0.65 as opposed to the one calculted with "Normal" which was 0.9. Is this approach correct?
The second paramter is even more complicated as the data is skewed mostly due to values below LOQ of the assay reported as 109.
Any help is appreciated!
Well, I certainly wouldn't trust the normal estimate, but I'm not sure I'd trust any of those. Stability is fundamental assumption for the capability indices. They don't have to be normal, but the data needs to maintain a constand mean and variance. In the other thread, Mark and Mike talked about understanding why you have multiple modes. That doesn't absolutely mean you don't have a stable process because sometimes there are perfectly reasonable reasons to have multi-moded data and they can't be "fixed." However, you really need to make sure you do your due diligence. If the assumption of stability is not met, then your capability estimates are probably not indicative of the capability you will have in the future.
That being said, if you feel confident your process is stable, then by all means base the capability calculations on the most appropriate fitted distribution. An alternative is non-parametric capability, which doesn't assume any distribution.
Thanks Cameron for the quick response! Is there a way to identify these multiple modes in JMP 14 and analyze them?
Also I forgot to mention that some of the data points highlighted below were determined to be due to sample prep error. Can these be elimintated from the process capability or sample prep or mixing errors are considered part of your process?
Also the goal is not only to calculate CpK but change the specs (tighten or broaden) depending on the results from Capability analysis.
I think you can eliminate the those samples that you know were prepped incorrectly since you have an assignable cause. Your objective is to generalize to unsampled product, and unsampled product is not prepped at all for testing, so I would say that it's not part of the process.
JMP can't really figure out which samples belong to which mode; there's no way to determine that precisely. The best you could probably do is assign each individual to the closest mode, but you are likely to get alot of them wrong. If you can look at samples that come from different parts of the distribution and see if you can figure out how they are different. We sometimes see multiple modes in our strength testing and the modes align with different failure modes.
It might also help to look at a control chart. If the different modes of your distributions correspond to periods of time along the X-axis, you may be able to find a root cause.
What could be the justification of shrinking the spec limit in the following case where data is heavily skewed?
Basically most of the data is at 109 which is the LOQ of the assay.
Again the Norm 3 mix looks the best and the new CpK calc with Norm 3 is lower at 3.1.
Is there a way to deal with the near LOQ data? Is the CpK calc this way accurate?
Thanks for all the previous replies!
I don't know what LOQ means. I'm not sure I could be of much help with these questions.
LOQ is limit of quantitation. Basically thats the lowest value that an assay can reliably report.
You have chosen Normal 3 mixtures for your distribution, but SHASH and Johnson Su are better choices based on their lower AICc scores. The lower the AICc the better.
The SHASH distribution is also known as the sinh-arcsinh distribution. Thisdistribution is similar to Johnson distributions in that it is a transformation to normality, but the SHASH distribution includes the normal distribution as a special case. This distribution can be symmetric or asymmetric.
Also, you have two outliers one is way below your LOQ and the other way above and beyond your USL. You may want to hide and exclude those points and refit your distributions to see how much those points leverage the overall fit and Cpk.