If I know my parts have a 30% failure rate, how many parts do I need to inspect to be 95% confident I will find at least one failure? Assuming I made a process improvement and want to see if my failure rate decreased from 30% to 10% w/ 95% confidence, then how many parts do I need to inspect? I think JMP can give me these answers through the DOE>Sample Size Explorer>Power wizard, but I'm not exactly sure how to use this script to get the answers to these pretty basic sample size questions.
Understood. What about being confident at the 90, 95, or 99% level? Can the power explore be used to estimate sample size required to detect failure at this confidence level? It is possible to estimate mean and standard deviation based on our internal measurement and there is minimal overlap in the distributions by simulation.
Hi @chris_dennis : Hmm, I don't think we understand each other.
The following examples are based on the hypergeometric distribution. https://en.wikipedia.org/wiki/Hypergeometric_distribution
Here are some examples, based on a batch size of 250 and 95% confidence.
1. If you take a sample size of 50, and none are defective, then there is 95% certainty that no more than 13 (of 250) are defective.
2. If you take a sample size of 100, and none are defective, then there is 95% certainty that no more than 5 (of 250) are defective.
3. If you take a sample size of 150, and none are defective, then there is 95% certainty that no more than 3 (of 250) are defective.
4. If you take a sample size of 200, and none are defective, then there is 95% certainty that no more than 1 (of 250) is defective.
Here are some examples, based on a batch size of 250 and 99% confidence.
5. If you take a sample size of 50, and none are defective, then there is 99% certainty that no more than 19 (of 250) are defective.
6. If you take a sample size of 100, and none are defective, then there is 99% certainty that no more than 8 (of 250) are defective.
7. If you take a sample size of 150, and none are defective, then there is 99% certainty that no more than 4 (of 250) are defective.
8. If you take a sample size of 200, and none are defective, then there is 99% certainty that no more than 2 (of 250) are defective.
Here are some examples, based on a batch size of 250 and 90% confidence.
9. If you take a sample size of 50, and none are defective, then there is 90% certainty that no more than 10 (of 250) are defective.
10. If you take a sample size of 100, and none are defective, then there is 90% certainty that no more than 4 (of 250) are defective.
11. If you take a sample size of 150, and none are defective, then there is 90% certainty that no more than 2 (of 250) are defective.
12. If you take a sample size of 200, and none are defective, then there is 90% certainty that no more than 1 (of 250) is defective.
i.e., you can't prove, with any level of confidence, that 0 are defective by sampling less than all 250.
Edit: I wrote my own code to make these calculations. However, you can make these kinds of calculations using the tool here.
Thanks, I think I get it. Sampling <100% of the population won't tell me if there is or is not defective material in the population with any confidence.
100% sampling is not possible because the sample measurement is destructive, the cost will be infinite as usable units after measurement will equal 0.
I will need to look for a different detection method.
Hi @chris_dennis . Unfortunately, statistical (less than 100%) sampling schemes can offer no guarantee (at any confidence level) that there are 0 in the batch; this is because there is no way to know if there are some that you didn’t test that are defective. It’s sorta like I can’t prove there is no red squirrel living in my local forest; no, I haven’t seen one yet…but one might be just over that next hill!
Sometimes I think it can be more illuminating to simulate the process. If there is a constant 30% chance of a failure, the geometric distribution will give the time to the first failure. You can use a random geometric formula in a column, add a lot of rows, and look at the distribution of times to first failures. If you also create a column using a different failure probability, comparing these will tell you a lot. Not as elegant as some other solutions, but simulation is often more intuitive for me.
Hi dlehman1, thanks for your feedback and I like your approach also. Do you have a couple of screenshots of how to set this up as far as getting the rows with the ones and zeros in it based off a 30% failure rate? Thanks!
Interestingly, I thought my question above was pretty basic and I was missing something simple in JMP as I can tend to get lost in all the statistics terms(i.e. alpha, null hypothesis testing, proportions, beta, power, etc. etc), but it seems to not be as straight forward as I thought? There are many different ways to couch this question, but I think the way I stated it gives the general idea.
Attached is an example comparing the first failure at 30% and 10% probabilities, with a few graphs of the results stored as scripts.
Thanks for the sample file, dlehman1. I do understand how you got the charts and the data, but I am still a little confused about how I should use these charts to answer these two questions:
Say it takes 4hrs to inspect a part to see if it will fail. I need to find a failure so that I can do some failure analysis work, but management wants me to give them an estimate of time for how long it will take to inspect parts before I actually find a failure. Let's assume a historical failure rate of 30% and I want to be 95% confident I am giving management a good estimate of time. How many parts should I anticipate inspecting before I find a failure? Then, the same question, but with a historical 10% failure rate.
Hi @chrsmth . You could think about it this way. Prob(finding at least 1)=Prob(x>0), where x is the number of failures in a sample size of n, = 1 - Prob(x=0)^n.
1 - Prob(x=0)^n = 1-(1-p)^n = 0.95.
This implies (1-p)^n = 0.05.
This implies n = ln(0.05)/ln(1-p) where p = 0.3 or 0.1. Then round n up to the nearest integer.
n is then the sample size such that there is 95% chance of at least one of them is a failure.
The thing is, this is a question of probability, not statistics. Problems of statistics involve testing/estimating parameters based on observed data. In your case, you say you know the parameter p (0.3). So, once p is assumed to be known, the sample size question, as you state it, is all about probability. I.e., you know the distribution is binomial(n, p), where p = 0.3. So, once the distribution is known there are no statistical hypotheses to be tested. It is a matter of probability.
Now…if you want to “prove” (I’m using “prove” very loosely here) that p = 0.3, then that is a problem of statistics.
Simulation provides a link between probability and statistics. Almost any probability problem can be simulated and statistical analysis of the simulation can provide results that match (closely if enough simulations are used) the probability theoretical results. Since I have always been bad at probability theory, simulation works better for me. It is also easier to explain to people.