cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMPĀ® Marketplace
Choose Language Hide Translation Bar
RaviK
Level II

Statistical Significance

Hi, 

I have the following information only: 

I have Failure Rates (FR) from two sets of data. The experiment set has 4 fails out of 2500 (FR:0.16%) samples and the control set has 34 fails out of 35000 samples (FR:0.097%). Due to such a big difference in the sample size, is there a way JMP can help to determine if these two FRs are statistocally similar or different? 

Thanks.

Ravi

6 REPLIES 6
julian
Community Manager Community Manager

Re: Statistical Significance

Hi @RaviK,

There are a few ways you could approach this, but perhaps the most straightforward is a Chi-Square Test of Independence, a type of contingency analysis. This test is appropriate because what you have are two categorical variables, a grouping variable, and a categorical outcome (success or fail), and you are interested in whether the observed proportions of the categorical outcome for your two groups provide evidence that the process generating the outcomes differs between the groups. You can obtain this test using Analyze > Fit Y by X. But first, you'll first need your data entered in a particular way (also attached): Screen Shot 2019-07-09 at 8.45.34 AM.png

What I've done is taken the numbers you provided me and made columns for Group, Outcome, and N, the number of observations in each. For the number of successes, I simply took the total you gave minus the failures. Next, we run Analyze > Fit Y by X:Screen Shot 2019-07-09 at 8.45.44 AM.png

Here I've cast Outcome in the Y role, Group to the X, and N as the Freq, or frequency of occurrence. When we hit OK, we get the output below (I've hidden the mosaic plot since your observed frequencies of fail are so low that the plot is not helpful). Screen Shot 2019-07-09 at 8.48.55 AM.png

Our p-value of interest is the Likelihood ratio, or Pearson (classical Chi-Square test of independence), both of which are around p = ~0.30, indicating that if there isn't any true difference in the experimental and control group processes, a difference in the proportion of failures you observed in these sample data (or a difference more extreme) would occur about 30% of the time when taking samples of the sizes you had. In other words, not very convincing evidence that there is a true difference in these sets.  Given what appears to be a large difference in the proportion of failures this may be surprising; but, given the low failure count overall, it's relatively easy to observe differences in the proportions of this magnitude or greater simply by chance (which is what this statistical significance test is telling us directly).

 

I hope this helps!

 

@julian 

ian_jmp
Level X

Re: Statistical Significance

Don't know if it helps understanding, but here's some JSL that gets the pValue directly through simulation:

NamesDefaultToHere(1);

n1 = 2500;		// Size of sample one
nf1 = 4;		// Number of failures in sample one
n2 = 35000;	// Size of sample two
nf2 = 34;		// Number of failures in sample two

n = n1 + n2;
nf = nf1 + nf2;

nSim = 10000;					// Number of simulations
np1 = J(nSim, 1, .);			// Vector to hold the number of passes counted in sample one

// Randomly allocate nf failures to n units
for(s=1, s<=nSim, s++,
	result$ = J(n, 1, 1);					// All n units pass initially
	result$[randomIndex(n, nf)] = 0;		// Simulate nf failures at random
	np1$ = VSum(result$[1::n1]);			// Count the nummber of passes in sample one
	np1[s] = np1$;							// Store this result
);

// Now evaluate how extreme the observed number of passes, (n1 - nf1), is in relation to the
// reference distribution constructed under the null hypothesis of random allocation of
// failures to samples
np1Observed = n1 - nf1;
pValue = NRow(Loc(np1 <= np1Observed)) / nSim;
Print("Estimated pValue is "||Char(Round(pValue, 3)));
txnelson
Super User

Re: Statistical Significance

Nicely done
Jim
RaviK
Level II

Re: Statistical Significance

Thanks Ian. 

Ravi

RaviK
Level II

Re: Statistical Significance

Thanks Julian. It was useful indeed. 

Could I also use "Hypothesis Test for two Proportions"?

Ravi

julian
Community Manager Community Manager

Re: Statistical Significance

Absolutelyā€” those calculators are useful when working from summarized statistics. That said, we intended them mostly for classroom use, and youā€™ll usually get much more value from using the native JMP platforms.