cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
Twoolman
Level II

Does k means sample size estimate require a normal distribution?

Hi everyone.

I'm using the k means sample size estimator under DOE | Sample Size and Power.

Do the means that I submit using this function need to be from a normally distributed dataset in order to be used effective? I don't think that is the case especially since k means in JMP only lets me submit a maximum of 10 groups, and some of my datasets have more than that (in this situation I randomly submit groups from that dataset to the k means function after calculating that sets standard dev).

Thanks in advance.
10 REPLIES 10

Re: Does k means sample size estimate require a normal distribution?

Hi, all. Sorry to jump into the conversation.

 

This is what I am hearing...

 

Twoolman has data from insurance companies that represent a number of surgeons, but it may not represent all surgies a surgeon may conduct. Only a subset of surgeries are provided by an insurance company based on the surgeries that are covered by the insurer.

 

So it sounds as if the number of observations is fixed, and it sounds as if Twoolman is asking whether the k-means power calculation can be used as a way to exclude certain physicians based on having too few observations.

 

In general, I probably would not do this. The power calculations are for a planned experiment for a particular delta to observe. As this delta gets smaller (or the variability increases) we would need more sample per group, fewer in the reverse scenario. Prospectively, we would try to get as many surgeries as possible to meet the criteria of the calculation. However, given that the "experiment" is already completed, we are taking the observations as they are. I wouldn't exclude surgeons with small # of surgeries just because they may be quite different than other surgeons.

 

However, one has to be careful here. Comparisons (assuming patients somehow arrive randomly to the set of surgeons analyzed) may identify statistical differences between the surgeon, but the question then becomes... are these differences clinically meaningful? This is often a challenge for many endpoints - we may identify statistical differences between groups but have no idea whether the differences we have found are of any practical importance.

 

Mark does point out that these data are observational, so there are likely a number of additional factors that would need to be controlled for that a comparison between the surgeons would not account for. This could involve covariates in a model, or the use of propensity scores. I am not sure if JMP has any features for propensity scores, but SAS has two new procedures described here.