Subscribe Bookmark RSS Feed

how do I create a subset that is not a SRS but follows a given distn (specify the mean and SD)

squesen407

Community Trekker

Joined:

Feb 5, 2016

I have a large dataset and I need to subsample to form two groups. One is an SRS (that is easy). The other is a purposeful sample that will match a given distribution. I have the data for the distn I am trying to match, and know the mean and SD. How can I do this?

4 REPLIES
Jeff_Perkinson

Community Manager

Joined:

Jun 23, 2011

I can't think of a way to do this without scripting it. Even then, I don't know what strategy or algorithm you might use. Do you have any references or examples of how this might be accomplished? With that the Community might be able to provide some direction on some JSL.

-Jeff
squesen407

Community Trekker

Joined:

Feb 5, 2016

I think I figured it out.

Thank you!

Jeff_Perkinson

Community Manager

Joined:

Jun 23, 2011

I'm glad to hear it Sarah.

Can you share what method you ended up with?

-Jeff

-Jeff
squesen407

Community Trekker

Joined:

Feb 5, 2016

A is approx bimodal; this is the group I want to match. B is a much larger (10x larger) population, strong left skew  The mean of A << the mean of B. Trying to get a subset of B to match A.

Get a freq table for group A.

Get a histogram of group B, match the bin size to group A.

Take a SRS from the first bin in group B of size n that matches group A, to get the same n in the sample of B as in A.

Repeat for each bin across the histogram in order to build the complete subset of B.

Used a two-sample t to make sure the means weren't too different between groups.

This took a while to build and I had to be careful not to accidentally select the wrong rows. But it worked.