BookmarkSubscribe
Choose Language Hide Translation Bar
Community Trekker

## how do I create a subset that is not a SRS but follows a given distn (specify the mean and SD)

I have a large dataset and I need to subsample to form two groups. One is an SRS (that is easy). The other is a purposeful sample that will match a given distribution. I have the data for the distn I am trying to match, and know the mean and SD. How can I do this?

4 REPLIES 4
Community Manager

## Re: how do I create a subset that is not a SRS but follows a given distn (specify the mean and SD)

I can't think of a way to do this without scripting it. Even then, I don't know what strategy or algorithm you might use. Do you have any references or examples of how this might be accomplished? With that the Community might be able to provide some direction on some JSL.

-Jeff
Community Trekker

## Re: how do I create a subset that is not a SRS but follows a given distn (specify the mean and SD)

I think I figured it out.

Thank you!

Community Manager

## Re: how do I create a subset that is not a SRS but follows a given distn (specify the mean and SD)

I'm glad to hear it Sarah.

Can you share what method you ended up with?

-Jeff

-Jeff
Community Trekker

## Re: how do I create a subset that is not a SRS but follows a given distn (specify the mean and SD)

A is approx bimodal; this is the group I want to match. B is a much larger (10x larger) population, strong left skew  The mean of A << the mean of B. Trying to get a subset of B to match A.

Get a freq table for group A.

Get a histogram of group B, match the bin size to group A.

Take a SRS from the first bin in group B of size n that matches group A, to get the same n in the sample of B as in A.

Repeat for each bin across the histogram in order to build the complete subset of B.

Used a two-sample t to make sure the means weren't too different between groups.

This took a while to build and I had to be careful not to accidentally select the wrong rows. But it worked.