cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
Samira
Level II

How to Select a quota sample from a data set

Hello everyone

 

I am working on a research that implies working on a representative sample. I have already collected data, but I need to select a subset that fulfills the representation criteria that are 4: The sample should be with the following quotas to be met:
-
on gender: 50% males and 50% females.
-
on age: 1/3 from 18 to 30 years old, 1/3 from 31 to 50 years old and 1/3 over 51 years old.
-
on household income level: 1/3 from low , 1/3 from medium 7 and 1/3 from high .

 

1/5 of study population by each of the five regions of the city (North, South, Centre, East and West)

 

How can I create this sample?

I am using JMP pro 17

20 REPLIES 20
hogi
Level XII

Re: How to Select a quota sample from a data set

sounds like a textbook exercise - is there a chapter with the solution?

Samira
Level II

Re: How to Select a quota sample from a data set

No
I actually was discussing this issue with a colleague to start the data analysis of a project with this quota sampling technique.
txnelson
Super User

Re: How to Select a quota sample from a data set

This is a complex problem.  How large is your data table you are pulling data from.  What size of a sample are you pulling?  You have 90 combinations of your 4 columns.  Did you and your colleague come up with and idea on how to approach the problem?

Jim
hogi
Level XII

Re: How to Select a quota sample from a data set

And you don't want to pick the sample data by the actual share of the distribution, but by the artificial fraction?

How about female, > 51yrs, high income, north.
Should it be 1/2 * 1/3 * 1/3 * 1/5? (*)
This is very easy to compute - but maybe too strict - and not intended?
Just think of the case where there is no female, > 51yrs, high income, north in the original distribution.

On the other hand: If just the 1/2, 1/3, 1/3 and 1/5 have to be fulfilled, one could make up extreme cases with 0 sample data for female, > 51yrs, high income, north.

Samira
Level II

Re: How to Select a quota sample from a data set

Thanks for your reply

The idea is to conduct the research on a representative sample to the population living in the whole city

 

dlehman1
Level V

Re: How to Select a quota sample from a data set

This is not elegant and I'm not even sure it will work, but it might.  You can create 0,1 columns for each of your criteria and then combine them into a single 0,1 column where 1 means that all of the individual criterion columns were =1.  That will give the desired subset.  And, if you want a random selection from such rows, just use a validation column stratified by that 0,1 column.  As I said, not elegant but might work.

hogi
Level XII

Re: How to Select a quota sample from a data set

data table - ideal case:

New Table( "quota_samples",
	Add Rows( 100000 ),
	Compress File When Saved( 1 ),
	New Column( "gender",
		Character,
		Formula( Match( Floor( Random Uniform() * 2 ), 0, "M", "F" ) ),
		Compact(),
		Set Selected
	),
	New Column( "income",
		Character,
		Formula(
			Match( Floor( Random Uniform() * 3 ), 0, "low", 1, "medium", "high" )
		),
		Compact()
	),
	New Column( "region",
		Character,
		Formula(
			Match( Floor( Random Uniform() * 5 ),
				0, "N",
				1, "S",
				2, "W",
				3, "E",
				"C"
			)
		)
	),
	New Column( "age",
		Character,
		Formula(
			Match( Floor( Random Uniform() * 3 ), 0, "young", 1, "medium", "old" )
		)
	)
)
hogi
Level XII

Re: How to Select a quota sample from a data set

For such an ideal table, (if the probabilities of your data set fit to the fractions you want), you can pick random samples - random samples per variant or a specific number of samples per variant - or a combination of all 3 ...
you will always get the subgroups with the requested fraction (1/2, 1/3, 1/3 and 1/5)

hogi_1-1733519336078.png

 

// random sampling : full data set
if(not(current data table() << has column ("cum_prob")),New Column( "cum_prob",
	Formula(
		Col Rank( random uniform()) / (
		Col Number( 1 ))
	)
));

// random sampling : per variant
if(not(current data table() << has column ("cum_prob_indiv")),New Column( "cum_prob_indiv",
	Formula(
	tmp = random uniform(); // tmp =1; // **)
		Col Rank( tmp, :gender, :income, :region, :age ) / (
		Col Number( tmp, :gender, :income, :region, :age ))
	)
));

// force ratios: 1/2, 1/3, 1/3, 1/5
if(not(current data table() << has column ("rank_indiv")),
New Column( "rank_indiv",
	Formula( Col Rank( random uniform(), :gender, :income, :region, :age ) )
));

Graph Builder(
	Size( 518, 448 ),
	Show Control Panel( 0 ),
	Graph Spacing( 4 ),
	Variables( X( :gender ), X( :income ), X( :region ), X( :age ) ),
	Elements( Position( 1, 1 ), Bar( X,  Summary Statistic( "N" ) ) ),
	Elements( Position( 2, 1 ), Bar( X,  Summary Statistic( "N" ) ) ),
	Elements( Position( 3, 1 ), Bar( X,  Summary Statistic( "N" ) ) ),
	Elements( Position( 4, 1 ), Bar( X,  Summary Statistic( "N" ) ) ),
	Local Data Filter(
Title( "how many samples do you want ? " ), Add Filter( columns( :cum_prob, :cum_prob_indiv, :rank_indiv ) ) ) );

 

**) instead of using CDFs with random uniform(), one can randomizing the row order and use CDFs of "1".

hogi
Level XII

Re: How to Select a quota sample from a data set

The last option also works for less systematic tables like the one below.
The only limitation:  if there are few samples for one of the variants, there is this clear limit to the number of samples that can be selected.

It follows the simple rule:
If for one of the variants (A), there are just N samples, take those and pick the same number of random samples from the other variants. Actually, for variant A, this is NOT "sampling".
So, maybe pick just M << N random samples from each of the 90 variants.
You can take the same (JSL) logic - just adjust N to a lower value.

 

 



variants = {};
For Each( {gender}, {"F", "M"},
	For Each( {age}, {"young", "medium", "old"},
		For Each( {income}, {"low", "medium", "high"},
			For Each( {region}, {"N", "S", "E", "W", "C"},
				Insert Into( variants, Concat Items( {gender, age, income, region} ) )
			)
		)
	)
);

	Eval(
		Eval Expr(
		new table(
	"unfair",
	add rows( 100000 ), 
	

			New Column( "variant",Character,
				formula(
					variants = As constant(Expr( variants ));
					Try(
						variants[Floor( Random Normal( 45, 30 ) )],
						"F young medium C"
					);
				)
			),
			
			New column ("gender",Character, formula(Word(1,:variant))),
			New column ("age",Character, formula(Word(2,:variant))),
			New column ("income",Character, formula(Word(3,:variant))),
			New column ("region",Character, formula(Word(4,:variant))),
		)
	)
);