Discussions

Samira · Dec 5, 2024 12:28 AM

Hello everyone

I am working on a research that implies working on a representative sample. I have already collected data, but I need to select a subset that fulfills the representation criteria that are 4: The sample should be with the following quotas to be met:
-
on gender: 50% males and 50% females.
-
on age: 1/3 from 18 to 30 years old, 1/3 from 31 to 50 years old and 1/3 over 51 years old.
-
on household income level: 1/3 from low , 1/3 from medium 7 and 1/3 from high .

1/5 of study population by each of the five regions of the city (North, South, Centre, East and West)

How can I create this sample?

I am using JMP pro 17

hogi · Dec 5, 2024 08:59 AM

sounds like a textbook exercise - is there a chapter with the solution?

Samira · Dec 6, 2024 12:22 AM

No
I actually was discussing this issue with a colleague to start the data analysis of a project with this quota sampling technique.

txnelson · Dec 6, 2024 01:34 AM

This is a complex problem. How large is your data table you are pulling data from. What size of a sample are you pulling? You have 90 combinations of your 4 columns. Did you and your colleague come up with and idea on how to approach the problem?

Jim

hogi · Dec 6, 2024 03:25 AM

And you don't want to pick the sample data by the actual share of the distribution, but by the artificial fraction?

How about female, > 51yrs, high income, north.
Should it be 1/2 * 1/3 * 1/3 * 1/5? (*)
This is very easy to compute - but maybe too strict - and not intended?
Just think of the case where there is no female, > 51yrs, high income, north in the original distribution.

On the other hand: If just the 1/2, 1/3, 1/3 and 1/5 have to be fulfilled, one could make up extreme cases with 0 sample data for female, > 51yrs, high income, north.

Samira · Dec 7, 2024 12:41 AM

Thanks for your reply

The idea is to conduct the research on a representative sample to the population living in the whole city

dlehman1 · Dec 6, 2024 08:07 AM

This is not elegant and I'm not even sure it will work, but it might. You can create 0,1 columns for each of your criteria and then combine them into a single 0,1 column where 1 means that all of the individual criterion columns were =1. That will give the desired subset. And, if you want a random selection from such rows, just use a validation column stratified by that 0,1 column. As I said, not elegant but might work.

hogi · Dec 6, 2024 09:38 AM

data table - ideal case:

New Table( "quota_samples",
	Add Rows( 100000 ),
	Compress File When Saved( 1 ),
	New Column( "gender",
		Character,
		Formula( Match( Floor( Random Uniform() * 2 ), 0, "M", "F" ) ),
		Compact(),
		Set Selected
	),
	New Column( "income",
		Character,
		Formula(
			Match( Floor( Random Uniform() * 3 ), 0, "low", 1, "medium", "high" )
		),
		Compact()
	),
	New Column( "region",
		Character,
		Formula(
			Match( Floor( Random Uniform() * 5 ),
				0, "N",
				1, "S",
				2, "W",
				3, "E",
				"C"
			)
		)
	),
	New Column( "age",
		Character,
		Formula(
			Match( Floor( Random Uniform() * 3 ), 0, "young", 1, "medium", "old" )
		)
	)
)

hogi · Dec 8, 2024 4:11 AM

For such an ideal table, (if the probabilities of your data set fit to the fractions you want), you can pick random samples - random samples per variant or a specific number of samples per variant - or a combination of all 3 ...
you will always get the subgroups with the requested fraction (1/2, 1/3, 1/3 and 1/5)

// random sampling : full data set
if(not(current data table() << has column ("cum_prob")),New Column( "cum_prob",
	Formula(
		Col Rank( random uniform()) / (
		Col Number( 1 ))
	)
));

// random sampling : per variant
if(not(current data table() << has column ("cum_prob_indiv")),New Column( "cum_prob_indiv",
	Formula(
	tmp = random uniform(); // tmp =1; // **)
		Col Rank( tmp, :gender, :income, :region, :age ) / (
		Col Number( tmp, :gender, :income, :region, :age ))
	)
));

// force ratios: 1/2, 1/3, 1/3, 1/5
if(not(current data table() << has column ("rank_indiv")),
New Column( "rank_indiv",
	Formula( Col Rank( random uniform(), :gender, :income, :region, :age ) )
));

Graph Builder(
	Size( 518, 448 ),
	Show Control Panel( 0 ),
	Graph Spacing( 4 ),
	Variables( X( :gender ), X( :income ), X( :region ), X( :age ) ),
	Elements( Position( 1, 1 ), Bar( X,  Summary Statistic( "N" ) ) ),
	Elements( Position( 2, 1 ), Bar( X,  Summary Statistic( "N" ) ) ),
	Elements( Position( 3, 1 ), Bar( X,  Summary Statistic( "N" ) ) ),
	Elements( Position( 4, 1 ), Bar( X,  Summary Statistic( "N" ) ) ),
	Local Data Filter(
        Title( "how many samples do you want ? " ),
		Add Filter(
			columns( :cum_prob, :cum_prob_indiv, :rank_indiv )
		)
	)
);

**) instead of using CDFs with random uniform(), one can randomizing the row order and use CDFs of "1".

hogi · Dec 6, 2024 1:12 PM

The last option also works for less systematic tables like the one below.
The only limitation: if there are few samples for one of the variants, there is this clear limit to the number of samples that can be selected.

It follows the simple rule:
If for one of the variants (A), there are just N samples, take those and pick the same number of random samples from the other variants. Actually, for variant A, this is NOT "sampling".
So, maybe pick just M << N random samples from each of the 90 variants.
You can take the same (JSL) logic - just adjust N to a lower value.

(view in My Videos)

variants = {};
For Each( {gender}, {"F", "M"},
	For Each( {age}, {"young", "medium", "old"},
		For Each( {income}, {"low", "medium", "high"},
			For Each( {region}, {"N", "S", "E", "W", "C"},
				Insert Into( variants, Concat Items( {gender, age, income, region} ) )
			)
		)
	)
);

	Eval(
		Eval Expr(
		new table(
	"unfair",
	add rows( 100000 ), 
	

			New Column( "variant",Character,
				formula(
					variants = As constant(Expr( variants ));
					Try(
						variants[Floor( Random Normal( 45, 30 ) )],
						"F young medium C"
					);
				)
			),
			
			New column ("gender",Character, formula(Word(1,:variant))),
			New column ("age",Character, formula(Word(2,:variant))),
			New column ("income",Character, formula(Word(3,:variant))),
			New column ("region",Character, formula(Word(4,:variant))),
		)
	)
);

Discussions

How to Select a quota sample from a data set

Re: How to Select a quota sample from a data set

Re: How to Select a quota sample from a data set

Re: How to Select a quota sample from a data set

Re: How to Select a quota sample from a data set

Re: How to Select a quota sample from a data set

Re: How to Select a quota sample from a data set

Re: How to Select a quota sample from a data set

Re: How to Select a quota sample from a data set

Re: How to Select a quota sample from a data set

Recommended Articles