- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
How to Select a quota sample from a data set
Hello everyone
I am working on a research that implies working on a representative sample. I have already collected data, but I need to select a subset that fulfills the representation criteria that are 4: The sample should be with the following quotas to be met:
-
on gender: 50% males and 50% females.
-
on age: 1/3 from 18 to 30 years old, 1/3 from 31 to 50 years old and 1/3 over 51 years old.
-
on household income level: 1/3 from low , 1/3 from medium 7 and 1/3 from high .
1/5 of study population by each of the five regions of the city (North, South, Centre, East and West)
How can I create this sample?
I am using JMP pro 17
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: How to Select a quota sample from a data set
sounds like a textbook exercise - is there a chapter with the solution?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: How to Select a quota sample from a data set
I actually was discussing this issue with a colleague to start the data analysis of a project with this quota sampling technique.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: How to Select a quota sample from a data set
This is a complex problem. How large is your data table you are pulling data from. What size of a sample are you pulling? You have 90 combinations of your 4 columns. Did you and your colleague come up with and idea on how to approach the problem?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: How to Select a quota sample from a data set
And you don't want to pick the sample data by the actual share of the distribution, but by the artificial fraction?
How about female, > 51yrs, high income, north.
Should it be 1/2 * 1/3 * 1/3 * 1/5? (*)
This is very easy to compute - but maybe too strict - and not intended?
Just think of the case where there is no female, > 51yrs, high income, north in the original distribution.
On the other hand: If just the 1/2, 1/3, 1/3 and 1/5 have to be fulfilled, one could make up extreme cases with 0 sample data for female, > 51yrs, high income, north.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: How to Select a quota sample from a data set
Thanks for your reply
The idea is to conduct the research on a representative sample to the population living in the whole city
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: How to Select a quota sample from a data set
This is not elegant and I'm not even sure it will work, but it might. You can create 0,1 columns for each of your criteria and then combine them into a single 0,1 column where 1 means that all of the individual criterion columns were =1. That will give the desired subset. And, if you want a random selection from such rows, just use a validation column stratified by that 0,1 column. As I said, not elegant but might work.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: How to Select a quota sample from a data set
data table - ideal case:
New Table( "quota_samples",
Add Rows( 100000 ),
Compress File When Saved( 1 ),
New Column( "gender",
Character,
Formula( Match( Floor( Random Uniform() * 2 ), 0, "M", "F" ) ),
Compact(),
Set Selected
),
New Column( "income",
Character,
Formula(
Match( Floor( Random Uniform() * 3 ), 0, "low", 1, "medium", "high" )
),
Compact()
),
New Column( "region",
Character,
Formula(
Match( Floor( Random Uniform() * 5 ),
0, "N",
1, "S",
2, "W",
3, "E",
"C"
)
)
),
New Column( "age",
Character,
Formula(
Match( Floor( Random Uniform() * 3 ), 0, "young", 1, "medium", "old" )
)
)
)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: How to Select a quota sample from a data set
For such an ideal table, (if the probabilities of your data set fit to the fractions you want), you can pick random samples - random samples per variant or a specific number of samples per variant - or a combination of all 3 ...
you will always get the subgroups with the requested fraction (1/2, 1/3, 1/3 and 1/5)
// random sampling : full data set
if(not(current data table() << has column ("cum_prob")),New Column( "cum_prob",
Formula(
Col Rank( random uniform()) / (
Col Number( 1 ))
)
));
// random sampling : per variant
if(not(current data table() << has column ("cum_prob_indiv")),New Column( "cum_prob_indiv",
Formula(
tmp = random uniform(); // tmp =1; // **)
Col Rank( tmp, :gender, :income, :region, :age ) / (
Col Number( tmp, :gender, :income, :region, :age ))
)
));
// force ratios: 1/2, 1/3, 1/3, 1/5
if(not(current data table() << has column ("rank_indiv")),
New Column( "rank_indiv",
Formula( Col Rank( random uniform(), :gender, :income, :region, :age ) )
));
Graph Builder(
Size( 518, 448 ),
Show Control Panel( 0 ),
Graph Spacing( 4 ),
Variables( X( :gender ), X( :income ), X( :region ), X( :age ) ),
Elements( Position( 1, 1 ), Bar( X, Summary Statistic( "N" ) ) ),
Elements( Position( 2, 1 ), Bar( X, Summary Statistic( "N" ) ) ),
Elements( Position( 3, 1 ), Bar( X, Summary Statistic( "N" ) ) ),
Elements( Position( 4, 1 ), Bar( X, Summary Statistic( "N" ) ) ),
Local Data Filter( Title( "how many samples do you want ? " ),
Add Filter(
columns( :cum_prob, :cum_prob_indiv, :rank_indiv )
)
)
);
**) instead of using CDFs with random uniform(), one can randomizing the row order and use CDFs of "1".
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: How to Select a quota sample from a data set
The last option also works for less systematic tables like the one below.
The only limitation: if there are few samples for one of the variants, there is this clear limit to the number of samples that can be selected.
It follows the simple rule:
If for one of the variants (A), there are just N samples, take those and pick the same number of random samples from the other variants. Actually, for variant A, this is NOT "sampling".
So, maybe pick just M << N random samples from each of the 90 variants.
You can take the same (JSL) logic - just adjust N to a lower value.
- Chapters
- descriptions off, selected
- captions settings, opens captions settings dialog
- captions off, selected
This is a modal window.
Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
This is a modal window. This modal can be closed by pressing the Escape key or activating the close button.
variants = {};
For Each( {gender}, {"F", "M"},
For Each( {age}, {"young", "medium", "old"},
For Each( {income}, {"low", "medium", "high"},
For Each( {region}, {"N", "S", "E", "W", "C"},
Insert Into( variants, Concat Items( {gender, age, income, region} ) )
)
)
)
);
Eval(
Eval Expr(
new table(
"unfair",
add rows( 100000 ),
New Column( "variant",Character,
formula(
variants = As constant(Expr( variants ));
Try(
variants[Floor( Random Normal( 45, 30 ) )],
"F young medium C"
);
)
),
New column ("gender",Character, formula(Word(1,:variant))),
New column ("age",Character, formula(Word(2,:variant))),
New column ("income",Character, formula(Word(3,:variant))),
New column ("region",Character, formula(Word(4,:variant))),
)
)
);