cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Check out the JMP® Marketplace featured Capability Explorer add-in
Choose Language Hide Translation Bar
Samira
Level II

How to Select a quota sample from a data set

Hello everyone

 

I am working on a research that implies working on a representative sample. I have already collected data, but I need to select a subset that fulfills the representation criteria that are 4: The sample should be with the following quotas to be met:
-
on gender: 50% males and 50% females.
-
on age: 1/3 from 18 to 30 years old, 1/3 from 31 to 50 years old and 1/3 over 51 years old.
-
on household income level: 1/3 from low , 1/3 from medium 7 and 1/3 from high .

 

1/5 of study population by each of the five regions of the city (North, South, Centre, East and West)

 

How can I create this sample?

I am using JMP pro 17

20 REPLIES 20
Victor_G
Super User

Re: How to Select a quota sample from a data set

Hi @hogi and @Samira,

 

Just read quickly the post and answers and I'm still not 100% clear about your objective, as "representative" dataset may have different implications :

Do you want to better understand and model a response/phenomenon based on demographic data (age, gender, location, etc...), or do you want a "representative" sample in order to do some inference/generalization about the population ?

  • In the first case, because you would like to link the response/phenomenon to specific attributes, it's best to have a balanced dataset, so having balanced levels for age, gender, location, etc... to have an easier modeling and interpretation of the results.
    Using a DoE approach, either building your Custom design "from scratch" and finding the right person corresponding to the different attributes, or building your D-Optimal design thanks to a Candidate Set approach with collected demographic data would help you investigate and analyze the different demographic factors as independantly and individually as possible.
  • In the second case, because you want to generalize the results of the sample to the population, you need to have a representative sample (as "biased" as the population) of your population. So perhaps you won't have 50/50 male/female, as the population might be biased to 40/60 for example, so you need to respect this demographic specificity.
    Using a stratified approach on demographic data you have collected on your 5 regions might be helpful. You may have to consider two aspects : the creation of samples of your regions based on their population (make sure the proportion of you sample relative to the region population is the same, so a highly/dense populated region will have more people in the sample dataset) and based on the demographic data (gender, age, etc...).

In both cases, I would recommend to gather as much data as possible on the populations of your 5 regions, to better understand and inform your sample creation method.
On a side note, you can also find litterature about the segmentation of demographic data, I'm not sure the age segmentation proposed here is really helpful because of different age ranges : 18-30 (12years), 31-50 (19years), 51+ (20+ years ?). Also how do you define household income level low/medium and high ? Based on analysis or predefined thresholds/criteria ?
Some examples about demographic segmentation : https://xperiencify.com/what-is-demographic-segmentation/ 

 

Hope this answer might provide some ideas,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)