Discussions

Samira · Dec 5, 2024 12:28 AM

Hello everyone

I am working on a research that implies working on a representative sample. I have already collected data, but I need to select a subset that fulfills the representation criteria that are 4: The sample should be with the following quotas to be met:
-
on gender: 50% males and 50% females.
-
on age: 1/3 from 18 to 30 years old, 1/3 from 31 to 50 years old and 1/3 over 51 years old.
-
on household income level: 1/3 from low , 1/3 from medium 7 and 1/3 from high .

1/5 of study population by each of the five regions of the city (North, South, Centre, East and West)

How can I create this sample?

I am using JMP pro 17

Victor_G · Dec 9, 2024 1:05 AM

Hi @hogi and @Samira,

Just read quickly the post and answers and I'm still not 100% clear about your objective, as "representative" dataset may have different implications :

Do you want to better understand and model a response/phenomenon based on demographic data (age, gender, location, etc...), or do you want a "representative" sample in order to do some inference/generalization about the population ?

In the first case, because you would like to link the response/phenomenon to specific attributes, it's best to have a balanced dataset, so having balanced levels for age, gender, location, etc... to have an easier modeling and interpretation of the results.
Using a DoE approach, either building your Custom design "from scratch" and finding the right person corresponding to the different attributes, or building your D-Optimal design thanks to a Candidate Set approach with collected demographic data would help you investigate and analyze the different demographic factors as independantly and individually as possible.
In the second case, because you want to generalize the results of the sample to the population, you need to have a representative sample (as "biased" as the population) of your population. So perhaps you won't have 50/50 male/female, as the population might be biased to 40/60 for example, so you need to respect this demographic specificity.
Using a stratified approach on demographic data you have collected on your 5 regions might be helpful. You may have to consider two aspects : the creation of samples of your regions based on their population (make sure the proportion of you sample relative to the region population is the same, so a highly/dense populated region will have more people in the sample dataset) and based on the demographic data (gender, age, etc...).

In both cases, I would recommend to gather as much data as possible on the populations of your 5 regions, to better understand and inform your sample creation method.
On a side note, you can also find litterature about the segmentation of demographic data, I'm not sure the age segmentation proposed here is really helpful because of different age ranges : 18-30 (12years), 31-50 (19years), 51+ (20+ years ?). Also how do you define household income level low/medium and high ? Based on analysis or predefined thresholds/criteria ?
Some examples about demographic segmentation : https://xperiencify.com/what-is-demographic-segmentation/

Hope this answer might provide some ideas,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

Discussions

How to Select a quota sample from a data set

Re: How to Select a quota sample from a data set

Recommended Articles