Discussions

Samira · Dec 5, 2024 12:28 AM

Hello everyone

I am working on a research that implies working on a representative sample. I have already collected data, but I need to select a subset that fulfills the representation criteria that are 4: The sample should be with the following quotas to be met:
-
on gender: 50% males and 50% females.
-
on age: 1/3 from 18 to 30 years old, 1/3 from 31 to 50 years old and 1/3 over 51 years old.
-
on household income level: 1/3 from low , 1/3 from medium 7 and 1/3 from high .

1/5 of study population by each of the five regions of the city (North, South, Centre, East and West)

How can I create this sample?

I am using JMP pro 17

Samira · Dec 7, 2024 01:18 AM

Many thanks for your efforts

Your suggestion gave me a good idea to explore my dataset effectively by using local data filter in the graph builder first. For my data it seems that with some combinations (such as middle aged males living in the center region with IM income) the sample is 0

The whole idea of creating a representative sample by this way seems now to be not practical. The sampling technique that should have been used from the beginning is one of the Probability sampling techniques. May be Stratified Random Sampling, using only the 5 regions as strata. Then a random sample is collected from each strata.

txnelson · Dec 6, 2024 10:33 AM

I spent a good deal of time pondering this question, and did not come up with a good answer. I did generate a sample data table similar to @hogi. Below is modification to @hogi JSL with my code that I used. Basically, I think it is easier to read, and in the case of Age, it better represents the distribution as stated by @Samira .

New Table( "Samp",
	Add Rows( 100000 ),
	Compress File When Saved( 1 ),
	New Column( "gender",
		Character,
		Set Each Value( Match( Random Integer( 1, 2 ), 1, "Male", "Female" ) ),
		Compact()
	),
	New Column( "income",
		Character,
		Set Each Value(
			Match( Random Integer( 1, 3 ), 1, "Low", 2, "Medium", "High" )
		),
		Compact()
	),
	New Column( "region",
		Character,
		Set Each Value(
			Match( Random Integer( 1, 5 ),
				1, "North",
				2, "South",
				3, "West",
				4, "East",
				"Ccentre"
			)
		)
	),
	New Column( "age",
		Character,
		Set Each Value(
			temp = Random Integer( 18, 80 );
			If(
				temp <= 30, "Young",
				temp <= 50, "Middle Age",
				"Old"
			);
		)
	)
)

Jim

Samira · Dec 7, 2024 01:20 AM

Many thanks for your time and efforts

The whole idea of creating a representative sample by this way seems now to be not practical. The sampling technique that should have been used from the beginning is one of the Probability sampling techniques. May be Stratified Random Sampling, using only the 5 regions as strata. Then a random sample is collected from each strata.

Samira · Dec 7, 2024 01:23 AM

Thanks for your reply

I think you are right,

The whole idea of creating a representative sample by this way seems now to be not practical. The sampling technique that should have been used from the beginning is one of the Probability sampling techniques. May be Stratified Random Sampling, using only the 5 regions as strata. Then a random sample is collected from each strata.

txnelson · Dec 7, 2024 01:28 AM

That sounds like a reasonable approach

Jim

hogi · Dec 7, 2024 7:55 AM

So, what you asked for in the original post is :

disproportionate stratified random sampling, right?

And the strata are the 90 subgroups of gender x age x income x region.

Samira · Dec 8, 2024 04:24 AM

After carefully revising different sampling techniques in textbooks, I found that my original idea did not fit with any of them. To get a representative sample you should choose one of the probability sampling techniques ( that offer each respondent of the study population an equal probability or chance of being included in the sample). These techniques include 4 types:

–Simple random sampling
–Systematic sampling
–Cluster sampling
–Stratified sampling

That's why I think stratified sampling could have been used to get the study sample from the beginning ( using only the 5 regions as strata. Then, a random sample is collected from each stratum). The stratified, cluster, and quota sampling techniques require classifying the population into mutually exclusive groups, which is not the situation in my question. That's why I think my original question is not correct in the first place.

:

hogi · Dec 8, 2024 1:50 AM

Hm, when we compare this case with DOE...

In a nicely designed DOE one doesn't even pick a (single) sample from each of the 90 subsets.

[1 sample from each of the 90 subsets: "full-factorial" -> way to "expensive"]

So, maybe together with some simplification of the model, here it's also OK to use a data set with missing entries in some intersection points? And a step further, maybe by intentionally removing (some data points from) some intersection points, one can get something useful between stratified input data and a balanced DOE?

@Victor_G might have a suggestion?

hogi · Dec 8, 2024 4:06 AM

hogi · Dec 8, 2024 09:07 AM

For the approach "random sampling - per variant" from
https://community.jmp.com/t5/Discussions/How-to-Select-a-quota-sample-from-a-data-set/m-p/821078/hig...

I created a JSL snippet to stratify arbitrary data - just select the columns and click OK.

librecall can be downloaded here: Recall Function Library

Names Default To Here( 1 );
verbose=0;
//Include( ".\libRecall_v2.jsl" );

objects = {"si^myCols", "s^Ncolumns", "s^Nfolds"};
values = {{}, {1}, {5}};
Try( // issue with projects
	librecall:genArrays( objects, values, "Stratify", verbose )
);

dt = Current Data Table();

nw = New Window( "K Fold Creator",
	<<Type( "Modal Dialog" ),
	<<Return Result,
	<<On Validate(
		If( N Items( myCols << get items() ),
			1,
			Caption( "please select a column" );
			0;
		)
	),
	V List Box(
		Lineup Box( N Col( 2 ),
			Panel Box( "", fcs = Filter Col Selector() ),
			Panel Box( "",
				Lineup Box( N Col( 2 ),spacing( 3 ),
					Button Box( "stratify by", myCols << append( fcs << get selected ) ),
					myCols = Col List Box( width( 200 ), min items( 1 ), nlines( 11 ) )
				), 
				
				Lineup Box( N Col( 2 ), spacing( 3 ),
					Text Box( "create more than 1 column?" ),
					Ncolumns = Number Edit Box(
						1,
						4,
						<<setintegeronly( 1 ),
						<<setminimum( 1 ), 

					),
					Text Box( "Number of Folds (K)" ),
					Nfolds = Number Edit Box( 5, 4, <<setintegeronly( 1 ), <<setminimum( 2 ) )
				), 

			)
		), 
		
		H List Box(
			Button Box( "OK",
				librecall:storeRoles( "Stratify", verbose );
				// the modal dialog stores the selected columns.
				For Each( {item}, 1 :: N Items( myCols << Get Items() ), myCols << Set Selected( item, 1 ) );
			),
			Button Box( "recall", librecall:recallRoles( "Stratify", verbose ) ),
			Button Box( "clear", librecall:resetRoles( "Stratify", verbose ) ),
			Button Box( "cancel" )
		)
	)
);

If( Not( nw["button"] == 1 ),
	Stop()	
);


myCols = Transform Each( {col}, nw["myCols"], Name Expr( As Column( col ) ) );
// remove continuous and exotic values.
myCols = Filter each({col}, myCols, col << Get Modeling Type == "Nominal" | col << Get Modeling Type == "Ordinal" );

For( i = 1, i <= nw["Ncolumns"], i++, 

	rankExpr = Expr(
		Col Rank( tmp, Excluded() )
	);
	For Each( {col}, myCols, Insert Into( rankExpr, Name Expr( col ) ) );
	numberExpr = Substitute( Name Expr( rankExpr ), Expr( Col Rank() ), Expr( Col Number() ) );

	Eval(
		Substitute(
				Expr(
					New Column( "Fold",
						Formula(
							If( Excluded(),
								.,
								tmp = Random Uniform();
								Floor( (_rank_ - 1) / (_number_) * _folds_ ) + 1;
							)
						)
					)
				),
			Expr( _rank_ ), Name Expr( rankExpr ),
			Expr( _number_ ), Name Expr( numberExpr ),
			Expr( _folds_ ), nw["Nfolds"]
		)
	);
);

Discussions

How to Select a quota sample from a data set

Re: How to Select a quota sample from a data set

Re: How to Select a quota sample from a data set

Re: How to Select a quota sample from a data set

Re: How to Select a quota sample from a data set

Re: How to Select a quota sample from a data set

Re: How to Select a quota sample from a data set

Re: How to Select a quota sample from a data set

Re: How to Select a quota sample from a data set

Re: How to Select a quota sample from a data set

Re: How to Select a quota sample from a data set

Recommended Articles