- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Sub-Sample randomly from a column
Hello all, - I have a population of 200 stores across the country. I'm interested to know if I can sub-sample a fraction of these stores and approximate the distribution of events. Might there be some Monte Carlo method for filling e.g., 30 rows in a new column with a random sub-sample from the original 200.
Then I could compare the distribution of the 30 counts in the sub-sample to the distribution in the population of 200.
Repeat this say 100 times to determine the risk of sampling 30 stores to represent the population of 200.
Thoughts? thanks, Paul
First 10 rows of the 200
Store# - column 1
Even count - column 2
19 20 21 22 23 25 26 27 29 34 | 1 4 7 3 4 3 12 5 8 8 |
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Sub-Sample randomly from a column
I quick way to do a subsample using a column formula is the use the Col Shuffle() function. The function returns a random reordering of row numbers. If you choose rows where the values of Col Shuffle() are less than than or equal to the subsample size (n_sub), this equivalent to doing a random ordering of the column and selecting the first n_sub rows.
For example, adding new column with this formula will do this for a subsample size of 30.
example table attached.
If you want to do this repeatedly, then learning some JMP Scripting would make this easier. A basic script to add columns like this repeatedly is
Names Default to Here( 1 );
Number_of_Cols_to_Add = 100;
For( i = 1, i <= Number_of_Cols_to_Add, i++,
Current Data Table() <<
New Column( "Y Subsample (Size 30)",
Numeric,
"Continuous",
Format( "Best", 12 ),
Formula( If( Col Shuffle() <= 30, :Y, . ) )
)
);
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Sub-Sample randomly from a column
Take a look at the Resample Freq() function.
Resample Freq
Generates a random selection with replacement frequency counts, suitable for use in bootstrapping. For example, it supports a second Freq Column argument, enabling it to do bootstrap samples relating to a pre-existing frequency column specified in the second argument. Resample Freq() generates a 100% resample. ResampleFreq(rate) generates a rate frequency sample. Resample(rate, column) generates a sample that is calculated by the rate multiplied by the sum of the specified column. For more information about the syntax for the Resample Freq function, see Discrete Probability Functions in the JSL Syntax Reference.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Sub-Sample randomly from a column
Hello Jeff, thank you for the reply. I am not understanding how to use the function for my purpose, unfortunately.
- Resample Freq creates counts of re-samples. This seems it would be an intermediate step to then creating a new table with the sub-sampled data. I am hoping to directly create a column of sub-sampled data from the original population of 200 rows.
- I would prefer to sample without replacement, not bootstrap.
If I only needed the original data sub-sampled once, the easiest path would be Table > Subset > Random Sample Size = 30 (e.g.)
I don't script, otherwise I would create a loop to produce 100 tables of 30 random samples, then concatenate the 100 tables into a master table.
Is there a way to directly grab 30 random samples from a column, and put those 30 values into a new column?
Ultimately, I hope to try the Reliability > Life distribution > Compare Groups to assess the distribution of the subsamples vs. original population distribution.
- Then repeat for Sample Size = 20, then 40, etc. so I can assess the minimum sample size that gives a similar distribution to the population of 200 stores.
Thanks, Paul
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Sub-Sample randomly from a column
Here is an example script that creates subsets of a given size without replacement
Names Default To Here( 1 );
dt =
// Open Data Table: semiconductor capability.jmp
// → Data Table( "semiconductor capability" )
Open( "$SAMPLE_DATA/semiconductor capability.jmp" );
// New formula column: Row
Data Table( "semiconductor capability" ) << New Column( "Row",
Numeric,
"Continuous",
Format( "Best", 12 ),
Formula( Row() )
) << Move Selected Columns( {:Row}, after( :SITE ) );
// Loop through data creating sample data tables
sampleSize = 20;
For( i = 1, i <= 20, i++,
dtSample = Data Table( "semiconductor capability" ) << Subset(
Sample Size( sampleSize ),
Selected columns only( 0 ),
output table( "Sample " || Char( i ) )
);
matSelected = dtSample:Row << get values;
// Delete row column from sample data
dtSample << delete columns( Row );
// Delete selected rows from main table
dt << select rows( matSelected );
dt << delete rows;
);
// Delete row column from original data table
dt << delete columns( Row );
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Sub-Sample randomly from a column
I quick way to do a subsample using a column formula is the use the Col Shuffle() function. The function returns a random reordering of row numbers. If you choose rows where the values of Col Shuffle() are less than than or equal to the subsample size (n_sub), this equivalent to doing a random ordering of the column and selecting the first n_sub rows.
For example, adding new column with this formula will do this for a subsample size of 30.
example table attached.
If you want to do this repeatedly, then learning some JMP Scripting would make this easier. A basic script to add columns like this repeatedly is
Names Default to Here( 1 );
Number_of_Cols_to_Add = 100;
For( i = 1, i <= Number_of_Cols_to_Add, i++,
Current Data Table() <<
New Column( "Y Subsample (Size 30)",
Numeric,
"Continuous",
Format( "Best", 12 ),
Formula( If( Col Shuffle() <= 30, :Y, . ) )
)
);
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Sub-Sample randomly from a column
Thank you, Sam, and to everyone who posted solutions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Sub-Sample randomly from a column
I understand that you are not scripting your solution. You can also interactively use Tables > Subset to create a new smaller table with randomly selected rows from the original data table. Would that way work for you?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Sub-Sample randomly from a column
Hi @paulp ,
the other non-scripting solution that can be useful in such applications is using JMP query builder under the Tables menu.
there you can find a random sampling tab as in the picture
As usual one you run it interactively you can make use of the script created for additional usage.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Sub-Sample randomly from a column
thanks, Ron