cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
paulp
Level III

Sub-Sample randomly from a column

Hello all, - I have a population of 200 stores across the country.  I'm interested to know if I can sub-sample a fraction of these stores and approximate the distribution of events. Might there be some Monte Carlo method for filling e.g., 30 rows in a new column with a random sub-sample from the original 200.

 

Then I could compare the distribution of the 30 counts in the sub-sample to the distribution in the population of 200.

 

Repeat this say 100 times to determine the risk of sampling 30 stores to represent the population of 200.

 

Thoughts?  thanks, Paul

 

First 10 rows of the 200

Store# - column 1

Even count - column 2 

19
20
21
22
23
25
26
27
29
34
1
4
7
3
4
3
12
5
8
8



1 ACCEPTED SOLUTION

Accepted Solutions

Re: Sub-Sample randomly from a column

I quick way to do a subsample using a column formula is the use the Col Shuffle() function.  The function returns a random reordering of row numbers.  If you choose rows where the values of Col Shuffle() are less than than or equal to the subsample size (n_sub), this equivalent to doing a random ordering of the column and selecting the first n_sub rows.  

 

For example, adding new column with this formula will do this for a subsample size of 30.

SamGardner_0-1666984984603.png

example table attached.

If you want to do this repeatedly, then learning some JMP Scripting would make this easier.  A basic script to add columns like this repeatedly is

Names Default to Here( 1 );

Number_of_Cols_to_Add = 100;

For( i = 1, i <= Number_of_Cols_to_Add, i++, 
	
	Current Data Table() << 
	New Column( "Y Subsample (Size 30)",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Formula( If( Col Shuffle() <= 30, :Y, . ) )
	)
);

 

View solution in original post

8 REPLIES 8
Jeff_Perkinson
Community Manager Community Manager

Re: Sub-Sample randomly from a column

Take a look at the Resample Freq() function. 

 

Resample Freq

Generates a random selection with replacement frequency counts, suitable for use in bootstrapping. For example, it supports a second Freq Column argument, enabling it to do bootstrap samples relating to a pre-existing frequency column specified in the second argument. Resample Freq() generates a 100% resample. ResampleFreq(rate) generates a rate frequency sample. Resample(rate, column) generates a sample that is calculated by the rate multiplied by the sum of the specified column. For more information about the syntax for the Resample Freq function, see Discrete Probability Functions in the JSL Syntax Reference.

-Jeff
paulp
Level III

Re: Sub-Sample randomly from a column

Hello Jeff, thank you for the reply.  I am not understanding how to use the function for my purpose, unfortunately.

  • Resample Freq creates counts of re-samples.  This seems it would be an intermediate step to then creating a new table with the sub-sampled data. I am hoping to directly create a column of sub-sampled data from the original population of 200 rows.
  • I would prefer to sample without replacement, not bootstrap.

If I only needed the original data sub-sampled once, the easiest path would be Table > Subset > Random Sample Size = 30 (e.g.)

I don't script, otherwise I would create a loop to produce 100 tables of 30 random samples, then concatenate the 100 tables into a master table.

 

Is there a way to directly grab 30 random samples from a column, and put those 30 values into a new column?

 

Ultimately, I hope to try the Reliability > Life distribution > Compare Groups to assess the distribution of the subsamples vs. original population distribution. 

  • Then repeat for Sample Size = 20, then 40, etc. so I can assess the minimum sample size that gives a similar distribution to the population of 200 stores.

Thanks, Paul

txnelson
Super User

Re: Sub-Sample randomly from a column

Here is an example script that creates subsets of a given size without replacement

Names Default To Here( 1 );
dt = 
// Open Data Table: semiconductor capability.jmp
// → Data Table( "semiconductor capability" )
Open( "$SAMPLE_DATA/semiconductor capability.jmp" );

// New formula column: Row
Data Table( "semiconductor capability" ) << New Column( "Row",
	Numeric,
	"Continuous",
	Format( "Best", 12 ),
	Formula( Row() )
) << Move Selected Columns( {:Row}, after( :SITE ) );

// Loop through data creating sample data tables
sampleSize = 20;

For( i = 1, i <= 20, i++,
	dtSample = Data Table( "semiconductor capability" ) << Subset(
		Sample Size( sampleSize ),
		Selected columns only( 0 ),
		output table( "Sample " || Char( i ) )
	);
	
	matSelected = dtSample:Row << get values;
	
	// Delete row column from sample data
	dtSample << delete columns( Row );

// Delete selected rows from main table
	dt << select rows( matSelected );
	dt << delete rows;
);

// Delete row column from original data table
dt << delete columns( Row );
Jim

Re: Sub-Sample randomly from a column

I quick way to do a subsample using a column formula is the use the Col Shuffle() function.  The function returns a random reordering of row numbers.  If you choose rows where the values of Col Shuffle() are less than than or equal to the subsample size (n_sub), this equivalent to doing a random ordering of the column and selecting the first n_sub rows.  

 

For example, adding new column with this formula will do this for a subsample size of 30.

SamGardner_0-1666984984603.png

example table attached.

If you want to do this repeatedly, then learning some JMP Scripting would make this easier.  A basic script to add columns like this repeatedly is

Names Default to Here( 1 );

Number_of_Cols_to_Add = 100;

For( i = 1, i <= Number_of_Cols_to_Add, i++, 
	
	Current Data Table() << 
	New Column( "Y Subsample (Size 30)",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Formula( If( Col Shuffle() <= 30, :Y, . ) )
	)
);

 

paulp
Level III

Re: Sub-Sample randomly from a column

  Thank you, Sam, and to everyone who posted solutions.

Re: Sub-Sample randomly from a column

I understand that you are not scripting your solution. You can also interactively use Tables > Subset to create a new smaller table with randomly selected rows from the original data table. Would that way work for you?

ron_horne
Super User (Alumni)

Re: Sub-Sample randomly from a column

Hi @paulp ,
the other non-scripting solution that can be useful in such applications is using JMP query builder under the Tables menu.

ron_horne_1-1667380578796.png

there you can find a random sampling tab as in the picture

 

ron_horne_0-1667380494559.png

 

As usual one you run it interactively you can make use of the script created for additional usage.

 

 

 

paulp
Level III

Re: Sub-Sample randomly from a column

 

thanks, Ron