Hi @bbenny7 ,
I found this topic interesting for a similar reason of wanting to correctly subset a data table, but to stratify it on a column. I've done it in the past as @dlehman1 has suggested using a validation column to stratify on column(s) of interest, but also was curious how to do it a different way in case multiple data tables were needed. I tried the way that @Mark_Bailey suggested, but found that I needed to split the JSL code for the New Table() into two lines, one defining the new table, and the next assigning the values based on the other data table column of interest. I couldn't get it to work the way his original code was laid out. Here's how a modified code worked for me:
Names Default To Here( 1 );
dt = Data Table( "originaldatatable" ); //assigns the original data table to the variable dt
subscript = J( samplesize, 1, Random Integer( 1, N Rows( dt ) ) ); //creates the random integer vector of length 'samplesize'
dt2 = New Table( "Sample", New Column( "Data" ) ); //creates new data table with column Data
dt2:Data << Set Values( dt:originalColumn[subscript] ); // assigns values to Data based on the row entries for the originalColumn
As a fun little test, I generated 4 subsets by making a For() loop and putting the subscript line in it (to generate a new set of row numbers) and compared the distributions for the 4 sets, and their summary statistics are all very similar.
I did a similar test but created 4 stratified validation columns and then looked at their statistics. The N is different because the Make Validation Column platform wouldn't generate the same ratios that I did above, where I chose 300 just randomly. Anyway, the results are all very similar.
Either way should work and get you where you want to go.
Good luck!,
DS