Partition Data (Validation, Train, Test) for Logistic Regression

Kathleen · May 30, 2018 11:14 AM

How do you partition data and then model on a portion of it (Validation for example) for nominal logistic regression?

Peter_Bartell · May 30, 2018 11:24 PM

The path of least resistance is to use JMP Pro and the Make Validation Column utility. Think about perhaps using the stratify random option for partitioning the response variable set. Then when you actually launch the platform, make sure to place the column name in the Validation window on the Fit Model platform launch window.

cwillden · May 31, 2018 12:07 AM

JMP Pro definitely makes this effortless, but here's a way to mimic the functionality in JMP Pro of creating the partition. Using the partitons will still be pretty manual. You could just hide/exclude the val and test sets to fit the model to the training data, then predict with the training model on the val and test sets using a saved prediction formula from whichever platform you are using.

dt = Current Data Table();
n_rows = N Row(dt);

//determine proportions for training and validation sets (test is whatever is left over)
train_prop = 0.7;
val_prop = 0.15;

test_prop = 1-train_prop-val_prop;

//randomly shuffle row numbers
rows_shuffled = random shuffle(1::n_rows);

//determine precise number of data points for each partition
n_train = floor(train_prop*n_rows);
n_val = ceiling(val_prop*n_rows);
n_test = n_rows - n_train - n_val;

//obtain the desired data points for the partitions
train_rows = rows_shuffled[1::n_train];
val_rows = rows_shuffled[(n_train+1)::(n_train+1+n_val)];
test_rows = rows_shuffled[(n_rows-n_test)::n_rows];

//Create the partition column
partition_col = dt << New Column("Partition", numeric, nominal); //use partition_col just in case a Partition column already exists and the new column gets a name like "Partition 2"
//mimic JMP Prop underlying numeric values
partition_col[train_rows] = 1;
partition_col[val_rows] = 2;
partition_col[test_rows] = 3;

//Add value labels
partition_col << Add Column Properties({Value Labels( {"." = "Other", 1 = "Training", 2 = "Validation", 3 = "Test"} ),
Use Value Labels( 1 )});

I just whipped this together, but you could easily make this into an app with a dialog asking for the proportions for the training and validation. You could make it really handy by adding the script for that app to a menu or toolbar via View > Customize > Menus and Toolbars.

-- Cameron Willden

Partition Data (Validation, Train, Test) for Logistic Regression

Re: Partition Data (Validation, Train, Test) for Logistic Regression

Re: Partition Data (Validation, Train, Test) for Logistic Regression