cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
Kathleen
Level I

Partition Data (Validation, Train, Test) for Logistic Regression

How do you partition data and then model on a portion of it (Validation for example) for nominal logistic regression?

2 REPLIES 2
Peter_Bartell
Level VIII

Re: Partition Data (Validation, Train, Test) for Logistic Regression

The path of least resistance is to use JMP Pro and the Make Validation Column utility. Think about perhaps using the stratify random option for partitioning the response variable set. Then when you actually launch the platform, make sure to place the column name in the Validation window on the Fit Model platform launch window.

cwillden
Super User (Alumni)

Re: Partition Data (Validation, Train, Test) for Logistic Regression

JMP Pro definitely makes this effortless, but here's a way to mimic the functionality in JMP Pro of creating the partition.  Using the partitons will still be pretty manual.  You could just hide/exclude the val and test sets to fit the model to the training data, then predict with the training model on the val and test sets using a saved prediction formula from whichever platform you are using.

dt = Current Data Table();
n_rows = N Row(dt);

//determine proportions for training and validation sets (test is whatever is left over)
train_prop = 0.7;
val_prop = 0.15;

test_prop = 1-train_prop-val_prop;

//randomly shuffle row numbers
rows_shuffled = random shuffle(1::n_rows);

//determine precise number of data points for each partition
n_train = floor(train_prop*n_rows);
n_val = ceiling(val_prop*n_rows);
n_test = n_rows - n_train - n_val;

//obtain the desired data points for the partitions
train_rows = rows_shuffled[1::n_train];
val_rows = rows_shuffled[(n_train+1)::(n_train+1+n_val)];
test_rows = rows_shuffled[(n_rows-n_test)::n_rows];

//Create the partition column
partition_col = dt << New Column("Partition", numeric, nominal); //use partition_col just in case a Partition column already exists and the new column gets a name like "Partition 2"
//mimic JMP Prop underlying numeric values
partition_col[train_rows] = 1;
partition_col[val_rows] = 2;
partition_col[test_rows] = 3;

//Add value labels
partition_col << Add Column Properties({Value Labels( {"." = "Other", 1 = "Training", 2 = "Validation", 3 = "Test"} ),
Use Value Labels( 1 )});

I just whipped this together, but you could easily make this into an app with a dialog asking for the proportions for the training and validation.  You could make it really handy by adding the script for that app to a menu or toolbar via View > Customize > Menus and Toolbars.

-- Cameron Willden