cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
JMP is taking Discovery online, April 16 and 18. Register today and join us for interactive sessions featuring popular presentation topics, networking, and discussions with the experts.
Choose Language Hide Translation Bar
tajrida
Level III

Split data into training and validation without JMP Pro

What is  the easiest and convenient way to split data into training, test and validation without using JMP Pro?

Mohammed Ahmed
1 ACCEPTED SOLUTION

Accepted Solutions
KarenC
Super User (Alumni)

Re: Split data into training and validation without JMP Pro

You may find the "Initialize Column" feature helpful for this task. To create a "data usage" column add a new column to your table, right click and select "column info" and then there will be an "Initial Data" option towards the bottom of the dialog box (once initialized this will not appear in the column info dialog box). From the drop down menu select "Random" and then from the radio options "Random Indicator" . Now you have three lines that have the default of 0, 1, and 2.  You can rename as Train, Test, Validate or whatever you like.  Finally you can select the proportion of the data that you would like in each group.  The result will be an indicator column that you can use to filter your data for fitting, testing, and validating.

View solution in original post

3 REPLIES 3
txnelson
Super User

Re: Split data into training and validation without JMP Pro

The easiest way to do this interactively, is to open the data table in question, go to the pull down menu and select:

     Rows==>Row Selection==>Select Randomly

Then go to

     Tables==>Subset

Create the new data table

In original data table, rt click on one of the selected rows and select

     Invert Selection

Then go back to

     Tables==>Subset

and create your second data table

This can also be simply scripted

 

Names Default To Here( 1 );

dt = Current Data Table();

 

// Create a Uniformly Random column

dt << New Column( "my random sample", formula( Random Uniform() ) ); 

// Select 20% of the data 

dt << select where( :my random sample <= .2 ); 

// Delete the random number column since it is no longer needed 

dt << delete columns( "my random sample" ); 

// Put those selected rows into a Validate data table 

dt << subset( selected rows( 1 ), selected columns( 0 ), output table name( "Validate" ) ); 

// Invert the row selection 

dt << invert row selection;

// Place all of those rows into the Training data table 

dt << subset( selected rows( 1 ), selected columns( 0 ), output table name( "Training" ) );

 

Jim
KarenC
Super User (Alumni)

Re: Split data into training and validation without JMP Pro

You may find the "Initialize Column" feature helpful for this task. To create a "data usage" column add a new column to your table, right click and select "column info" and then there will be an "Initial Data" option towards the bottom of the dialog box (once initialized this will not appear in the column info dialog box). From the drop down menu select "Random" and then from the radio options "Random Indicator" . Now you have three lines that have the default of 0, 1, and 2.  You can rename as Train, Test, Validate or whatever you like.  Finally you can select the proportion of the data that you would like in each group.  The result will be an indicator column that you can use to filter your data for fitting, testing, and validating.

Jeff_Perkinson
Community Manager Community Manager

Re: Split data into training and validation without JMP Pro

In JMP Pro 12 you can use Cols->Modeling Utilities->Make Validation Column utility.

11175_JMPScreenSnapz008.png11176_JMPScreenSnapz009.png

-Jeff

-Jeff