Subscribe Bookmark RSS Feed

Split data into training and validation without JMP Pro

tajrida

Community Trekker

Joined:

Apr 8, 2015

What is  the easiest and convenient way to split data into training, test and validation without using JMP Pro?

3 REPLIES
txnelson

Super User

Joined:

Jun 22, 2012

The easiest way to do this interactively, is to open the data table in question, go to the pull down menu and select:

     Rows==>Row Selection==>Select Randomly

Then go to

     Tables==>Subset

Create the new data table

In original data table, rt click on one of the selected rows and select

     Invert Selection

Then go back to

     Tables==>Subset

and create your second data table

This can also be simply scripted

 

Names Default To Here( 1 );

dt = Current Data Table();

 

// Create a Uniformly Random column

dt << New Column( "my random sample", formula( Random Uniform() ) ); 

// Select 20% of the data 

dt << select where( :my random sample <= .2 ); 

// Delete the random number column since it is no longer needed 

dt << delete columns( "my random sample" ); 

// Put those selected rows into a Validate data table 

dt << subset( selected rows( 1 ), selected columns( 0 ), output table name( "Validate" ) ); 

// Invert the row selection 

dt << invert row selection;

// Place all of those rows into the Training data table 

dt << subset( selected rows( 1 ), selected columns( 0 ), output table name( "Training" ) );

 

Jim
KarenC

Super User

Joined:

Feb 10, 2013

You may find the "Initialize Column" feature helpful for this task. To create a "data usage" column add a new column to your table, right click and select "column info" and then there will be an "Initial Data" option towards the bottom of the dialog box (once initialized this will not appear in the column info dialog box). From the drop down menu select "Random" and then from the radio options "Random Indicator" . Now you have three lines that have the default of 0, 1, and 2.  You can rename as Train, Test, Validate or whatever you like.  Finally you can select the proportion of the data that you would like in each group.  The result will be an indicator column that you can use to filter your data for fitting, testing, and validating.

Jeff_Perkinson

Community Manager

Joined:

Jun 23, 2011

In JMP Pro 12 you can use Cols->Modeling Utilities->Make Validation Column utility.

11175_JMPScreenSnapz008.png11176_JMPScreenSnapz009.png

-Jeff

-Jeff