gail_massari
Community Manager Community Manager

How do you easily create validation column for Neural Nets and other models using JMP Pro?

To create validation column for Neural Nets and other models using JMP Pro, from data table use Analyze>Predictive Modeling>Make Validation Column.

 

(view in My Videos)

There are five methods available:

 

Formula Random

Partitions the data into sets based on the allocations entered. For example, if the default values are entered, each row has a probability of 0.75 to be included in the training set and 0.25 probability of being included in the validation set. The formula is saved to the column. To see it, click on the plus icon to the right of the column name in the Columns panel.

 

Fixed Random

Partitions the data into sets based on the allocations entered. For example, if the default values are entered, each row has a probability of 0.75 to be included in the training set and 0.25 probability of being included in the validation set. You can specify a random seed that enables you to reproduce the allocations in the future. No formula is saved to the column. 

 

Stratified Random

Partitions the data into balanced sets based on levels of columns that you specify. Use this option when you want a balanced representation of a column’s levels in each of the training, validation, and testing sets. When you click Stratified Random, a window appears that enables you to select one or more columns by which to stratify the data. When you click OK, the validation column is added to the data table. As in the Fixed Random case, rows are randomly assigned to the holdback sets based on the specified allocations. However, this is done at each level or combination of levels of the stratifying columns. A column is added to the data table with a Notes property that gives the stratifying variables.

 

Grouped Random

Partitions the data into sets in such a way that entire levels of a specified column or combinations of levels of two or more columns are placed in the same holdback set. Use this option when splitting levels across holdback sets is not desirable. When you click Grouped Random, a window appears that enables you to select one or more columns to be grouping columns. When you click OK, the levels are randomly assigned to holdback sets. When a level is larger than the proportion or number of rows you specify, it stays in its assigned holdback set. However, fewer rows are allocated into the training set. Because of this, the sizes of the resulting sets vary slightly from the sizes that you specified.

 

Cutpoint

Partitions the data into sets based on time series cutpoints. Use this option when you want to assign your data to holdback sets based on time periods. When you click Cutpoint, a window appears that enables you to select one or more columns to define time periods. When you click OK, a JMP Alert appears that shows the assigned cutpoints. A column that reflects this assignment is added to the data table. The training set consists of rows between the first cutpoint and the second cutpoint. The validation set consists of rows between the second and third cutpoints. The test set consists of the remaining rows. These sets are chosen to reflect the proportions or numbers of rows that you specified.