cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
The Discovery Summit 2025 Call for Content is open! Submit an abstract today to present at our premier analytics conference.
Choose Language Hide Translation Bar
shasheminassab
Level IV

How to make a validation column in regular JMP using JSL?

Hi

 

I think there is a "Make Validation Column" function in JMP pro but I am wondering how to make a validation column (with 25% training and 75% validation) using JSL in regular JMP? Any help is appreciated.

2 REPLIES 2

Re: How to make a validation column in regular JMP using JSL?

It is pretty simple to do interactively. The JSL steps should be straight-forward using this approach, too.

Create a columns called Validation.

Fill the entire column with 0s. A zero will indicate a training set observation.

Go to Tables > Subset.

Enter a Random-Sampling Rate of 0.75 (for your 75% validation set).

Check the box for Link to Original Data Table.

Click OK. 

In the subset table, change a validation field from 0 to 1.

Right-click the 1 and Fill to end of Table.

Close the subset table.

 

I was pretty loose with my JSL (didn't bother with proper scoping to avoid potential issues), but it should give you a pretty good start.

dt=Current Data Table();
dt << New Column( "Validation",
	Numeric,
	"Nominal",
	Format( "Best", 12 )
);
For Each Row (:Validation = 0);

dt << Subset(
	Output Table("ValData"),
	Linked,
	Suppress formula evaluation( 0 ),
	Sampling Rate( 0.75 ),
	Selected columns only( 0 )
);
For Each Row (:Validation = 1);

Close( "ValData" );

dt << Clear Select;

Note that this is going strictly with a random assignment. Many times you really should stratify the validation by the target variable.

 

My approach was something I had kept hidden away for several years. Brady has two better approaches down below.

Dan Obermiller

Re: How to make a validation column in regular JMP using JSL?

Hi,

 

Subject to Dan's caveats regarding random assignment, this will do it (in this case, for 75% training data).

 

dt << new column ("Validation", nominal, <<set values(randomshuffle( (1::nrow(dt))` > 0.75*nrow(dt))));

Why this works:

1) The (1::nrow)`piece creates a column vector [1 2 3 ... nrow(dt)], and transposes (using the ` operator) it into a row vector [1,2,3, ... nrow(dt)].

2) Then, this row vector is compared to 0.75*nrow(dt). If greater, assign 1, if not, assign 0. So, suppose we have nrow(dt) = 100. Then the original vector is:

[1, 2, 3, ... , 74, 75, 76, 77, ... 100]. After the comparison with 75, the result vector is:

[0, 0, 0, ... , 0, 0, 1, 1, ... 1]. That is, 75 0s followed by 25 1s.

3) Randomshuffle ( ) puts the contents of a vector into random order... so the 75 0s and 25 1s (still using a 100-row table as an example) will be encountered in random fashion.

4) Finally, the << set values message fills the column with this random assortment of 75 0s and 25 1s.

 

 

FWIW, another way to do this interactively is to select Cols > New Columns... from the main menu, then fill out the dialog as below:

brady_brady_0-1623965977282.png

 

 

 

 

Cheers,

Brady