cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
learning_JSL
Level IV

trying to create and save a random subset of rows and save this to a script for later use

I am doing a very large number of validation tests on a multiple regression prediction model that I continue to tweak and would like to automate part of the validation test process using a script.  So far, I have not been able to "save to script" the part of the process where I generate and save a random subset of rows from my training dataset to use for validation testing. 

 

When I do this manually I: 1) generate a random subset of rows (20 percent) from my main table, 2) save the subset, and 3) delete those subset rows from the main table and save the "new" main table (note that I made sure the subset table is linked to the main table to allow this step).  I then go on to do the model analysis on the new main table and test how well the rows in the subset table fit (i.e. validation testing.) 

 

How do I get JMP to "record" those moves in a script?   So far, I haven't gotten it to recognize those moves.   Thanks in advance! 

1 ACCEPTED SOLUTION

Accepted Solutions
SDF1
Super User

Re: trying to create and save a random subset of rows and save this to a script for later use

Hi @learning_JSL ,

 

  Yes, when you call the Save or Save As function in JSL, you can set the name. If you do this iteratively, you can put it all within a larger For loop and it can do it all automatically for you. Now that I think about what you're doing with the data, it almost seems like you're doing something similar to K-fold cross validation. If you are, many of the fit platforms in JMP will handle that very easily. Depending on the modeling platform, you might select it as a cross validation option, or you might have to make several k-fold validation columns, as is needed in XGBoost. Either way, you can create a validation column with training, validation, and test columns, and you can stratify the validation column based on the response column, or factor columns, however it works best for your situation.

 

  On a side note, if you are going to start scripting things to save time, I highly recommend using the Scripting Index (Help > Scripting Index) to help you. You can search commands, and there are examples of how it's used to perform those actions/commands.

 

Hope this helps!,

DS

View solution in original post

10 REPLIES 10
txnelson
Super User

Re: trying to create and save a random subset of rows and save this to a script for later use

Look in the source entry in the subset data table and it will have the jsl for the subset
Jim
learning_JSL
Level IV

Re: trying to create and save a random subset of rows and save this to a script for later use

txnelson - Thank you for the tip!

SDF1
Super User

Re: trying to create and save a random subset of rows and save this to a script for later use

Hi @learning_JSL ,

 

  The following script should help you get started where you want to go. I would not recommend linking the subset to the main table because if you do, when we delete those rows in the main table using the script, it will also delete the rows in the subset table. Anyway, this should get you started in the direction you want to go.

Names default to here (1);

dt = Open( "$SAMPLE_DATA/Big Class.jmp" );

Random_rows = dt<<Select Randomly (0.2);

Random_dt = dt <<Subset(Selected Rows, random rows, selected Columns Only (0), link to original data table (0));

dt <<delete rows;

Hope this helps!,

DS

learning_JSL
Level IV

Re: trying to create and save a random subset of rows and save this to a script for later use

Perfect!  And is there a command to name the new files (i.e. the new subset file and the new main table file) within the script?  Ideally it would name the two new files with a sequentially increasing number each time the script is run (e.g.  subset 1, main 1.....subset 2, main 2.....subset 3, main 3, .....etc) but I don't mind re-setting it manually myself within the script if needed as this is already a huge time savings.   

SDF1
Super User

Re: trying to create and save a random subset of rows and save this to a script for later use

Hi @learning_JSL ,

 

  Yes, when you call the Save or Save As function in JSL, you can set the name. If you do this iteratively, you can put it all within a larger For loop and it can do it all automatically for you. Now that I think about what you're doing with the data, it almost seems like you're doing something similar to K-fold cross validation. If you are, many of the fit platforms in JMP will handle that very easily. Depending on the modeling platform, you might select it as a cross validation option, or you might have to make several k-fold validation columns, as is needed in XGBoost. Either way, you can create a validation column with training, validation, and test columns, and you can stratify the validation column based on the response column, or factor columns, however it works best for your situation.

 

  On a side note, if you are going to start scripting things to save time, I highly recommend using the Scripting Index (Help > Scripting Index) to help you. You can search commands, and there are examples of how it's used to perform those actions/commands.

 

Hope this helps!,

DS

learning_JSL
Level IV

Re: trying to create and save a random subset of rows and save this to a script for later use

Thank you Diedrich!  And thank you for suggesting the JSL Script Index for help.  It is working perfectly.

Re: trying to create and save a random subset of rows and save this to a script for later use

Before you go further, I wonder if there is an advantage to another approach. You might use the Excluded row state instead of a subset. Fit Least Squares will not include excluded rows in the training of the model. You can save a prediction formula in a new column and the predicted response of the excluded rows will be there.

 

Also, JMP Pro offers a Validation analysis role to make such evaluations very easy and convenient.

learning_JSL
Level IV

Re: trying to create and save a random subset of rows and save this to a script for later use

Thanks Mark - Does the excluded row state allow for a random selection of rows (say 20%) to be excluded from the original dataset?  It was not obvious in the JSL script index.

Re: trying to create and save a random subset of rows and save this to a script for later use

random.PNG