cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Submit your abstract to the call for content for Discovery Summit Americas by April 23. Selected abstracts will be presented at Discovery Summit, Oct. 21- 24.
Discovery is online this week, April 16 and 18. Join us for these exciting interactive sessions.
Choose Language Hide Translation Bar
Adele
Level III

k-fold cross-validation with stepwise regression_R Squares for training and validation

        Hi, I have a problem when doing k-fold cross-validation with stepwise regression: I want to run the k-fold cross-validation with stepwise regression for 100 times using same dataset and save the R squares for both the training and the validation sets. The results table of JMP include two columns, “RSquare” and “RSquare K-Fold”. Although I can get different “RSquare K-Fold”, it seems that every time the “RSquares” are the same. That means maybe the “RSquare” is not the R square for the training set and the “RSquare K-Fold” is not the R square for the validation set.

        So, how could I get the two values (i.e., R squares for the training and the validation set)? How could I change my script to get these values?

        I found that in the option of the "Plot R square history" we can get a graph of the R squares of the training and validation sets. Then how can I get the value? It seems that it only gives a graph (see the Figure).

values of validation and training.png

Here is my script:

names default to here(1);

 

dt=Current Data Table();

 

dtb=New Table( "K-Fold Results",

       Add Rows( 0 ),

       New Column( "StringColBox",

              Character,

              "Nominal"

       )

);

 

For( i = 1, i <= 100, i++,

obj = dt << Fit Model(

            Y( :Std FP ),

               Effects(

                     :Std wl,

                     :Std on,

                     :Std hfn,

                     :Std odc,

                     :Std cvq,

                     :Std son,

                     :lNo,

                     :wNo

                  ),

                  Personality( Stepwise),

                  Run,

                  invisible

                  );

obj << Name( "K-Fold Crossvalidation" )(10);

obj << Finish;

 

kfold = obj << report;

tablebox = kfold[Table Box( 3 )];

dt1=tablebox <<make into data table();

dt1<<set name("K-Fold Results "||char(i));

dt1<<new column("Iteration",formula(i));

dtb << Concatenate( dt1, "Append to first table" );

close(dt1, nosave);

);

1 ACCEPTED SOLUTION

Accepted Solutions

Re: k-fold cross-validation with stepwise regression_R Squares for training and validation

The column "RSquare K-Fold" was added to the History report for additional perspective when considering the candidate models. The other columns are the same as without this stopping rule. That is, they are result of fitting all the rows. You can observe the change in the R square overall and for the hold-out data look for signs of over-fitting.

 

I do not know of a way to obtain the R square for the model fit with only the training sets in the Stepwise platform.

 

The plot that you refer to plots the R square value from the RSquare column. You can see that if you add a reference line to the vertical axis for the maximum R square value in the History.

 

Screen Shot 2020-01-28 at 6.03.22 PM.png

 

They are the same.

View solution in original post

2 REPLIES 2

Re: k-fold cross-validation with stepwise regression_R Squares for training and validation

The column "RSquare K-Fold" was added to the History report for additional perspective when considering the candidate models. The other columns are the same as without this stopping rule. That is, they are result of fitting all the rows. You can observe the change in the R square overall and for the hold-out data look for signs of over-fitting.

 

I do not know of a way to obtain the R square for the model fit with only the training sets in the Stepwise platform.

 

The plot that you refer to plots the R square value from the RSquare column. You can see that if you add a reference line to the vertical axis for the maximum R square value in the History.

 

Screen Shot 2020-01-28 at 6.03.22 PM.png

 

They are the same.

AM007
Level I

Instructions on manually performing Cross-Validation

Where could I find instructions on performing Cross-validation without using JMP Pro?  

I am stumbling through what seems like an arduous process and it seems that this community has much more clever and experienced folks.