cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
JMP is taking Discovery online, April 16 and 18. Register today and join us for interactive sessions featuring popular presentation topics, networking, and discussions with the experts.
Choose Language Hide Translation Bar
frankderuyck
Level VI

Validation column

I am working on a logistic regression in jmp pro. Using validation column option the data set is split in a fixed training and validation part (I used no test set); I understand that holdback vallidation is used? Is there also a possibility to chose cross-validation?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Validation column

The options are a matter of personal preference. Lasso is for variable selection. Elastic Net, therefore, is also used for variable selection. Ridge is for shrinking estimates to avoid over-fitting.

 

The Penalized Estimation Methods are documented in JMP Help.

View solution in original post

5 REPLIES 5

Re: Validation column

Yes, the validation column is a way to define hold out sets for training, validation, and testing.

 

Using hold out sets is cross-validation. If you mean another way of defining hold out sets, such as K-fold cross-validation in the case of small data sets, it is not available in Nominal Logistic or Ordinal Logistic platforms. It is available in the Model Launch outline once you launch Generalized Regression.

frankderuyck
Level VI

Re: Validation column

Hi Marc, think there is a confusion, let me check:

I understood that cross-validation is like K-means & Jack Knife where all data are used in the validation process and are used to build the model based on inernal cyclic validation. In this case a test set is necessary to check model performance on new data.

Hold back validation is holding apart a fraction of the data set that is used to validate the model performance so, as the data in the hold back don't contribute to the model building here a test set is not required, right?

Re: Validation column

I know that there is confusion. K-Means is a supervised learning method to fit clusters. Jack-knife is a technique to estimate the standard error independent of the model. Honest assessment is an approach to select and evaluate among candidates models in lieu of future observations. Cross-validation is generally used for honest assessment. Cross-validation is generally accomplished by either holding out sub-sets of data (large data set case) or by K-fold cross-validation (small data set case).

 

See Hastie, Trevor, Robert Tibshirani, and Jerome Friedman, "Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition," Springer. See Section 7.10: Cross-validation.

frankderuyck
Level VI

Re: Validation column

In my reply above I meant Kfold not K means..

So I went to generalized regression, made a validation column and selected Kfold - 5 folds. What model estimation to use for my nominal logistic fit (three continuous factors): lasso, elastic net..? Is there a rule of thumb for selecting the estimation method?

Re: Validation column

The options are a matter of personal preference. Lasso is for variable selection. Elastic Net, therefore, is also used for variable selection. Ridge is for shrinking estimates to avoid over-fitting.

 

The Penalized Estimation Methods are documented in JMP Help.