Choose Language Hide Translation Bar
Highlighted
frankderuyck
Level IV

Validation column

I am working on a logistic regression in jmp pro. Using validation column option the data set is split in a fixed training and validation part (I used no test set); I understand that holdback vallidation is used? Is there also a possibility to chose cross-validation?

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Validation column

The options are a matter of personal preference. Lasso is for variable selection. Elastic Net, therefore, is also used for variable selection. Ridge is for shrinking estimates to avoid over-fitting.

 

The Penalized Estimation Methods are documented in JMP Help.

Learn it once, use it forever!

View solution in original post

5 REPLIES 5
Highlighted

Re: Validation column

Yes, the validation column is a way to define hold out sets for training, validation, and testing.

 

Using hold out sets is cross-validation. If you mean another way of defining hold out sets, such as K-fold cross-validation in the case of small data sets, it is not available in Nominal Logistic or Ordinal Logistic platforms. It is available in the Model Launch outline once you launch Generalized Regression.

Learn it once, use it forever!
Highlighted
frankderuyck
Level IV

Re: Validation column

Hi Marc, think there is a confusion, let me check:

I understood that cross-validation is like K-means & Jack Knife where all data are used in the validation process and are used to build the model based on inernal cyclic validation. In this case a test set is necessary to check model performance on new data.

Hold back validation is holding apart a fraction of the data set that is used to validate the model performance so, as the data in the hold back don't contribute to the model building here a test set is not required, right?

Highlighted

Re: Validation column

I know that there is confusion. K-Means is a supervised learning method to fit clusters. Jack-knife is a technique to estimate the standard error independent of the model. Honest assessment is an approach to select and evaluate among candidates models in lieu of future observations. Cross-validation is generally used for honest assessment. Cross-validation is generally accomplished by either holding out sub-sets of data (large data set case) or by K-fold cross-validation (small data set case).

 

See Hastie, Trevor, Robert Tibshirani, and Jerome Friedman, "Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition," Springer. See Section 7.10: Cross-validation.

Learn it once, use it forever!
Highlighted
frankderuyck
Level IV

Re: Validation column

In my reply above I meant Kfold not K means..

So I went to generalized regression, made a validation column and selected Kfold - 5 folds. What model estimation to use for my nominal logistic fit (three continuous factors): lasso, elastic net..? Is there a rule of thumb for selecting the estimation method?

Highlighted

Re: Validation column

The options are a matter of personal preference. Lasso is for variable selection. Elastic Net, therefore, is also used for variable selection. Ridge is for shrinking estimates to avoid over-fitting.

 

The Penalized Estimation Methods are documented in JMP Help.

Learn it once, use it forever!

View solution in original post

Article Labels

    There are no labels assigned to this post.