Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

- JMP User Community
- :
- Discussions
- :
- Validation column

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

May 25, 2020 12:21 AM
(978 views)

I am working on a logistic regression in jmp pro. Using validation column option the data set is split in a fixed training and validation part (I used no test set); I understand that holdback vallidation is used? Is there also a possibility to chose cross-validation?

1 ACCEPTED SOLUTION

Accepted Solutions

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

The options are a matter of personal preference. Lasso is for variable selection. Elastic Net, therefore, is also used for variable selection. Ridge is for shrinking estimates to avoid over-fitting.

The Penalized Estimation Methods are documented in JMP Help.

Learn it once, use it forever!

5 REPLIES 5

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Validation column

Yes, the validation column is a way to define hold out sets for training, validation, and testing.

Using hold out sets is cross-validation. If you mean another way of defining hold out sets, such as K-fold cross-validation in the case of small data sets, it is not available in Nominal Logistic or Ordinal Logistic platforms. It is available in the Model Launch outline once you launch Generalized Regression.

Learn it once, use it forever!

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Validation column

Hi Marc, think there is a confusion, let me check:

I understood that cross-validation is like K-means & Jack Knife where all data are used in the validation process and are used to build the model based on inernal cyclic validation. In this case a test set is necessary to check model performance on new data.

Hold back validation is holding apart a fraction of the data set that is used to validate the model performance so, as the data in the hold back don't contribute to the model building here a test set is not required, right?

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Validation column

I know that there is confusion. K-Means is a supervised learning method to fit clusters. Jack-knife is a technique to estimate the standard error independent of the model. Honest assessment is an approach to select and evaluate among candidates models in lieu of future observations. Cross-validation is generally used for honest assessment. Cross-validation is generally accomplished by either holding out sub-sets of data (large data set case) or by K-fold cross-validation (small data set case).

See Hastie, Trevor, Robert Tibshirani, and Jerome Friedman, "*Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition*," Springer. See **Section 7.10: Cross-validation**.

Learn it once, use it forever!

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Validation column

In my reply above I meant Kfold not K means..

So I went to generalized regression, made a validation column and selected Kfold - 5 folds. What model estimation to use for my nominal logistic fit (three continuous factors): lasso, elastic net..? Is there a rule of thumb for selecting the estimation method?

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

The options are a matter of personal preference. Lasso is for variable selection. Elastic Net, therefore, is also used for variable selection. Ridge is for shrinking estimates to avoid over-fitting.

The Penalized Estimation Methods are documented in JMP Help.

Learn it once, use it forever!

Article Labels

There are no labels assigned to this post.