Re: K-fold Cross-validation for Neural Networks

dcfroehlich · Jun 20, 2020 02:59 AM

My question concerns the use of K-fold cross validation for artificial neural networks (NN). Specifically, I want to know where the final NN model parameters come from? Were they obtained by a fit to the entire data set? Or, were they from a fit to one of the K data sets consisting of K-1 folds that were used for training? If so, how was the best traing data set chosen? The SAS documentation is not clear on this issue. Does anyone have an answer?

Phil_Kay · Jul 2, 2020 09:03 AM

My understanding is that the parameter from the best of the K models is used.

This is from the help documentation on the Generalized Regression platform but I expect the same to apply to the Neural platform: https://www.jmp.com/support/help/en/15.1/#page/jmp/validation-method-options.shtml

"For each value of the tuning parameter, the following steps are conducted:

–The observations are partitioned into k subsets, or folds.

–In turn, each fold is used as a validation set. A model is fit to the observations not in the fold. The log-likelihood based on that model is calculated for the observations in the fold, providing a validation log-likelihood.

–The mean of the validation log-likelihoods for the k folds is calculated. This value serves as a validation log-likelihood for the value of the tuning parameter.

The value of the tuning parameter that has the maximum validation log-likelihood is used to construct the final solution. To obtain the final model, all k models derived for the optimal value of the tuning parameter are fit to the entire data set. Of these, the model that has the highest validation log-likelihood is selected as the final model. The training set used for that final model is designated as the Training set and the holdout fold for that model is the Validation set. These are the Training and Validation sets used in plots and in the reported results for the final solution."

dcfroehlich · Jul 2, 2020 09:40 AM

So, it seems that the final model parameters are those that are fit to the "final model Training set," which is the training set that corresponds to the "Validation set" that produces the best log-likelihood. Makes sense, but the description confuses me. Thanks for your response.