Discussions

sreekumarp · Jun 8, 2023 9:43 AM

When using the validation column method for cross validation , we split the data set into training , validation and test sets. This split ratio is specified by the user. Is there any guideline /reference to decide on the split ratio (such as 60:20:20 / 70:15:15 / 50:25:25 / 80 :10:10). Is it chosen also based on the total number of observations -N ?

Di_Michelson · Jan 11, 2023 08:31 AM

To your original question, no, there are not specific rules about how much data to leave out. In the JMP Education analytics courses, we advise you to hold out as much data as you are comfortable with, with at least 20% held out. If you feel the training set is too small to hold back that many rows, consider k-fold cross validation. How many rows are you willing to sacrifice to validation? Use k = n / that many rows. If k < 5 using that formula, consider leave-one-out cross validation.

Discussions

CROSS VALIDATION - VALIDATION COLUMN METHOD

Re: CROSS VALIDATION - VALIDATION COLUMN METHOD

Recommended Articles