Hi, I have a relatively small data set with 95 observations/records and about 70 variables for each record. The goal is to detect which variables influence the response and find the optimal settings of these variables. Because of the small number of observations I decided to divide data into only two categories, training and test data set (omitting the validation data set). My strategy is to apply a decision tree with the partition platform followed by a least squares regression with the variables that seem most important. My question is, if there is a general advice regarding the proportions of the sizes of training and test data set? I have tried with a division where 75% of observations are included in the training data and 25% in the test data set but I am not sure if this is the optimal proportion.
I am also wondering if some bootstrap procedure could be applied for the division into training and test data set. The idea is to do a large number of divisions and somehow average the results. I am not sure as to whether this makes sense. I am running JMP 15.