Hi @AnnaPaula ,
I'm not sure I quite follow how you used the calculated seeds to generate stratification of the validation column. But, if you use the validation column platform in JMP Pro (which you have if you can access GenReg), then you can just stratify on the response (Y variable) you're fitting. If all your outliers are in the validation set, you might want to re-think how it generated the two sets.
Another point on this you might consider is splitting off entirely -- into another data table a test data set that is not used to fit the data. If you run multiple models on the training/validation set, you can then use the test data set to see which model is really the best at prediction, you can do this using the Model Comparison platform. Presumably, your goal is to have the best predictive capability, and you need to test that somehow. It looks like you have a large enough set you could do something like that. For example, make a validation column with, e.g. 60% train, 20% validation, 20% test. Subset the test data set into an entirely different table and then use the remaining train/validate subset to build models.
For your other questions about R^2 and it being negative or not quite matching up, I'd refer to what Mark said.
If your data is log-normal distributed, you'll want to select that option in the model specification, as that changes some of the underlying processes behind how it does the fitting.
When you have created a model in SLS or GenReg or PLS, you can bootstrap the Estimates by right clicking the Estimate column and selecting bootstrap. You'll want to run several thousand and then look at the distributions to see if the original Estimate for the coefficient of that term is close to a global mean of many estimations. JMP will run some calculations with slightly different starting points and therefore generate several different estimates, you can then see that distribution and determine if overall 1) the estimate from the first try was accurate, and 2) if the coefficient for the effect is really contributing a lot or not. For example, in the SLS platform, the effects are given a FDR LogWorth value to estimate the false discovery rate for that effect. If the value is >2 (the blue line), then there's significant evidence that the effect really does contribute to the model. On the other hand, it could be smaller, or near the value 2. If you bootstrap those FDR values, you can find the mean and range to determine if the effect is meaningful or not. I've had times where an effect looked like it could be borderline, and after bootstrapping the FDR, most of the time is was actually not crossing above 2 and was therefore not really a globally important factor in the model.
As a last note, I highly recommend running multiple different kinds of modeling platforms: boosted tree, bootstrap forest, NN, etc. to see if another platform might work best for your data/situation.
Hope this helps!,
DS