Discussions

dlehman1 · Apr 22, 2025 11:43 AM

I'm hoping that people with a better statistical background than I have can shed some light on this question. JMP has robust validation capabilities for all predictive modeling, including cross validation (though not in the Fit Model platform, except for the capability of using a validation set). Data scientists routinely partition their data into training and validation (and sometimes test) data sets. However, most published work using traditional statistical methods (multiple regression, logistic regression) do not use any validation. I've been wondering why that is the case. Among the possibilities I can think of: (1) statistical significance is viewed as making validation unnecessary; (2) historical practice that is slow to change; (3) the emphasis is on inference and not prediction (along the lines of Brieman's two cultures). In any case, I don't understand any justification for not using validation, so I'd like to hear what people think.

Jed_Campbell · Apr 22, 2025 12:38 PM

I suspect much of the lack of validation comes from your second option, especially in conjunction with other practices/traditions that are slow to change. If we accept the notion that academia tends to only publish significant findings, and if we accept the adage of "publish or perish," then we are naturally disincentivizing, at least to some small extent, extremely rigorous understanding before publication. This might explain some part of the recent discovery that many scientific studies can't be duplicated. In my opinion, if the scientific community adopted validation more readily and was more willing to publish "mundane" findings, I would expect to see results that are more easily duplicated, but perhaps less exciting.

I could easily be wrong or guilty of over-generalization, though. Like you, I'm interested in hearing others' thoughts.

statman · Apr 22, 2025 03:42 PM

First please define what you mean by validation. One take might be to collect a data set from some inference space and partition it randomly(?) so that only part is used to develop a model and the rest is used to see how well that model predicts the data from the held out data. While this may be accepted practice, this is not my idea of validation (I may be alone in this thought).

One belief I have is to truly “validate” the model I must make physical “samples” and evaluate how well the model predicted those actual samples. To me, validation is how well the model predicts the future (changing conditions). If you have designed your study well enough, over a large enough inference space (which can be very challenging and resource intensive), then your model should perform better in future prediction.

"All models are wrong, some are useful" G.E.P. Box

dlehman1 · Apr 22, 2025 04:52 PM

I mean the first definition (the one you don't like), and I think that is what the literature refers to as validation. Of course, almost all models are eventually used for out-of-sample prediction and the partitioning will not address future changing conditions. Nothing will (at least not perfectly). Ultimately, your second idea is necessary - even if you have "validated" a model in the first sense, you should monitor its performance to see if it works well enough to continue to use. But most published papers (other than data science journals) don't even do the first kind of validation, and I am asking why.

statman · Apr 23, 2025 09:00 AM

I have this to offer:

“The literature as it has grown up seems to be unbalanced in its comparative neglect of the Scientific aspects of the problem, and of its Logical aspects. This perhaps might have been expected, since many of the authors, albeit talented mathematicians, have evidently never submitted their minds to the specifically educational discipline of any one of the Natural Sciences, have never taken personal responsibility for experimentation at ground level, and have no direct experience of the kind of material involved…”

Sir Ronald Fisher, Colloq. Int. Cent. Nat. Recherche Scientifique, Paris, No. 110: 13-19 (1962)

"All models are wrong, some are useful" G.E.P. Box

dlehman1 · Apr 23, 2025 10:24 AM

This is a common viewpoint I have seen and I certainly agree with it in spirit. But I think it is inadequate when dealing with the social sciences. Many of the issues (the effects of minimum wages, competition policy, psychological effects of social media use, etc.) are not amenable to scientific experimentation in the same way as the physical sciences. That doesn't excuse people from ignoring good scientific practice, but it does leave a gap (a large one, in my opinion) regarding how to proceed in such areas of inquiry. And I think this does relate to the issue of validation - in my mind it makes the first sort of validation (random partitioning) more essential in these areas (though not sufficient as you have indicated).

SDF1 · Apr 23, 2025 11:12 AM

Hi @dlehman1 ,

Good point about the social sciences. It's much harder to replicate a DOE or to generate a "new sample" without having the previous runs influence a participants responses at a later time. One person's life experience is never the same as another's.

Perhaps turn this on it's head and think of it as a questionnaire to the social sciences. Maybe ask the social science community questions like: when researching your topic, do you use the concept of validation during analysis and model building? Why do you use or not use validation? Are you aware of the practice of using validation? Having some open ended prompts can provide an opportunity for the respondent to reveal more/additional information. I would suspect you will get a multitude of different responses, but there could be a few that stand out as more common -- at least from a conceptual/general idea and not specifics.

DS

statman · Apr 23, 2025 11:30 AM

Hmmm, inadequate to involve subject matter experts that work in the field of social sciences to be involved in the development of the survey or sampling plan or DOE or in the interpretation of data derived from such tools?

Why are they not "amenable to scientific experimentation in the same way as the physical sciences"? Agreed the measurement systems are quite challenging, but how does validation (holding a subset of data out) improve this?

"All models are wrong, some are useful" G.E.P. Box

dlehman1 · Apr 23, 2025 12:06 PM

I find that a very strange comment. Suppose we want to estimate the effect of tariffs on GDP. Experiments are difficult and expensive to run, and take time - during which many other things are changing (of course, we are running such an experiment now!). This is quite different than experiments in the social sciences. Yet, there are such experiments in the social sciences, and sometimes there are natural experiments that can be used (such as existing tariff regimes that exhibit much variation). So, I don't rule out experiments in the social sciences, but I have a hard time thinking that you believe they are just as feasible as in the physical sciences. And, I'd view medicine as somewhere between - there are many RCTs, but due to expense and ethical concerns, these are usually far smaller sample sizes than we would like (thereby omitting many subgroup analyses that we would want).

You ask how validation can improve things. I don't think it is sufficient to replace the ideal experiments you would want, but I think it is necessary in their absence. If the model does not work for data we have, why should we believe it would work in the future?

statman · Apr 23, 2025 01:28 PM

No doubt I am strange and likely think differently than you. If you understand the differences between enumerative and analytical situations, I will let you know I am completely biased to the analytical approach (I'm a devout determinist). Using both directed sampling (based on hypotheses) and experimentation (with emphasis on how to increase the inference space while simultaneously increasing the design precision). I am less "interested" in explanatory studies and more interested in predictive modeling (though both may be useful).

No experiments are "ideal" (we wouldn't know if they were anyway). This is why I always propose (and recommend) the investigator develop multiple different experiment/sampling plans. Each plan should be evaluated for potential knowledge gained (e.g., what can be assigned (model), what is confounded, what is restricted (inference)) and that potential knowledge compared to the resources required.

I don't want to discuss and politics. I believe there is a cause(s) for every effect. I don't care what the discipline is. Every discipline has its challenges with using the data acquisition tools. That is no excuse for not using them.

"All models are wrong, some are useful" G.E.P. Box

Discussions

Conceptual question about validation

Re: Conceptual question about validation

Re: Conceptual question about validation

Re: Conceptual question about validation

Re: Conceptual question about validation

Re: Conceptual question about validation

Re: Conceptual question about validation

Re: Conceptual question about validation

Re: Conceptual question about validation

Re: Conceptual question about validation

Recommended Articles