Statistical modeling is a powerful tool for developing and testing theories by way of causal explanation, prediction and description. In many disciplines, there is near-exclusive use of statistical modeling for causal explanation with the assumption that models with high explanatory power are inherently of high predictive power. Conflation between explanation and prediction is common, yet the distinction must be understood for progressing scientific knowledge and for proper use in practice.
Understanding the differences between explanatory and predictive modeling and assessment is crucial for being able to assess a data set’s information quality – its potential to achieve a scientific/practical goal using data analysis. While the explain-predict distinction has been recognized in the philosophy of science, the statistical and data mining literature lack a thorough discussion of the many differences that arise in the process of modeling for an explanatory versus a predictive goal. In this talk I will clarify the distinction between explanatory and predictive modeling and reveal the practical implications in terms of data analysis.