Smart Subsampling vs. Brute Force: A Strategic Approach to Predictive Modelling
Handling large data sets continues to present unique challenges, even in an era where advanced machine learning algorithms can process vast amounts of information. Relying on brute-force techniques to analyze massive data sets can lead to inefficiencies, model overfitting, noise accumulation, and diminishing returns from adding more data.
Intelligent subsampling, which involves selecting a representative fraction of the data, often provides a more targeted and insightful approach. Subsampling encourages more interpretable models, as the reduced data set size simplifies the relationships between variables. For these reasons, smart subsampling should be a preferred approach for a wide range of applications, including material science, biomedical research, environmental modelling, marketing analysis, and social sciences.
But why go brute force when you can go smart? Through an interactive demonstration using the latest capabilities of JMP Pro in the field of complex material formulation, this presentation shows that a well-designed subsampling approach, combined with both classical and advanced modeling techniques (multilinear regression, neural nets, SVM, generalized regression) can lead to robust predictions.