Hello @maryam_nourmand,
Your question concerns a broad topic, and there may be (a lot) of questions to adress and answer before answering this question:
- How is the data collected ? Observational study, experimental design, ... ? Representativeness and completeness of the dataset ? An Exploratory Data Analysis may be helpful to detect some patterns and possible pitfalls regarding the assumptions in regression models, like multicollinearity which may require adapted model like PLS or pre-processing steps like PCA.
- Objective of the model(s) ? Causal explanations, prediction & optimization, or both (also linked to the available dataset and collection method) ?
- Validation strategy/feature selection ? How to ensure the model(s) created has the right level of complexity and still has good predictive performance for example ? Do you assess the model performances and robustness through a "standard" Machine Learning validation strategy (with cross-validation or train/validation/test splits), or through a "statistically-oriented" approach, based on likelihood, information criteria (AICc, BIC), p-values ... ? Note that the model complexity should also be directly limited by the data collection : if you have factors with 3 different levels for example, you won't be able to fit higher terms than 2nd order terms.
- Evaluation/selection metrics and threshold ? How do you evaluate the models ? What is the selection process/criterion : do you select the ones with the best predictive results with the selected metric, or do you select all models which have a better performance than a benchmark model or a naive model, ... ? How do you finally test the model ?
Some of these questions and answers were already described in previous posts :
https://community.jmp.com/t5/Discussions/Statistical-Significance/m-p/765928/highlight/true#M94573
https://community.jmp.com/t5/Discussions/Analysis-of-split-plot-design-with-full-factorial-vs-RSM/m-...
Creating, comparing and selecting model(s) require evaluation metrics linked to your objective and thresholds/citeria to select one or several models. If you simply want the best predictive model, you could :
- Create a model with a standard ML validation strategy (cross-validation for example) or a strategy able to control the model's complexity,
- Use one or several metrics linked to predictive accuracy, like RMSE, MAPE, ...
- Compare models based on the metric(s) and domain expertise : which one(s) is/are the most appropriate/relevant for your topic and which ones have the best performances,
- Choose to estimate individual predictions with the selected model(s) to see how/where they differ, and/or to use a combined model to average out the prediction errors.
- Test the model in "real" situation/production environment.
You might be interested by these ressources as well :
Which Model When?
Specifying and Fitting Models
Building and Understanding Predictive Models
Building Predictive Models for Correlated and High Dimensional Data
Building Better Predictive Models Using JMP Pro - Model Screening (it might help screening parametric and Machine Learning models options and compare them simultaneously)
Predictive Modeling
STIPS Module 7: Predictive Modeling and Text Mining
What Model When? (and Which Modeling Type?) - (2023-US-PO-1509)
Choosing Models in JMP with Model Selection Criteria - (2023-US-30MP-1456)
Data Mining and Predictive Modeling
I hope this first discussion starter will help you,
Victor GUILLER
Scientific Expertise Engineer
L'Oréal - Data & Analytics