cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Have your say in shaping JMP's future by participating in the new JMP Wish List Prioritization Survey
Choose Language Hide Translation Bar
maryam_nourmand
Level III

best model

Hello
my question is how can i find best parametric model that fits very well on my dataset?
i want find a parametric model that can predict good my response 

4 REPLIES 4
Victor_G
Super User

Re: best model

Hello @maryam_nourmand,

 

Your question concerns a broad topic, and there may be (a lot) of questions to adress and answer before answering this question:

  • How is the data collected ? Observational study, experimental design, ... ? Representativeness and completeness of the dataset ? An Exploratory Data Analysis may be helpful to detect some patterns and possible pitfalls regarding the assumptions in regression models, like multicollinearity which may require adapted model like PLS or pre-processing steps like PCA.
  • Objective of the model(s) ? Causal explanations, prediction & optimization, or both (also linked to the available dataset and collection method) ?
  • Validation strategy/feature selection ? How to ensure the model(s) created has the right level of complexity and still has good predictive performance for example ? Do you assess the model performances and robustness through a "standard" Machine Learning validation strategy (with cross-validation or train/validation/test splits), or through a "statistically-oriented" approach, based on likelihood, information criteria (AICc, BIC), p-values ... ? Note that the model complexity should also be directly limited by the data collection : if you have factors with 3 different levels for example, you won't be able to fit higher terms than 2nd order terms.
  • Evaluation/selection metrics and threshold ? How do you evaluate the models ? What is the selection process/criterion : do you select the ones with the best predictive results with the selected metric, or do you select all models which have a better performance than a benchmark model or a naive model, ... ? How do you finally test the model ?

 

Some of these questions and answers were already described in previous posts :

https://community.jmp.com/t5/Discussions/Statistical-Significance/m-p/765928/highlight/true#M94573

https://community.jmp.com/t5/Discussions/Analysis-of-split-plot-design-with-full-factorial-vs-RSM/m-... 

 

Creating, comparing and selecting model(s) require evaluation metrics linked to your objective and thresholds/citeria to select one or several models. If you simply want the best predictive model, you could :

  1. Create a model with a standard ML validation strategy (cross-validation for example) or a strategy able to control the model's complexity,
  2. Use one or several metrics linked to predictive accuracy, like RMSE, MAPE, ...
  3. Compare models based on the metric(s) and domain expertise : which one(s) is/are the most appropriate/relevant for your topic and which ones have the best performances,
  4. Choose to estimate individual predictions with the selected model(s) to see how/where they differ, and/or to use a combined model to average out the prediction errors.
  5. Test the model in "real" situation/production environment. 

 

You might be interested by these ressources as well :

Which Model When? 

Specifying and Fitting Models 

Building and Understanding Predictive Models 

Building Predictive Models for Correlated and High Dimensional Data 

Building Better Predictive Models Using JMP Pro - Model Screening (it might help screening parametric and Machine Learning models options and compare them simultaneously)

Predictive Modeling 

STIPS Module 7: Predictive Modeling and Text Mining 

What Model When? (and Which Modeling Type?) - (2023-US-PO-1509) 

Choosing Models in JMP with Model Selection Criteria - (2023-US-30MP-1456) 

Data Mining and Predictive Modeling 

 

I hope this first discussion starter will help you,

Victor GUILLER
Scientific Expertise Engineer
L'Oréal - Data & Analytics
maryam_nourmand
Level III

Re: best model

 

If I want to explain my goal more precisely:

I have an initial dataset related to cancer patient data. I want to find a suitable parametric statistical model that best fits my dataset. Using this model, I aim to simulate data in order to apply a shift in the model's intercept. Ultimately, I want to see how quickly my pre-existing machine learning model can detect this shift in a control chart (i.e., obtaining the ARL). For this purpose, I need the best-fitting model on my data so that I can simulate data from it.

Victor_G
Super User

Re: best model

Hi @maryam_nourmand,

 

Ok, then you'll be more likely in a Data Mining approach, with a fixed dataset of observational data where you try to fit an acceptable predictive model. Using validation columns (with stratification on your features) will help validate (to avoid overfitting) and test your model on production data. In terms of modeling strategy, you'll very likely use stepwise approaches to select features, and Generalized Regression approaches with validation column method.

 

As an example, I used the Cancer_Data dataset from Kaggle to predict if there is a benign or malign cancer based on individual characteristics : https://www.kaggle.com/datasets/erdemtaha/cancer-data?resource=download

After a first exploratory data analysis focussed mainly on distributions and correlations between features, I created a validation column formula (stratification on the features) with 70/20/10 ratios for training/validation/test sets, and then used a Generalized regression model with validation column method, and all main effects features and 2-features interactions terms entered as possible terms in the model. I choose an adaptative Elastic Net as features are strongly correlated, but there might be other options as well, like PLS or PCA pre-processing.  

You can then save the formula of this model, and use the Prediction Profiler to create simulations, to assess impact of features effects on the response, and possibly estimate the effect of increasing noise on the predicted response.

 

I hope these few options may help you for your topic,

Victor GUILLER
Scientific Expertise Engineer
L'Oréal - Data & Analytics
dlehman1
Level IV

Re: best model

Your response triggers me to ask:  what do you mean by a "parametric statistical model?"  I usually think of machine learning models as non-parametric, so are you excluding such models here.  It is unclear since you say you have a pre-existing machine learning model.  Are you wanting to compare a parametric and non-parametric model?  If you use the model screening platform, you can build a number of both types of predictive models.  Using whatever you find to be the "best fitting," you can then save the prediction formula in order to do simulations.  I'm not entirely sure what you mean by a "shift in the model's intercept" but I think you could just put an additive disturbance into the formula to generate the simulated data.