Discussions

PhamBao

Hi team,

Currently, I am quite new to regression model. Currently, I try to use Fit model to create my regression model. The reason behind that I would like to screen out bad data points based on history data. For example, I have history data set and I create the model from this data set. After getting the model, I try to predict output and compare difference between predicted and actual values. The higher residual is, the higher chance this data point is abnormal

During performing Fit model, I found that picking different parameters will generate different outcomes. After having the model, I try to take a look on data points with high Studentized Residual and validate whether they are valid or not . I need to repeat the task of picking parameters a lot of time to get the proper model. Lets take a look on an example below

Model 1:

Model 2:

As you can see that, Model 2 has a lot of data points with high Studentized Residual compared to Model 1's. When I validated these data points with high Studentized Residual, these data points were data points categorized as Bad. It seems that Model 2 is more robust

My question is although 2 models have high RSquare scores, why Model 2 could screen out more bad data points. It could be because of parameters that I picked for creating the model
Questions:
-If I classify data points as Good/Bad in new column of data set, is there any method in JMP could suggest which parameters that I should pick, so that the model could be more robust to screen out Bad data points.

-If I do not classify data points , is there any method in JMP could suggest which parameters that I should pick, so that the model could be more robust to screen out Bad data points.

Hopefully I could get advice from the community

Appreciates

Discussions

How to pick independence parameters to optimize regression model

Recommended Articles