cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
  • Register to attend Discovery Summit 2025 Online: Early Users Edition, Sept. 24-25.
  • New JMP features coming to desktops everywhere this September. Sign up to learn more at jmp.com/launch.
Choose Language Hide Translation Bar
PhamBao
Level II

How I do predict outlier by fit model

Hi all,

Currently, I try to use Fit Model to create model, so that I could predict output from input variables as well as to make decision whether prediction is outlier or not. However, I am struggling that after I get the model, the model could not clearly point out outliers, highlighted in red

PhamBao_1-1751822005917.png

 



Even I tried to pre process data source such as excluding outliers from data source and then trained model, outliers not really be screened out 

PhamBao_3-1751822216351.png

 

PhamBao_2-1751822160892.png

Is there any way that I could train the model, so that outlier points could be distinguished 

Thanks

 

 

6 REPLIES 6

Re: How I do predict outlier by fit model

Hi @PhamBao ,

 

Outlier detection and classification can be a deep, deep pool to dip your toes in to with a number of different approaches. One thing you should consider is how you are defining an outlier (that isn't too clear in your post): are you looking for an outlier that doesn't fit your model, or are the points in red areas that you have defined as outliers?

 

As a quick thought, here's some areas to try out:

- Cooks D Outliers - this defines how much your model coefficients would change if you remove a point - generally a threshold of any value over 4/n (n= no of rows) is a good place to start). This link gives  a lot of guides as well as on Cooks D.

-Studentised Residual Plots - these give a good visualisation of the residuals from your rows and helps to define points that have entered a certain region as probable outliers. 

 

Check both of these links here and here for more info.

 

Thanks,

Ben

“All models are wrong, but some are useful”
Victor_G
Super User

Re: How I do predict outlier by fit model

Hi @PhamBao,

 

I agree with @Ben_BarrIngh, there are a lot of possibilities when dealing with the topic of outliers.

The first thing to consider is the nature of the outlier : model-based (linked to a specific model fit) or model-agnostic ?

The methods to identify outliers depend on their nature :

  • Model-agnostic outlier detection methods, like Mahalanobis distances, don't rely on a specified model and just compare distance between points based on variables/factors/features. So an outlier identified by this type of methods indicates that this point looks "strange" and doesn't seem to be part of the factors distributions of the other points. See the options in Outlier Analysis from Multivariate Platform.
  • Model-based outlier detection methods, based on residuals (Studentized residuals, or other metrics like PRESS RMSE/R2...) that enable to identify outliers that are not well fitted/predicted by the model. This may be an indication that the model may be missing some important terms (like interaction or non-linear effects), or may not be appropriate for the data. You can use the other diagnostics panels/tools to see if the model seem to fit well for your data, based on statistical significance of your model, different metrics according to your goal like information criterion/RMSE/R2... You can also check if this detected model-based outlier has a strong impact on your model by calculating Cook's distances: https://www.jmp.com/en_us/statistics-knowledge-portal/what-is-multiple-regression/mlr-residual-analy...If values for these points are high and/or unusual, it is an indication that these points could be influential and may bias your model. You should then investigate what are these points and if the measurements are valid, before deciding on diminishing their influence on the model or removing them. See options Row Diagnostics in platform Fit Model or Model Fit Options once the model is fitted.

Please take extra care about the cause/source of the outliers and their proper handling. Removing the outlier may make the analysis easier, but it won't solve your problem and won't prevent them to reappear.

Hope this answer may help you,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)
PhamBao
Level II

Re: How I do predict outlier by fit model

Hi @Victor_G  and @Ben_BarrIngh ,

Really appreciate your inputs. For me, my knowledge related to fit model and stats are limited. So just would like to seek some more advices. Let's me clarify what I want. Based on the image below, red data points are bad data points based on our manufacturing criteria. What I want is to separate bad data points far from the fit line. First time, I do Model 1, but seeing that bad data points are not really far from the fit line. Second time, I try to use same inputs from Model1, but this time I do polynomial 2 degree to one parameter, then I see that bad data point is far from the fit line compared to model 1

 

1. By comparison, summary of fit tables between Model 1 and Model 2 are comparable, but the model 2 is seem what I am looking for. Question - Is there method that I could separate bad data points in reality far from the fit line
2. For the model 1, I am not really understand why bad data points are quite near the fit line. Is there any method to separate it out
I am looking forward to your answers

 

PhamBao_0-1751974993803.png

 

Victor_G
Super User

Re: How I do predict outlier by fit model

Hi @PhamBao,

 

There are several topics in your question, one related to model fitting (and models comparison + validation) and one related to outliers detection.

  • Concerning model fitting, I would recommend comparing the models' metrics and analyze residuals. In your last screenshot, you can see that models are very similar, except for the addition of a quadratic effect in model 2. This quadratic effect addition has only slight benefice when looking at R²/R² adjusted and RMSE. It also enforces a non-random curvature pattern in the residuals, so I would really be cautious about the models evaluation and comparison. Maybe using validation points or splitting your dataset into training and validation set could help differentiate the most appropriate model for your data.
  • Concerning outliers detection, as already written in my first response, there are a lot of options available. If you already know these points are strange or anormal, why not labelling them as such and try to understand which factors (and range values) may cause these points ? You could create a classification model (like logistic regression) where your response would be a 2-classes response ("normal" and "outlier" for example) and your factors would be your input variables (like in the models shown).
    Depending on the results/performances of this classification model (confusion matrix, misclassification rate, ...) you can then investigate which factors cause the situation, and use predicted probabilities formula to predict the probability that a new point could be strange/outlier based on the measured values.

Hope these few suggestions may help you, 

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)
dlehman1
Level V

Re: How I do predict outlier by fit model

I believe Victor_G's 2nd point is exactly correct.  The points you have identified as "bad" are not really outliers in the traditional use of the term - they are not that far from the regression line.  The real problem is that you know (from subject matter knowledge) that there is something bad about those data points.  So, what you want is a model to predict those, and the regression model you provided does not do that.  I wouldn't search for different regression models - I think you really want a classification model to predict the points that you know to be problematic.  That is a different sort of model and most of the same methods can be used in JMP, but with a nominal response variable rather than the continuous response variable you have used.

Re: How I do predict outlier by fit model

In JMP, you can predict outliers by fitting a model (such as a regression or classification model) and then analyzing the residuals or prediction errors. After fitting the model, look for data points with unusually large residuals, leverage, or influence (e.g., using Cook’s Distance or studentized residuals). JMP also provides visual tools like the Leverage vs. Residual plot to help identify potential outliers. Once flagged, you can explore these points further to determine if they are true anomalies or data entry issues.

Recommended Articles