Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

- JMP User Community
- :
- Discussions
- :
- Cook's D influence

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Apr 16, 2019 11:39 AM
(6855 views)

In my regression analysis I saw couple of outliers and I ran my model by excluding the outliers to see if my results change and it did. However, when I looked at Cook's D influence values fro those data points they were all less than 1. Does that mean that those data points are not outliers and have no effect on the model?

Thanks

2 ACCEPTED SOLUTIONS

Accepted Solutions

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

What was the basis for your initial conclusion that you 'saw a couple of outliers?'

How much did the results of your regression analysis change? In what way did they change?

Also, how many observations do you have? How many predictors did you use in the model? Are any of them collinear with others? Does your model include terms as transforms like square (X^2) or crossed (X1*X2)?

Cook's D is based on the sum of the differences for every observation between the predicted response using the full data set and the predicted response using the leave one out set. So one or two highly influential observations might not result in a large Cook's D if there are many observations.

Learn it once, use it forever!

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Created:
Apr 16, 2019 1:23 PM
| Last Modified: Apr 16, 2019 1:31 PM
(6832 views)
| Posted in reply to message from billi 04-16-2019

A general rule of thumb for cutoff on Cook's D is to use 4/n. If your data had 40 data points, for example, a Cook's D > 0.1 would be considered influential. Using 1 may not be the best choice.

And to echo one of Mark Bailey's comments: an outlier is not always influential.

Dan Obermiller

4 REPLIES 4

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

What was the basis for your initial conclusion that you 'saw a couple of outliers?'

How much did the results of your regression analysis change? In what way did they change?

Also, how many observations do you have? How many predictors did you use in the model? Are any of them collinear with others? Does your model include terms as transforms like square (X^2) or crossed (X1*X2)?

Cook's D is based on the sum of the differences for every observation between the predicted response using the full data set and the predicted response using the leave one out set. So one or two highly influential observations might not result in a large Cook's D if there are many observations.

Learn it once, use it forever!

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Cook's D influence

Cook's Distance is a measure of a data point's influence. It doesn't flag outliers, just data points that contribute strongly to the model. I've used Cook's D as part of investigations before I remove outliers, but I wouldn't call it justification unto itself for excluding an inconvenient data point.

This article is a pretty good summary: Cook's Distance

M

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Created:
Apr 16, 2019 1:23 PM
| Last Modified: Apr 16, 2019 1:31 PM
(6833 views)
| Posted in reply to message from billi 04-16-2019

A general rule of thumb for cutoff on Cook's D is to use 4/n. If your data had 40 data points, for example, a Cook's D > 0.1 would be considered influential. Using 1 may not be the best choice.

And to echo one of Mark Bailey's comments: an outlier is not always influential.

Dan Obermiller

Highlighted
##

Thank you all for response. An outlier is not always influential is what I wanted to know. Thank you Dan for the viz.this helps.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Cook's D influence

Article Labels

There are no labels assigned to this post.