Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

- JMP User Community
- :
- Discussions
- :
- RE: Distribution and Data Representation

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Created:
Aug 20, 2018 4:13 PM
| Last Modified: Aug 20, 2018 5:43 PM
(4031 views)

Hi,

I have a large dataset I'm tasked to group and present with error bar. I'd like to get some advice on general distribution interpretation.

1. From the dataset, some reject the Ho and some don't. Is it a fair statement to say those have small p-value are NOT from a normally distributed set while those have large (>0.05) p-value are likely from a normally distributed set (given large enough sample size)?2. When a dataset is NOT normally distributed does it mean there are some factors that are systematically influencing the dataset?

3. Does the dataset being normally distributed have any relevancy how I want to present the data with error bar? say, mean/median __+__ (max/min - median)

Thanks,

Gary

2 ACCEPTED SOLUTIONS

Accepted Solutions

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Hi,

A lot of the answers to your questions depend on the objective you have for analysing the data.

In the examples you have shown, even the one that is apparently non-normal looks close to being normal. You will find that as the data size increases a lot of things start to become statistically significant. You have quite a lot of data. An effect or difference, like this deviation from normal, can be statistically significant but can be so small as to be practically unimportant. Hence, it depends on your objectives.

"Is it a fair statement to say those have small p-value are NOT from a normally distributed set while those have large (>0.05) p-value are likely from a normally distributed set (given large enough sample size)?" - that is fair to say but a statistician might be more pedantic. You could more properly say that where p < 0.05 there is evidence to reject the null hypothesis that the data are from the normal distribution.

Whether it is normal or not, the median is still a meaningful summary statistic - it is the mid-point of the distribution. The more the data deviates from normal, the less useful the mean is as an estimate of lcoation.

I hope this all helps,

Phil

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

I would say that the median is a more useful representation when when the data is non-normal.

3 REPLIES 3

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Hi,

A lot of the answers to your questions depend on the objective you have for analysing the data.

In the examples you have shown, even the one that is apparently non-normal looks close to being normal. You will find that as the data size increases a lot of things start to become statistically significant. You have quite a lot of data. An effect or difference, like this deviation from normal, can be statistically significant but can be so small as to be practically unimportant. Hence, it depends on your objectives.

"Is it a fair statement to say those have small p-value are NOT from a normally distributed set while those have large (>0.05) p-value are likely from a normally distributed set (given large enough sample size)?" - that is fair to say but a statistician might be more pedantic. You could more properly say that where p < 0.05 there is evidence to reject the null hypothesis that the data are from the normal distribution.

Whether it is normal or not, the median is still a meaningful summary statistic - it is the mid-point of the distribution. The more the data deviates from normal, the less useful the mean is as an estimate of lcoation.

I hope this all helps,

Phil

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

RE: Distribution and Data Representation

Thanks Phil,

This is very helpful. Would it be fair to say it's safer to use the median to represent a non-normal distributed dataset rather than mean?

Gary

This is very helpful. Would it be fair to say it's safer to use the median to represent a non-normal distributed dataset rather than mean?

Gary

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

I would say that the median is a more useful representation when when the data is non-normal.

Article Labels

There are no labels assigned to this post.