Share your ideas for the JMP Scripting Unsession at Discovery Summit by September 17th. We hope to see you there!
Choose Language Hide Translation Bar
Highlighted
bgeaibreyi
Level I

RE: Distribution and Data Representation

Hi, 

I have a large dataset I'm tasked to group and present with error bar. I'd like to get some advice on general distribution interpretation.

 

1. From the dataset, some reject the Ho and some don't. Is it a fair statement to say those have small p-value are NOT from a normally distributed set while those have large (>0.05) p-value are likely from a normally distributed set (given large enough sample size)?Ho fail to rejectHo fail to reject2. When a dataset is NOT normally distributed does it mean there are some factors that are systematically influencing the dataset? Ho rejectHo reject

3. Does the dataset being normally distributed have any relevancy how I want to present the data with error bar? say, mean/median + (max/min - median)

 

Thanks,

 

Gary

2 ACCEPTED SOLUTIONS

Accepted Solutions
Highlighted
phil_kay
Staff

RE: Distribution and Data Representation

Hi,

 

A lot of the answers to your questions depend on the objective you have for analysing the data.

 

In the examples you have shown, even the one that is apparently non-normal looks close to being normal. You will find that as the data size increases a lot of things start to become statistically significant. You have quite a lot of data. An effect or difference, like this deviation from normal, can be statistically significant but can be so small as to be practically unimportant. Hence, it depends on your objectives.

 

"Is it a fair statement to say those have small p-value are NOT from a normally distributed set while those have large (>0.05) p-value are likely from a normally distributed set (given large enough sample size)?" - that is fair to say but a statistician might be more pedantic. You could more properly say that where p < 0.05 there is evidence to reject the null hypothesis that the data are from the normal distribution.

 

Whether it is normal or not, the median is still a meaningful summary statistic - it is the mid-point of the distribution. The more the data deviates from normal, the less useful the mean is as an estimate of lcoation.

 

I hope this all helps,

Phil

View solution in original post

Highlighted
phil_kay
Staff

RE: Distribution and Data Representation

I would say that the median is a more useful representation when when the data is non-normal.

View solution in original post

3 REPLIES 3
Highlighted
phil_kay
Staff

RE: Distribution and Data Representation

Hi,

 

A lot of the answers to your questions depend on the objective you have for analysing the data.

 

In the examples you have shown, even the one that is apparently non-normal looks close to being normal. You will find that as the data size increases a lot of things start to become statistically significant. You have quite a lot of data. An effect or difference, like this deviation from normal, can be statistically significant but can be so small as to be practically unimportant. Hence, it depends on your objectives.

 

"Is it a fair statement to say those have small p-value are NOT from a normally distributed set while those have large (>0.05) p-value are likely from a normally distributed set (given large enough sample size)?" - that is fair to say but a statistician might be more pedantic. You could more properly say that where p < 0.05 there is evidence to reject the null hypothesis that the data are from the normal distribution.

 

Whether it is normal or not, the median is still a meaningful summary statistic - it is the mid-point of the distribution. The more the data deviates from normal, the less useful the mean is as an estimate of lcoation.

 

I hope this all helps,

Phil

View solution in original post

Highlighted
bgeaibreyi
Level I

RE: Distribution and Data Representation

Thanks Phil,

This is very helpful. Would it be fair to say it's safer to use the median to represent a non-normal distributed dataset rather than mean?

Gary
Highlighted
phil_kay
Staff

RE: Distribution and Data Representation

I would say that the median is a more useful representation when when the data is non-normal.

View solution in original post

Article Labels

    There are no labels assigned to this post.