cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Browse apps to extend the software in the new JMP Marketplace
Choose Language Hide Translation Bar
jeff_kolton1
Level II

Significance of data?

My ANOVA analysis indicates that there is no statistical difference between groups.  However, when I plot as a simple chart, there is obviously a downward trend to the data.  Which do I "believe"?

10496_ANOVA.jpg10497_Simple.jpg

1 ACCEPTED SOLUTION

Accepted Solutions
susan_walsh1
Staff (Retired)

Re: Significance of data?

The ANOVA is used to determine if there is a statistically significant difference between the means of the groups. In this case you can see that the means diamond for every group overlaps the overall mean of the data, so it is not surprising that a statistically significant difference is not found.

 

If you have the data to create a continous variable for age rather than a categorical variable, then a regression analysis would be done to test if the slope of the line is significantly different than zero. This would be the analysis you would need to identify a trend.

View solution in original post

8 REPLIES 8
Kevin_Anderson
Level VI

Re: Significance of data?

Hi, Jeff!

You are making a critical error when you say "My ANOVA analysis indicates that there is no statistical difference between groups."  Don't feel bad.  You are not alone.  There is a tempest swirling around significance testing in science now, specifically regarding p value thresholds.

The correct statement is along the lines of "My ANOVA analysis barely fails to reject the Null hypothesis."  It does not say "There is no difference"...there clearly IS a difference; you can see it.  There is not enough evidence to detect a significant difference given these data, although it is very close.

You should design the test to be powerful enough to statistically detect the difference about which you technically care.  If you care about the difference you see with your eyes in these data, believe that.  And next time, design a balanced test that provides adequate power to detect that difference a priori.

Re: Significance of data?

Hi Jeff,

Look at your connecting letters report and your LSD threshold matrix further down the page from what you are showing above.  From what I can see you have at least 4 groups that are significantly different than 0391-0420 based on the overlap or lack there of the mean diamonds at the at upper and lower overlap marks.  The R-square also indicates there is some difference in the different groups. If there was little difference the R-square would be closer to zero.

Best,

Bill

ms
Super User (Alumni) ms
Super User (Alumni)

Re: Significance of data?

If there really is a downward trend, a regression model should be more effective for detecting it: Y versus a continuous variable such as individual age or group mean age.

Steven_Moore
Level VI

Re: Significance of data?

Have you place a fitted line through the data with the 95% confidence band?  If the line can be moved within the band into the horizontal position, then the "trend" is not significant - this is often the case.  So-called "trend lines" are often abused.  You might also look at the "connecting letters report" - very useful.

Steve
Steven_Moore
Level VI

Re: Significance of data?

Another thought:  A properly constructed Process Behavior Chart or even a Run Chart might show that you have a shift rather than a trend in the data.

Steve
Kevin_Anderson
Level VI

Re: Significance of data?

Perhaps I'm incorrect, but as a consulting statistician and a card-carrying member of the American Statistical Association, I feel I must strongly object to the direction I sense this thread may be taking.  Before we all start tripping through the Garden of Forking Paths, perhaps we should just take a breath, stop recommending different procedures, and understand some particulars about the data and the science behind it.

Where do the data come from?  What is "Y"?  Are they happenstance data, or the results of a designed experiment?  How was "Y" measured?  Does the measurement system discriminate acceptably?  How were the groups chosen?  From where does the imbalance in the group sample sizes arise?  There appear to be outliers in the data; are they correct and understood?  What is a technically significant difference?  What kind of decision are you trying to make from the results of this analysis?  Is the objective "data mining" or inference?  Was an ANOVA the planned analysis from the beginning?

I hope no one takes offense.  None is intended.  I think we all want to try to help.  But the answers to the questions (and more!) can mean the difference between help and harm.

jeff_kolton1
Level II

Re: Significance of data?

To answer your questions for a better understanding of your thought process:

· Data comes from actual measurements, completed on scales following removal of the equipment from the process after failure

· “Y” is metal loss in the process

· Discrimination is acceptable

· The groups sizes were selected arbitrarily

· Imbalance in the group sizes comes from natural failure of the equipment, requiring replacement

· The outliers are correct, but not clearly understood

· Due to the high cost of the metal, differences of .005 or more are significant

· I’m trying to determine a rate of metal loss for budgeting purposes

· ANOVA was not the planned analysis, but was used to make some inferences about the process

Any additional help would be appreciated.

Jeff Kolton

Staff Engineer

Enterprise Excellence

Fiberglass

940 Washburn Switch Rd.

Shelby, NC, USA, 28150

Tel: 704-434-2261 ext. 2374

E-Mail: kolton@ppg.com<mailto:kolton@ppg.com>

Web: www.ppg.com<http://www.ppg.com/>

susan_walsh1
Staff (Retired)

Re: Significance of data?

The ANOVA is used to determine if there is a statistically significant difference between the means of the groups. In this case you can see that the means diamond for every group overlaps the overall mean of the data, so it is not surprising that a statistically significant difference is not found.

 

If you have the data to create a continous variable for age rather than a categorical variable, then a regression analysis would be done to test if the slope of the line is significantly different than zero. This would be the analysis you would need to identify a trend.