I am determining whether there is a significant difference in fruit quality across five locations (orchards) in WA state. I am making the assessment for five cultivars. Five quality indicators (Tannin, pH, TA, SSC, and SG), representative of fruit quality, were measured and only once per location per cultivar. More specifically, ~50 fruit were pressed and juiced into one composite sample and from this sample the indicators each measured once. I have data for three years, measurements repeated annually from 2012-2014. As I am again, determining whether there is a significant difference in fruit quality across Locations for each Cultivar separately, I attempted running a one-way ANOVA using Fit Model: inputting the five quality indicators in the Y role, Cultivar in the By role, and a Full Factorial of Year and Location in the Effects. As I am unfortunately missing data for some of the locations for varying years, I received a lot of 'lost DFs' errors and in testing the assumption of normality, Residuals by Cultivar, I observed that most of the data was not normally distributed, even after log transformation.
Given that there is missing data and only one data point per indicator per location per cultivar per year, is the data set a lost cause to analyze/sample size and power too small for me to make any confident conclusions at all, no matter the modifications? If not, anyone have suggestions for the errors?
Thank you very much
ANOVA relies heavily upon the assumption of normality. However, all is not lost! You could probably use a non-parametric method such as Mood's Median to determine differences, if any.
Thank you very much for taking the time to comment on my post, it is much appreciated. I have one question about the Median test before I run it however, I believe the null hypothesis is that the medians of the populations from which two or more samples are drawn are identical. My issue with this, if it wasn't clear from my explanation or maybe if I misunderstand the test, is that I only have one sample per population, not two, three, etc... (in this case the one data point I have for each cultivar at one of the five locations each year represents a population). However, if you could consider the one measurement from each composite juice, a representative mix of apples from each respective location/orchard, as a mean or median measurement in itself, than I could see running the Median test. I hope I making some sense.
Thank you again
I would start with graph builder, look at your data so that you understand what you have, then move on to modeling if needed/appropriate. If your variability is greater across locations that within location (where within location is across years within a location) then you could perhaps use ANOM (analysis of means) to compare quality by location by cultivar where now you have 3 data points per location per cultivar (one per year). But again, I would start with graph builder to understand the relationships from location to location within and between cultivars and over time.
Thank you for the input, much appreciated. I will definitely work with the graph builder. Missing data for an already small sample size, 1 datapoint per, limits my freedom greatly.
We decided to compromise and count the years as replicates, so that we at least have a sample size of 3 rather than 1. Unfortunately, I still have the issue of the assumption of normality being rejected, even with any transformation. We will be conducting another year of data, so the sample size will increase while still small, but I was wondering if you had any other advice on what I could do for the normality if anything. Pasted below is some of my output. I worked with graph builder which was helpful, but my supervisor would like some ANOVA output for a report coming due, so I am in a bind.
Thank you again for all the help.
Travis, It seems that you are at a dead end. It appears that your data was not structured for meaningful analysis. You can torture the data (i.e., transformations, etc.) until it "confesses," but you still end up with a confession under duress, and you know what that means. ;-)
Until you are able to collect data under meaningful and structured conditions, all you have is "tribal knowledge" upon which to make conclusions.