Re: Choosing the right Comparison of means

bobmorrane · May 15, 2023 09:11 AM

Hello,

I'm analyzing biological data where the output is a continuous variable. The tests consists of different treatments on plants which are then assessed for a level of disease (my output). If I plot a graph, all I get is a bunch of means with very large error bars that overlap each other. So, I'm trying to do the analysis with the little letters, to confidently be able to say which treatments are different from each other.

Now the issue I have, is that each repeat consists of two data points which are not independent from each other (two parts of the same plant). In this case I heard that usual comparisons like Tukey-Kramer don't apply.

So my question is, which comparison should I use, or which data transformation, to make sure I'm doing things right?

~~Bob~~

bobmorrane · May 23, 2023 06:39 AM

Hi @statman ,

1) is the variation in your response of any practical value?

Good question. Do it is a level of disease that can be measured relatively easily to the 0.1 unit. That being said, in a real life scenario, the end-user would not see a difference that small. I'd say the evarage values between two treatments (what you call levels) would need to have at least 4-5 units difference to really be of use.

2. If I understand your data, it looks like you have one factor (Name) and it is tested at 10 "levels". You have 2 Plant types (crossed with Name). You have 10 plants (Rep) for each level of Name and each plant is measured twice (is this in different locations on the plant or the exact same location on the plant? Two different parts of the plant ) and measures of Disease over 3 time periods for "within" plant (you might consider the plants nested within Type and Name). We could debate whether the systematic sampling of time periods is crossed with Name. I have some questions regarding this structure. The numbering scheme for Rep looks questionable. Can the same plant be given a different level of Name? If not, there should be no repeated Rep numbers.How many plants are actually in the study? 10 plants per "Name". They are grown independently. Then when treatment is applied, two leaves of each plants are isolated and the disease is applied. Because two leaves come from the same plant, my colleagues made the argument that the two leaves are NOT independent statistical units, hence why they are regrouped under the same "rep". It's like if we assessed 10 patients on their right hand and left hand.

It also looks like you have 2 measures of disease for each plant? Yes, two measures on two different leaves. I have attached another version of the data table. It is imperative the data table match how the data was actually collected or any analysis will be suspect.

What is the intent of the study? (e.g., Are you trying to pick the "winner" (e.g., best Name to reduce max disease?) or trying to understand what contributes to disease (growth))? We're trying to see if we can find additives (what I called adjuvant in my table) that significantly improve disease reduction from the active. So, what we want to see :

The active alone must be better than the untreated
Hopefully, we can find some treatments of active + adjuvant that are significantly better than the active alone

Why 10 plants? Cause we thought it would be a good number. Given the relatively high variability of the output and the relatively low efficacy of the treatments, it's kind of a minimum. Also, since they are grown plants, it's difficult to go higher than that, cause there's already 200 plants in this trial. That takes a lot of space and handywork. We could have more plants per treatment, to maybe get better stats, but then we'd loose on the number of treatments, which is not desirable. So 10 is a good compromise.

Why 3 time periods, each a day apart? Because it's easy to do. Once you measure the disease at one particular time, given all the effort you've put in to get there, it doesn't cost much more to wait another day and make another observation. It's quite usual with this kind of study. It depends on the nature of the disease of course, but this one progresses fast enough that you can see some growth within a one day difference.

Are you interested in the rate of change of disease over time or just maximum disease or what? I guess the question is what do you hypothesize the Name will do to disease? Why? Not rate of change, just absolute value. Basically, you got a good plant you want to protect against an bad disease. The idea is to minimize the amount of disease observed on treated plants. In an ideal world, you want 100% efficacy (aka zero disease). Or at least a significant reduction on the level of disease so that bad disease doesn't rob the yield from the good plants. But there is only so much the active can do on its own. That's where the additives (or adjuvants) come in. Ideally, we want to find additives that will have a synergystic effect with the active and in the best best case, the additive doesn't have much of an effect on its own. But here, there's clearly nothing exciting, which is why I'm trying to dig deeper to see if we can draw any conclusions. Usually, I'd be happy just looking at a Student's and a Dunett's test on the means. (I know student is not so suitable with more than 2 levels, but then Dunnett's can be a bit too restrictive, so a comparison of the two can be interesting). Here Dunnett's says there are no differences between the different levels, and student says maybe additive E is bringing something.

3. Have you assessed the measurement system? Not sure what that means. The values are a physical measurement that is easily done, so I wonuldnet worry about it too much.

Did you measure the disease multiple times on the same location on same plant at the same time period? yes, it's measured at three different time periods on the same leaves (day1, 2 and 3).

As it appears now, the measurement errors are likely confounded with within plant? no idea

~~Bob~~

bobmorrane · May 23, 2023 07:56 AM

Sorry, I meant Student and Tukey, not Student and Dunnett's.

~~Bob~~

statman · May 23, 2023 01:52 PM

Bob, nice of you to take the time to answer the questions. I don't think this forum is an efficient way to discuss/argue the issues and appropriate analysis so I'll bow out.

"All models are wrong, some are useful" G.E.P. Box

Byron_JMP · May 22, 2023 02:15 PM

Regarding: "By the way gents, it's quite a bit of a pain to generate these connecting letters report in the fit Y by X tool. because I have 6 groups, that's 6 red triangles to click for the compare means. It's needlessly tedious. Do you jnow of a method to make things quicker?"

Yes, Much quicker. Hold down your control key (command on Mac) when you do compare means for the first by group. The Control key broadcasts your key stroke to all like objects in the window. It works for resizing graphs too.

(still looking at the bigger question)

JMP Systems Engineer, Health and Life Sciences (Pharma)

bobmorrane · May 23, 2023 07:55 AM

@Byron_JMP ,

thanks for the tip with CTRL, it does save a lot of time. Though the bigger issue remains, it's still highly impractical to Add the letters on a graph.

~~Bob~~

Byron_JMP · May 23, 2023 08:00 AM

I agree, its very difficult to get the letters on the graph.

Please go to the Wishlist tab at the top of this web page and add a request.

In the mean time, here is a procedure and script to get the figure I think you are looking for.

https://community.jmp.com/t5/Byron-Wingerd-s-Blog/One-Way-ANOVA-Figure-for-Scientists/ba-p/265156

JMP Systems Engineer, Health and Life Sciences (Pharma)

Choosing the right Comparison of means