I have been working with a scientist who is doing some experiments to sanitize various mediums form plant disease spores. The medium are infested with disease spores then the medium receive mitigating treatments. Then the surviving spores are counted. An initial count of spores is not available for each individual medium. A sample of untreated medium are tested to get the initial spore counts. A mean initial spore count is calculated for each medium type. The data is analyzed in JMP using GLM with a binomial distribution with the after treatment counts of surviving spores on individual medium, the mean initial spore counts are second variable (denominator) in the binomial analysis. With this analysis the probability of surviving spores is reported based on using the mean spore count from a sample of the medium. Are there any issues with this approach? I realize it would be preferable to have an initial count for each medium. Is using the mean initial spore medium counts an issue in this analysis? We plan to report that we are using the mean initial spore counts from a separate sample of the mediums.
If the method to infest each of the individual media (samples) is consistent, then you might assume that the initial spore count is the same. You could then use a GLM with Poisson distribution and log link function without an offset. Enable the option for over-dispersion for a better fit. (Check to see if it is significant.)
The binomial distribution that you chose assumes a dichotomous response but your ratio is not likely to be so.as I understand it. It is continuous, no?
Thanks for the comments and suggestions.
Some of my terminology was not the best. The use of the work survival and
infected is miss leading. A lab populates the medium with viable spores.
The focus is viable spores on the medium and specifically the probability
of viable spores on the medium.
Let me see clarify what we are doing. This is a sanitation study. The
two mediums are steel washers and a square piece of wool. So no spores
would live on the medium if they sprouted. The treatments are to sanitize.
The spore counts are viable spores. We want to know how effective the
treatments are at cleaning up the spores. The initial counts on the wool
averaged 2.2 million, the steel washer averaged 8500. Knowing the
probability of a viable spore on the medium after treatment is valuable.
The counts are integers. They are not continuous. I have used GLM with the
Pioson distribution as Mark suggested. I find significance but not as much
as with the binomial. The probability of a viable spore on the medium is of
interest and the treatments that significantly affect it. Is the binomial
telling me this? A low probability of a viable spore tell us there is a low
likelihood of spreading viable spores.
If the binomial is not the correct approach could the GLM Pioson treatment
estimates be divided by the initial spore estimates to produce the
probability of a viable spore on the medium. Would this be a better way to
estimate this probability?
Or do you have another approach to suggest. Thanks.
I'm not exactly sure what you're trying to get out of the experiment or how the experiment is set up, so a few more details might be useful.
I suspect that you are comparing the effect of different treatment durations with different disinfectants (chemical, or physical (heat,steam)).
Often one is interested in the log decrease in spore or bacteria counts. For example, I spike my apple sauce with 10^8 cfu/ml of my favorite organism, and then run jars through several sterilization cycles (temp and time). Then I open the jars and do counts on a couple samples from each jar and use the data to find a minimum time temperature combination that gives me a 3 log (or 6) log decrease in counts. The same kind of experiment might be run on a surface, dose the surface, recover with and without treatment, and compare the results.
In either case the response is not binomial (0 or 1) unless I'm only looking at whether I got greater than or less than a 6 log decrease or maybe complete sterilization vs. not complete. In most cases scientists (and regulators) are more interested the estimate of the log reduction and the confidence intervals around the estimate (and whether you can operate in a condition where the CI is below the required threshold.)
I am not sure what you are trying to analyze as your dependent variable given your description above. It sounds like you are trying for a binary "contaminated/not contaminated" response, but then you are using a simulated initial spore count and an actual post-treatment spore count as your only input variables? I am a little confused as to your goal here because if there are any spores surviving post-treatment, then your medium is contaminated. Your probability is automatically 100%. Now, if some variables related to the treatment (e.g., time of treatment or temperature or...) were included in the model such that you were trying to determine the effectiveness of your treatment, and the final number of spores post-treatment were your response, then I could see the point of this exercise. And to Mark's point, if your response is the post-treatment spore counts, then you'd want the Poisson distribution and log link function for the GLM. Or am I missing something here?