Re: Selecting a test to determine when to end an experiment?

Report Inappropriate Content · Jun 8, 2023 5:30 PM

We have an experiment testing a population's response to a chemical with three sampling points in time. The measurements at these time points are the dosage with a significant effect on the test population.

I want to know if the measurements at the first and last time points (middle time point too) are normally equal or different. It would save money to end the experiment at sampling point 1 if the results are typically representative of the later time points (2 or 3). I have a large data set of test of 100 chemicals I would like to do this analysis on

I am at a loss for structuring this and I have spend hours playing around with different approaches. Because the dosage values with an effect are not random, they depend on the range of dosage values selected in the experiment, and the effect magnitude is a function of the the organism's response to that specific chemical, I would think that a repeated measures anova is not correct. Also the data is non-normal even after log transformation. I just care if the value is the same or different during the duration of the experiment

Maybe a chi square would be the correct here with discrete value "is the same or not"? Maybe an agreement statistic, but that would only work with two time cases. I just care if the value is the same or different across the three time points

This is the current structure

Dosage with an effect

Day 10 Day 20 Day 30

Chem x .5 .5 .3

Chem y 2 1 .2

Chem z 1 .15 .15

Mark_Bailey · Mar 11, 2021 09:35 AM

Set the analysis up as a two-way ANOVA, with Chem and Day as fixed effects, with an interaction. So three columns: Chem, Day, and Y. Then select Analyze > Fit Model. Select Y and click Y. Select Chem and Day and click Macros > Factorial. Click Run.

Evaluate the Effect Tests for the Day and Chem*Day terms.

joemama985 · Mar 11, 2021 7:18 AM

I get overfitting when I try to do it that way. A few of the tests are missing the second time point and many of the values are repeats if the measurement didn't change across sampling points. I am wondering if that is what lead to this outcome. Very odd that I dont have any DF for my day variable. Any suggestions? Thank you!

Mark_Bailey · Mar 11, 2021 10:34 AM

I am sorry. The two-way ANOVA model not be the most efficient give the limited data that you have. There is no replication, so my first approach results in a saturated model.

It is reasonable to instead treat Day as a continuous linear fixed effect Be sure that the modeling type for Day is Continuous, not Nominal or Ordinal. Then follow the same steps as I first described. You should get a result like this:

These results suggest that there is a difference in the response across Chem levels, but the Day makes not significant difference.

joemama985 · Mar 11, 2021 10:52 AM

Even when changing day to continuous I get overfitting:

Interestingly if I model Pesticide as a nested random effect under the type of pesticide (nominal classification) and structure the comparison like a repeated measures mixed model I get meaningful results:

I am conflicted thought, because part of me thinks that pesticide shouldn't be random in this instance. But maybe for the question I am interested in it should be.

joemama985 · Mar 11, 2021 10:57 AM

I guess a nested two way anova will work for my needs. But I wonder why it was overfitting with the non nested?? Let me know if you have thoughts

Mark_Bailey · Mar 11, 2021 11:23 AM

You keep changing the situation. What is Type? Is it really nested with Pesticide?

There is something more to your data than what you showed initially. You had a number of Pesticides observed on three separate Days. The model I showed should work unless there is a data problem. Missing data? Different values for Day?

Pesticide is not a random effect in this situation.

joemama985 · Mar 11, 2021 04:35 PM

Sorry for adding more confusion Mark! Once I removed the pesticides that had missing data it ran. I very much appreciate your help.

"Type" is a broader pesticide class that each pesticide can be classified as. I forgot to mention that variable. Therefore I think it would be good to nest each pesticide under its classification because it explains a good deal of the variation based on other tests I have run. One last question: because my data is non normal and I am running a nested regression model, I should test differences on the regression using a non parametric test right instead of an ANOVA. Is there a non-parametric that is good for this type of comparison? Thanks again for all your help

Mark_Bailey · Mar 12, 2021 07:05 AM

I like that you included Type and nested Pesticide in Type. It obviously improve the model and interpretation of the data. An alternative is to 'discover' types instead of imposing types. For example, you might omit Type from the model, then save the parameter estimates to a new data table and chart them. Look for patterns while adding Type to the chart in the Color role.

What do you mean by "my data is non-normal?" The related assumption in the linear mixed model is that the errors are normally distributed. Did you analyze the distribution of the residuals or the response to verify this assumption?

joemama985 · Mar 15, 2021 5:47 AM

I really like the idea of looking at parameter estimates while I am exploring the dataset. The nominal values recorded each day already cluster according to type, one of the "types" has values that are notably lower than the other types, but I would like to know how to explore the data like this for the future. I reran the model with the exclusion of type and nesting:

I have whole model parameter estimates for each chemical and one estimate for each chemical as an interaction with the 2nd sampling event (day2*chemical X). I wanted to ask if there is a way to save the parameter estimates to my current data table instead of generating a new data table? I am asking because I dont want to have to manually recode the "type" for each of the 100 chemicals and I figure that there must be a way to save it to my current sheet (or not!).

Re the normality:

I have been running this on the nominal recorded data values (non normal) but even when log transformed the data is non-normal:

However, the eyeball test tells me that I should be running these test on the log transformed data because it is "more normal" for sure.

Are you sure that this would be a linear mixed model? My understanding is that because there are no random effects in the current whole model that this would be a linear model with both continuous and categorical explanatory variables (day is continuous, pesticide and type are categorical:

I recall that least squares regression does not assume normally distributed residuals but it might not be the most optimal model for non-parametric data. What are your thoughts on this?

Selecting a test to determine when to end an experiment?