Good afternoon,
I am constructing a LMM and have created a QQ plot attached in image 1. The conditional residuals seem to deviate from normality a fair bit so I transformed the response variable data using log base 10. This improved my R^2 by about 2% and the QQ plot seems a bit better (image 2). However, I'm not sure if these slight improvements are worth the transformation? As I would then have to transform the data back to non-log to report it in my thesis and that would be a fold-change instead of an actual arithmetic difference.
Thank you!
I remember when one of my instructors said "The only reason to do data transformation is to simplify the model". BTW, that was G.E.P. Box. I suggest you read his papers on the subject.
Box, G.E.P., Paul Tidwell, (1962) “Transformation of the Independent Variables”, Technometrics, Vol. 4, No. 4, November
also the paper and attached discussions:
Draper, Norman, William Hunter, (1969), "Transformations: Some Examples Revisited", Technometrics, Vol. 11, No. 1, February
Hi @blip555555,
It's very difficult (if not impossible) to help you without an (anonymized) dataset with the situation you're facing. Please read the post Getting correct answers to correct questions quickly.
To assess if a transformation would be needed, it's important to look at residuals plot, to check if there is still a pattern in residuals that is not handled by the assumed model. Are you experiencing heteroscedasticity ? Or strange patterns in your residuals ? You can look at Regression Model Assumptions | Introduction to Statistics | JMP for more information.
I would also distinguish data transformation from Generalized Linear Mixed Models (GLM) in JMP Pro, where the response distribution can be specified (and enable to fit model with different response distributions: normal, exponential, gamma, ...).
You can read Difference between "least square" and "generelized linear method" in the fit model for more information.
Hope this conversation starter may help you,