Hi @Janneman,
There might be a big list of questions to answer to better understand what is the purpose of your model and its applicability/coverage/generalization:
- What is your objective behind the modeling: explainative, predictive, both ... ?
- How were the data collected ? Is there any design involved in the data collection ?
- More questions to help think about the expected "performance" of the model : best model
Besides the evaluation metrics you could use to compare model explainative performances (R² / R² adjusted, unitless), predictive performances (RMSE, MAE, ...), statistical significance (model p-value, individual terms p-values, ...), model complexity/adjustment (through AICc, BIC, ...), I think it might be important to first evaluate and "debug" the model through residual visualization and analysis. See Model reduction for more ideas about model comparison and selection.
Transforming a response means that you'll transform both the average response as well as variance response. You can then see if the residuals of the transformed response model better respect regression model assumptions than the original response model. If the situation doesn't seem to improve, you could perhaps use Generalized Regression Model, as they handle average response and variance response independantly : this type of model use a link function to transform the mean into a linear function of the predictor variables and a variance function to allow for variance heterogeneity in the analysis, rather than try to transform everything (for example through log transform). By transforming the response completely (through log-transform or Box-Cox Y Transformation) or using a Generalized Regression Model, you should have an equally or more simple model than your original model (less terms in the model, so lower AICc/BIC).
I wouldn't tranform the responses unless I have a strong indication that it may be needed (simplification of the model with transformed response and better residual patterns by transforming the response).
See : Difference between "least square" and "generelized linear method" in the fit model for more info about the differences between these two types of models.
Once this statistical evaluation is done and if the models can both be kept, you could then use normalized and/or unitless predictive metrics to compare the models and avoid any biased comparison due to the transformation.
Hope this answer will help you,
Victor GUILLER
"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)