cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Discussions

Solve problems, and share tips and tricks with other JMP users.
Choose Language Hide Translation Bar
blip555555
Level I

Suspiciously High R^2

Good afternoon, 

I fitted a LMM with fixed effects being strain, concentration, their interaction, experimental day (insufficient days to account it as random) and random effects being replicate nested within 96-well plate and plate nested within day. I ran a LMM using the standard least square platform and attributing the attribute 'random' to my random effects. I am getting an  R^2 of 0.99 which seems suspiciously high, I thought whether it was model overfitting but I think I have enough observations. Is this normal?

Screenshot 2025-05-01 at 17.17.06.png

Screenshot 2025-05-01 at 17.17.18.png

Screenshot 2025-05-01 at 17.16.27.png

3 REPLIES 3

Re: Suspiciously High R^2

Hi @blip555555 ,

 

A few thoughts - first is to check whether Day is needed in the model, as it doesn't look significant from the LogWorth plot at the top of the model report. Other thought is to apply validation sets to test the accuracy of the model to determine the effects, have you also looked at the studentised residuals to see if any values are outliers?

 

Thanks,

Ben

“All models are wrong, but some are useful”
blip555555
Level I

Re: Suspiciously High R^2

Dear Ben, 

Thank you. I've looked at the studentised residuals and the outliers seem normal to me because I'm testing the effect of different antibiotic concentrations on the fluorescence level of multiple replicates, so it makes sense biologically that not all replicates perform equally at certain concentrations. I removed day from the model but the high R^2 remained, I have about 900 observations so that should exclude model overfitting?

Thanks!

 

statman
Super User

Re: Suspiciously High R^2

First, I am not a SME for your particular situation.  I don't know what the response is you are modeling?  I have no context for the reported analysis output.  Do the parameter estimates look reasonable?  All of the statistics being reported are contingent on the model you have entered and how the data was acquired.  If either of these things change, so may the statistics.

When assessing a model's adequacy, there are a number of statistics to consider (R-square is just one).  For example, you might evaluate:

1. R-square - R-square adjusted to assess over fitting.  The smaller the delta, the less chance of insignificant terms in the model

2. R-square adjusted is better than R-square, the larger the better

3. RMSE (standard deviation of the model).  Smaller is better. This is in response variable units, so is 33 reasonable?

4. p-values (most useful when you understand what constitutes the MSE).  Significance is a conditional statement.

5. Residuals (all kinds of plots).  You can assess outliers, multicollinearity, violations of assumptions (independence, random, mean of 0, constant variance)

 

I notice you mention "replicate nested within 96-well plate and plate nested within day".  I do not see any nested terms in your model (e.g., plate[Day], Replicate[Plate])?

"All models are wrong, some are useful" G.E.P. Box

Recommended Articles