Subscribe Bookmark RSS Feed

R-square

rambam54

Community Trekker

Joined:

Dec 23, 2014

Let’s say we want to determine if Y (continuous) could be predicted by other metrics (X1, X2, X3-all continuous).
We have table with 31 rows, each row contains data about X1, X2, X3 & Y.
We want to find out what are the metrics we should use as inputs to Y while we want to account for at least 81% of the variation in Y.
We fit the model and looked at the "summary of fit", there we see R-square of 0.9, however it is for the entire model X1,X2&X3 altogether

How can we check what is  the % of variation in Y  for each one of the metrics separately and verify it is indeed is >81%? Should we fit the model 3 times separately?
i.e.
1st time Y with X1 and check R-square
2nd time Y with X2 and check R-square
3rd time Y with X3 and check R-square

When we do this in that way we get:
R-square for X1-0.83
R-square for X2-0.82
R-square for X3-0.09

Does it mean that X1 & X2 can be used separately as inputs to Y, and account the required % of the variation in Y? is this a correct analysis & statement?

Thanks

Ramon Bamer

1 ACCEPTED SOLUTION

Accepted Solutions
julian

Staff

Joined:

Jun 25, 2014

Solution

Hi rambam54,

You are correct; according to your results, X1 and X2 separately account for more than 81% of the variance in the sample data, so either could be a candidate as an input. However, remember that the sample R^2 is, like all sample estimates, subject to sampling error, so the true proportion of variance accounted for in the population could be more or less. So, if it is mission critical that you have evidence the proportion of variance accounted for by your input is above 0.81 in the population, you might want to put a confidence interval around the estimate and ensure that the lower bound of the confidence interval isn't below 0.81. If you happen to be using JMP Pro, bootstrapping would be a good approach given the size of your sample.  Alternatively, the calculation is straightforward (given the usual parametric assumptions), and there are even online statistical calculators that will take simple input (sample size, observed R^2, number of predictors). With your sample the 95% bounds for R^2 of Y regressed on to X1 would be would be ~ 0.72 ≤ R2 ≤ 0.93.

I hope this helps!

julian

2 REPLIES
julian

Staff

Joined:

Jun 25, 2014

Solution

Hi rambam54,

You are correct; according to your results, X1 and X2 separately account for more than 81% of the variance in the sample data, so either could be a candidate as an input. However, remember that the sample R^2 is, like all sample estimates, subject to sampling error, so the true proportion of variance accounted for in the population could be more or less. So, if it is mission critical that you have evidence the proportion of variance accounted for by your input is above 0.81 in the population, you might want to put a confidence interval around the estimate and ensure that the lower bound of the confidence interval isn't below 0.81. If you happen to be using JMP Pro, bootstrapping would be a good approach given the size of your sample.  Alternatively, the calculation is straightforward (given the usual parametric assumptions), and there are even online statistical calculators that will take simple input (sample size, observed R^2, number of predictors). With your sample the 95% bounds for R^2 of Y regressed on to X1 would be would be ~ 0.72 ≤ R2 ≤ 0.93.

I hope this helps!

julian

rambam54

Community Trekker

Joined:

Dec 23, 2014

Thank you very much !!