Discussions

rambam54 · Dec 23, 2014 02:50 PM

Let’s say we want to determine if Y (continuous) could be predicted by other metrics (X1, X2, X3-all continuous).
We have table with 31 rows, each row contains data about X1, X2, X3 & Y.
We want to find out what are the metrics we should use as inputs to Y while we want to account for at least 81% of the variation in Y.
We fit the model and looked at the "summary of fit", there we see R-square of 0.9, however it is for the entire model X1,X2&X3 altogether.

How can we check what is the % of variation in Y for each one of the metrics separately and verify it is indeed is >81%? Should we fit the model 3 times separately?
i.e.
1^st time Y with X1 and check R-square
2^nd time Y with X2 and check R-square
3^rd time Y with X3 and check R-square

When we do this in that way we get:
R-square for X1-0.83
R-square for X2-0.82
R-square for X3-0.09

Does it mean that X1 & X2 can be used separately as inputs to Y, and account the required % of the variation in Y? is this a correct analysis & statement?

Thanks

Ramon Bamer

julian · Dec 23, 2014 05:13 PM

Hi rambam54,

You are correct; according to your results, X1 and X2 separately account for more than 81% of the variance in the sample data, so either could be a candidate as an input. However, remember that the sample R^2 is, like all sample estimates, subject to sampling error, so the true proportion of variance accounted for in the population could be more or less. So, if it is mission critical that you have evidence the proportion of variance accounted for by your input is above 0.81 in the population, you might want to put a confidence interval around the estimate and ensure that the lower bound of the confidence interval isn't below 0.81. If you happen to be using JMP Pro, bootstrapping would be a good approach given the size of your sample. Alternatively, the calculation is straightforward (given the usual parametric assumptions), and there are even online statistical calculators that will take simple input (sample size, observed R^2, number of predictors). With your sample the 95% bounds for R^2 of Y regressed on to X1 would be would be ~ 0.72 ≤ R2 ≤ 0.93.

I hope this helps!

julian

View solution in original post

julian · Dec 23, 2014 05:13 PM

Hi rambam54,

You are correct; according to your results, X1 and X2 separately account for more than 81% of the variance in the sample data, so either could be a candidate as an input. However, remember that the sample R^2 is, like all sample estimates, subject to sampling error, so the true proportion of variance accounted for in the population could be more or less. So, if it is mission critical that you have evidence the proportion of variance accounted for by your input is above 0.81 in the population, you might want to put a confidence interval around the estimate and ensure that the lower bound of the confidence interval isn't below 0.81. If you happen to be using JMP Pro, bootstrapping would be a good approach given the size of your sample. Alternatively, the calculation is straightforward (given the usual parametric assumptions), and there are even online statistical calculators that will take simple input (sample size, observed R^2, number of predictors). With your sample the 95% bounds for R^2 of Y regressed on to X1 would be would be ~ 0.72 ≤ R2 ≤ 0.93.

I hope this helps!

julian

rambam54 · Dec 25, 2014 04:14 PM

Thank you very much !!

Discussions

R-square

Re: R-square

Re: R-square

Re: R-square

Recommended Articles