Solved: Re: Color Map on Correlations creation

orthogonal · Sep 11, 2013 04:52 PM

I have been trying to understand better the 'Color Map on Correlations' plot which is part of the DOE Design Evaluation but I haven' been able to find out too much about how it is made. The best I can figure out is that it is a scaled/transformed version of the variance-covariance matrix inv(X'*X). Could anyone point me to some literature or documentation on the subject?

orthogonal · Oct 2, 2013 09:41 AM

I finally was able to find a good explanation for the creation of the color map on correlations. I actually found it in the JMP documentation on fitting linear models. Fitting Linear Models.pdf > Chapter 3 Standard Least Squares Report and Options > Correlation of Estimates, page 189 in JMP 11 documentation.

Using these equations, with variance = 1, the color map on correlation plot is just the absolute value of Corr(Beta-hat)

View solution in original post

orthogonal · Oct 2, 2013 09:41 AM

I finally was able to find a good explanation for the creation of the color map on correlations. I actually found it in the JMP documentation on fitting linear models. Fitting Linear Models.pdf > Chapter 3 Standard Least Squares Report and Options > Correlation of Estimates, page 189 in JMP 11 documentation.

Using these equations, with variance = 1, the color map on correlation plot is just the absolute value of Corr(Beta-hat)

drew_baumgartel · Dec 11, 2017 03:47 PM

The correlations of estimates matrix is NOT the absolute value of the corr(beta-hat) matrix. Here's an example to demonstrate. Note I have added a column of 1's for the intercept term.

X = [1 0.53 -1 -1 -0.53 -0.53 1,
1 1 1 1 1 1 1,
1 -1 1 -1 -1 1 -1,
1 -1 -1 1 1 -1 -1,
1 1 1 -1 1 -1 -1,
1 -1 1 1 -1 -1 1,
1 1 -1 1 -1 1 -1];

The "color map on correlations" does not include the column of intercepts, but this is immaterial since corrleations are pairwise. Anyhow, if you put the above design into the DOE platform (less the column of 1's), or do the same and use the multivariate --> correlations option, you get the following correlations matrix among the columns of the X matrix:

[1 -0.0926 -0.0926 0.2819 0.2819 0.0926,
-0.0926 1 -0.1667 0.0926 0.0926 0.1667,
-0.0926 -0.1667 1 0.0926 0.0926 0.1667,
0.2819 0.0926 0.0926 1 -0.2819 -0.0926,
0.2819 0.0926 0.0926 -0.2819 1 -0.0926,
0.0926 0.1667 0.1667 -0.0926 -0.0926 1]

If you look into the JMP documentation regarding the correlations of estimates, you'll find that it's defined as

corr(beta-hat) := V_inv*(X’X)_inv*V_inv where V:=sqrt(diag(X’X)_inv)

Using the X matrix as above, this gives

X_t = Transpose(X);

V = sqrt(diag(Inverse(X_t*X)));

corr_beta_hat = Round(Inv(V)*Inverse(X_t*X)*Inv(V),4);

[1 -0.3101 -0.3154 -0.3154 0.3101 0.3101 0.3154,

-0.3101 1 0.3101 0.3101 -0.5 -0.5 -0.3101,

-0.3154 0.3101 1 0.3154 -0.3101 -0.3101 -0.3154,

-0.3154 0.3101 0.3154 1 -0.3101 -0.3101 -0.3154,

0.3101 -0.5 -0.3101 -0.3101 1 0.5 0.3101,

0.3101 -0.5 -0.3101 -0.3101 0.5 1 0.3101,

0.3154 -0.3101 -0.3154 -0.3154 0.3101 0.3101 1]

Which is exactly what you get when you use the "fit model" command in JMP. Note that the values in this matrix are not dependent on the values of either the response or the MSE.

So my question is this: if one were trying to decide if the amount of confounding among model terms (or estimates) was acceptable, should one review the correlations among the columns of the design matrix or should one review the correlations among the beta-hats? Why might one be better than the other? Is there an intuitive explanation describing the relationship between the two?

Mark_Bailey · Dec 12, 2017 07:13 AM

The correlation among the parameter estimates is the correlation that really matters. This quantity determines the inflation of the standard errors of these estimates. It reduces the power of your tests. It widens your confidence intervals. The correlation between the factor columns in the design are merely an 'means to the end.' Eliminating this correlation will eliminate the correlation of the estimates of the main effects, for example.

Note that there is no absolute cutoff for unacceptable correlation because the tolerable VIF does depend on the effect size and the RMSE. If you have a very small RMSE compared to the effect, then you can tolerate high VIF. On the other hand, if you have a small effect compared to the RMSE, then even a small correlation might adversely affect your power.

drew_baumgartel · Dec 12, 2017 12:27 PM

Thanks for your prompt response Mark! Your explanation makes good sense, and my guess was that the correlation among the beta-hats was the more important of the two. Personally, it's hard for me to directly interpret the correlations among the columns in the X matrix other than "estimates are independent" vs. "estimates are not independent". Also one does not estimate the design matrix X; one estimates the coefficients for the model terms. When I think about the correlations of the beta hats, I consider that the dot product of any two beta vectors is 0 iff the beta hat vectors are orthogonal, and as the correlation values among the beta hats are scaled versions of these dot products, they describe "how orthogonal" the beta hats are to one another in the model. My particular question was spawned from running a 2^(8-4) fractional factorial, finding that one two-factor interaction was significant, and then using the augment design option to un-confound the four two-way interaction terms. The correlations among the two-way interaction columns in the new X matrix were ~0.7-0.8, whereas the correlations among the beta hats were less (~0.55).

And I agree that there is no correct cutoff for correlation values. The entirety of inferential statistics involves tradeoffs, and which tradeoff to make is dependent on the application, the data, and one's personal viewpoint.