cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
ILoveJMP
Level III

Why there is a constant component in the PCA formula

I have been surprised to see a constant component in the saved formula when performing PCA analysis. For example, saving the 1st 2 Principal Components on the following standardized data set (centered and scaled data), 

Standardized (Centered + Scaled)  Data
SLengthSWidthPLengthPWidth
0.270.19-0.36-0.44
-0.30-1.14-0.36-0.44
-0.88-0.61-0.94-0.44
-1.16-0.870.22-0.44
-0.020.46-0.36-0.44
1.131.261.381.48
-1.16-0.07-0.360.52
-0.02-0.070.22-0.44
-1.74-1.41-0.36-0.44
-0.30-0.870.22-1.40
1.130.720.22-0.44
-0.59-0.070.80-0.44
-0.59-1.14-0.36-1.40
-2.02-1.14-2.11-1.40
2.281.52-1.52-0.44
1.992.590.221.48
1.131.26-0.941.48
0.270.19-0.360.52
1.990.991.380.52
0.270.990.220.52
1.13-0.071.38-0.44
0.270.720.221.48
-1.160.46-2.69-0.44
0.27-0.341.382.43
-0.59-0.072.55-0.44
-0.02-1.140.80-0.44
-0.02-0.070.801.48
0.560.190.22-0.44
0.56-0.07-0.36-0.44
-0.88-0.610.80-0.44
-0.59-0.870.80-0.44
1.13-0.070.221.48
0.561.790.22-1.40
1.422.06-0.36-0.44
-0.30-0.870.22-0.44
-0.02-0.61-1.52-0.44
1.420.19-0.94-0.44
-0.300.46-0.36-1.40
-1.74-1.14-0.94-0.44
0.27-0.070.22-0.44
-0.020.19-0.940.52
-1.45-3.01-0.940.52
-1.74-0.61-0.94-0.44
-0.020.190.803.39
0.270.992.551.48
-0.59-1.14-0.360.52
0.270.990.80-0.44
-1.16-0.61-0.36-0.44
0.840.720.22-0.44
-0.02-0.34-0.36-0.44

 

The formulas are as follows:

 

Prin1: 0.59834170442161 * :SLength + 0.569834108206745 * :SWidth + 0.371661472844918 *
:PLength  + 0.39892861952586 * :PWidth  + 2.15154543958667e-16

 

Prin2: -0.331623960696996 * :SLength + -0.436415344018397 * :SWidth  + 0.620670317712319
* :PLength  + 0.54252700661609 * :PWidth  + (-1.4778627619204e-16)

 

Even though the two constant components are almost close to 0 (BTW, I saw constant >> 0 in some other cases), I just don't understand why they would be part of the formula in the 1st place since Prin1 and Prin2 should be just the product between 1st and 2nd eigenvectors and the data.

 

In matlab, the detailed calculations will be as follows:

[U S V] = svd (cov(X));

Z2 = X * U(:,1:2) ;

Prin1 = Z2(:,1);

Prin2 = Z2(:,2);

 

Look forward to your explanation and thanks much in advance!

 

3 REPLIES 3

Re: Why there is a constant component in the PCA formula

The data that you provided does not have means of 0 and standard deviations of 1. Like the constant in the PC, they are close, but they are not exactly 0 and 1. That alone can give you the constant term in the PCs.

Even if the means and standard deviations were exactly 1, you may possibly get a constant term that is VERY close to 0 due to round off error and the estimation process that is being used. Regardless of your input data, JMP will be scaling your variables unless you tell it not to do so.

If you absolutely do not want the constant term, go to the red popup menu and choose to form the PCs "On Unscaled" variables. This is typically not recommended, but this will give you PCs with no constant term.

Dan Obermiller
ILoveJMP
Level III

Re: Why there is a constant component in the PCA formula

As indicated, the data was standardized (centered and scaled), thus, it does have mean of 0 and std dev of 1. I did try out the "unscaled" option and indeed the constant term disappeared. 

 

However, I still can't wrap my mind around it given that the principal components are derived from the product of eigenvectors and the data. Many thanks for the reply though.

 

Re: Why there is a constant component in the PCA formula

Using the data you provided, I used Distributions to look at it and got these results:

  Capture.JPG

That is enough of a difference to give round-off error. Again, even if the standard deviation is exactly 1, you will likely get a constant term that is very close to zero. The scaling is what prevents everything from just being a product of the eigenvectors and the data. You are not just using the data, you are using scaled data (every time JMP will do the scaling because, as your data shows, you don't always know the standard deviation = 1). Estimation can be a messy game sometimes.

Dan Obermiller