Subscribe Bookmark RSS Feed

PCA Predicteds Method

rhh0004

Community Member

Joined:

May 4, 2017

I am trying to understand the technique behind using a PCA model to return predicted values of your original input variables, with retained_PC's < predictor_count. Ultimately, I'll be looking into residuals (original values - predicteds) for varying numbers of retained PC's.

 

I've read that, to return from PCA space back to X space, you'll need to calculate:

 

predicteds = scores*loadings'

 

Whenever I look into the formula for the 'save predicteds' output, is this linear equation representative of the above concept?

 

Thanks

 

P.S. I know PCA is not meant for predictive modeling, I'm ultimately using it for fault detection with Hotelling's T2 and SPE output. My SPE (squared prediction error) would hinge on this predicteds output.

1 REPLY
ian_jmp

Staff

Joined:

Jun 23, 2011

Looks like the PCA platform does not give you access to the score matrix directly. But the code below may help to give some confidence that the prediction formulas are as they should be. It generates random multivariate Gaussian data for three variables, and when using three principal components (that is, with no dimensionality reduction) the predictions are the same as the original data.

 

Names Default To Here( 1 );

// Function to generate n independant samples from a multivariate Gaussian
//		m is the row vector of means
//		v is the (square) variance-covariance matrix
// (Dimensions of m and v have to conform)
randomMultiGaussian = Function( {m, v, n},
	{Default Local},
	nDim = N Row( m );
	If( (N Row( v ) != nDim) | (N Col( v ) != nDim),
		Beep();
		Print( "ERROR in randomMultiGaussian" );
		Throw();
	);
	// Simulate random values . . .
	t = Cholesky( v );
	d = J( n, nDim, Random Normal() ) * t`;
	// Build mean values columnwise . . .
	mu = [];
	For( i = 1, i <= nDim, i++,
		mu ||= J( n, 1, m[i] )
	);
	//  . . . and add the mean to get the final result
	d = mu + d;
);

// Make some data
nPts = 100;
m  = [0, 0, 0];
v = [1.0 0.2 0.8, 0.2 1.0 0.5, 0.8 0.5 1.0];
dMat = randomMultiGaussian(m, v, nPts);
dt = AsTable(dMat, << Column Names({"x1", "x2", "x3"}));
dt << setName("Sample from Multvariate Gaussian");

// Use PCA
pca = dt << Principal Components(Y( Eval(dt << getColumnNames) ));

// Save predictions
pca << savePredicteds(1);		// Predictions with 1 PC
pca << savePredicteds(2);		// Predictions with 2 PCs
pca << savePredicteds(3);		// Predictions with 3 PCs

// See how the predicted x1 value varies with the number of PCs
dt << Graph Builder(
					Show Control Panel( 0 ),
					Variables(
						X( :x1 ),
						Y( :Predicted x1 3 ),
						Y( :Predicted x1 2 ),
						Y( :Predicted x1 )
					)
				);

// For 3 PCs compare the predicted values with the original data values
Print(
	Maximum(
		Maximum(Column(dt, "x1")[1::nPts] - Column(dt, "Predicted x1 3")[1::nPts]),
		Maximum(Column(dt, "x2")[1::nPts] - Column(dt, "Predicted x2 3")[1::nPts]),
		Maximum(Column(dt, "x3")[1::nPts] - Column(dt, "Predicted x3 3")[1::nPts])
	)
);