The World Statistics Day celebration continues here in the Community. We all need reliable data for sound decision making. Do you have a data source that you trust most? Head over to Discussions to tell us about it.
Choose Language Hide Translation Bar
Level I

PCA Predicteds Method

I am trying to understand the technique behind using a PCA model to return predicted values of your original input variables, with retained_PC's < predictor_count. Ultimately, I'll be looking into residuals (original values - predicteds) for varying numbers of retained PC's.


I've read that, to return from PCA space back to X space, you'll need to calculate:


predicteds = scores*loadings'


Whenever I look into the formula for the 'save predicteds' output, is this linear equation representative of the above concept?




P.S. I know PCA is not meant for predictive modeling, I'm ultimately using it for fault detection with Hotelling's T2 and SPE output. My SPE (squared prediction error) would hinge on this predicteds output.


Re: PCA Predicteds Method

Looks like the PCA platform does not give you access to the score matrix directly. But the code below may help to give some confidence that the prediction formulas are as they should be. It generates random multivariate Gaussian data for three variables, and when using three principal components (that is, with no dimensionality reduction) the predictions are the same as the original data.


Names Default To Here( 1 );

// Function to generate n independant samples from a multivariate Gaussian
//		m is the row vector of means
//		v is the (square) variance-covariance matrix
// (Dimensions of m and v have to conform)
randomMultiGaussian = Function( {m, v, n},
	{Default Local},
	nDim = N Row( m );
	If( (N Row( v ) != nDim) | (N Col( v ) != nDim),
		Print( "ERROR in randomMultiGaussian" );
	// Simulate random values . . .
	t = Cholesky( v );
	d = J( n, nDim, Random Normal() ) * t`;
	// Build mean values columnwise . . .
	mu = [];
	For( i = 1, i <= nDim, i++,
		mu ||= J( n, 1, m[i] )
	//  . . . and add the mean to get the final result
	d = mu + d;

// Make some data
nPts = 100;
m  = [0, 0, 0];
v = [1.0 0.2 0.8, 0.2 1.0 0.5, 0.8 0.5 1.0];
dMat = randomMultiGaussian(m, v, nPts);
dt = AsTable(dMat, << Column Names({"x1", "x2", "x3"}));
dt << setName("Sample from Multvariate Gaussian");

// Use PCA
pca = dt << Principal Components(Y( Eval(dt << getColumnNames) ));

// Save predictions
pca << savePredicteds(1);		// Predictions with 1 PC
pca << savePredicteds(2);		// Predictions with 2 PCs
pca << savePredicteds(3);		// Predictions with 3 PCs

// See how the predicted x1 value varies with the number of PCs
dt << Graph Builder(
					Show Control Panel( 0 ),
						X( :x1 ),
						Y( :Predicted x1 3 ),
						Y( :Predicted x1 2 ),
						Y( :Predicted x1 )

// For 3 PCs compare the predicted values with the original data values
		Maximum(Column(dt, "x1")[1::nPts] - Column(dt, "Predicted x1 3")[1::nPts]),
		Maximum(Column(dt, "x2")[1::nPts] - Column(dt, "Predicted x2 3")[1::nPts]),
		Maximum(Column(dt, "x3")[1::nPts] - Column(dt, "Predicted x3 3")[1::nPts])
Article Labels

    There are no labels assigned to this post.