Discussions

Mariana_Aguilar · Nov 13, 2024 01:28 PM

Hi all!

I'm working with Principal Component Analysis of spectral (FTIR) data. For that, I'm using the Functional Data Explorer Platform. I have a few questions about the results loaded after analysis:

1. What would be the criteria to identify which model is best? Diagnostic plots vs Eigenvalues

Before doing PCA, I'm fitting some models (B-spline, P-spline and direct functional PCA) to see which would be a better fit to my data.

As you know, the software launches diagnostic plots of the model and of the PCA. For P-spline, I believe I have slightly better diagnostic plots than direct PCA (kindly refer to the ppt file attached), but, my 2 Principal Components only explain 78% of the variation, whereas with direct PCA my 2 PC explain 99%. So, I'm doubtful on how to decide which is best?

2. Are shape functions the same as loading plots?

Direct PCA shape function:

P Spline shape function:

I'd like to check if my interpretation is correct. I believe shape functions are the loading plots for my PCA analysis, right? Is it ok to say that shape function 1 and 2 explain the wavenumbers that had the highest contribution on my PCA grouping?

And just for my curiosity.... is there any reason that shape functions for P spline look kinda "dented" whereas for B spline and direct PCA they look smooth?

Thank you!

Victor_G · Nov 13, 2024 02:18 PM

Hi @Mariana_Aguilar,

To follow up on your questions :

There are several metrics that you can use to better evaluate, compare and select models, and your model choice might be different depending on your objective.
For Functional Data Explorer platform, there are several Information criteria that are displayed : AICc, BIC and GCV : Model Fit Report Options. These metrics are used to compare and select the model with the lowest Information criterion (best compromise between accuracy/predictive performance and complexity). See Basis Function Expansion Model Report to have more info on Model Selection.
Visually, diagnostics plots are also super helpful. On your attached file, it seems P-Splines does a better job than Direct PCA, which might smoothen and simplify too much the signal from spectral data (as with B-Splines, which are polynomial splines models). Since the signal seems (over-)simplified with Direct PCA, it may explain why you are able to explain 99% of the variation with 2 Functional PCs with this model. Depending on the type of model you are fitting (B-, P-Splines, Direct models, Wavelets, ...), the complexity (and predictive performances) will be different and will lead to different complexities/ of Shape Functions.
On spectral data, I would also recommend giving the Wavelets family models a try.
Shape Functions are Functional Principal Components, so they help understand visually which area of your spectral data are the most important to discriminate and explain the differences between your samples. But several Shape Functions can be linked to the same spectral area, so the influences/importances are not so direct to assess.

Hope this answer will help you,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

View solution in original post

Victor_G · Nov 13, 2024 02:18 PM

Hi @Mariana_Aguilar,

To follow up on your questions :

There are several metrics that you can use to better evaluate, compare and select models, and your model choice might be different depending on your objective.
For Functional Data Explorer platform, there are several Information criteria that are displayed : AICc, BIC and GCV : Model Fit Report Options. These metrics are used to compare and select the model with the lowest Information criterion (best compromise between accuracy/predictive performance and complexity). See Basis Function Expansion Model Report to have more info on Model Selection.
Visually, diagnostics plots are also super helpful. On your attached file, it seems P-Splines does a better job than Direct PCA, which might smoothen and simplify too much the signal from spectral data (as with B-Splines, which are polynomial splines models). Since the signal seems (over-)simplified with Direct PCA, it may explain why you are able to explain 99% of the variation with 2 Functional PCs with this model. Depending on the type of model you are fitting (B-, P-Splines, Direct models, Wavelets, ...), the complexity (and predictive performances) will be different and will lead to different complexities/ of Shape Functions.
On spectral data, I would also recommend giving the Wavelets family models a try.
Shape Functions are Functional Principal Components, so they help understand visually which area of your spectral data are the most important to discriminate and explain the differences between your samples. But several Shape Functions can be linked to the same spectral area, so the influences/importances are not so direct to assess.

Hope this answer will help you,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

Mariana_Aguilar · Nov 13, 2024 02:24 PM

Thank you so much Victor!

In the shape functions, the graph from the left says mean function... may I understand why the Y axis for this graph is "data" whereas for the other two is "weight"?

And one more question, is there a way to export these graphs to excel? This is so that I can put them in the same format as the rest of my material.

Thank you :)

Discussions

doubts about PCA via functional data explorer

Re: doubts about PCA via functional data explorer

Re: doubts about PCA via functional data explorer

Re: doubts about PCA via functional data explorer

Recommended Articles