cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
Mariana_Aguilar
Level III

doubts about PCA via functional data explorer

Hi all!

I'm working with Principal Component Analysis of spectral (FTIR) data. For that, I'm using the Functional Data Explorer Platform. I have a few questions about the results loaded after analysis:

1. What would be the criteria to identify which model is best? Diagnostic plots vs Eigenvalues

Before doing PCA, I'm fitting some models (B-spline, P-spline and direct functional PCA) to see which would be a better fit to my data.

As you know, the software launches diagnostic plots of the model and of the PCA. For P-spline, I believe I have slightly better  diagnostic plots than direct PCA (kindly refer to the ppt file attached), but, my 2 Principal Components only explain 78% of the variation, whereas with direct PCA my 2 PC explain 99%. So, I'm doubtful on how to decide which is best?

 

2. Are shape functions the same as loading plots?

Direct PCA shape function:

Mariana_Aguilar_3-1731522350503.png

P Spline shape function:

Mariana_Aguilar_2-1731522154056.png

I'd like to check if my interpretation is correct. I believe shape functions are the loading plots for my PCA analysis, right? Is it ok to say that shape function 1 and 2 explain the wavenumbers that had the highest contribution on my PCA grouping?

And just for my curiosity.... is there any reason that shape functions for P spline look kinda "dented" whereas for B spline and direct PCA they look smooth?

 

Thank you!

 

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Victor_G
Super User

Re: doubts about PCA via functional data explorer

Hi @Mariana_Aguilar,

 

To follow up on your questions :

  1. There are several metrics that you can use to better evaluate, compare and select models, and your model choice might be different depending on your objective.
    For Functional Data Explorer platform, there are several Information criteria that are displayed : AICc, BIC and GCV : Model Fit Report Options. These metrics are used to compare and select the model with the lowest Information criterion (best compromise between accuracy/predictive performance and complexity). See Basis Function Expansion Model Report to have more info on Model Selection. 
    Visually, diagnostics plots are also super helpful. On your attached file, it seems P-Splines does a better job than Direct PCA, which might smoothen and simplify too much the signal from spectral data (as with B-Splines, which are polynomial splines models). Since the signal seems (over-)simplified with Direct PCA, it may explain why you are able to explain 99% of the variation with 2 Functional PCs with this model. Depending on the type of model you are fitting (B-, P-Splines, Direct models, Wavelets, ...), the complexity (and predictive performances) will be different and will lead to different complexities/ of Shape Functions.
    On spectral data, I would also recommend giving the Wavelets family models a try.
  2. Shape Functions are Functional Principal Components, so they help understand visually which area of your spectral data are the most important to discriminate and explain the differences between your samples. But several Shape Functions can be linked to the same spectral area, so the influences/importances are not so direct to assess.

 

Hope this answer will help you,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

View solution in original post

2 REPLIES 2
Victor_G
Super User

Re: doubts about PCA via functional data explorer

Hi @Mariana_Aguilar,

 

To follow up on your questions :

  1. There are several metrics that you can use to better evaluate, compare and select models, and your model choice might be different depending on your objective.
    For Functional Data Explorer platform, there are several Information criteria that are displayed : AICc, BIC and GCV : Model Fit Report Options. These metrics are used to compare and select the model with the lowest Information criterion (best compromise between accuracy/predictive performance and complexity). See Basis Function Expansion Model Report to have more info on Model Selection. 
    Visually, diagnostics plots are also super helpful. On your attached file, it seems P-Splines does a better job than Direct PCA, which might smoothen and simplify too much the signal from spectral data (as with B-Splines, which are polynomial splines models). Since the signal seems (over-)simplified with Direct PCA, it may explain why you are able to explain 99% of the variation with 2 Functional PCs with this model. Depending on the type of model you are fitting (B-, P-Splines, Direct models, Wavelets, ...), the complexity (and predictive performances) will be different and will lead to different complexities/ of Shape Functions.
    On spectral data, I would also recommend giving the Wavelets family models a try.
  2. Shape Functions are Functional Principal Components, so they help understand visually which area of your spectral data are the most important to discriminate and explain the differences between your samples. But several Shape Functions can be linked to the same spectral area, so the influences/importances are not so direct to assess.

 

Hope this answer will help you,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)
Mariana_Aguilar
Level III

Re: doubts about PCA via functional data explorer

Thank you so much Victor!

In the shape functions, the graph from the left says mean function... may I understand why the Y axis for this graph is "data" whereas for the other two is "weight"? 

 

And one more question, is there a way to export these graphs to excel? This is so that I can put them in the same format as the rest of my material.

Thank you