- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
PCA for spectral data
Hello,
I am performing a PCA on a large data set (257 columns (wavenumber) and about 8000 rows (samples)) and want to recognise certain patterns between the spectra with the PCA.
However, this warning always appears ‘Warning: The matrix correlation is not positive definite.’ Can someone help me to understand what this means and if it is relevant for my analysis of the data?
I am only a beginner in JMP but I appreciate any help on this matter.
Thank you!
Maike
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: PCA for spectral data
Hi @Maike_W ,
You may want to consider clustering your variables in the PCA and refining your number of columns to the most representative ones (see below):
I also took this data into our Functional Data Explorer platform (after stacking the data) and split the columns to Group (Erbse, Ackerborne) and the attached numeric value (40.5, 40.15, note I added '0' for the ones where the value wasn't stated) and I got a good model explaining the difference in the peaks between the groups (see the gif and attached table).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: PCA for spectral data
Hi @Maike_W ,
Reducing your number of 'non-important' columns will help with improving the PCA - I think that you shouldn't place too much emphasis on the positive definitive isse - the PCA will help you find the related columns that could indicate the denaturation of the proteins regardless of the correlation issue, and you're going to likely look at them visually to scrutinize it anyway to see if it is a true relationship in your samples which will give you a clear 'true or not' answer to the wavelengths that are important.
Hope this helps!
Ben
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: PCA for spectral data
Hi Maike,
Could you provide an example data set to see? This is indicating that some characteristics of your matrix aren't positive (i.e. the eigenvalues arent all positive, the matrix isn't symmetrical) which changes the type of tests that can be performed with JMP that require (for example Maximum Likelihood tests as described here) and can weaken the strength of your PCA. I would consider looking at your data and trying things like removing outliers/errors or even refining your predictors list to the actually important factors/columns.
As you mentioned you're a beginner, I would recommend you look at these articles (here and here) around spectral data analysis - they're very inspiring and useful!
Thanks,
Ben
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: PCA for spectral data
Hi Ben,
thanks for your reply and the articles.
In my data you can see the spectra of native and denatured proteins. I would like to use PCA to see if I can distinguish between the native and denatured proteins. Here is an example data set with some samples.
I have already normalised my data to the range 0 - 1. The warning appears for both the raw data and the normalised values.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: PCA for spectral data
Hi Maike,
In this example how are you distinguishing the 'denaturing' is it the difference between the samples '0.5','0.14' and '85'? I'm just trying to figure out how you intend on using PCA to then say 'these features are denatured'
Thanks,
Ben
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: PCA for spectral data
Hi @Maike_W ,
You may want to consider clustering your variables in the PCA and refining your number of columns to the most representative ones (see below):
I also took this data into our Functional Data Explorer platform (after stacking the data) and split the columns to Group (Erbse, Ackerborne) and the attached numeric value (40.5, 40.15, note I added '0' for the ones where the value wasn't stated) and I got a good model explaining the difference in the peaks between the groups (see the gif and attached table).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: PCA for spectral data
Hi Ben,
thank you very much! yes exactly, the values without a number are the native proteins and I treated them at different temperatures and times. You can see the differences in the numbers.
My work is about whether you can see these structural differences using NIR spectroscopy. I want to see if you can see patterns and determine regularity using PCA and what wavenumber you need to look at.
But back to PCA and this warning (The matrix correlation is not positive definite.). Can I fix this by looking only at the columns with the important wave numbers (from clustering or the loading matrix) in the PCA?
Thank you for the example in the Functional Data Explorer. I will have a closer look at it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: PCA for spectral data
Hi @Maike_W ,
Reducing your number of 'non-important' columns will help with improving the PCA - I think that you shouldn't place too much emphasis on the positive definitive isse - the PCA will help you find the related columns that could indicate the denaturation of the proteins regardless of the correlation issue, and you're going to likely look at them visually to scrutinize it anyway to see if it is a true relationship in your samples which will give you a clear 'true or not' answer to the wavelengths that are important.
Hope this helps!
Ben
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: PCA for spectral data
Hi Ben,
thank you so much. This has definitely helped me.
Maike