スペクトルデータの分析で「前処理」の重要性を感じた事例

Masukawa_Nao · Dec 8, 2024 07:00 PM

Recently, we held a seminar for Japanese customers titled "Multivariate Analysis of Spectral/Chromatogram Data using JMP."

As the title suggests, this seminar was about niche topics such as spectrum and chromatogram data, so I assumed there would be few participants. However, the number of people who attended was far greater than I had expected. I am very grateful!

In the seminar, we mainly dealt with the following Raman spectrum data.

Purpose of the analysis

Develop a model for quantifying lead content in turmeric using Raman spectroscopy (scattering mode)

Spectral Data Overview

42 samples containing turmeric with different lead concentrations were prepared.
The samples were classified into six groups, A to F (seven samples in each group), based on lead concentration.

The figure below shows the Raman spectral data of 42 samples visualized using Graph Builder. As shown in the legend, each group is color-coded.

undefined

Data source : Cruz, Jordi et al. (2020), “Data for: Quantitative models for detecting the presence of lead in Curcumin using Raman spectroscopy”, Mendeley Data

Classify spectra by performing functional principal component analysis

In the seminar, we used this data to perform modeling using PCA and PLS regression, and also introduced a method for classifying spectra using functional principal component analysis (FPCA) as an application example.

Functional principal component analysis can be performed in the “Functional Data Explorer” platform in JMP Pro. We then fit a wavelet model to these spectral data and consider the “score plot” from functional principal component analysis, which allows visual confirmation of the similarity based on the shape of the spectra of the samples.

Samples plotted close together have similar shapes, while those plotted far apart have different shapes.Therefore, if the samples are plotted together in groups on the score plot, the samples can be classified by turmeric concentration.

Looking at the score plot in the figure below, the yellow plot points (group D) are located to the right of component 1, and the other groups are located to the left. (To visually grasp the shape of the spectrum, the following score plots show the shape of the spectrum that can be displayed in pop-up for the three samples A7, D4, and D7.)

undefined

This shows that group D was clearly distinguishable from the other groups (A, B, C, E, F), but groups A, B, C, E, F were not completely separated.

Those of you who work with spectral data may already know this, but the above example is the result of attempting classification using the raw measured data without any preprocessing of the spectral data.

Spectral data preprocessing

It is said that preprocessing is important in analyzing spectral data.This makes it possible to reduce noise, remove unnecessary fluctuations, and emphasize features, which is expected to improve analytical accuracy.

JMP Pro's "Functional Data Explorer" provides pre-processing functions that are commonly used with spectral data.

undefined

The details of each command are as follows:

Technique	content
SNV	Standardize each sample (mean to 0, standard deviation to 1)
MSC	Multiplicative scatter correction, corrected by fitting a simple regression for each sample
Savitzky-Golay	Apply the SG filter and perform the first and second differentiation on the applied result.
Baseline correction	Correct each function by subtracting the baseline function

Classification results after preprocessing

First, preprocessing was performed by applying a Savitzky-Golay filter (SG filter) to the raw data and taking the first derivative.

The SG filter is a method of smoothing by applying a polynomial to a group of points. Because it is a polynomial, it can be differentiated, and by performing the first differentiation, the slope of the peak (rate of change) is emphasized.

We applied a wavelet model to the preprocessed data and checked the score plot in the functional principal component analysis. The samples in groups other than D are plotted with more variance than before, but we are not yet able to classify the groups.

undefined

In addition, SNV processing was performed on the SG filtered data and the score plot was confirmed using a similar analysis.

It was then possible to confirm that group A was clearly separated from the other groups.

undefined

Finally, a three-dimensional plot was performed including component 3 in addition to principal components 1 and 2, and it was confirmed that group D and group A were separated from the other groups. On the other hand, groups B and C were almost overlapping and could not be completely distinguished, but groups E and F could be classified to some extent by looking at component 3.

undefined

Groups D and A are separated from the other groups, while groups B and C are not separated as they overlap, and when we look at component 3 (Y FPC3), we can see that E and F are separated to some extent, though not completely.

Through this case study, we reaffirmed the importance of preprocessing of spectral data, which can improve data quality and lead to more accurate classification results.

Reference : About Function Data Explorer

Functional Data Explorer Overview - Functional Principal Component Analysis and Various Use Cases - (Japanese) - JMP User Community

by Naohiro Masukawa (JMP Japan)

Naohiro Masukawa - JMP User Community