Developer Tutorial: Modeling Spectral Data Using JMP Pro 17

1 Kudo

JMP Pro makes it easy to solve many kinds of problems involving data that is inherently functional in form, such as:

Time series data
Sensor streams from manufacturing processes
Measurements taken over a range of temperatures
Spectra: IR, Chromatography, Mass Spec, Nuclear Magnetic Resonance

See an overview of the basics of functional data analysis, with emphasis on analyzing functional response designed experiments, highlighting functionality that has been available in JMP Pro 16 using Functional Data Explorer. Functional Data Explorer uses data efficiently and yields results that are easy to interpret and where shapes of the curves/spectra are the primary object

Then, learn about capabilities added in JMP Pro 17 to facilitate the analysis of spectral data that are especially useful to solve problems prevalent in chemical, pharmaceutical and biotech industries.

Questions answered during the live webinar by Chris @chris_gotwalt1 and Ryan Parker @RyanParker:

Q: If you run PCA, JMP would still give you the same Eigenvalue?

A: Functional PCA is performed on the model coefficients to reconstruct the shape functions. So, it is a different operation from normal PCA to respect the order of the input space.

Q: If I am trying to maximize the entire curve, should I unlock the time? Or is there a different method to maximize the curve?

A: You might have to get creative with the desirability function, or to start try a few select time points to see if there is a consensus. It’s also possible that you might have to use “Save Summaries” and create a summary of the response to try and maximize in a Profiler.

Q: When modelling (NIR) spectra with concentrations of what was inside, would you recommend rather wavelets or splines?

A: We would recommend trying Wavelets first.

Q: I know in Fit Curve you can get similar capability for DOE factors. Is there a vision of adding an option in the drop down under Wavelets as Non-Linear Model? Are more non-linear models planned to be added? Currently in Fit Curve it is limited and is not in the Fit.

A: Agreed. The Fit Curve platform is fortunately easy for us to add new functions to. If you have suggestions, send them in to support@jmp.com or to the Wish List on the JMP User Community.

Q: Is the target function copied and pasted to the end (labeled as Target)? So, it is still used in the analysis, but also used as the target?

A: After you load a target function, it is treated as validation data. So, it is no longer used when fitting the models, and instead held out of the analysis.

Q: So only loading curves/spectra without any Z information would not use the full potential of the FDE? Or are there use cases for this?

A: Suppose you're looking at curves from a manufacturing process, and you want to identify the kinds of typical variation to start to understand a big multi-function data set. In that sense, you would use FDE in much the same way as principal components is used for vector multivariate data. There is also a kind of analysis that we're not talking about here today, where you might have the functions as inputs and you might not be trying to predict a function, but instead using functions to predict a key quality attribute. In that case, you would load up those as supplementary variables and then do a machine learning exercise at the end to predict the key quality attribute using the functions as inputs.

Q: Is the target function copied and pasted to the end (labeled as Target)? Is it still used in the analysis, but also used as the target?

A: After you load a target function, it is treated as validation data and is no longer used when fitting the models. Instead, it is held out of the analysis.

Q: When I size a regular DOE, I use Fraction of Design space and look at the 80% FDS value to determine the sample size required for a given starting model with a given level of random noise. When I’m doing functional DOE do I need to account for the fact that my response will be a curve instead of a scalar? In other words, do I need to have a larger sample size to account for the extra step of converting the data to smooth functions? I’m just trying to get a feel for determining how many curves I’m going to need for Functional DOE.

A: Behind the scenes, we're converting the functions into scalars. So, I think that the design diagnostics that I would use would be the same so as before. So, if you're using FDE plots to assess the quality of the design that's being proposed by the custom designer, I think you can still use that because we're building Functional DOE analysis on top of the scalar analyses that are available. Really everything that you've learned about scalar DOEs should extend nicely to Functional DOEs.

Q: Is it possible to use the raw data, without preprocessing the data?

A: Yes, if you don't think the raw data need to be cleaned up, subset, or transformed, you can go straight to the Smoothing Step without making any changes to the data.

Q: What is a Regular Grid?

A: A regular grid is where the gaps between consecutive X's is the same. For example, {0, .5, 1, 1.5, 2, 2.5, 3}, as the set of X values is a regular grid. An irregular grid could be something like {0, .2, .7, 1.1, 1.2, 1.3, 2.4, 2.9}.

Q: What's the difference between, say, Symlet6 and Symlet20?

A: Generally, the index, like 6 versus 20, leads to increasing complexity. If you go to https://wavelets.pybytes.com/wavelet/sym2/ you can see different pictures of the different symlets.

Overall, I haven't really found it necessary to learn about the different wavelet types and orders. You can just let the data decide which is the best and look at the diagnostic plots to ensure that the fit is good. In general, you will want to spend more time investigating the number of Shape Functions that I want to work with.

Q: I believe all shown wavelets integrate to zero over the entire range because they dip below the zero-level baseline, whereas peaks integrate to a positive area under a peak. If am correct, how do you close this qualitative gap between the modeled function and the model?

A: That comes about through the discrete wavelet transform of the data.

Q: To perform FDA in JMP, do all the curves need to have similar shapes? What if I have different types of shapes (e.g., normal, exponential, bimodal, skewed, ...) in one data set?

A: No. They don’t need to be similar. The smoothing model with fit the very different shapes without difficulty, and the big differences between the widely different shapes will turn up in the Functional Principal Components Analysis.

Q: Can we save the shape functions and predictive model to use with new (future) data (e.g., new spectra)?

A: Yes. In the wavelet model fit there is an outline node called "Function Summaries" you can use to save out the predicted model, and you can also save the Functional DOE prediction model that you can use to work with the function DOE model directly outside of the platform ( e.g., in the standalone Graph->Profiler ).

Q: Can we get confidence intervals from the Functional DOE?

A: That question has come up several times. We have some ideas for how we can do this, but we need to do more research to make uncertainty analysis like confidence intervals possible.

Q: Are the mixture variables set as such in the column info, with coding and all?

A: The Mixture Coding column property is there because the design was created by the Custom Designer, which creates them automatically for you.

Resources:

Functional Data Explorer Documentation
Wavelets in Functional Data Analysis by Morettin, Pinheiro and Vidakovic covers mathematical details of wavelets
Spectral Tools Add-In from Worley and Ash
Pre-Processing Spectral Data blog by Worley and Ash
Analyzing Spectral Data: Multivariate methods and advanced pre-processing blog by Worley and Ash

gail_massari · ‎12-12-2022

Chris @chris_gotwalt1 and Ryan @RyanParker ,

Michael Nazarkovsky @Nazarkovsky sent me an email, which I am putting here for you. Michael, thanks. We are trying to use the Comment area so others with same issues/thoughts etc. can be part of the dialog.

After opening the NIR.jmp (NIR Gluten Starch Study) and the journal (Mastering JMP - FDE in JMP Pro 17) I saw that the item "Blue spectra have no gluten, red spectra are all gluten, and gray is a mixture" seems to be wrong, since the blue spectra are full o gluten and the red ones are associated with high levels of starch, whereas both components are present in an inverse proportion according to the data.

Please, correct me, if I am wrong. It might be just a mechanical error, while preparing the JMP journal. If yes, it's worth correcting the item on the Community's website.

gail_massari · ‎12-12-2022

Hi Michael @Nazarkovsky From @RyanParker . New journal v2 attached!

Nazarkovsky · ‎12-12-2022

Great! Thanks!

Recommended Articles