Discussions

bi0 · Mar 17, 2020 9:38 AM

Hello all,

I am trying to perform a multivariate partial least squares on my dataset in JMP. This dataset consists of 10 rows and 59740 columns (FT-IR data, which has many thousands of "discrete" measurements to create a continuous curve). I get the following error when I try to run the PLS, setting 59738 columns to the X factors and two columns to the Y response.

This dataset is structured in the same way as the Baltic sample dataset but with many more measurements.

Does anyone know how to resolve this? It works if I only select a small number of the X responses and not all 59738 of them. Have I exceeded the capacity of data JMP can handle?

Thank you

ian_jmp · Mar 17, 2020 01:52 PM

Probably best to email support@jmp.com referencing this thread.

From a methodological point of view, have you plotted the data? Do you expect all regions of wavelength/frequency to discriminate between the responses?

bi0 · Mar 20, 2020 10:00 AM

Hey Ian, thanks for the reply. I will email JMP support. I have plotted the data, and not all regions do discriminate between the responses. I can get the PLS to run by manually eliminating the non-responsive X factors (which actually reduces the number of columns to consider from some 59,738 to near 8,000). However, this is non-ideal, and I would much rather be able to find a systematic/computed method for eliminating non-responsive X factors rather than doing it myself (which by definition, is prone to overfitting and may be modelling some spectral noise).

Somehow, I would like to use scaled and centred model coefficients to decide on useful wavenumber ranges ((criteria e.g. coefficient >= 5% of max?). I don't know how to do this, and if you had any ideas it would be greatly appreciated.

bi0 · Mar 20, 2020 12:14 PM

I realise now that I can do this by running the PLS, then Making a Model using VIP. However I get some strange results when I do this on my X-reduced dataset. It decides that the number of factors to minimise PRESS is 0, and therefore I cannot extract any meaningful information. This is in contrary the initial PLS, which decides on seven factors and has reasonably strong predictive power for X (97.7%) and Y (99.9%). It seems that removing non-important X factors makes the model worse, which does not seem to make sense to me. Perhaps I am misunderstanding this result?

Bill_Worley · Mar 17, 2020 02:33 PM

I agree with Ian on contacting support, but there are couple of other options you can try out along the way.

It looks as if you are using JMP Pro. Have you tried Generalized Regression? You can limit the number of important intensities using a penalized regression approach like Elastic Net.

Also a good way to look at spectral data is to use Functional Data Explorer using Rows as Functions. Using your outputs of concentrations of imines and amines as "Z Supplementary" will allow you to use the Functional DOE capability (JMP Pro 15) to get a Generalized Regression model of your data. Make sure to put your sample ID column in to fit the individual curves and use a P-Spline to fit the model. This might take a little bit with 59,740 columns, but I believe it is worth a shot.

HTH

Bill

P_Bartell · Mar 17, 2020 06:03 PM

Another question for you...are the 'zero' responses really numerically zero, or just placeholders for maybe a 'missing value'? Zeros just look odd to me in the context of the other values in each column. JMP needs to know if they are truly zero (which it looks like that's how they'll get treated by your analysis so far) or if they are missing, then how would you like to handle the 'missingness'?

bi0 · Mar 20, 2020 10:03 AM

Thanks for highlighting this, they are true numerical zero values, and not missing datapoints.

P_Bartell · Mar 20, 2020 10:17 AM

OK...the zeros are actual numeric data. I'm wondering based on the error message if somewhere in the long list of over 50,000 predictor variables you've somehow got a column(s) with entire missing values? If I'm interpreting the error message correctly calculating the mean for missing values is the method of choice for missing values. Have you run any of the missing value exploratory platforms to see if in fact this is the root cause of the error? If you had a column of complete missing values, the mean can't be created. If missing values isn't the issue...then I'm at a loss and think along with @ian_jmp and @Bill_Worley reaching out to JMP Technical Support might be the best recourse.

I (during my tenure as a JMP systems engineer) once had a customer that was encountering a similar 'error' type message and the root cause was missing values scattered throughout the data table that rendered execution of the analysis platform she was trying to use impossible.

Discussions

Partial Least Squares

Re: Partial Least Squares

Re: Partial Least Squares

Re: Partial Least Squares

Re: Partial Least Squares

Re: Partial Least Squares

Re: Partial Least Squares

Re: Partial Least Squares

Recommended Articles