Hi everyone !
Here's my situation: I have a dataset of 58 batches in line, and 213 columns corresponding to my parameters Xs (ex: Glucose at culture day 1, Glucose at culture day 2 etc...) . I have couple of data missing here and there depending on the X and the batches but on the overall the dataset is quite fine.
Now I want to use PLS to put all my Xs and try to understand which of them impact my Y (I have 14 different Ys I would like to understand individually, so all the Xs versus one Y).
For some of the Ys, it works fine, but for some other, the RM PRESS Plot advises "Note: The minimum root mean PRESS is 1.25 and the minimizing number of factors is 0." and I cannot build any model therefore (for example Y_1 in the attached database).
Why is it doing that? I thought it might be linked to the missing data but then I do not understand why it would work for the other Ys?
An advantage of PLS is simultaneously modeling multiple responses with multiple predictors, especially when the number of predictors exceeds the number of observations. You might instead use rotated PCA with the predictors to make factors for the regression with individual responses.
I do not fully understand your data or your intended analysis from your initial brief description. It seems like it is worth trying rotated PCA with the individual responses.
Multivariate models like PLS are based on large numbers of variables that exhibit correlations between X, Y, and X and Y. That result might be your goal. I could not tell. It might also be your goal to identify or select variables for prediction of individual responses. Variable selection is possible with several different tools in JMP. Instead of variable selection, you might want to use variable synthesis with rotated PCA. That way you include most of the original information while reducing the dimensionality of the problem.
Let me know more about your case and I will help you if you have more questions.
I want to take a step back first. Using all Ys is obviously not working due to the missingness spanning through all rows taking all Ys into account. Looking just at e.g. Y1 and simply visualizing against the factors you can see that almost each factor has some strange points making a fit looking quite wrong. So I suspect there are some potential outliers in the data which makes life hard for PLS to get to a result, and you get the notification.
Not having tried any of Mark's suggestions, I'd make a data check if this can be true.
Indeed you're right, I'll check the data then and see after if I can go ahead with Mark's sugestion!
Thanks a lot
There are no labels assigned to this post.