cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
Moukanni
Level I

PLS validation & variable loadings

Hello JMP community, 

 

I have a couple of questions about PLS:

- Do I have to hide a portion of my data manually, or is it done automatically when I choose cross-validation?

- Is there a threshold of variables loadings on factors that distinguish the most important variables captured by each factor (latent variable) 

 

Thank you for your assistance!

 

1 ACCEPTED SOLUTION

Accepted Solutions
Victor_G
Super User

Re: PLS validation & variable loadings

Hello @Moukanni,

- I am not sure what is your objective behind masking your data manually and/or choosing cross-validation.
If you want a test set (a set not used by the model for training, and not seen during validation) to assess how the PLS results are on "new"/unseen data (and provided you have a large dataset), then yes, you can hide manually a portion of your dataset (hide & exclude the rows, run the model, save the prediction formula, and compare predicted vs. actual responses on this hidden dataset), or if you have JMP Pro, create a validation column (in "Analyze", "Predictive Modeling", "Make Validation Column") where you'll specify the proportion of rows in your training, validation and/or test set.
If you want to validate your model through a K-fold cross-validation, that means JMP will automatically split your dataset in K parts, train the PLS model on K-1 parts, then validate it on 1 part, and repeat this operation so that each "part" (fold) has been one time a validation part and K-1 times a training part. This is a good validation technique if you want to assess the robustness of your model (different training and validation sets compared) on a small dataset.

- Not sure on the second question too, if you want to know which factors are the most important in the PLS model, you can have a look at the variable importance plot and the computed VIP scores. See : Variable Importance Plot (jmp.com) and VIP vs Coefficients Plots (jmp.com)

I hope it will help you !

Victor GUILLER
Scientific Expertise Engineer
L'Oréal - Data & Analytics

View solution in original post

2 REPLIES 2
Victor_G
Super User

Re: PLS validation & variable loadings

Hello @Moukanni,

- I am not sure what is your objective behind masking your data manually and/or choosing cross-validation.
If you want a test set (a set not used by the model for training, and not seen during validation) to assess how the PLS results are on "new"/unseen data (and provided you have a large dataset), then yes, you can hide manually a portion of your dataset (hide & exclude the rows, run the model, save the prediction formula, and compare predicted vs. actual responses on this hidden dataset), or if you have JMP Pro, create a validation column (in "Analyze", "Predictive Modeling", "Make Validation Column") where you'll specify the proportion of rows in your training, validation and/or test set.
If you want to validate your model through a K-fold cross-validation, that means JMP will automatically split your dataset in K parts, train the PLS model on K-1 parts, then validate it on 1 part, and repeat this operation so that each "part" (fold) has been one time a validation part and K-1 times a training part. This is a good validation technique if you want to assess the robustness of your model (different training and validation sets compared) on a small dataset.

- Not sure on the second question too, if you want to know which factors are the most important in the PLS model, you can have a look at the variable importance plot and the computed VIP scores. See : Variable Importance Plot (jmp.com) and VIP vs Coefficients Plots (jmp.com)

I hope it will help you !

Victor GUILLER
Scientific Expertise Engineer
L'Oréal - Data & Analytics
Moukanni
Level I

Re: PLS validation & variable loadings

Thank you, Victor! this helps a lot! 

My objective is to validate the model through K-fold cross-validation. 

For the second question, I'm referring to the PLS X loadings on each factor; is there a commonly used threshold that highlights the variables belonging to the same system. For example in exploratory factor analysis, variable loadings (> 0.4) on a given factor suggest that these variables highly likely come from the same system.

 

Thank you so much!