Hello @Moukanni,
- I am not sure what is your objective behind masking your data manually and/or choosing cross-validation.
If you want a test set (a set not used by the model for training, and not seen during validation) to assess how the PLS results are on "new"/unseen data (and provided you have a large dataset), then yes, you can hide manually a portion of your dataset (hide & exclude the rows, run the model, save the prediction formula, and compare predicted vs. actual responses on this hidden dataset), or if you have JMP Pro, create a validation column (in "Analyze", "Predictive Modeling", "Make Validation Column") where you'll specify the proportion of rows in your training, validation and/or test set.
If you want to validate your model through a K-fold cross-validation, that means JMP will automatically split your dataset in K parts, train the PLS model on K-1 parts, then validate it on 1 part, and repeat this operation so that each "part" (fold) has been one time a validation part and K-1 times a training part. This is a good validation technique if you want to assess the robustness of your model (different training and validation sets compared) on a small dataset.
- Not sure on the second question too, if you want to know which factors are the most important in the PLS model, you can have a look at the variable importance plot and the computed VIP scores. See : Variable Importance Plot (jmp.com) and VIP vs Coefficients Plots (jmp.com)
I hope it will help you !
Victor GUILLER
L'Oréal Data & Analytics
"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)