Hi @utkcito ,
Now I understand what you were originally meaning by "bootstrap" -- running the bootstrap platform is a good approach, but you'll also want to run predictive models with the neural net, GenReg, boosted trees, and even get the XGBoost add-in to compare the different prediction formula on a validation test set. When doing that, you'll want to be sure and tune your settings so that you get the best fit -- the default fits are not always that good.
But, one nice thing about the trees approach is that it is an aggregated approach that uses many different decision trees to come up with a model. If the factors aren't very important in their contribution, they will automatically get pushed down and will not show as having a high contribution to the overall model. Still, due to the random starts and decision process, the bootstrap forest and boosted trees approaches are not particularly stable -- when you run the same settings many hundred times, you'll see that you actually get slightly different answers and formula. That is why it's good to bootstrap the "SS" or "portion" in output report (in bootstrap forest, for example) to get a better estimate of how strong that factor really is in the model.
That being said, you will want to make sure that you use cross validation methods to make sure you don't overfit. And, with a relatively short data table (not that many measurements), it'll be hard to do it without using a leave-one-out approach, unless you use the autovalidation method I mentioned in the previous reply. K-fold cross validation might also be a good approach for you.
I'm not sure what your input data looks like, but it almost sounds like it's what JMP calls "functional data", like spectra or other regularly spaced data where each wavelength (or x-value) is correlated with it's neighbor. You might want to look into using the functional data explorer to generate a better subset of predictors. Or, PCA is also a good approach to reduce down the dimensionality of your predictors and use a smaller set of orthogonal vectors to model your data.
As to your points:
1) Is there a theoretical (biological, chemical, physical) reason to include so many different cross terms from the *2 factorial approach? If you don't have a theoretical reason to include them, then it's not very meaningful and you could be introducing a false predictor.
2) In the PLS platform, it's important to also look at the cumulative Y response under the "Percent Variation Explained" in the report. If the linear combinations of your predictors are not explaining much of the response in Y, your % variation explained will be quite low, indicating your predictors are not doing a particularly great job as a linear combination of explaining the variation in Y, and a different platform will probably work out better. With so few response measurements, trying to fit such a large number of predictors is probably going to lead to overfitting because you don't have enough degrees of freedom. Along with PLS, you might want to consider reviewing the Factor Analysis platform under Multivariate Analysis. I'm thinking that your cytokine variables might be too many, and you lose your degrees of freedom. Imagine having two points and fitting a line, or three and fitting a parabola -- you get a perfect fit because that's exactly the minimum number of points needed to fit those polynomials, but, you lose any degrees of freedom and can no longer estimate errors or fit quality. I'm thinking this is what's happening in the PLS platform.
3) The community pages are the best. I am always checking them, especially for JSL scripts as I code in JMP. Almost always, there is a snippet of code that someone else has solved that is exactly what I need for my work. I also enjoy running through the questions and seeing what I can help with, especially when I get to learn something new.
Best of luck!,
DS