Response Screening - Multiple model effects?

Sloux · Jun 8, 2023 5:27 PM

I am trying to analyze a medium / large proteomics dataset (3350 proteins) using the response screening analysis. The problem is that I have multiple model effects to take into consideration - I have treatment vs control, but I also have samples from multiple days to consider, but I clearly can't run this as a fit model.

Is it possible to take both treatment and day into consideration using JMP's response screening? I ran it with day as weight because I expect the later days to have greater importance, but I'm not sure if this is correct.

Thanks

SDF1 · Jan 21, 2021 08:55 AM

Hi @Sloux ,

As far as I understand, the Response Screening platform will only compare a bunch of Y vs X combinations, but not Y vs X1*X1 or Y vs X1*X2 kind of combinations. So, it might not be the right platform for what you are doing. It might help to understand what of your data set is Y (response) and what exactly is the X.

To me, what you describe, it sounds like you're actually hunting down what are the best predictors and combinations of predictors to get your outcome. If that's the case, you might want to consider a couple different approaches. Some of the comments below depend on whether or not you obtained your data from a DOE. Also, some of the options require JMP Pro.

You might consider using the Analyze > Screening > Predictor Screening Platform to check on what are the best predictors. This can depend somewhat on the number of trees you use, so you might want to play around with that. Even then, you'll want to run some bootstrap simulations on the Portion column to really get a better grip on which factors impact the outcome. However, this approach also doesn't allow for crossed terms.
Use the SLS platform (Fit Model with Personality SLS) then again run bootstrapping on the estimates to see which factors appear most frequently. This approach allows you to look at crossed terms. However, unless you have some kind of evidence that the crossed terms MUST be in there, it's generally recommended not to include the terms. (Here is where the DOE matters -- if the DOE shows there must be crossed terms, then include them).
Use the GenReg platform (Fit Model with Personality GenReg) then again run bootstrapping on the estimates to see which factors appear most frequently. This approach allows you to look at crossed terms. However, unless you have some kind of evidence that the crossed terms MUST be in there, it's generally recommended not to include the terms. (Here is where the DOE matters -- if the DOE shows there must be crossed terms, then include them).
Use the PLS (Fit Model with Personality PLS) this will often help you to cut back factors that fall below a Variable Importance value of 0.8 (default, but you can change this).
You also might try the Analyze > Specialized Modeling > Functional Data Explorer platform -- this depends on if your data is like a spectrum, i.e. some kind out output measurement at different wavelengths, voltages, etc. You can combine this with the Z, Supplementary optional column to include a "DOE" kind of aspect in the analysis and get some prediction profilers out of it. This can sometimes reduce the large data set of wavelengths down to just a few that are the best at predicting an outcome.
DOE Autovalidation, there is an add-in tool you can get to help with this analysis, and it is somewhat similar to the bootstrapping for the estimates in the other platform, but works slightly different.

These are just some general comments on how to approach evaluating what it sounds like you're trying to do. Depending on how many rows you have, you should consider how you might want to validate things, e.g. leave one out or k-fold, etc. I highly recommend using more than one approach and comparing them. Because the algorithms are slightly different and use a different random seed, the results are often slightly different and can help to make the final decision call as to whether or not to include certain factors and cross terms.

Hope this helps!,

DS

P_Bartell · Jan 21, 2021 11:22 AM

So I'm clear...are the 3350 proteins responses or predictors? If they are responses, some sort of dimensionality reduction technique (PLS?) might be beneficial? One of the nice features of a PLS approach is you can leverage the variance/covariance structure of the proteins (if indeed it exists) as responses, while structuring your predictors as you see fit. You may want to strike up a one on one conversation with @abmayfield who has done lots of analytics work with a highly multivariate system?

abmayfield · Jan 21, 2021 01:47 PM

You are correct: the response screening platform isn't like "Fit model" in the sense that you can include interaction terms. You would have to add temperature and time separately, and then create a separate "temperature x time" column. Even though response screening is FDR-governed, I still don't like to rely on it exclusively for finding differentially concentrated proteins. You can also use predictor screening to see which proteins contribute most to the variation across temperatures and time.

I agree with Peter in that gen-reg and PLS, although less commonly used to identify DCPs, could very well be superior since they can accommodate highly collinear datasets (such as every single gene expression or protein dataset ever generated).

But what I tell people first off if look for the multivariate trends using PCA or MDS. If you see clear multivariate temperature (or temp. x time) effects, you can then dive in and see which proteins contributed to this. You could export the MDS coordinates and then carry out MANOVA or PLS on the coordinates. This would be akin to non-parametric MANOVA. I liken this to the univariate analog: documenting an ANOVA treatment effect BEFORE doing the post-hoc tests between means. In other words, identify a global treatment effect first using multivariate approaches AND THEN try to find the proteins of interest. If you proceed directly to uncovering the DCPs without having done so, this could be like p-value mining between individual means without an overall treatment effect in the main model ANOVA.

Anderson B. Mayfield

Response Screening - Multiple model effects?

Re: Response Screening - Multiple model effects?

Re: Response Screening - Multiple model effects?

Re: Response Screening - Multiple model effects?