I routinely use response screening to quickly look at tons of comparisons of experimental factors (e.g., temperature) on protein concentrations. However, I now have binomial (presence/absence) data (either 0 or 1), and I see that you can change the distribution to Cauchy or Poisson at the bottom, but I don't see binomial. I see "Same Y scale" though I am not sure if that would do it. Obviously, I could do individual generalized linear models under "Fit model" but I feel like there could be a faster option is response screening is inappropriate. IN this case, there are only 22 proteins, so it's not the end of the world to do it manually, but for when I have hundreds of proteins, it would be nice to use something akin to response screening. GLM with a column switcher to quickly go through proteins could be one slightly slower option. Any other ideas?
I believe you are running JMP Pro and it sounds like your primary goal is variable selection? If that's the case you may also want to check out some of the penalized regression methods within the Fit Model -> Generalized Regression personality. Two that might be of particular value for variable selection are the family of Lasso or Elastic Net fitting procedures. There are also various options for specifying the underlying response distribution. Here's the online documentation web site for some additional guidance:
You might also consider using Partition. Use cross-validation so you can click Go. You are not selecting a model here, though it might seem so. Now examine the column contributions to get an idea of useful predictors.
Thank you both for your responses. I am indeed looking for the most interesting candidate proteins, and it does appear that response screening (even while potentially statistically dubious given that the data are binomial), stepwise regression, and generalized regression all basically tell me that the same series of proteins are those most impacted by the experimental treatment. I think generalized regression might actually be most robust since it penalizes for collinearity.
My question is then more statistical in nature: is it valid to use such an approach if the predictors (proteins) are binomial (0 or 1) and the response (Y) is multinomial (in some cases since I have various interaction groups)? More generally, for generalized regression, does it matter if the predictors (the proteins) are non-normally distributed (just 0s and 1s)? If it doesn't matter, then I think generalized regression could be a good solution. I've attached the data so people can see how it's formatted.