So, I have a large dataset of assay data covering 42 different elements. The concentrations and ratios of each assay vary with both the amount of each mineral and each mineral's (variable) chemistry. I am trying to predict one of these elements by variations in the rest of the elements. My gut says that only a few of them really have any influence and that it is likely ratios of some elements relative to ratios of other elements will be the key to my problem. I have already done the brain dead scatterplot matrices to see if there are any obvious linear regressions that I can pull from the data. Some very weak ones are there but are still largely shotgun patterns to which I apply wishful thinking.
So, what I am looking for is simply a recommendation on what kind of statistical analysis you would recommend to at minimum identify a pared down list of which elements have the greatest influence. From there I can work with random ratios. I am guessing that I will somehow have to do some sort of multidimensional multivariate analysis like MDA, cluster, or principal component type analysis? Any recommendations you could provide would be most helpful.
I think you are on the right track with your thinking so far. Three other thoughts:
1. If you are running JMP Pro, I would definitely look at the Generalized Regression platform and the penalized regression methods of Ridge, Lasso and Elastic Net. These methods are very effective at variable identification and model selection.
2. Or try partial least squares too. Can be effective at variable identification, dimensionality reduction, and model selection. The PLS deployment in JMP Pro is much richer than the PLS deployment in JMP for things like model specification, cross validation, etc.
Both platforms have the usual profilers, simulators, etc. that many find useful for both evaluating a model and actually using it.
3. Also, again if you have JMP Pro, don't forget about the Model Comparison platform. Many find that a useful 'one stop shopping hub' for simultaneously evaluating multiple models.