Hi everyone, I'm trying to generate a prediction profiler for 86 components on its effect for 1 response "IVCC". I used the Fit Model tool in the Analyze menu, with 1 response being IVCC and 86 responses ranging from "Glycine" to "Vitamin D12" in the file. The resulting prediction profiler and maximizing for desirability in the Analysis window is able to output values for the predicted response, but no prediction curves are generated. Would anyone have ideas on how to better analyze this data? My ultimate goal is to determine the value for each of the 86 components that maximizes "IVCC". Thank you!
The prediction profiler is most useful after you select a model. Are 86 features necessary for the model?
Also, the algorithm to automatically scale the response axis is very good but it is not fool-proof. You might need to change the scale to see the profiles, especially after maximizing the desirability.
Sorry I'm a bit confused by your post. Are the 86 components input variables (x's) or are they response variables (y's)?
"86 components on its effect for 1 response "IVCC"" suggest they are x's, but "1 response being IVCC and 86 responses ranging..." suggests they are Y's. If you have that many Y's, I would be looking first to see which ones correlate (Multivariate Methods>Multivariate). If they are x's, based on the principle of Scarcity of effects, I would be first screening the 86 components to determine which ones are most important (and these are just 1st order effects?). There are many ways to do this (I would start with hypotheses). Once you have identified the significant factors, then your prediction formula will be useful. I would think it would be virtually impossible to control/manage 86 variables simultaneously (think of the associated measurement errors for each x).
It appears you are not seeing the results you expect in the prediction profiler because you have a model with more parameters to estimate than degrees of freedom available. The "Singularity Details" section at the top of your fit model report is an indication of this.
In other words, to fit a model as complex as you want (estimates for the effect of 86 variables, plus the model intercept) you will need more than 15 rows of data.
An alternative is to use a model that is tolerant of having less data than the number of variables you wish to use (such as Partial Least Squares), or to do variable selection to reduce the set of variables for X.
I hope this helps!
I like @julian 's answer, that's a good way to go.
I would probably use Analyze>Screening>Predictor Screening to narrow down the list of good candidate parameters. With only a few rows of data, it would probably be a good idea to run this platform multiple times to see if you get a stable set of predictors. When the response is mostly noise, the set of predictors will be fairly random from run to run.
Alternatively, Partition platform (one tree at time) is a good way to get a handle on the key drivers of you response too.
As far as methods for variable selection, PLS and the Tree methods do a pretty good job. If you happen to have access to JMP Pro, there are some other very good options using generalized regression.