I have an interactive database of 67 gene levels in multiple specimens ~100. These genes were chosen for probable activity. Many of the genes are interactive with others. I am trying to determine the best group of genes that correlate with each individual gene. Using various platforms (predictor screening utility, boosted tree/forest in partition, response screening in modeling, and screening in modeling), I get a similar hierarchy of genes, which make biological sense. However, using response screening in fit model, I get a very different hierarchy, which makes much less biological sense. Any ideas why the response screening in fit model is so different.
Have you tried Neural Networks? I would definitely do that as one of my first steps. Also Partial Least Squares. I would do these and several other methods, like the ones you mentioned, and determine which method gives the best Actual vs. Predicted results, with the best RMSE. Hope this helps.
Rex, I think the answer lies in the fact that Fit Model platforms tend to give results that give the best fit (lowest RMSE) calculated from the data after the fact. Other methods are designed to give the best predictive model rather that the best fitted model.