Yes, 100% of the features are used to calculate p0. The system actually generates thousands of features, but we needed to be able to validate the algorithm for computing p0 offline. So we found (through reading the code) every feature that was used to calculate p0 and saved them. Mind you, the "calculation" for p0 as a function of the features is too complex and implicit to ever let you write it down (hundreds of binary classifiers in parallel and serial configurations, I didn't build the system). But we do know that we have captured all the necessary and sufficient feratures to do an offline simulation and validate the computation for p0. My goal is not to modify anything about the model for calculating p0. It is to see if, by chance, the very same features might hold contain some "magic" information that allows us to predict some other disease. My thought is that is unlikely, especially since the features have been generated specifically to, in some optimal and minimal manner, predict D0. There are probably some information theoretic ways to test this asssumption. Again, I need to come up with a fair and principled wat to either show that the concept just will not work or that it will. It has been easy to generate plenty of red flags (covariate drift as an example). My attempt to use some sort of modifed "Bayesian" approach is my way of using the best statistic the features have been designed to generate, and then see if there is some independent, residual, information that might help predict D1. I'm a skeptic, but I'm among a group where others (management) have a dogmatic view that machine learning can magically solvce any problem imaginable. They don't appreciate that most of these models (LASSO, ridge, PLS) are just, at the end of the day, just simple main effects (generalized) linear models. Once again, I appreciate your engagement. It has been very helpful.
... View more