Re: Combining models

abmayfield · Jun 8, 2023 5:39 PM

If I learned one thing from Discovery Summit Americas this year, it was how common it is to average and/or create ensemble models, something I had heard about but never considered (being a modeling novice). So I decided to do it. Making an ensemble model is easy; I saved the model as a prediction formula and simply included that as a predictor in the next round of model screening (using JMP Pro 16's model screen). What would seem to be simpler, model averaging, confuses me. If I find two models that I like, save both as prediction formulas to my data table, and then create a new column in which they are combined, it seems that the probability gets combined, but the formula is not in a syntax in which I could apply it to new data (it only shows the averaging of the probabilities). If I publish both models to the Formula Depot, I can average them that way pretty easily, so that's my current workaround, but shouldn't I ALSO be able to average them in the table itself? I think the issue might be that, because the prediction formula has a formula AND a probability, when I go to combine them, only the probability gets averaged, when I actually want to average the formulas themselves. Can I average both in a way that would allow me to copy that averaged formula into a new data table and test it with new data? If not, Formula Depot was designed to do just this, so it's not a big deal. I just want to have a few options! Maybe I just remove the probability part to where ONLY the formula is left in the column's information cell?

Anderson B. Mayfield

dale_lehman · Oct 8, 2021 09:18 AM

From your description, it sounds like you want to average the predictions (discrete) rather than the probabilities (continuous). I've not seen it done that way. There is a technique, conformal prediction, that is designed to see how robust the predictions are from a model. But what you are describing would look like the following: suppose there are 2 classes to predict and you code them as continuous, 0 and 1. Suppose you have 3 models: 2 of them predict a 0 for a particular case and 1 predicts a 1. Then, your "averaging" would produce a result of 1/3 which I guess you would take as the probability of getting a 1. Interesting idea but I'm not sure what statistical basis such an approach has. Also, it seems it would be limited to binary classifications (although that is only a limit). I'll be interested to see what other responses you get.

SDF1 · Oct 8, 2021 09:29 AM

Hi @abmayfield ,

Liked your talk, had to watch it on demand though, couldn't make it to the live viewing and Q&A.

Normally, you would be able to average the prediction columns if you're doing a typical regression formula -- something that is numeric. You would simply take the average of the two. But, since you are working with a categorical value yes or no (or even maybe), it becomes more complicated because it has to transform the numerical X inputs to a logistic function that it then finds the probability and compares the outcomes with an IfMax() function.

Take for example the Big Class file. If you try to predict :sex with :height and :weight, you get a logistic output function, in my example, I'm trying to predict sex = M. There are intermediate functions though that are used to determine the Prob[M] and Prob[F]. The linear function is just that, something like Lin[M] = a + b*:height + c*:weight, where a, b, and c are just fit coefficients. To get the probabilities JMP then uses the transformation 1/(1_exp(-Lin[M])) for Prob[M] and 1/(1+exp(Lin[M]) for Prob[F], which is just 1-Prob[M]. The most likely outcome is then MaxIf(Prob[M]=>M, Prob[F]=>F) and whichever Prob[y] is bigger is the most likely outcome, M or F.

In your case, if you work with categorical outcomes, I think you'll need to average the Lin[Y] functions in order to then transform it to a probability. It's the linear function which is the fit to the data, not the probability function, so that's why you need to average before you transform the linear function.

If you can change your outcome to a continuous response, then a simple average of prediction formulas is straightforward. As long as the modeling type is nominal or ordinal, then JMP will treat it as discrete levels and perform logistic modeling each time.

One idea I just had to maybe use a continuous scale is to represent the percent of bleaching within a given area. Let's say that you asses 100 cm^2 (a 10cmx10cm patch) of coral and determine that about 2cmX3cm (or 6cm^2) is bleached, then the percent bleaching would be 60%. By doing it this way, you might get a more accurate representation of how much of a coral specimen is bleached and you now have a continuous response that can be modeled with regression vs classification.

Hope this helps!,

DS

P.S. The Discovery Summit was great. I liked a lot of the talks. Also, I'm updating my GUI to automatically generate the tuning tables based on user input. I'll let you know when the new version is ready. It should help to automate the process even more by generating a space filling design with the number of runs you wish to perform.

abmayfield · Oct 8, 2021 12:56 PM

Yes! I believe, as usual, that you are right regarding the weird nature of trying to predict a categorical variable, in which you effectively need two equations, thereby making things trickier. What I hope to do is eventually have a large enough sample size to where I have a true gradient, either 1-10, 1-5, or 1-100, the idea being the same: 1=terrible, the highest number=great. Even in this case, I either have two categories (bleaching-resistant vs. bleaching-sensitive) or sometimes three (the former two + "intermediate"). I might play around with recoding them as 1, 50, and 100 for susceptible, intermediate, and resistant, respectively, and see what kinds of results I get. I think the more samples I get and analyze, the more sense it makes to have my Y be a continuous variable. With 20 samples I'm playing with now, I don't have enough data to go beyond 2-3 bins essentially. But this will be interesting to see if my models look very different when I recode my categorical classifications into a continuous variable. It would surely make all the model-building more seamless. And in reality, in nature it will NOT be as simple as "healthy" vs. "sick;" I DO expect to pick up a gradient that spans a health spectrum like you'd see in humans (for instance, what you see in the JMP diabetes dataset). The funny thing about this exercise, though, regarding the probability score, is that I never really looked at these much; I went directly to JMP's "guess." But now looking at the actual probabilities is super interesting because that is ultimately what I want to report: coral A has a 95% chance of bleaching. So through this model averaging exercise, I accidentally picked up some cool nuances that I totally missed before! And like I said before, even if I am "stuck" with categorial responses for the time being, the model averaging feature of Formula Depot works fine for this.

You'll be happy to know that, thanks to your GUI, there are over 20,000 neural network models (21,450, but who's counting?!) in my recently submitted paper (which describes essentially phase 1 = the lab experiment). Obviously, only a handful passed all quality control, with even fewer discussed at length in the text, but it was extremely useful to see which NN parameters are essentially most important for me to tune. Also, I want to build the simplest, (potentially) most parsimonious model, so if one NN has two hidden layers, each with three activation nodes, whereas a second model has one layer with NTANH(3), all else being equal, I'd choose the latter, though I guess this isn't really critical since even the simplest one is pretty complex. One thing I sought to learn this year at Discovery, which I did thanks to Chris Gotwalt, is whether the variable importance metrics under the NN Profiler can be used to essentially tell you which analytes were most influential in the model. The old rule was that NNs are a black box, and it's way too complicated to mine which response variables were most important, but I'm glad to see this is not entirely true. Are they hard to interpret? Yes. Can you say one protein's behavior scales in a particular, easily defined way with the model's predictive power? Maybe not. But if a molecular biologist asked me: "Which coral proteins were most important in your best NN model" I could at least list off a few from the variable importance feature of the NN profiler. This is actually why I chose to discuss the gen-reg model in my talk; I wasn't sure if I could "trust" the profiler for a NN (!), but now I see that I can. But anyway, this is why, as a modeling novice, I benefit so much from these sorts of symposia. Thanks again for your well thought-out response and all your help in general.

Anderson B. Mayfield