cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
KarlA026
Level II

Best approach and alternative to simple logistic regression to model the probability of belonging to a group

Hello 

 

I need a little help on picking the right approach to solve my pronlem.

 

I have a continuous variable X than can take any value between [-45 0].

I have a dataset of X values for 12 distinct groups, see below their histogram.

 

I'm trying to build two models:

1) A model that, for a given new value of X gives me the probability of belonging to each goup and gives me the most likely group.

2) A model that, for A new entry of X that is suppose to belong to a given group, give me the probablity of this entry not being for that said group.

 

I initially thaught logistic regression was the best approach, at least for 1) but for 2) I'm not sure. In particular for groups with a very narrow distribution like K for instance.

 

Thanks in advance!

 

KarlA026_1-1718347298320.png

 

14 REPLIES 14

Re: Best approach and alternative to simple logistic regression to model the probability of belonging to a group

Depending on the data set size, you might have a look at the decision tree methods.  Specifically Bootstrap Forest (random forest in the literature). 

KarlA026
Level II

Re: Best approach and alternative to simple logistic regression to model the probability of belonging to a group

Thanks @MikeD_Anderson . Will have a look!

 

I have about 800 entries but some group have less that 20 observations.

 

However I just have acces to JMP (not Pro) so I'm still interested in other alternatives with continuous prediction formula. Would ANN work?

 

Thanks

MRB3855
Super User

Re: Best approach and alternative to simple logistic regression to model the probability of belonging to a group

KarlA026
Level II

Re: Best approach and alternative to simple logistic regression to model the probability of belonging to a group

Thanks @MRB3855 . I get better results with ANN. But still I can help thinking I'm using to sofisticated tools when I have only one (continuous) predictor only.

 

For approach 2) What about if I fit a distribution model to a group (Normal or other) and use the model formula as a prediction formula?

MRB3855
Super User

Re: Best approach and alternative to simple logistic regression to model the probability of belonging to a group

Hi @KarlA026   What do you mean by ANN “better” than LR?

And, wrt (2), you are flipping what is random from (1). In (1), group is random. In (2), the way you phrased it, X is random. That’s why I suggested you may want to consider DA.

 

I’m not sure what you mean by “fit a distribution model to a group (Normal or other) and use the model formula as a prediction formula?”. Can you clarify?

KarlA026
Level II

Re: Best approach and alternative to simple logistic regression to model the probability of belonging to a group

Thanks @MRB3855 and @MikeD_Anderson 

For ANN, I mean that when using X to predict the group, I have less Missclassified results than with LDA. This because LDA just gives a similar model that with logit, not allowing for a non-linear model (see profilers differences in attached images in an example for Group K).

 


As you can see the profilers for the ANN model is close to the distribution of Group K. My suggestion of fitting a distribution may not be relevant. It’s just that when I tried to fit a normal distribution model to group K for instance, I was able to save the model formula as a new column and use that as an indicator column for any new entry to evaluate the probability of this new entry of belonging to group K.

 

I just want to estimate, forr a given fixed X value, the probability of that point to belong to a given Group.

Hope I’m clear now!

Thanks!

 

KarlA026_0-1718612306141.png

KarlA026_1-1718612467686.png

 

 

KarlA026
Level II

Re: Best approach and alternative to simple logistic regression to model the probability of belonging to a group

@MRB3855 , @MikeD_Anderson,

 

Any thaughts? Thanks in advance!

MRB3855
Super User

Re: Best approach and alternative to simple logistic regression to model the probability of belonging to a group

Hi @KarlA026 : Unfortunately, I'm not sure I have a great solution. And, in case one of my comments above flew under the radar, I'll expand on it below. 

Above, a few posts up, I said:

"And, wrt (2), you are flipping what is random from (1). In (1), group is random. In (2), the way you phrased it, X is random.".

 

(1) In Logistic Regression or ANN you are estimating Prob(of being in a group given some value of X), which, for brevity, can be written P(G|X) and read "Probability of Group membership given X". So here, group is the random component. X is fixed (i.e., chosen). So, for a given X, you then estimate the probability of being in each group. 

 

(2) What you "just want to estimate" is very different. The way you've stated the problem, you are asking Prob(X belongs to a group, given a group) = Prob(X|G) read "probability that X belongs to a given group".. Here, X is random and group is fixed. So, what is the probability of X being in a given group.  Here, you are choosing the group, and asking what is the probability that X is in that group. 

 

P(G|X) is very different than P(X|G), just like Prob(Person A has the disease, given Person A tested positive for the disease) is very different from Prob(Person A tested positive for the disease, given Person A has the disease).

 

All said: Looking at your plot at the top, if you choose an x value, say -30 and draw a line straight up. You can see the percentage of area under the curve to the left of -30 for group M is about 50%. For group D it is about 20%, etc.   Is that the sorta thing you were thinking  when you said “fit a distribution model to a group (Normal or other) and use the model formula as a prediction formula?”

   

KarlA026
Level II

Re: Best approach and alternative to simple logistic regression to model the probability of belonging to a group

@MRB3855 thanks for the clarification.
I guess I had that in mind but they I formulated it was not very rigorous.

Anyway, knowing what would you advise to evaluate/model P(G|X) or P(X|G)?

Indeed what you describe in the last paragraph seems close to what I had in mind.