cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
KarlA026
Level II

Best approach and alternative to simple logistic regression to model the probability of belonging to a group

Hello 

 

I need a little help on picking the right approach to solve my pronlem.

 

I have a continuous variable X than can take any value between [-45 0].

I have a dataset of X values for 12 distinct groups, see below their histogram.

 

I'm trying to build two models:

1) A model that, for a given new value of X gives me the probability of belonging to each goup and gives me the most likely group.

2) A model that, for A new entry of X that is suppose to belong to a given group, give me the probablity of this entry not being for that said group.

 

I initially thaught logistic regression was the best approach, at least for 1) but for 2) I'm not sure. In particular for groups with a very narrow distribution like K for instance.

 

Thanks in advance!

 

KarlA026_1-1718347298320.png

 

14 REPLIES 14
MRB3855
Super User

Re: Best approach and alternative to simple logistic regression to model the probability of belonging to a group

Hi @KarlA026 : I'm not sure here. Why do you need to "estimate, for a given fixed X value, the probability of that point to belong to a given Group."?  i.e., how do you observe X?  Is the precise value of X unpredictable (the output of some process that has some uncontrolled source of variability perhaps) or can you select X as you please? And how is this model, whatever the model is, to be used in practice? Can you provide some details?  

KarlA026
Level II

Re: Best approach and alternative to simple logistic regression to model the probability of belonging to a group

Thanks @MRB3855!

I should have start with that. X is an intrinsic material property of a raw material that can be obtained through a specific analysis. The groups are different suppliers of this raw material.

The two scenarios correspond to cases:

1) We have a material for which the supplier is unknown and we need to evaluate the most probable supplier (group) based on X.


2) A supplier of the raw material is provinding a sample and we want to evaluate what’s the probability of this material being effectively form that supplier or, in contrario, the probability of that material coming from somewhere else. All of that based on the value of X for that sample on the reference database shown in the figure.

Hope this helps!
MRB3855
Super User

Re: Best approach and alternative to simple logistic regression to model the probability of belonging to a group

Hi @KarlA026 : Yes, that is helpful. Thanks!

 

The idea, as I understand it, is to build a probabilistic model that is based on the data in the figure. That model can then be used to estimate the probability of that sample coming from each of the suppliers, respectively (based on the observed value of material property X).  

 

Case 1 (material from unknown supplier): You measure X. Then once you have the observed value of X, just apply to model you've already built (logistic/ANN/whatever) to predict the most likely supplier (supplier with highest probability).

 

Case 2 (material from known supplier): If you know the supplier and you know X, the probability of this material being from that supplier (or, not) is irrelevant (though it can be calculated, same as in case 1 above). Here is an example. Let's suppose you look outside and it is raining. But, an hour ago you checked the weather forecast and it said, then, that the probability of rain today is 10%. All you can really say, from a binary prediction point of view, is that the model got it wrong; from a probabilistic view, however, the model wasn't wrong (it didn't say the chance of rain was 0%). But, either way, it's raining...probability doesn't enter into it. So, once something has occurred or is known, there can be no probability associated with it. In your case, once you know the supplier, there can be no probability associated with it...though you can calculate it (just as in case 1 above) to see how well the model predicts. So, if the model predicts supplier M, but you know it actually came from supplier F...all you can say, from a prediction point of view, is that the model got it wrong. That said, do you have some reason to distrust a supplier? e.g., is there a chance you get a sample from supplier D that is actually from supplier B? If so, that is a different situation, with an easy solution (don't use supplier D anymore!).     

KarlA026
Level II

Re: Best approach and alternative to simple logistic regression to model the probability of belonging to a group

@MRB3855Many thanks!

To answer your question. Yes in our scenario we assume some raw materials may be coming from other supplier that the one stated (or could be mixed with another source). The aim here is to obtain some kind of certification based on X.


Also, since I have only one predictor here, is ANN an acceptable method? I always thought this was for multivariate issues only.

MRB3855
Super User

Re: Best approach and alternative to simple logistic regression to model the probability of belonging to a group

Hi @KarlA026 : Ah, I see. Good luck! 

 

Certainly, if X = -5 and it is "from supplier D", then I'd be very suspicious based on your plot above. But, what if X = -24?  Then perhaps several suppliers are plausible with little ability to accurately discriminate between them. I think X alone is, in general, not a good discriminator. Perhaps there are other material properties?

 

Sure, ANN is OK; from a purely statistical perspective, I see no reason why you can't use 1 predictor. Though, as I say above, the X you are measuring is, in general, not adequate to discriminate between groups.