Discussions

dkraeuter_sunne · Sep 21, 2023 02:58 PM

There are parts that have a normal two mixture distribution (I have included two examples of the type of distributions that I am seeing in the attachment below). I would like to label the parts based on whether they are included in the first or second part of the distribution. Is there a way to do this?

Victor_G · Sep 22, 2023 02:32 AM

Hi @dkraeuter_sunne,

May I suggest to have a look at Normal Mixtures clustering ? It may be very appropriate and relevant for what you intend to do, as you would like to group points based on distribution similarity.

I tried to reproduce the type of distribution you have :

When you go to Analyze, Clustering, Normal Mixtures and specify your distribution as Y and the number of clusters to 2, you can find these results :

From there, you can save the probabilities of each point belonging to each cluster (in the red triangle options, "save cluster formula" to get the labels about first or second part of the distribution and "save mixture probabilities" to have more details about the cluster decision).

More info and explanations on Normal Mixtures clustering : Clustering methods for unsupervised machine learning (jmp.com)

Attached you'll find the toy dataset I created and used, I hope it will help you.

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

View solution in original post

peng_liu · Sep 21, 2023 10:40 PM

After you fit a 2-normal mixture, you should get mu1, mu2, sigma1, sigma2, p1, p2, such that the fitted distribution is defined as: F(x) = p1 * Normal Distribution(x, mu1, sigma1) + p2 * Normal Distribution(x, mu2, sigma2).

The Density function is: f(x) = p1 * Normal Density(x, mu1, sigma1) + p2 * Normal Density(x, mu2, sigma2).

One approach to define the label is by comparing p1 * Normal Density(x, mu1, sigma1) and p2 * Normal Density(x, mu2, sigma2). If the first one is larger, assign 1, otherwise, 2.

Therefore, you can create a formula column to calculate that.

Meanwhile, that approach has been implemented in Life Distribution's "Fit Mixture". After fitting a mixture, use "Save Predictions" to get the p1 * Normal Density(x, mu1, sigma1) and p2 * Normal Density(x, mu2, sigma2). And it is up to you how to handle it. You can use a simple comparison like what I described. Or you can use the values in more sophisticated ways.

Victor_G · Sep 22, 2023 02:32 AM

Hi @dkraeuter_sunne,

May I suggest to have a look at Normal Mixtures clustering ? It may be very appropriate and relevant for what you intend to do, as you would like to group points based on distribution similarity.

I tried to reproduce the type of distribution you have :

When you go to Analyze, Clustering, Normal Mixtures and specify your distribution as Y and the number of clusters to 2, you can find these results :

From there, you can save the probabilities of each point belonging to each cluster (in the red triangle options, "save cluster formula" to get the labels about first or second part of the distribution and "save mixture probabilities" to have more details about the cluster decision).

More info and explanations on Normal Mixtures clustering : Clustering methods for unsupervised machine learning (jmp.com)

Attached you'll find the toy dataset I created and used, I hope it will help you.

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

peng_liu · Sep 22, 2023 08:56 AM

Thanks, @Victor_G Good to know that function. And the clustering bakes the decision rule directly in the result!

Discussions

Normal Two Mixture Distribution Labeling

Re: Normal Two Mixture Distribution Labeling

Re: Normal Two Mixture Distribution Labeling

Re: Normal Two Mixture Distribution Labeling

Re: Normal Two Mixture Distribution Labeling

Recommended Articles