Subscribe Bookmark RSS Feed

JMP 11 Pro: Distribution Platform - Normal Mixtures. How does JMP decide how to label the distributions?

camth

Community Trekker

Joined:

Jul 28, 2015

JMP 11 Pro: Distribution Platform - Normal Mixtures. How does JMP decide how to label the distributions?

I fit a mixture of 3 normal distributions to a (single) response. The 3 distributions identified by JMP are labelled 1, 2 and 3 in the JMP output, i.e. (π1, μ1, σ1), (π2, μ2, σ2) and (π3, μ3, σ3). How does JMP decide which of the 3 identified distributions to label respectively 1, 2 and 3? Is it based on the relative values of the μ’s ? Or is it eg. so that I can be most ‘confident’ in the distribution labelled 3 and less ‘confident’  in the distribution labelled 1 (i.e.  iterative like e.g.: first the distribution labelled 3 is identified as the most ‘obvious’ distribution in the data. Then this distribution is ‘filtered out’ and JMP identifies the distribution labelled ‘2’ in the ‘rest’ after distribution 1 is filtered out. Finally, the distribution labelled ‘1’ is identified after distribution 3 and 2 are ‘filtered out’)?

 

 

4 REPLIES
volker_kraft

Staff

Joined:

May 29, 2014

"The answer is that there is no ordering given to the distributions.  Fitting the normal mixtures model is solved as an optimization problem in the mixture proportions, means, and variance matrices.  The user sees the parameters that are wherever that optimization algorithm lands at its last iteration, e.g. no further processing happens.  As a user I wouldn’t read much into the ordering that is given, and I would focus attention on the clusters with larger mixture probabilities."

(answer provided by Chris Gotwalt - thanks, Chris!)


camth

Community Trekker

Joined:

Jul 28, 2015

Thank you very much!

Is it possible to get a reference to the algorithm used?

MathStatChem

Community Trekker

Joined:

Sep 11, 2013

If I remember correctly, for the Distribution Platform, the cluster labels are based on the estimated cluster mean.  The cluster with the lowest cluster mean gets the label "1", next highest gets label "2", and so on. 

You can see an example of where I took advantage of that in this blog post http://blogs.sas.com/content/jmp/2013/05/01/is-your-data-too-precise/

Unfortunately, the link to the JMP add-in I created that "bins"  rows according to their most likely cluster doesn't work any more.  I will see if I can upload the add-in to the JMP User Community.

MathStatChem

Community Trekker

Joined:

Sep 11, 2013

Just uploaded the add-in.  You can find it here:  Univariate Binning using the Normal Mixtures Distribution