Subscribe Bookmark RSS Feed

Has anybody used the hierarchical clustering function in JMP for all ordinal data and tried how well the non-Ward options work?

sibylle_herzer

Community Trekker

Joined:

Oct 1, 2014

So I am trying to build clusters in JMP with the cluster analysis but since my data is all categorical,I was wondering if some of the other options would work better. There is blessfully little in the JMP manual on which drop down to use and the internet was not particularly helpful other than suggesting that for all categorical data different algorithms should be used. Any suggestions would be greatly appreciated. Thank you.

3 REPLIES
ian_jmp

Staff

Joined:

Jun 23, 2011

From the thread I couldn't tell if your data are categorical or ordinal (or perhaps a mixture of both). Nor if you are using 'cluster' in a colloquial or technical sense.

If the data are categorical, I would certainly take a look at: http://www.jmp.com/support/help/Multiple_Correspondence_Analysis.shtml. If you have a mix of categorical and continuous and/or ordinal variables, some would advocate making the latter variables categorical so that they too can be included in the MCA. Although this process loses information, it can sometimes be better than not including the variables at all. And, as you may have found through your own searches, some advocate using the output from MCA as the input to clustering, and you could certainly try this too.

Ultimately, though, any 'best', or even workable method depends on the specific objectives you have set (how would the results actually be used?), and how well the data you have to hand does (or does not . . . ) support meeting these objectives. It goes without saying that simple graphical and descriptive summaries will help point you on the right direction.

sibylle_herzer

Community Trekker

Joined:

Oct 1, 2014

Hi Ian,

Thank you for the quick response. I guess I should have not just tagged with ordinal but spelled it out. So yes, my data is all categorical, no continuous data & no nominal data. I will likely have as many as a couple of hundred of columns with data and I am just trying to use the cluster function to determine what natural order these will fall in. So this is a complex situation where I have a product which is sold globally but the way it was filed globally is slightly different for each country/market. The product has to be sold in the the way it was filed. Each filing may have subtle differences in how the product can be manufactured and analyzed before being ready for sale. I am trying to find what natural groupings markets/countries fall in in terms of supply to then check back to see what strategy to best use to align product status/filing in each country to ensure no supply shortages and getting all products to become as similar as possible as quickly as possible. Hope that makes sense?

So what algorithm is best to use for something like that? I don’t think I should use the default “Ward” correct?

Thank you for your help!

ian_jmp

Staff

Joined:

Jun 23, 2011

Many thanks for the background, and apologies for the broken link to Multiple Correspondence Analysis (MCA). Given what you say, I would definitely take a look and see what emerges: http://www.jmp.com/support/help/Multiple_Correspondence_Analysis.shtml