Subscribe Bookmark RSS Feed

How to classify groups with categorical data?

ondatra_berkele

Community Trekker

Joined:

Dec 14, 2015

Hi,

   Here is an easy one. I have a group of rodent skulls. They are from several named groups which are also geographic regions. One group is classified as "unknown". I have metric data and used Discriminant Analysis and a plot of the Canonical 1 and 2 scores. And, using the values in the Discriminant Scores table to show predictions.

   I also have categorical data. Briefly, things like the number of holes in the skull for nerves and veins. They vary geographically from 2 to 4 (e.g. 1 on one side, a double hole on the other side). I am looking at the data in the Fit Y by X analysis. This is showing me Mosaic Plots and chi square tests for each variable. I need to retrace my steps, but some command also compares means with a student t test such that groups that are statistically different are labeled A, b, C, etc. Any advice there would be appreciated.

    However, I would like to find a statistically valid test for classifying the unknown group on these categorical variables. Or, at least create a table suggesting greatest similarity as a group, not as individuals. I know there is a clustering method, but not clear to me how to use this for groups rather than individuals.

Thanks,

Chris

1 REPLY
stephen_pearson

Community Trekker

Joined:

Oct 6, 2014

It depends on the version of JMP you are using as Multiple Correspondence Analysis might be the technique you wish to use.


A more manual approach might be to:

  1. Label the points based on the group column.
  2. Carry out cluster analysis.
  3. Turn on colour clusters from the hotspot.
  4. Turn on constellation plot (optional) from hotspot.
  5. Vary the number of clusters until your known groups are roughly all coloured the same. Hopefully your unknowns will lie within a group or on the same branch as a known group.
  6. The save options on the hotspot include constellation coordinates, distance matrix and formula for cluster.


Perhaps one of these scores be a suitable basis for a test?