Typical dataset: 120-200 rows, 15 classes.
Rows are known to be ordered by class. What is unknown is where the 'fences' between classes are.
As-is -- i.e., without taking row order into account -- LCA correctly classifies about 94% of rows. My intuition is that if I knew how to tell the algorithm that the rows are grouped by class on input, we'd be at 100%.
Visual, in case my use of 'grouped by' is unclear:
Correct would look like this:
1a
1a
1a
1a
1a
1b
1b
1b
1b
1b
1b
1b
1b
1c
1c
1c
1c
1c
1c
1c
All 1a rows will always be neighbors in terms of row order, and all 1b, etc.
Currently, I sometimes get:
1a
1a
1a
1c
1c
1b
1b
1b
1b
1b
1b
1b
1b
1c
1c
1c
1c
1c
1c
1c
Ideas?
Potentially, I could make a column with Row() as the value, but my worry is that most of the clustering cols have 'class'-type values. For 15 classes, each col in use might have between 2-8 unique values. Row() will have 120 unique values, which feels like throwing a wildcard into the mix.