Is there a way to tell LCA that the rows are ordered by class?

mtowle419 · Aug 15, 2024 08:03 PM

Typical dataset: 120-200 rows, 15 classes.

Rows are known to be ordered by class. What is unknown is where the 'fences' between classes are.

As-is -- i.e., without taking row order into account -- LCA correctly classifies about 94% of rows. My intuition is that if I knew how to tell the algorithm that the rows are grouped by class on input, we'd be at 100%.

Visual, in case my use of 'grouped by' is unclear:

Correct would look like this:

1a
1a
1a
1a
1a
1b
1b
1b
1b
1b
1b
1b
1b
1c
1c
1c
1c
1c
1c
1c

All 1a rows will always be neighbors in terms of row order, and all 1b, etc.

Currently, I sometimes get:

1a
1a
1a
1c
1c
1b
1b
1b
1b
1b
1b
1b
1b
1c
1c
1c
1c
1c
1c
1c

Ideas?

Potentially, I could make a column with Row() as the value, but my worry is that most of the clustering cols have 'class'-type values. For 15 classes, each col in use might have between 2-8 unique values. Row() will have 120 unique values, which feels like throwing a wildcard into the mix.