In examining my term clusters in JMP Pro, I discovered that some of my stems made zero sense beyond pattern recognition.
For example, ration is considered a stem for the following and more:
These words all contain "ration" in their spelling, but are otherwise completely different words that should not be lumped together.
I found my way to the Manage Stem Exceptions window, realized there was no clarification on how to enter terms/stems, looked up the documentation and found nothing specific, and am now scratching my head.
You see, I want ration to stay a stem for ration and rations, and I want to separately track the terms it's considered a stem for--possibly as stems themselves for their plural forms. The only thing I want to remove is the connection. Ideally, the stem exceptions would be some kind of stem-to-term format, but I see no examples on how to do this, if it's even possible in JMP Pro.
So does anyone know?
the stem word, oper, is the same for operation, operator, operating, operated:
The trailing middle-dot on the terms tells you a suffix was removed; administration did not have a suffix removed because I chose stem for combining, not stem all terms, and there were no other administr... words to combine with administration.
I'm looking at unstructured text, not a list of hand-selected terms. I also did not ask for an explanation of how stems work. I'm well-aware and have used them in other software packages, including the Text Miner within SAS Enterprise Miner, which does have options for "editing synonyms".
I'm asking for guidance on how to edit the connections in the automatically generated stem list within JMP Pro.
When I noticed the stem variations for ration, it was through the "Show Text" option on a term cluster.
I double-checked my Stemming list to see what came up under ration- and did not see the same words I found for ration in "Show Text".
So JMP appears to be inconsistent between stem definitions and identifying stems in the text.
I'm guessing, since the term cluster included docs with the unexpected variations of ration, that someone made a goof in programming the term search within the unstructured text based on a word-find with no consideration for new word spacing. This would explain why my stem list looks great but there are a ton of mix-ups in Show Text portion.
If I'm correct, please fix that. The clustering tool is not helpful if it's grouping completely unrelated terms. The ration example was just one of dozens I found.
I think the Show Text button in the SVD Plots is at the heart of the problem; it only applies to the left-hand documents graph, not the righ-hand terms graph. It looks like the right way to show documents for the selected terms is to go back to the term list, and without changing the selection, right-click a term in the term column and pick show text from there. I'm talking with another developer here about how to make the UI work better.