Can I remove terms from the Document Term Matrix after it is run during text analysis?
Feb 22, 2018 7:31 AM
| Last Modified: Aug 18, 2020 10:53 AM
Some background: Text Explorer analysis options are based on the document term matrix (DTM). A term or token is the smallest piece of text, similar to a word in a sentence. A document is the collection of words in a cell. Each row in the DTM corresponds to a document (a cell in a text column of a JMP data table). Each column in the DTM corresponds to a term from the curated term list. Analysis ignores word ordering. In its simplest form, each cell of the DTM contains the frequency (number of occurrences) of the column’s term in the row’s document.
To remove terms after analysis: After running Text Explorer (Analyze>Text Explorer>then select Text Column(s) to analyze), you can remove terms from the Terms and Phrase List . To remove, select the term(s) from the Terms List, R-click, then Add Stop Word.
You can save the new Document Term Matrix to the data table. That will add a column for each term to the data table, useful for subsequent analysis.