BookmarkSubscribe
Choose Language Hide Translation Bar
4MUL8R
Community Trekker

Text Analysis -- Manually Grouping Similar Phrases

I have spent a bit of time, probably two hours, trying to use the JMP training info, but I must be missing something.

 

I have unstructured survey questions and performed the basic text analysis.  From the phrases, I can humanly see about five phrases that are the same meaning.  I would like to group these phrases into one and only one phrase for analysis.  In JMP can I select phrases and create a "super phrase" with ease?  I have tried several times and can't figure out what to do.

 

"request tests"

"request a test"

"request testing"

"test requests"

"test request"

"testing requests"

all would be grouped as "request tests" since they all have that meaning.

0 Kudos
1 ACCEPTED SOLUTION

Accepted Solutions
gzmorgan0
Super User

Re: Text Analysis -- Manually Grouping Similar Phrases

You did not mention what version of JMP you are working on. If JMP12 or higher there is a a column utility called Recode.

The script below creates a dummy table and calls Recode for coulmn User1.

 

ex= {"request tests", "request a test", "request testing", "test requests", "test request",
"testing requests"};

dt= New Table("Survey", add rows(50), NewColumn("User1", character, 
   << set each value(ex[Random integer(1,6)]) ) );
   
dt << Go To (:User1);
dt << Recode;

The attached screeshot shows the table and the interactive interface. The red box around Formula is a menu option to create a New Column (of values), a Formula (column) or In Place (replace values).  I chose Formula and named the column Recode User1. Next, select all options and right click, then a pop-up menu of options appears. The options are to use one of the numerous phrases, or use a new value. Make your selection and press the Recode button.

 

Note: If there are multiple goups of answers, you can select the groups by different options, specifying the common term. And when done press recode.

 

 

image.png

The formula created by this action is

Match( :User1,
	"request a test", "request tests",
	"request testing", "request tests",
	"test request", "request tests",
	"test requests", "request tests",
	"testing requests", "request tests",
	:User1
)

I would have used a the following formula

t0 = Trim( Lowercase( :User1 ) );
If( Contains( t0, "request" ) & Contains( t0, "test" ),
	"request test",
	:User1
);

 

Neither of these functions handle typos and misspellings.  JMP has a function called shortest edit distance and I have a script for computing the Levenshtein Distance and there are other algorithms to "score" the level matching (or non-matching) or words and phrases.

 

However, of you are working with your data interactively, Recode is very nice to use. 

 

Look up Recode Data in the online book Using JMP.  (Main Menu > Help > Books > Using JMP).

6 REPLIES 6
dale_lehman
Community Trekker

Re: Text Analysis -- Manually Grouping Similar Phrases

There may be a more elegant way to do this (and I'd be interested if someone knows of it), but you can accomplish this by creating a new column using a formula IF with several OR clauses that say if that text field CONTAINS "each of the phrases you listed" then 1, otherwise 0.  This is even easier if you use the Row, Select Where option, and add multiple conditions, each of which is the Text Field "contains" and list the phrases you have on your list (make sure the check "if any condition is met").  Once those rows are slected, under Rows, Name Selection in Column will create the same column the formula would give you.

Highlighted

Re: Text Analysis -- Manually Grouping Similar Phrases

]I do not understand why "a" is in your term list and therefore in your phrase list. Are one-character tokens really informative? Also, the stopping words include "a" and it should have been removed automatically.

You could first create stems for request/requests and for test/tests so you are down to just two phrases, request test and test request. You could add them to the term list and then recode them to the one desired level.

Learn it once, use it forever!

Re: Text Analysis -- Manually Grouping Similar Phrases

Oh, we don't have training for Text Explorer yet. We will premier a new course at the JMP Discovery Summit in October!

Learn it once, use it forever!
0 Kudos
gzmorgan0
Super User

Re: Text Analysis -- Manually Grouping Similar Phrases

You did not mention what version of JMP you are working on. If JMP12 or higher there is a a column utility called Recode.

The script below creates a dummy table and calls Recode for coulmn User1.

 

ex= {"request tests", "request a test", "request testing", "test requests", "test request",
"testing requests"};

dt= New Table("Survey", add rows(50), NewColumn("User1", character, 
   << set each value(ex[Random integer(1,6)]) ) );
   
dt << Go To (:User1);
dt << Recode;

The attached screeshot shows the table and the interactive interface. The red box around Formula is a menu option to create a New Column (of values), a Formula (column) or In Place (replace values).  I chose Formula and named the column Recode User1. Next, select all options and right click, then a pop-up menu of options appears. The options are to use one of the numerous phrases, or use a new value. Make your selection and press the Recode button.

 

Note: If there are multiple goups of answers, you can select the groups by different options, specifying the common term. And when done press recode.

 

 

image.png

The formula created by this action is

Match( :User1,
	"request a test", "request tests",
	"request testing", "request tests",
	"test request", "request tests",
	"test requests", "request tests",
	"testing requests", "request tests",
	:User1
)

I would have used a the following formula

t0 = Trim( Lowercase( :User1 ) );
If( Contains( t0, "request" ) & Contains( t0, "test" ),
	"request test",
	:User1
);

 

Neither of these functions handle typos and misspellings.  JMP has a function called shortest edit distance and I have a script for computing the Levenshtein Distance and there are other algorithms to "score" the level matching (or non-matching) or words and phrases.

 

However, of you are working with your data interactively, Recode is very nice to use. 

 

Look up Recode Data in the online book Using JMP.  (Main Menu > Help > Books > Using JMP).

Re: Text Analysis -- Manually Grouping Similar Phrases

I could be mistaken but it appears to me that the various forms of the phrase to be dealt with are found in the phrase list of Text Explorer. These phrases are not the original character string values in the text data column. I think that this case is unstructured text, not structured character values. So the recode must be done within Text Explorer after parsing and terming.

Learn it once, use it forever!
4MUL8R
Community Trekker

Re: Text Analysis -- Manually Grouping Similar Phrases

I have learned that this is a two step process in JMP 14 Pro.  Text Explorer... First, create a new phrase by selecting multiple stemmed phrases in the right box entitled Phrase.  Then, once that has been done, go to the left box entitled Term and Phrase Lists.  Find and right click on the new phrase.  Select "Recode" and for each of the phrases you grouped, give a single descriptive name.  These disappear as the new name is given to each.  Then, in that left box, you will see the new "superphrase" and you can view it in the pareto.

0 Kudos