BookmarkSubscribeSubscribe to RSS Feed

Need some quick help with Text Explorer

Highlighted
miguello

Community Trekker

Joined:

Jan 27, 2016

Folks, 

 

I need some qiuck help with the Text Explorer platform. 

I have a huge table with, let's say, parts replacement history. I need to analyze it. The field that has Parts Description in it is entered manually and is vaguely structured. I can have different entries for the same part. Let's say: "ABC DE", "ABC-DE", "New DE", "ABC of type (DE)", "Automatic Bull Catcher DE"  and so on. Very hard to to make it into algorytm. I can start going through the table and do a lots of "Select Similar" and "Name Selected in Column", but as I said the table is huge and technicians' imagination when typing info in is enormous. 

I know I should be able to do that with Text Explorer, but I never worked with it. I'm looking through the Books, but either looking at the wrong place or can't just recognize what I need. Meanwhile whoever gave me this task also gave me very limited deadline. Can somebody nudge me in the right direction? Name of the method, option, chapter in the book, anything?

 

Thanks,

M.

1 ACCEPTED SOLUTION

Accepted Solutions
markbailey

Staff

Joined:

Jun 23, 2011

Solution

I am afraid that there is usually no such thing as "quick help" when  it comes to using Text Explorer. I am currently writing a two-day course as an introduction to this methodology with the Text Explorer platform. (It will premier at the JMP Discovery Summit in October at SAS Headquarters in Cary, NC.) There simply is so many aspects to

Because the data is somewhat structured, even if it is messy, you might first try using the Cols > Recode command. This facility is very powerful. You can learn about it and examine many examples in Help > Books > Using JMP. This way might still take a lot of work due to the size of the data set and the creativity of the engineers but it is likely to be the most straight-forward approach.

If Recode is not satisfactory or successful, then Text Explorer might be useful for this problem. Please read the chapter about Text Explorer first in Help > Books > Basic Analysis. I expect that the default method of tokenizing the unstructured (regular expressions) will be the most useful in this case. I would try using 1 to 5 characters as the minimum number for a word because the initial processing is fast and then you should be able to quickly compare the term list. This parameter can save you a lot of work later but it can eliminate terms that might have meaning.

Once you have selected the best value for this parameter, examine the term list. (First click the red triangle at the top and select Display Options > Show Term and Phrase Options to reveal buttons for common commands.) The most frequent and the least frequent terms are usually uninformative because they are either common to most texts or so rarely found. Now carefully examine the list and select all of the forms for the same term. (For example, "r", "rt", and "right".) Click Recode and then enter the designation that you want to use (e.g., "right").If you find anything in the term list that is simply uninformative junk, select it and click Add Stop Word. All this work is just cleaning up the term list. Now example the phrase list. Some combinations mean more than the individual words, like "monkey bars." Select these phrases and click Add Phrase to use it as a term.

I suggest that you save your work as a table script. Also, see the platform menu commands for the Show Stop Words and Show Recodes. These commands provide options about the 'scope' of your customizations. This way all your effort will be saved!

I am honestly not trying to make a big deal out of this problem. I wish that I could give you quick help. But this job is a lot of work and it involves many careful, thoughtful decisions that cannot be automated.

Learn it once, use it forever!
4 REPLIES
miguello

Community Trekker

Joined:

Jan 27, 2016

Oh, totally forgot to say what I need - just to clean up this mess. Run some phrase\words analysis, click all the apply, rename it to one thing.
Don't see a straight path now...
miguello

Community Trekker

Joined:

Jan 27, 2016

Ok I think I found it - right click on column, choose Recode... Ther eyou can group and recode terms. No need to run Text Explorer platform.
markbailey

Staff

Joined:

Jun 23, 2011

Solution

I am afraid that there is usually no such thing as "quick help" when  it comes to using Text Explorer. I am currently writing a two-day course as an introduction to this methodology with the Text Explorer platform. (It will premier at the JMP Discovery Summit in October at SAS Headquarters in Cary, NC.) There simply is so many aspects to

Because the data is somewhat structured, even if it is messy, you might first try using the Cols > Recode command. This facility is very powerful. You can learn about it and examine many examples in Help > Books > Using JMP. This way might still take a lot of work due to the size of the data set and the creativity of the engineers but it is likely to be the most straight-forward approach.

If Recode is not satisfactory or successful, then Text Explorer might be useful for this problem. Please read the chapter about Text Explorer first in Help > Books > Basic Analysis. I expect that the default method of tokenizing the unstructured (regular expressions) will be the most useful in this case. I would try using 1 to 5 characters as the minimum number for a word because the initial processing is fast and then you should be able to quickly compare the term list. This parameter can save you a lot of work later but it can eliminate terms that might have meaning.

Once you have selected the best value for this parameter, examine the term list. (First click the red triangle at the top and select Display Options > Show Term and Phrase Options to reveal buttons for common commands.) The most frequent and the least frequent terms are usually uninformative because they are either common to most texts or so rarely found. Now carefully examine the list and select all of the forms for the same term. (For example, "r", "rt", and "right".) Click Recode and then enter the designation that you want to use (e.g., "right").If you find anything in the term list that is simply uninformative junk, select it and click Add Stop Word. All this work is just cleaning up the term list. Now example the phrase list. Some combinations mean more than the individual words, like "monkey bars." Select these phrases and click Add Phrase to use it as a term.

I suggest that you save your work as a table script. Also, see the platform menu commands for the Show Stop Words and Show Recodes. These commands provide options about the 'scope' of your customizations. This way all your effort will be saved!

I am honestly not trying to make a big deal out of this problem. I wish that I could give you quick help. But this job is a lot of work and it involves many careful, thoughtful decisions that cannot be automated.

Learn it once, use it forever!
miguello

Community Trekker

Joined:

Jan 27, 2016

Thanks a lot for quick reply! This is a lot of help, actually. I am currently trying to use Recode... platform without text explorer. Will see if I even need to use Text Explorer at all at this stage. If I start bumping in too many exceptions, will do tokenizing.