Dear all,
I am new to using this software and am struggling to figure out how to use it to calculate the TF-IDF values for documents in my data set.
I am sure it is included in the functionality of JMP but it must be under a different name. Would anyone be able to point me in the right direction?
Thanks
Mark
you may have a look at text explorer (main menu analyse),
please see the following example as script, or in scripting index, or in statistics index (main menu help).
Names Default To Here( 1 );
dt = Open( "$SAMPLE_DATA/Consumer Preferences.jmp" );
obj = dt << Text Explorer( TextColumns( :Reasons Not to Floss ) );
My way would be to get the text first imported into a table (JMP can Import even full Folders via multiple file Import …),
and then using text Explorer on that data table.
So this depends on type and structure of your files.
May be it is worth to check out the following AddIn as well Text Importer - Text, PDF, Word Documents, and Powerpoint
The TD-IDF is one of the built-in transforms that may be applied to the document-term matrix when you save it back to the data table from the Text Explorer platform.
Have you looked at the JMP documentation, Text Explorer section? Here's a link: Text Explorer
Much will depend on how your curate your documents...so pay attention to that first before any analytics.
Hi Mark,
I've got the same issue. Need to solve this as part as an Assignment for my Degree. Did you find a solution?
Best,
Martin
Thanks everyone for your help, with your collective help I have managed to find the solution:
Under text explorer, go on save document term matrix and the option for TF-IDF is found under weighting.