Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- JMP User Community
- :
- Discussions
- :
- Cosine Similarity Measure in Text Explorer

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Jan 31, 2017 11:44 AM
(719 views)

Is it possible to do Cosine Similarity Measure in Text Explorer to identify documents that are "close" to each other. I see that we can cluster documents and do Latent Semantic analysis but I don't see any way to compute Cosine Similarity Measure. Any help will be appreciated.

Rahul

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Feb 2, 2017 1:25 AM
(1335 views)

Solution

Because JMP allows you to save the document term matrix (DTM), you can always calculate this directly. The table attached was made from the 'Aircraft Incidents' sample data. The code below (which took a few seconds to run on my laptop) produced the similarity matrix in the second attached table.

// https://en.wikipedia.org/wiki/Cosine_similarity NamesDefaultToHere(1); dt = CurrentDataTable(); m = dt << getAsMatrix; n = NRow(dt); // Make some column headings for the final table cols = {}; for(i=1, i<=n, i++, InsertInto(cols, "Document in Row "||Char(i)); ); // Get the modulus of each feature vector modulus = J(n, 1, .); for(i=1, i<=n, i++, modulus[i] = sqrt(ssq(m[i,0])); ); // Get the cosine of the angle between each pair of feature vectors cosTheta = J(n, n, .); for(i=1, i<=n, i++, for(j=1, j<=i, j++, cosTheta[i,j] = Sum(m[i, 0] :* m[j, 0])/(modulus[i] * modulus[j]); ); ); dt2 = AsTable(cosTheta, << ColumnNames(cols)); dt2 << setName("Cosine between feature vectors in "||(dt << getName));

2 REPLIES

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Feb 2, 2017 1:25 AM
(1336 views)

Because JMP allows you to save the document term matrix (DTM), you can always calculate this directly. The table attached was made from the 'Aircraft Incidents' sample data. The code below (which took a few seconds to run on my laptop) produced the similarity matrix in the second attached table.

// https://en.wikipedia.org/wiki/Cosine_similarity NamesDefaultToHere(1); dt = CurrentDataTable(); m = dt << getAsMatrix; n = NRow(dt); // Make some column headings for the final table cols = {}; for(i=1, i<=n, i++, InsertInto(cols, "Document in Row "||Char(i)); ); // Get the modulus of each feature vector modulus = J(n, 1, .); for(i=1, i<=n, i++, modulus[i] = sqrt(ssq(m[i,0])); ); // Get the cosine of the angle between each pair of feature vectors cosTheta = J(n, n, .); for(i=1, i<=n, i++, for(j=1, j<=i, j++, cosTheta[i,j] = Sum(m[i, 0] :* m[j, 0])/(modulus[i] * modulus[j]); ); ); dt2 = AsTable(cosTheta, << ColumnNames(cols)); dt2 << setName("Cosine between feature vectors in "||(dt << getName));

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Feb 2, 2017 8:41 AM
(657 views)

Thanks for help. That is what I was thinking of doing. Get the matrix and do it myslef. I was wondering if it is built in?

Rahul