Subscribe Bookmark RSS Feed

Cosine Similarity Measure in Text Explorer

Rahul

New Contributor

Joined:

Jan 17, 2017

Is it possible to do Cosine Similarity Measure in Text Explorer to identify documents that are "close" to each other.  I see that we can cluster documents and do Latent Semantic analysis but I don't see any way to compute Cosine Similarity Measure.  Any help will be appreciated.


Rahul

1 ACCEPTED SOLUTION

Accepted Solutions
ian_jmp

Staff

Joined:

Jun 23, 2011

Solution

Because JMP allows you to save the document term matrix (DTM), you can always calculate this directly. The table attached was made from the 'Aircraft Incidents' sample data. The code below (which took a few seconds to run on my laptop) produced the similarity matrix in the second attached table.

 

 

// https://en.wikipedia.org/wiki/Cosine_similarity
NamesDefaultToHere(1);

dt = CurrentDataTable();
m = dt << getAsMatrix;
n = NRow(dt);

// Make some column headings for the final table
cols = {};
for(i=1, i<=n, i++,
	InsertInto(cols, "Document in Row "||Char(i));
);

// Get the modulus of each feature vector
modulus = J(n, 1, .);
for(i=1, i<=n, i++,
	modulus[i] = sqrt(ssq(m[i,0]));
);

// Get the cosine of the angle between each pair of feature vectors
cosTheta = J(n, n, .);
for(i=1, i<=n, i++,
	for(j=1, j<=i, j++,
		cosTheta[i,j] = Sum(m[i, 0] :* m[j, 0])/(modulus[i] * modulus[j]);
	);
);
dt2 = AsTable(cosTheta, << ColumnNames(cols));
dt2 << setName("Cosine between feature vectors in "||(dt << getName));

 

2 REPLIES
ian_jmp

Staff

Joined:

Jun 23, 2011

Solution

Because JMP allows you to save the document term matrix (DTM), you can always calculate this directly. The table attached was made from the 'Aircraft Incidents' sample data. The code below (which took a few seconds to run on my laptop) produced the similarity matrix in the second attached table.

 

 

// https://en.wikipedia.org/wiki/Cosine_similarity
NamesDefaultToHere(1);

dt = CurrentDataTable();
m = dt << getAsMatrix;
n = NRow(dt);

// Make some column headings for the final table
cols = {};
for(i=1, i<=n, i++,
	InsertInto(cols, "Document in Row "||Char(i));
);

// Get the modulus of each feature vector
modulus = J(n, 1, .);
for(i=1, i<=n, i++,
	modulus[i] = sqrt(ssq(m[i,0]));
);

// Get the cosine of the angle between each pair of feature vectors
cosTheta = J(n, n, .);
for(i=1, i<=n, i++,
	for(j=1, j<=i, j++,
		cosTheta[i,j] = Sum(m[i, 0] :* m[j, 0])/(modulus[i] * modulus[j]);
	);
);
dt2 = AsTable(cosTheta, << ColumnNames(cols));
dt2 << setName("Cosine between feature vectors in "||(dt << getName));

 

Rahul

New Contributor

Joined:

Jan 17, 2017

Thanks for help.  That is what I was thinking of doing.  Get the matrix and do it myslef.  I was wondering if it is built in? 


Rahul