cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Check out the JMP® Marketplace featured Capability Explorer add-in
Choose Language Hide Translation Bar
sigma_field
Level I

Clustering result change everytime I contuct latent class analysis(LCA) on my text data

Hi there,

 

I used LCA of text explorer form to cluster my text data, but everytime I run it. it gives me different results. Anyone know why it happens?

 

Also, what is the difference between these two clustering methods on text explorer platform, latent class analysis(LCA) and clustering documents in Latent semantic analysis?

8 REPLIES 8

Re: Clustering result change everytime I contuct latent class analysis(LCA) on my text data

LCA uses random seeds to begin the clustering process. I think you can set t]he random seed before each LCA run and reproduce previous fit.

sigma_field
Level I

Re: Clustering result change everytime I contuct latent class analysis(LCA) on my text data

Thank you! It works!

 

Also, I'm curious about the difference between LCA in text explore platform and cluster documents in Latent Semantic Analysis, any idea?

Re: Clustering result change everytime I contuct latent class analysis(LCA) on my text data

The LCA platform and the LCA available within the Text Explorer platform accomplish the same task. The LCA platform is a general tool for any multivariate data set. The LCA embedded within TE, however, has been customized for text analysis. First, the clustering results are presented in the context of finding similar documents in the corpus. Second, the sparse document-term matrix requires a new solution to the singular value decomposition.

Re: Clustering result change everytime I contuct latent class analysis(LCA) on my text data

I did not answer one of your original questions about the difference between latent class analysis and latent semantic analysis. Both of these methods produce clusters. Both methods are based on the expression of latent variables. LCA clusters documents based on the weighted document-term matrix, so the question is about similar documents. LSA clusters terms, also based on the weighted DTM, so the question is about terms. The clusters from LSA can identify latent topics.

sigma_field
Level I

Re: Clustering result change everytime I contuct latent class analysis(LCA) on my text data

I also notice LSA can cluster documents, does it have different results than clusters in LCA?

Re: Clustering result change everytime I contuct latent class analysis(LCA) on my text data

Well, both methods use random seeds for the initial clusters so there is the run-to-run difference that you observed.

 

The dedicated LCA method in the TE can handle much bigger matrices. The numerics might result in a difference, aside from the random seed aspect.

 

Have you tried it? You can save the DTM with weighting from TE and then analyze it with the LCA platform separate from TE.

 

Please not that the identity of the clusters is random but the composition of each cluster should be stable, though not necessarily identical. That is, cluster 1 in one run might become cluster 10 in another run or another platform but the constituents should be essentially the same. If there is not much similarity among documents, then there might be large changes in the clusters from run-to-run or platform-to-platform. The choice for the number of clusters can also affect the stability of the cluster composition.

Re: Clustering result change everytime I contuct latent class analysis(LCA) on my text data

You can also find a lot of answers in the JMP documentation.

 

See Help > Books > Basic Analysis > Text Explorer.

 

See Help > Books > Multivariate Methods > Latent Class Analysis.

Re: Clustering result change everytime I contuct latent class analysis(LCA) on my text data

I thought I would post a response I got from JMP Technical Support on setting the random seed.

 

"To generate reproducible results from Latent Class Analysis in Text Explorer, you must set the random seed before each using the Random Reset() JSL function.

 

Here is an example using the Pet Survey sample data that fits the LCA five times, with reseting the random seed before each.  All 5 LCA results should be identical."

 

 

dt = open("$SAMPLE_DATA/Pet Survey.jmp");

te= Text Explorer(

   Text Columns( :Survey Response ),

   Set Regex( Library( "Words" ) ),

   Language( "English" ),

   );

 

for(i=1, i<=5, i++, //run LCA five times

RandomReset(123); //set the random seed before each

lca=te<< Latent Class Analysis(

       Number of Clusters( 5 ),

       Maximum Number of Terms( 143 ),

       Minimum Term Frequency( 2 )

   ));