Discussions

sigma_field · Feb 14, 2019 10:32 AM

Hi there,

I used LCA of text explorer form to cluster my text data, but everytime I run it. it gives me different results. Anyone know why it happens?

Also, what is the difference between these two clustering methods on text explorer platform, latent class analysis(LCA) and clustering documents in Latent semantic analysis?

Mark_Bailey · Feb 14, 2019 10:49 AM

LCA uses random seeds to begin the clustering process. I think you can set t]he random seed before each LCA run and reproduce previous fit.

sigma_field · Feb 18, 2019 11:42 AM

Thank you! It works!

Also, I'm curious about the difference between LCA in text explore platform and cluster documents in Latent Semantic Analysis, any idea?

Mark_Bailey · Feb 19, 2019 05:45 AM

The LCA platform and the LCA available within the Text Explorer platform accomplish the same task. The LCA platform is a general tool for any multivariate data set. The LCA embedded within TE, however, has been customized for text analysis. First, the clustering results are presented in the context of finding similar documents in the corpus. Second, the sparse document-term matrix requires a new solution to the singular value decomposition.

Mark_Bailey · Feb 19, 2019 05:50 AM

I did not answer one of your original questions about the difference between latent class analysis and latent semantic analysis. Both of these methods produce clusters. Both methods are based on the expression of latent variables. LCA clusters documents based on the weighted document-term matrix, so the question is about similar documents. LSA clusters terms, also based on the weighted DTM, so the question is about terms. The clusters from LSA can identify latent topics.

sigma_field · Feb 19, 2019 09:27 AM

I also notice LSA can cluster documents, does it have different results than clusters in LCA?

Mark_Bailey · Feb 19, 2019 10:35 AM

Well, both methods use random seeds for the initial clusters so there is the run-to-run difference that you observed.

The dedicated LCA method in the TE can handle much bigger matrices. The numerics might result in a difference, aside from the random seed aspect.

Have you tried it? You can save the DTM with weighting from TE and then analyze it with the LCA platform separate from TE.

Please not that the identity of the clusters is random but the composition of each cluster should be stable, though not necessarily identical. That is, cluster 1 in one run might become cluster 10 in another run or another platform but the constituents should be essentially the same. If there is not much similarity among documents, then there might be large changes in the clusters from run-to-run or platform-to-platform. The choice for the number of clusters can also affect the stability of the cluster composition.

Mark_Bailey · Feb 19, 2019 10:37 AM

You can also find a lot of answers in the JMP documentation.

See Help > Books > Basic Analysis > Text Explorer.

See Help > Books > Multivariate Methods > Latent Class Analysis.

wendytseng · May 31, 2019 10:40 AM

I thought I would post a response I got from JMP Technical Support on setting the random seed.

"To generate reproducible results from Latent Class Analysis in Text Explorer, you must set the random seed before each using the Random Reset() JSL function.

Here is an example using the Pet Survey sample data that fits the LCA five times, with reseting the random seed before each. All 5 LCA results should be identical."

dt = open("$SAMPLE_DATA/Pet Survey.jmp");

te= Text Explorer(

Text Columns( :Survey Response ),

Set Regex( Library( "Words" ) ),

Language( "English" ),

);

for(i=1, i<=5, i++, //run LCA five times

RandomReset(123); //set the random seed before each

lca=te<< Latent Class Analysis(

Number of Clusters( 5 ),

Maximum Number of Terms( 143 ),

Minimum Term Frequency( 2 )

));

Discussions

Clustering result change everytime I contuct latent class analysis(LCA) on my text data

Re: Clustering result change everytime I contuct latent class analysis(LCA) on my text data

Re: Clustering result change everytime I contuct latent class analysis(LCA) on my text data

Re: Clustering result change everytime I contuct latent class analysis(LCA) on my text data

Re: Clustering result change everytime I contuct latent class analysis(LCA) on my text data

Re: Clustering result change everytime I contuct latent class analysis(LCA) on my text data

Re: Clustering result change everytime I contuct latent class analysis(LCA) on my text data

Re: Clustering result change everytime I contuct latent class analysis(LCA) on my text data

Re: Clustering result change everytime I contuct latent class analysis(LCA) on my text data

Recommended Articles