cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
ar2
ar2
Level III

Text Explorer - Need help with Topic Analysis

Dear all - Am using Text explorer to analyse some interesting "incident" data in transport environment. I am Using Topic Analysis and have identified about 15 sensible "topics". Is it possible to find out how many documents in my sample set "include" each topic - haven't found a way to do that.

Any guidance welcome

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
ih
Super User (Alumni) ih
Super User (Alumni)

Re: Text Explorer - Need help with Topic Analysis

You should be able to use the document topic vectors.  Maybe someone knows of a quantifyiable way to choose the decision points for each vector, I have done that visually and by checking documents:

 

Names default to here( 1 );

dt = Open( "$Sample_data/Aircraft Incidents.jmp" );

te = dt << Text Explorer(
	Text Columns( :Final Narrative ),
	Latent Semantic Analysis(
		1,
		Maximum Number of Terms( 2128 ),
		Minimum Term Frequency( 10 ),
		Weighting( "TF IDF" ),
		Number of Singular Vectors( 100 ),
		Centering and Scaling( "Centered and Scaled" )
	),
	Topic Analysis( 1, Number of Topics( 10 ) ),
	Tokenizing( "Basic Words" ),
	Language( "English" ),
	SendToReport(
		Dispatch( {}, "Term and Phrase Lists", OutlineBox, {Close( 1 )} ),
		Dispatch( {}, "SVD Plots", OutlineBox, {Close( 1 )} ),
		Dispatch( {}, "Topic Terms", OutlineBox, {Close( 1 )} ),
		Dispatch( {}, "Topic Scores Plots", OutlineBox, {Close( 0 )} )
	)
);

//Save the topic vectors
te << Save Document Topic Vectors;

//Decide what values relate to documents that contain the topic:
dt << Distribution(
	Continuous Distribution( Column( :Topic 1 ) ),
	Continuous Distribution( Column( :Topic 2 ) ),
	Continuous Distribution( Column( :Topic 3 ) ),
	Continuous Distribution( Column( :Topic 4 ) ),
	Continuous Distribution( Column( :Topic 5 ) ),
	Continuous Distribution( Column( :Topic 6 ) ),
	Continuous Distribution( Column( :Topic 7 ) ),
	Continuous Distribution( Column( :Topic 8 ) ),
	Continuous Distribution( Column( :Topic 9 ) ),
	Continuous Distribution( Column( :Topic 10 ) )
);

//Select rows with topic 1
dt << Select where( :Topic 1 > 5 );

//Or, count rows with topic 1:
Sum( (Column( dt, "Topic 1" ) << Get values) > 5 );
//returns 169

View solution in original post

2 REPLIES 2
ih
Super User (Alumni) ih
Super User (Alumni)

Re: Text Explorer - Need help with Topic Analysis

You should be able to use the document topic vectors.  Maybe someone knows of a quantifyiable way to choose the decision points for each vector, I have done that visually and by checking documents:

 

Names default to here( 1 );

dt = Open( "$Sample_data/Aircraft Incidents.jmp" );

te = dt << Text Explorer(
	Text Columns( :Final Narrative ),
	Latent Semantic Analysis(
		1,
		Maximum Number of Terms( 2128 ),
		Minimum Term Frequency( 10 ),
		Weighting( "TF IDF" ),
		Number of Singular Vectors( 100 ),
		Centering and Scaling( "Centered and Scaled" )
	),
	Topic Analysis( 1, Number of Topics( 10 ) ),
	Tokenizing( "Basic Words" ),
	Language( "English" ),
	SendToReport(
		Dispatch( {}, "Term and Phrase Lists", OutlineBox, {Close( 1 )} ),
		Dispatch( {}, "SVD Plots", OutlineBox, {Close( 1 )} ),
		Dispatch( {}, "Topic Terms", OutlineBox, {Close( 1 )} ),
		Dispatch( {}, "Topic Scores Plots", OutlineBox, {Close( 0 )} )
	)
);

//Save the topic vectors
te << Save Document Topic Vectors;

//Decide what values relate to documents that contain the topic:
dt << Distribution(
	Continuous Distribution( Column( :Topic 1 ) ),
	Continuous Distribution( Column( :Topic 2 ) ),
	Continuous Distribution( Column( :Topic 3 ) ),
	Continuous Distribution( Column( :Topic 4 ) ),
	Continuous Distribution( Column( :Topic 5 ) ),
	Continuous Distribution( Column( :Topic 6 ) ),
	Continuous Distribution( Column( :Topic 7 ) ),
	Continuous Distribution( Column( :Topic 8 ) ),
	Continuous Distribution( Column( :Topic 9 ) ),
	Continuous Distribution( Column( :Topic 10 ) )
);

//Select rows with topic 1
dt << Select where( :Topic 1 > 5 );

//Or, count rows with topic 1:
Sum( (Column( dt, "Topic 1" ) << Get values) > 5 );
//returns 169
ar2
ar2
Level III

Re: Text Explorer - Need help with Topic Analysis

Looks like a good approach - if anyone out there knows a quantifiable way to choose "cut -off" points for relevance of each topic vecto that would be great