Text Explorer - Need help with Topic Analysis

ar2 — Tue, 23 Jan 2018 08:52:24 GMT

Dear all - Am using Text explorer to analyse some interesting "incident" data in transport environment. I am Using Topic Analysis and have identified about 15 sensible "topics". Is it possible to find out how many documents in my sample set "include" each topic - haven't found a way to do that.

Any guidance welcome

Thanks

Re: Text Explorer - Need help with Topic Analysis

ih — Tue, 23 Jan 2018 17:15:50 GMT

You should be able to use the document topic vectors. Maybe someone knows of a quantifyiable way to choose the decision points for each vector, I have done that visually and by checking documents:

Names default to here( 1 );

dt = Open( "$Sample_data/Aircraft Incidents.jmp" );

te = dt << Text Explorer(
	Text Columns( :Final Narrative ),
	Latent Semantic Analysis(
		1,
		Maximum Number of Terms( 2128 ),
		Minimum Term Frequency( 10 ),
		Weighting( "TF IDF" ),
		Number of Singular Vectors( 100 ),
		Centering and Scaling( "Centered and Scaled" )
	),
	Topic Analysis( 1, Number of Topics( 10 ) ),
	Tokenizing( "Basic Words" ),
	Language( "English" ),
	SendToReport(
		Dispatch( {}, "Term and Phrase Lists", OutlineBox, {Close( 1 )} ),
		Dispatch( {}, "SVD Plots", OutlineBox, {Close( 1 )} ),
		Dispatch( {}, "Topic Terms", OutlineBox, {Close( 1 )} ),
		Dispatch( {}, "Topic Scores Plots", OutlineBox, {Close( 0 )} )
	)
);

//Save the topic vectors
te << Save Document Topic Vectors;

//Decide what values relate to documents that contain the topic:
dt << Distribution(
	Continuous Distribution( Column( :Topic 1 ) ),
	Continuous Distribution( Column( :Topic 2 ) ),
	Continuous Distribution( Column( :Topic 3 ) ),
	Continuous Distribution( Column( :Topic 4 ) ),
	Continuous Distribution( Column( :Topic 5 ) ),
	Continuous Distribution( Column( :Topic 6 ) ),
	Continuous Distribution( Column( :Topic 7 ) ),
	Continuous Distribution( Column( :Topic 8 ) ),
	Continuous Distribution( Column( :Topic 9 ) ),
	Continuous Distribution( Column( :Topic 10 ) )
);

//Select rows with topic 1
dt << Select where( :Topic 1 > 5 );

//Or, count rows with topic 1:
Sum( (Column( dt, "Topic 1" ) << Get values) > 5 );
//returns 169

Re: Text Explorer - Need help with Topic Analysis

ar2 — Tue, 23 Jan 2018 18:20:30 GMT

Looks like a good approach - if anyone out there knows a quantifiable way to choose "cut -off" points for relevance of each topic vecto that would be great

topic Re: Text Explorer - Need help with Topic Analysis in Discussions

Text Explorer - Need help with Topic Analysis

Re: Text Explorer - Need help with Topic Analysis

Re: Text Explorer - Need help with Topic Analysis