Share your ideas for the JMP Scripting Unsession at Discovery Summit by September 17th. We hope to see you there!
Choose Language Hide Translation Bar
Highlighted
pmroz
Super User

Using Text Explorer for Custom Regex Searching

I'm having trouble figuring out how to make use of Text Explorer for the following situation:

  • Patient narratives in a dataset.  The narrative describes adverse events a patient experienced, along with medications, etc.  Each narrative could be up to 20,000 characters long, and there could be up to 1000 narratives.  
  • About 2400 regular expressions in another dataset.  The regular expressions are searching for a particular condition

I need to search all the narratives using all of the regular expressions.  We have a solution in Excel + VBA but I'd like to implement something in JMP, possibly using Text Explorer.  I played around with a small set of regular expressions using Text Explorer and Customize Regex, but the resulting display was disappointing.

text explorer output.png

Ideally what I'd like is a column next to the FULL NARRATIVE column with the found regular expressions highlighted.

Our VBA solution is quite fast; I tried using JSL + for loops and regex to search a small set of narratives (23) for all of the regular expressions, and it was very slow.

Here's the code for a text explorer search with 5 regular expressions:


Text Explorer(
	Text Columns( :FULL NARRATIVE ),
	Set Regex(
		Custom(
			Title( "R1" ),
			Example( "x" ),
			Regex( "\[\bexperienc[^.]*\b(\w{4,})\b.*\bexperienc[^.]*\1\b]\" ),
			Result( "\[\0]\" ),
			Comment( "x" ),
			Locale( "" ),
			ColorStyle( 10 )
		),
		Custom(
			Title( "R2" ),
			Example( "x" ),
			Regex( "\[\bexperienc[^.]*\.?[^.]*again[^.]*\.?[^.]*second[^.]*\.]\" ),
			Result( "\[\0]\" ),
			Comment( "x" ),
			Locale( "" ),
			ColorStyle( 9 )
		),
		Custom(
			Title( "R3" ),
			Example( "x" ),
			Regex( "\[\bexperienc.*experienc[^.]*second[^.]*infusion[^.]*\.]\" ),
			Result( "\[\0]\" ),
			Comment( "x" ),
			Locale( "" ),
			ColorStyle( 8 )
		),
		Custom(
			Title( "R4" ),
			Example( "x" ),
			Regex(
				"\[\bexperienc[a-z]{0,3}\W+(\w+\W+){0,25}resolv[a-z]{0,3}\W+(\w+\W+){0,25}receiv[a-z]{0,3}\W+(\w+\W+){0,25}experienc[^.]*\.]\"
			),
			Result( "\[\0]\" ),
			Comment( "x" ),
			Locale( "" ),
			ColorStyle( 7 )
		),
		Custom(
			Title( "R5" ),
			Example( "x" ),
			Regex( "\[\bexperienc[^.]*again[^.]*\.]\" ),
			Result( "\[\0]\" ),
			Comment( "x" ),
			Locale( "" ),
			ColorStyle( 6 )
		)
	),
	Layout( "Ordered" ),
	Customize Regex( 0 ),
	Language( "English" ),
	SendToReport(
		Dispatch(
			{"Term and Phrase Lists"},
			"",
			TableBox,
			{Set Summary Behavior( "Collapse" )}
		)
	)
);

Thanks for any suggestions on better ways to use Text Explorer and/or speed up regex searching.

Article Labels

    There are no labels assigned to this post.