cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
markschahl
Level V

Parsing phrases separated by ;

So, I saw a link to the 1/6 cases charged already. I got curious and File>Internet Open>:

Local( {dt},
	dt = Open(
		"https://www.justice.gov/usao-dc/capitol-breach-cases",
		HTML Table( 1, Column Names( 1 ), Data Starts( 2 ) )
	);
	dt << Set Name( "capitol-breach-cases" );
	dt;
)

Thankfully, the charge(s) per case are separated by semicolons (;). I used the formula:  Words(:"Charge(s)"n, ";"). This created a column with lists like this in each row:

{"Entering and Remaining in a Restricted Building", " Disorderly and Disruptive Conduct in a Restricted Building", " Violent Entry and Disorderly Conduct in a Capitol Building", " Parading, Demonstrating, or Picketing in a Capitol Building"}

So, now that I have that, how can I determine the counts for each charge phrase? Would make a great packed bar chart...
Or can I do this with Text Explorer and somehow customizing Regex?
Other?

4 REPLIES 4
jthi
Super User

Re: Parsing phrases separated by ;

Most likely there are many ways to get this done, but you could make Charge(s) column multiple response column -> create distribution of that (or use some other platform which supports multiple response columns properly) -> make data table of the Frequencies table -> create graph builder from that:

jthi_0-1641237844913.png

 

-Jarmo
markschahl
Level V

Re: Parsing phrases separated by ;

Jarmo:
Thanks! I did not know about the Multiple Response data type. Cool thing: Recode works on a Multiple Response Column, treating each response as a value. So, there are 490 values that need to be recoded. I will be busy for a while...

markschahl
Level V

Re: Parsing phrases separated by ;

Took a lot of recoding, but here is the summary. I'm not done with this dataset. I want to learn Regex() so I can extract the arrest dates to plot how fast DoJ worked.

 

capitol-breach-cases-summary-bar-chart.png

ih
Super User (Alumni) ih
Super User (Alumni)

Re: Parsing phrases separated by ;

You could start with something like this to pull out the arrested date, but it would need expanded to handle other cases, like dates without years or 'arrested on ...'.

 

New Column( "Date Arrested",
	Numeric,
	"Continuous",
	Format( "m/d/y", 10 ),
	Input Format( "m/d/y" ),
	Formula(
		d = Regex( :Case Status, "Arrested (\d+/\d+/\d+)", "\1" );
		If( !Is Missing( d ),
			Parse Date( d )
		);
	),
	Set Display Width( 89 )
);