cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Submit your abstract to the call for content for Discovery Summit Americas by April 23. Selected abstracts will be presented at Discovery Summit, Oct. 21- 24.
Discovery is online this week, April 16 and 18. Join us for these exciting interactive sessions.
Choose Language Hide Translation Bar
Build a data table from a string using pattern matching

Problem

You have a text string that has a repeating pattern. You want to extract some data from the string into a data table. Each repetition in the string represents a row in the table. There might be a lot of data to skip over because it doesn't belong in the table.

Solution

Use the JSL PatMatch function. You'll need to use the pat___ functions to build a pattern. The following example makes a data table of links found on a web page.

site = "https://en.wikipedia.org";
url = "/wiki/JMP_(statistical_software)";
html = Load Text File( site || url ); // Dec 2017: 200+ links here
quote = "\!""; // simplify escaping of quotation mark elsewhere

dt = New Table( url, New Column( "link", Character ), New Column( "text", Character ) );

// A typical link looks like this on Wikipedia:
//<a href="//st.wikipedia.org/" lang="st">Sesotho</a>
// the following pattern will need tweaking for other sites.

rc = Pat Match(	html, //
	Pat Repeat( // until there are no more
		Pat Break( "<" ) + // match up to, but not including, the <
		(// either we've found a link with <a
			(// this will fail if the html link format changes much...
			// there is no requirement that href follows a after one space.
			// the href value is between quotation marks and lang=... is
			// thrown away by this simple pattern
			"<a href=" + quote + Pat Break( quote ) >> vlink + Pat Break( ">" ) // store match in vlink
			// throw away the closing > then capture everything  up to the
			// closing </a>. this may include other tags
			+ ">" + Pat Arb() >> vtext + "</a>" + // store match in vtext
			Pat Fence() + // fence off previously parsed data, back-up-and-retry is pointless
			Pat Test( // inject some JSL into the match to save the results
				If( Starts With( vlink, "#" ), // ignore in-page anchors
					{} // nothing
				, // else
					If( Starts With( vlink, "/" ), // within site links begin with /
						vlink = site || vlink; // fully qualified
					);
					dt << addrows( 1 ); // extend the table by one row
					dt:link = vlink; // vlink and vtext are the JSL variables
					dt:text = vtext; // link and text are the table variables
				); //
				1; // PatTest needs a true result to keep going
			) //
			) //
		| // or we found something else and can just skip it
			Pat Break( ">" ) // match up to the closing > and throw it away
		) //
	) //
);
Data table of links and associated textData table of links and associated text

Discussion

Your pattern will be different; the < and > characters are part of the html specification and the pattern matcher uses them to find the links in the text. Your data will follow some other pattern. You'll want the PatFence and the PatTest, but you'll also want to change any special cases in PatTest. For this html example, links beginning with a # sign are just anchors within a web page (they make the page scroll to a section when clicked) so the code ignores them. Otherwise, the link should be saved. The links that are saved have the site prepended if they start with a /. You might need to handle some similar details.


See Also

https://community.jmp.com/t5/Uncharted/Pattern-Matching/ba-p/21005

JSL Cookbook

If you’re looking for a code snippet or design pattern that performs a common task for your JSL project, the JSL Cookbook is for you.

This knowledge base contains building blocks of JSL code that you can use to reduce the amount of coding you have to do yourself.

It's also a great place to learn from the experts how to use JSL in new ways, with best practices.