Problem
You have a text string that has a repeating pattern. You want to extract some data from the string into a data table. Each repetition in the string represents a row in the table. There might be a lot of data to skip over because it doesn't belong in the table.
Solution
Use the JSL PatMatch function. You'll need to use the pat___ functions to build a pattern. The following example makes a data table of links found on a web page.
site = "https://en.wikipedia.org";
url = "/wiki/JMP_(statistical_software)";
html = Load Text File( site || url );
quote = "\!"";
dt = New Table( url, New Column( "link", Character ), New Column( "text", Character ) );
rc = Pat Match( html,
Pat Repeat(
Pat Break( "<" ) +
(
(
"<a href=" + quote + Pat Break( quote ) >> vlink + Pat Break( ">" )
+ ">" + Pat Arb() >> vtext + "</a>" +
Pat Fence() +
Pat Test(
If( Starts With( vlink, "#" ),
{}
,
If( Starts With( vlink, "/" ),
vlink = site || vlink;
);
dt << addrows( 1 );
dt:link = vlink;
dt:text = vtext;
);
1;
)
)
|
Pat Break( ">" )
)
)
);
Data table of links and associated text
Discussion
Your pattern will be different; the < and > characters are part of the html specification and the pattern matcher uses them to find the links in the text. Your data will follow some other pattern. You'll want the PatFence and the PatTest, but you'll also want to change any special cases in PatTest. For this html example, links beginning with a # sign are just anchors within a web page (they make the page scroll to a section when clicked) so the code ignores them. Otherwise, the link should be saved. The links that are saved have the site prepended if they start with a /. You might need to handle some similar details.
See Also
https://community.jmp.com/t5/Uncharted/Pattern-Matching/ba-p/21005