Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- JMP User Community
- :
- Discussions
- :
- Discussions
- :
- Re: regex recursion

Topic Options

- Start Article
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Dec 21, 2017 4:14 PM
(1176 views)

Does JSL support recursive regex?

I've tried ?R and \g<0> and neither of them seem to work.

`regex("test(1, (2, 3));", "\(([^()]|(?R))*\");`

1 ACCEPTED SOLUTION

Accepted Solutions

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

No, but the pattern matcher loves recursion. Here's a complete example of an expression interpreter. I'm not a grammar expert; I think this is a recursive descent parser, and I think it does a lot of back tracking on moderately long expressions, making it noticeably slow for expressions with more than about 10-20 operators.

PatMatch returns 1 if the pattern matches, 0 if it fails (unbalanced parens, invalid operators, invalid numbers).

```
text = "3+4*((5+7)*(9+11*2+2)+13)"; // plus, star, paren, integer. pattern does not handle spaces.
stack = {}; // the "working" stack... holds the RPN numbers
// the following add and mul routines pop two numbers and push the result...
addstack = Function( {},
{num1 = Num( Remove From( stack )[1] ), num2 = Num( Remove From( stack )[1] )},
Insert Into( stack, Char( num1 + num2 ) )
);
mulstack = Function( {},
{num1 = Num( Remove From( stack )[1] ), num2 = Num( Remove From( stack )[1] )},
Insert Into( stack, Char( num1 * num2 ) )
);
dummy = {}; // dummy variable with a subscript that will evaluate as needed
// BNF-like grammar of Expression, Term, Factor, Constant
// >? is the conditional assignment from LHS to RHS. The >>
// immediate assignment will not work because of back-up and retry
// which would push incorrect items to stack. The RHS of the >? is either
// the dummy[] list or working stack. The dummy stack reference uses the
// calculation of its subscript to manipulate the working stack.
// As the grammar/pattern runs, the expression is evaluated.
E = (Expr( T ) + "+" + Expr( E )) >? dummy[addstack(); 1] | Expr( T );
T = (Expr( T ) + "*" + Expr( F )) >? dummy[mulstack(); 1] | Expr( F );
F = ("(" + Expr( E ) + ")") | Expr( C );
C = Pat Span( "0123456789" ) >? stack[N Items( stack ) + 1];
rc = Pat Match( text, Pat Pos( 0 ) + E + Pat R Pos( 0 ) );
If( rc,
Show( rc, Num( stack[1] ), Eval( Parse( text ) ) ); // use JMP's parse to verify the answer
,
Show( rc )
);
```

Craige

2 REPLIES 2

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

No, but the pattern matcher loves recursion. Here's a complete example of an expression interpreter. I'm not a grammar expert; I think this is a recursive descent parser, and I think it does a lot of back tracking on moderately long expressions, making it noticeably slow for expressions with more than about 10-20 operators.

PatMatch returns 1 if the pattern matches, 0 if it fails (unbalanced parens, invalid operators, invalid numbers).

```
text = "3+4*((5+7)*(9+11*2+2)+13)"; // plus, star, paren, integer. pattern does not handle spaces.
stack = {}; // the "working" stack... holds the RPN numbers
// the following add and mul routines pop two numbers and push the result...
addstack = Function( {},
{num1 = Num( Remove From( stack )[1] ), num2 = Num( Remove From( stack )[1] )},
Insert Into( stack, Char( num1 + num2 ) )
);
mulstack = Function( {},
{num1 = Num( Remove From( stack )[1] ), num2 = Num( Remove From( stack )[1] )},
Insert Into( stack, Char( num1 * num2 ) )
);
dummy = {}; // dummy variable with a subscript that will evaluate as needed
// BNF-like grammar of Expression, Term, Factor, Constant
// >? is the conditional assignment from LHS to RHS. The >>
// immediate assignment will not work because of back-up and retry
// which would push incorrect items to stack. The RHS of the >? is either
// the dummy[] list or working stack. The dummy stack reference uses the
// calculation of its subscript to manipulate the working stack.
// As the grammar/pattern runs, the expression is evaluated.
E = (Expr( T ) + "+" + Expr( E )) >? dummy[addstack(); 1] | Expr( T );
T = (Expr( T ) + "*" + Expr( F )) >? dummy[mulstack(); 1] | Expr( F );
F = ("(" + Expr( E ) + ")") | Expr( C );
C = Pat Span( "0123456789" ) >? stack[N Items( stack ) + 1];
rc = Pat Match( text, Pat Pos( 0 ) + E + Pat R Pos( 0 ) );
If( rc,
Show( rc, Num( stack[1] ), Eval( Parse( text ) ) ); // use JMP's parse to verify the answer
,
Show( rc )
);
```

Craige

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: regex recursion

A few minutes on wikipedia gives a set of rules without left recursion...that runs fast.

```
text = "(1+2)*(3+4+5)*6*7*8*21*(0+(1+(2*3))+4*(5+6)*7*8+(((12+13+14)*15+16+17)*18+19+20))*22"; // plus, star, paren, integer. pattern does not handle spaces.
stack = {}; // the "working" stack... holds the RPN numbers
// the following add and mul routines pop two numbers and push the result...
addstack = Function( {},
{num1 = Num( Remove From( stack )[1] ), num2 = Num( Remove From( stack )[1] )},
Insert Into( stack, Char( num1 + num2 ) )
);
mulstack = Function( {},
{num1 = Num( Remove From( stack )[1] ), num2 = Num( Remove From( stack )[1] )},
Insert Into( stack, Char( num1 * num2 ) )
);
dummy = {}; // dummy variable with a subscript that will evaluate as needed
// BNF-like grammar of Expression, Term, Factor, Constant
// >? is the conditional assignment from LHS to RHS. The >>
// immediate assignment will not work because of back-up and retry
// which would push incorrect items to stack. The RHS of the >? is either
// the dummy[] list or working stack. The dummy stack reference uses the
// calculation of its subscript to manipulate the working stack.
// As the grammar/pattern runs, the expression is evaluated.
/*
E = (Expr( T ) + "+" + Expr( E )) >? dummy[addstack(); 1] | Expr( T );
T = (Expr( T ) + "*" + Expr( F )) >? dummy[mulstack(); 1] | Expr( F );
F = ("(" + Expr( E ) + ")") | Expr( C );
C = Pat Span( "0123456789" ) >? stack[N Items( stack ) + 1];
*/
// re-written without left recursion
// https://en.wikipedia.org/wiki/Left_recursion#Removing_left_recursion
//
E = Expr( T ) + Expr( EP );
EP = ("+" + Expr( T ) >? dummy[addstack(); 1] + Expr( EP )) | "";
T = Expr( F ) + Expr( TP );
TP = ("*" + Expr( F ) >? dummy[mulstack(); 1] + Expr( TP )) | "";
F = ("(" + Expr( E ) + ")") | Expr( C );
C = Pat Span( "0123456789" ) >? stack[N Items( stack ) + 1];
rc = Pat Match( text, Pat Pos( 0 ) + E + Pat R Pos( 0 ), NULL, FULLSCAN );
If( rc,
Show( rc, Num( stack[1] ), Eval( Parse( text ) ) ); // use JMP's parse to verify the answer
,
Show( rc )
);
```

If you are following closely you are wondering how the first version runs at all, with left recursion. Why doesn't it loop forever, or at least until the recursion uses up available memory? By default the pattern matcher assumes that each expr(...) must match at least one character. At some point, the recursion is deep enough that there is not enough text left and the match backs up. All that back up and retry takes time on a long string. In the second example, without the left recursion, the patmatch() requires the FULLSCAN option to turn off the assumption because the expr() can now match "" (zero characters).

Craige