BookmarkSubscribe
Choose Language Hide Translation Bar
vince_faller
Super User

regex recursion

Does JSL support recursive regex? 
I've tried ?R and \g<0> and neither of them seem to work.  

regex("test(1, (2, 3));", "\(([^()]|(?R))*\");
Vince Faller - Predictum
0 Kudos
1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted
Craige_Hales
Staff (Retired)

Re: regex recursion

No, but the pattern matcher loves recursion. Here's a complete example of an expression interpreter. I'm not a grammar expert; I think this is a recursive descent parser, and I think it does a lot of back tracking on moderately long expressions, making it noticeably slow for expressions with more than about 10-20 operators.

PatMatch returns 1 if the pattern matches, 0 if it fails (unbalanced parens, invalid operators, invalid numbers).

text = "3+4*((5+7)*(9+11*2+2)+13)"; // plus, star, paren, integer.  pattern does not handle spaces. 

stack = {}; // the "working" stack... holds the RPN numbers
// the following add and mul routines pop two numbers and push the result...
addstack = Function( {},
    {num1 = Num( Remove From( stack )[1] ), num2 = Num( Remove From( stack )[1] )},
    Insert Into( stack, Char( num1 + num2 ) )
);
mulstack = Function( {},
    {num1 = Num( Remove From( stack )[1] ), num2 = Num( Remove From( stack )[1] )},
    Insert Into( stack, Char( num1 * num2 ) )
);
dummy = {}; // dummy variable with a subscript that will evaluate as needed

// BNF-like grammar of Expression, Term, Factor, Constant
// >? is the conditional assignment from LHS to RHS. The >>
// immediate assignment will not work because of back-up and retry
// which would push incorrect items to stack. The RHS of the >? is either
// the dummy[] list or working stack. The dummy stack reference uses the 
// calculation of its subscript to manipulate the working stack.
// As the grammar/pattern runs, the expression is evaluated.
E = (Expr( T ) + "+" + Expr( E )) >? dummy[addstack(); 1] | Expr( T );
T = (Expr( T ) + "*" + Expr( F )) >? dummy[mulstack(); 1] | Expr( F );
F = ("(" + Expr( E ) + ")") | Expr( C );
C = Pat Span( "0123456789" ) >? stack[N Items( stack ) + 1];

rc = Pat Match( text, Pat Pos( 0 ) + E + Pat R Pos( 0 ) );
If( rc,
    Show( rc, Num( stack[1] ), Eval( Parse( text ) ) ); // use JMP's parse to verify the answer
,
    Show( rc )
);
Craige

View solution in original post

2 REPLIES 2
Highlighted
Craige_Hales
Staff (Retired)

Re: regex recursion

No, but the pattern matcher loves recursion. Here's a complete example of an expression interpreter. I'm not a grammar expert; I think this is a recursive descent parser, and I think it does a lot of back tracking on moderately long expressions, making it noticeably slow for expressions with more than about 10-20 operators.

PatMatch returns 1 if the pattern matches, 0 if it fails (unbalanced parens, invalid operators, invalid numbers).

text = "3+4*((5+7)*(9+11*2+2)+13)"; // plus, star, paren, integer.  pattern does not handle spaces. 

stack = {}; // the "working" stack... holds the RPN numbers
// the following add and mul routines pop two numbers and push the result...
addstack = Function( {},
    {num1 = Num( Remove From( stack )[1] ), num2 = Num( Remove From( stack )[1] )},
    Insert Into( stack, Char( num1 + num2 ) )
);
mulstack = Function( {},
    {num1 = Num( Remove From( stack )[1] ), num2 = Num( Remove From( stack )[1] )},
    Insert Into( stack, Char( num1 * num2 ) )
);
dummy = {}; // dummy variable with a subscript that will evaluate as needed

// BNF-like grammar of Expression, Term, Factor, Constant
// >? is the conditional assignment from LHS to RHS. The >>
// immediate assignment will not work because of back-up and retry
// which would push incorrect items to stack. The RHS of the >? is either
// the dummy[] list or working stack. The dummy stack reference uses the 
// calculation of its subscript to manipulate the working stack.
// As the grammar/pattern runs, the expression is evaluated.
E = (Expr( T ) + "+" + Expr( E )) >? dummy[addstack(); 1] | Expr( T );
T = (Expr( T ) + "*" + Expr( F )) >? dummy[mulstack(); 1] | Expr( F );
F = ("(" + Expr( E ) + ")") | Expr( C );
C = Pat Span( "0123456789" ) >? stack[N Items( stack ) + 1];

rc = Pat Match( text, Pat Pos( 0 ) + E + Pat R Pos( 0 ) );
If( rc,
    Show( rc, Num( stack[1] ), Eval( Parse( text ) ) ); // use JMP's parse to verify the answer
,
    Show( rc )
);
Craige

View solution in original post

Craige_Hales
Staff (Retired)

Re: regex recursion

A few minutes on wikipedia gives a set of rules without left recursion...that runs fast.

text = "(1+2)*(3+4+5)*6*7*8*21*(0+(1+(2*3))+4*(5+6)*7*8+(((12+13+14)*15+16+17)*18+19+20))*22"; // plus, star, paren, integer.  pattern does not handle spaces. 

stack = {}; // the "working" stack... holds the RPN numbers
// the following add and mul routines pop two numbers and push the result...
addstack = Function( {},
    {num1 = Num( Remove From( stack )[1] ), num2 = Num( Remove From( stack )[1] )},
    Insert Into( stack, Char( num1 + num2 ) )
);
mulstack = Function( {},
    {num1 = Num( Remove From( stack )[1] ), num2 = Num( Remove From( stack )[1] )},
    Insert Into( stack, Char( num1 * num2 ) )
);
dummy = {}; // dummy variable with a subscript that will evaluate as needed

// BNF-like grammar of Expression, Term, Factor, Constant
// >? is the conditional assignment from LHS to RHS. The >>
// immediate assignment will not work because of back-up and retry
// which would push incorrect items to stack. The RHS of the >? is either
// the dummy[] list or working stack. The dummy stack reference uses the 
// calculation of its subscript to manipulate the working stack.
// As the grammar/pattern runs, the expression is evaluated.
/*
E = (Expr( T ) + "+" + Expr( E )) >? dummy[addstack(); 1] | Expr( T );
T = (Expr( T ) + "*" + Expr( F )) >? dummy[mulstack(); 1] | Expr( F );
F = ("(" + Expr( E ) + ")") | Expr( C );
C = Pat Span( "0123456789" ) >? stack[N Items( stack ) + 1];
*/
// re-written without left recursion 
// https://en.wikipedia.org/wiki/Left_recursion#Removing_left_recursion
//
E = Expr( T ) + Expr( EP );
EP = ("+" + Expr( T ) >? dummy[addstack(); 1] + Expr( EP )) | "";
T = Expr( F ) + Expr( TP );
TP = ("*" + Expr( F ) >? dummy[mulstack(); 1] + Expr( TP )) | "";
F = ("(" + Expr( E ) + ")") | Expr( C );
C = Pat Span( "0123456789" ) >? stack[N Items( stack ) + 1];

rc = Pat Match( text, Pat Pos( 0 ) + E + Pat R Pos( 0 ), NULL, FULLSCAN );
If( rc,
    Show( rc, Num( stack[1] ), Eval( Parse( text ) ) ); // use JMP's parse to verify the answer
,
    Show( rc )
);

If you are following closely you are wondering how the first version runs at all, with left recursion. Why doesn't it loop forever, or at least until the recursion uses up available memory? By default the pattern matcher assumes that each expr(...) must match at least one character. At some point, the recursion is deep enough that there is not enough text left and the match backs up. All that back up and retry takes time on a long string. In the second example, without the left recursion, the patmatch() requires the FULLSCAN option to turn off the assumption because the expr() can now match "" (zero characters).

Craige