cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
jetpeach
Level II

Regex backreference followed by a number (unambiguous backreferences)

I'm scripting something to replace column names based on a regex match. I iterate through all columns and regex regex match with a new string. Works well and handy for shortening col names - except when I am replacing to a string that starts with a number - this is because the backreference gets messed up (\1 becomes \133 in the case below)

In python, there is 'unambious backreference' to work around this. Is there the same in JSL? I read somewhere maybe it uses Perl type backreferences but couldn't find where this is documented.

Thanks,

Peach

 

colName = "COLUMNNAMETEST LONGNAMESHORTEN33339E0HS KEEPTHIS";
Rename_regx = "LONGNAMESHORTEN2339E0HS111";
Rename_to = "39E0H";

newname=Regex( colName,
	"(.*?)(" || Rename_regx || ")(.*)", // match
	"\1" || Rename_to || "\3") // replace match with

//	"\g<1>" || Rename_to || "\g<3>") // python like unambiguous doesn't owrk
1 REPLY 1
Craige_Hales
Super User

Re: Regex backreference followed by a number (unambiguous backreferences)

colName = "COLUMNNAMETEST LONG\1NAMESHORTEN2339E0HS111 KEEPTHIS";
Rename_regx = substitute("LONG\1NAMESHORTEN2339E0HS111","\","\\");
Rename_to = "39E \1 0H";

// \Q starts a sequence that ignores escapes
newname=Regex( colName,
	"(.*?)(" || Rename_regx || ")(.*)", // match
	"\1\Q" || Rename_to || "\E\3"); // replace match with
show(newname); // newname = "COLUMNNAMETEST 39E \1 0H KEEPTHIS";
// notice: 
//   \Q begins "quoting" and \E ends it
//   the \Q has a side effect of ending the \1 (which is what you asked)
//   the \1 in the replacement is NOT expanded inside the \Q...\E
//   at the very top, it takes care of escaping any escapes with substitute,
//   which *could* be an issue

// what else? upper and lower case:
// \L begin lower casing 
// \U begin upper casing
// \E ends *all* \L, \U, or \Q
show(regex("aBcDEf","(..)(..)(..)","\1\L \2 \U\3"));//"aB cd EF"
// \l (lower case L) lower case the next character
// \u upper case the next character
show(regex("ABcdEF","(..)(..)(..)","\1 \u\2 \l\3"));//"AB Cd eF"
// the Q, L, and U modes share a common E (end)
// the q and l are not persistent and only apply to the next character.
// upper/lower is probably very ASCII specific, no Unicode magic.

// this is poorly documented here: https://www.jmp.com/support/help/en/17.2/index.shtml#page/jmp/escaped-characters-in-regular-expressions.shtml
// somehow all the escapes have been lumped together in the worst possible way.

 edit: there are two parameters to the regex() function that use escapes. Parameter 2 is the pattern and uses escapes the way most people expect, and the \Q \U \L \E \u \l do not belong there. Parameter 3 is the replacement, and the \Q \U \L \E \u \l can be used there. Backreferences \1 \2 \3 ... can be used in both parm2 and parm3.

Craige