cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
HSS
HSS
Level IV

Regex help -

hello, I need just some help to use Regex function --

String is like --

"Fine.2020.Optional.(Not)"

 

Required -- "Fine_2020_Optional"

 

I have tried couple of combination, but not able to get the result in one regex. I am using this type of Regex in my for loop script.

 

Any help ?

 

Thanks.

 

 

11 REPLIES 11
Craige_Hales
Super User

Re: Regex help -

Hi @Byron_JMP !

I might do it like this:

source = "(aaa).Fine.2020.Optional.(Not).more.(leave out).last.(bbb)";
//
// remove text between parens, change . to _
//
parts = {}; // accumulator list of the individual parts
ipart = 0; // index to the parts list
rc = Pat Match( // test the rc (below) to make sure the match worked
	source,
	patpos(0) // PatRepeat must start at the beginning.
	+
	Pat Repeat( 
		( "." | Pat Pos( 0 ) ) // advance past a required period, or begin of string
		+ // followed by either...
		(
			( "(" + Pat Break( ")" ) + ")" ) 
			// above: match open paren and stop on a close paren. Parens and everything
			// between is rejected by not adding anything to the list of parts.
		| // or
			( ( Pat Break( "." ) | Pat Rem() ) >> parts[ipart += 1] ) 
			// above: scan forward up to, but not including, a period, or if
			// that doesn't work,  to the end (remainder) of the string. 
			// ipart += 1 adds 1 before subscripting. It is OK to add an item
			// to a list just beyond the end of the list. The >> operator stores
			// the matched value on its left into the variable on its right.
		)
	)
	+
	patrpos(0) // PatRepeat must stop at the end.
);
// re-join the parts with the _ separator
result = if(rc,Concat Items( parts, "_" ),"problem with match");
show(result);

result = "Fine_2020_Optional_more_last";

It's pretty long, and pretty opaque (not as opaque as regex, maybe). But...

 

  • Regex() is really targeted at making multiple replacements; GLOBALREPLACE is done efficiently. You'll probably need a second regex to remove (text), also using GLOBALREPLACE if there can be more than one.
  • Word(), in Jeff's example is also good if you know how many to keep and where the (text) appears. You can't beat it for simplicity and maintainability and maybe speed as well.
  • PatMatch() shines elsewhere; it is really for extracting information, or at most munging a string in a single location (there is no GLOBALREPLACE). The example shows PatMatch() extracting information to make a list of parts, then reassembling the parts.

 

On the other hand, this is a great introduction on how to use pattern matching to extract information from a string. The real benefit comes from longer strings that can be parsed in a single match (as above, on a short string) because the match only has startup overhead once. By making the PatRepeat walk the string in one pass, and by not modifying the string, the match can be pretty fast.

Craige
Byron_JMP
Staff

Re: Regex help -

@Craige_Hales   Kudo x 10^6

 

Thank you

JMP Systems Engineer, Health and Life Sciences (Pharma)