Share your ideas for the JMP Scripting Unsession at Discovery Summit by September 17th. We hope to see you there!
Choose Language Hide Translation Bar
Highlighted
Hari
Level III

Regex help -

hello, I need just some help to use Regex function --

String is like --

"Fine.2020.Optional.(Not)"

 

Required -- "Fine_2020_Optional"

 

I have tried couple of combination, but not able to get the result in one regex. I am using this type of Regex in my for loop script.

 

Any help ?

 

Thanks.

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Regex help -

It would probably help if we had more examples of input and expected output and where you are getting hung up. The JSL below works for your single case, but I'm not sure if it's applicable to all of your data in general.

 

Regex("Fine.2020.Optional.(Not)",
	"(\w+?)\.(\d+?)\.(\w+?)\.\(\w+?\)",
	"\1_\2_\3"
);

View solution in original post

10 REPLIES 10
Highlighted

Re: Regex help -

It would probably help if we had more examples of input and expected output and where you are getting hung up. The JSL below works for your single case, but I'm not sure if it's applicable to all of your data in general.

 

Regex("Fine.2020.Optional.(Not)",
	"(\w+?)\.(\d+?)\.(\w+?)\.\(\w+?\)",
	"\1_\2_\3"
);

View solution in original post

Highlighted
Hari
Level III

Re: Regex help -

Thanks Paul,

There few simpler one, I was able to do it but this was the one. which gave me some trouble. Next time if I found any issue, I will get back to you with more details. Do you have some reference / link to share with me /us on Regex with good examples ? That would be very helpful.

Thanks again.
Highlighted

Re: Regex help -

This website has some great general information about Regex:

 

https://www.regular-expressions.info/quickstart.html

 

There is also a tool for generating a regex based on your inputs and expected outputs:

 

http://regex.inginf.units.it/

Highlighted
Hari
Level III

Re: Regex help -

This is really very helpful. Thanks.
Highlighted
Byron_JMP
Staff

Re: Regex help -

Regex( "this.is.jmp", "\.", "_", GLOBALREPLACE );

JMP Systems Engineer, Pharm and BioPharm Sciences
Highlighted
Byron_JMP
Staff

Re: Regex help -

OK, so off line I got a couple of comments about the very short solution.  

Here's how to get there on you own. It might be cheating, but it's soooooo much easier.

 

I made a table with the example in it. Selected the column and used recode.

In Recode I used replace string, and turned on the regular expressions.

In the recode dialog I turned on both scripting options (bottom right)

Then saved the script to the script window, copied the little bit and pasted it here.  (well, verified it worked and then pasted it.)

 

Huge thanks to @ErnestPasour  : ) 

 

Screen Shot 2020-02-14 at 9.19.49 AM.png

 

Screen Shot 2020-02-14 at 9.20.09 AM.png

 

oh, and there is that little bit using "\." rather than "." alone. The backslash is an escape character, and keeps from replacing everything with "_".   That's the only bit of regex knowledge needed to make this work.

JMP Systems Engineer, Pharm and BioPharm Sciences
Highlighted
Jeff_Perkinson
Community Manager Community Manager

Re: Regex help -

Don't forget the built-in JMP functions for doing string manipulation, like Substitute(), Munger() and, my favorite, Word(). The syntax for those might be easier to deal with for something like this. 

 

str="Fine.2020.Optional.(Not)";

show(substitute(str, ".", "_"));

show(word(1, str, ".") || "_" || word(2, str, ".") || "_" || word(3, str, "."));

show(concat items(words(str, ".")[1::3], "_"));

Log:

Substitute(str, ".", "_") = "Fine_2020_Optional_(Not)";
Word(1, str, ".") || "_" || Word(2, str, ".") || "_" || Word(3, str, ".") = "Fine_2020_Optional";
Concat Items(Words(str, ".")[Index(1, 3)], "_") = "Fine_2020_Optional";

 

-Jeff
Highlighted
Byron_JMP
Staff

Re: Regex help -

Pattern match, could also be a good solution, but I keep getting tripped up on the syntax. If only there was someone around who was fluent in snbol...  @Craige_Hales 

JMP Systems Engineer, Pharm and BioPharm Sciences
Highlighted
Craige_Hales
Staff (Retired)

Re: Regex help -

Hi @Byron_JMP !

I might do it like this:

source = "(aaa).Fine.2020.Optional.(Not).more.(leave out).last.(bbb)";
//
// remove text between parens, change . to _
//
parts = {}; // accumulator list of the individual parts
ipart = 0; // index to the parts list
rc = Pat Match( // test the rc (below) to make sure the match worked
	source,
	patpos(0) // PatRepeat must start at the beginning.
	+
	Pat Repeat( 
		( "." | Pat Pos( 0 ) ) // advance past a required period, or begin of string
		+ // followed by either...
		(
			( "(" + Pat Break( ")" ) + ")" ) 
			// above: match open paren and stop on a close paren. Parens and everything
			// between is rejected by not adding anything to the list of parts.
		| // or
			( ( Pat Break( "." ) | Pat Rem() ) >> parts[ipart += 1] ) 
			// above: scan forward up to, but not including, a period, or if
			// that doesn't work,  to the end (remainder) of the string. 
			// ipart += 1 adds 1 before subscripting. It is OK to add an item
			// to a list just beyond the end of the list. The >> operator stores
			// the matched value on its left into the variable on its right.
		)
	)
	+
	patrpos(0) // PatRepeat must stop at the end.
);
// re-join the parts with the _ separator
result = if(rc,Concat Items( parts, "_" ),"problem with match");
show(result);

result = "Fine_2020_Optional_more_last";

It's pretty long, and pretty opaque (not as opaque as regex, maybe). But...

 

  • Regex() is really targeted at making multiple replacements; GLOBALREPLACE is done efficiently. You'll probably need a second regex to remove (text), also using GLOBALREPLACE if there can be more than one.
  • Word(), in Jeff's example is also good if you know how many to keep and where the (text) appears. You can't beat it for simplicity and maintainability and maybe speed as well.
  • PatMatch() shines elsewhere; it is really for extracting information, or at most munging a string in a single location (there is no GLOBALREPLACE). The example shows PatMatch() extracting information to make a list of parts, then reassembling the parts.

 

On the other hand, this is a great introduction on how to use pattern matching to extract information from a string. The real benefit comes from longer strings that can be parsed in a single match (as above, on a short string) because the match only has startup overhead once. By making the PatRepeat walk the string in one pass, and by not modifying the string, the match can be pretty fast.

Craige
Article Labels

    There are no labels assigned to this post.