cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar

Extract number adjacent to String

Hi, can anybody help me to extract numbers adjacent to strings?

I have column1 shown below, and expect to extract numbers adjacent to "HFA" (column2).

 

I tried the following JSL, but could not get column2 that I expect

Num( Regex( :Column 1, "(\d\d\d)HFA | HFA(\d\d\d)", "\1" ) )

 

column1column2
400HFA isa300400
isa300 HFA400400

 

Thank you for considering the question.

1 ACCEPTED SOLUTION

Accepted Solutions
ngambles
Level III

Re: Extract number adjacent to String

Given the small example set, the following formula works:

Num( Regex( :column1, "(\d*)HFA(\d*)", "\1\2" ) )

This solution will run into problems if you have instances like "400HFA300".  It will return 400300.

View solution in original post

7 REPLIES 7
ngambles
Level III

Re: Extract number adjacent to String

Given the small example set, the following formula works:

Num( Regex( :column1, "(\d*)HFA(\d*)", "\1\2" ) )

This solution will run into problems if you have instances like "400HFA300".  It will return 400300.

Re: Extract number adjacent to String

Hi Ngambels,

 

Thank you very much for your help. It resolved the issue.

 

I did try 

"(\d*)HFA(\d*)"

but apparently I missed (I did not know I can and must use both \1 and \2)

"\1\2"

 

I learned new things from you and through this discussion.

 

Once again, thank you very much!

jthi
Super User

Re: Extract number adjacent to String

If for some reason there is a need to tackle the issue @ngambles mentioned, one fairly simple way would be to use Regex Match instead of Regex

Names Default To Here(1);
a = "400HFA isa300";
b = "isa300 HFA400";
c = "400HFA300";
d = "400HF300";

rgx_pattern = "(\d*)(?:HFA)(\d*)";
show(Regex Match(a, rgx_pattern));
show(Regex Match(b, rgx_pattern));
show(Regex Match(c, rgx_pattern));
show(Regex Match(d, rgx_pattern));

Regex Match will though require some extra checks as it returns a list of groups captured.

-Jarmo
Craige_Hales
Super User

Re: Extract number adjacent to String

Thanks @AprioriFish3134  for pointing out the original issue (no \2), and @ngambles  for a good solution.

 

If you need even more levels of nested parens, or the sequential parens in your example, just count the open-parens from left-to-right to discover the back reference number. The back reference for unused alternatives is an empty string.

 

Regex( "abac", "(((a)|(b)|(c))(c))+", "1=\1 2=\2 3=\3 4=\4 5=\5 6=\6" )

"1=ac 2=a 3=a 4= 5= 6=c"

 

One thing that might not be obvious: the back reference can also be used in the pattern. This could be useful for making sure a leading and trailing quotation mark or apostrophe are used consistently:

 

Write( Regex( 
	"before 'a \!"b\!" c' after", // before 'a "b" c' after
	"(\!"|')(.*?)\1", // ("|')(.*?)\1
	"\2" 
) )

a "b" c

(The \!" escape puts a quotation mark inside the strings; the comment shows the text of the string.) In this example, \1 group can match either " or ' and \2 matches all of the text up to another occurrence of \1.

 

Use some comments if you write regex like that! The next maintainer will appreciate it.

Craige
jthi
Super User

Re: Extract number adjacent to String

Good place to practice and test regex is https://regex101.com/ .

jthi_0-1656140400727.png

Those comments with JMP escape characters removed were very helpful, I have to start adding those when I write more complicated regex patterns (Regex issue with escaped characters )

-Jarmo

Re: Extract number adjacent to String

HI @jthi 

 

Thank you very much for your alternative solution to the issue, and also for the link on regex.

Re: Extract number adjacent to String

Hi @Craige_Hales ,

 

Thank you very much for enriching the discussion.

 

In fact I first learned about JMP's regex from your old article !
https://community.jmp.com/t5/Uncharted/Regex/ba-p/21008