cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
uday_guntupalli
Level VIII

Regex help

All, 

       Trying to explore Regular Expressions a little better . I can achieve what I want with a simple Contains() - but trying to see what the equivalent would be using Regular Expressions. 

        The regex I have doesn't work the way I expect it to , can somebody point me to the equivalent regex and explain what I have gotten wrong ? 

Clear Log(); Clear Globals(); 
dt = Open( "$SAMPLE_DATA/Cities.jmp" );
dt:city[2] = "aLBUQUERQUE"; 
CityList = dt:City << Get Values; 
Des = list(); 

for(i = 1 , i <= N Items(CityList),i++,
		If(!IsMissing(Regex(Char(CityList[i]),"^a|A[LB]*")),
			Insert Into(Des,CityList[i]);
	 	  );
   );

/*for(i = 1 , i <= N Items(CityList) , i++,
		If(Contains(CityList[i],"aLB")|Contains(CityList[i],"ALB"),
			Insert Into(Des,CityList[i]);
		  );
   );*/
   
Show(Des);
//Close All(Data Tables,"No Save");
Best
Uday
1 ACCEPTED SOLUTION

Accepted Solutions
ih
Super User (Alumni) ih
Super User (Alumni)

Re: Regex help

Right now you are matching either an 'a' at the beginning of the string ('^a') or an 'A' followed by zero or more letters 'L' or 'B' ('A[LB]*').  Some notes:

  1. There are no parenthesis around the or so the match is either the entire string before or after the '|'.  So if the string starts with 'a' it is done and will not match anything else.
  2. The square brackets mean match any character inside them.
  3. The asterisk means match zero or more of the preceeding character or group, in this case that means the group inside the squre brackets.
  4. Together, right side of the or will match any of these, anywhere inside your string (not just at the beginning):
    1. A
    2. AL
    3. AB
    4. ALB
    5. ABL
    6. ALLLLLLLLLLBLLBBBBLLBLLB
  5. The string aA would actually be two different matches of the Regex function.

You probably want this: "^(a|A)LB.*$", or this: "^[aA]LB.*$"

  1. Starting at the beginning of the string, match an a or A, followed by
  2. The characters LB, in order, followed by
  3. Zero or more of any character, the period is any character (.*), followed by
  4. The end of the string.  This is probably not necessary.
Des = {"ALBANY", "aLBUQUERQUE"};

 

Check out regexr.com, it not only helps check your code but has good 'reference' and 'cheatsheet' sections on the left.  I almost always turn multiline on (flags in the upper right).

 

Edited to clarify that match can be anywhere inside the string.

View solution in original post

1 REPLY 1
ih
Super User (Alumni) ih
Super User (Alumni)

Re: Regex help

Right now you are matching either an 'a' at the beginning of the string ('^a') or an 'A' followed by zero or more letters 'L' or 'B' ('A[LB]*').  Some notes:

  1. There are no parenthesis around the or so the match is either the entire string before or after the '|'.  So if the string starts with 'a' it is done and will not match anything else.
  2. The square brackets mean match any character inside them.
  3. The asterisk means match zero or more of the preceeding character or group, in this case that means the group inside the squre brackets.
  4. Together, right side of the or will match any of these, anywhere inside your string (not just at the beginning):
    1. A
    2. AL
    3. AB
    4. ALB
    5. ABL
    6. ALLLLLLLLLLBLLBBBBLLBLLB
  5. The string aA would actually be two different matches of the Regex function.

You probably want this: "^(a|A)LB.*$", or this: "^[aA]LB.*$"

  1. Starting at the beginning of the string, match an a or A, followed by
  2. The characters LB, in order, followed by
  3. Zero or more of any character, the period is any character (.*), followed by
  4. The end of the string.  This is probably not necessary.
Des = {"ALBANY", "aLBUQUERQUE"};

 

Check out regexr.com, it not only helps check your code but has good 'reference' and 'cheatsheet' sections on the left.  I almost always turn multiline on (flags in the upper right).

 

Edited to clarify that match can be anywhere inside the string.