cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
swu2
Level III

regex function to match letter and letter with one digital that doesn't have space in between them

hello all,

 

I have a string "9 035 Assay Activity Testing For Characterization STD4 1 1". and I want to only match with "Assay Activity Testing For Characterization STD4"

 

I tried to use : Regex( :Sample Name, "\d", "", GLOBALREPLACE ); but it dropped the number on the STD, any suggestion?

 

Thanks

2 ACCEPTED SOLUTIONS

Accepted Solutions

Re: regex function to match letter and letter with one digital that doesn't have space in between th

Instead of doing a global replace, you could get a match for the part of the string you want to keep.

This RegEx works for the string you gave, but may not if the format of other strings are too different.

 

Trim(
	Regex(
		"9 035 Assay Activity Testing For Characterization STD4 1 1",
		"([a-z]+\d?\s)+",
		"\0",
		IGNORECASE
	)
);

Log:

"Assay Activity Testing For Characterization STD4"

 

Justin

View solution in original post

Craige_Hales
Super User

Re: regex function to match letter and letter with one digital that doesn't have space in between th

Justin's example might be perfect, but without knowing the rule you are using it is hard to tell. The rule might be to drop all leading digits and spaces, or it might be to start at "Assay". And on the other end, the rule might be to drop two final space delimited numbers, or all but the last digit or everything after the last digit connected to a word. And these are just some possible rules. Once you know the rule, making a regex to follow it becomes easier.

If you are puzzling over Justin's example, read it like this:

( ... )+  means match what is inside the parens one or more times

[a-z]+  means match one or more letters

\d?   means match one or zero digits 

\s   means match a space

Each 'word' matched by the parens includes an optional trailing digit and a required trailing space, even the last word. Justin used the trim function to remove the final space from the result. If your data might not have a space after the last word, the last word won't be included.

Craige

View solution in original post

3 REPLIES 3

Re: regex function to match letter and letter with one digital that doesn't have space in between th

Instead of doing a global replace, you could get a match for the part of the string you want to keep.

This RegEx works for the string you gave, but may not if the format of other strings are too different.

 

Trim(
	Regex(
		"9 035 Assay Activity Testing For Characterization STD4 1 1",
		"([a-z]+\d?\s)+",
		"\0",
		IGNORECASE
	)
);

Log:

"Assay Activity Testing For Characterization STD4"

 

Justin
Craige_Hales
Super User

Re: regex function to match letter and letter with one digital that doesn't have space in between th

Justin's example might be perfect, but without knowing the rule you are using it is hard to tell. The rule might be to drop all leading digits and spaces, or it might be to start at "Assay". And on the other end, the rule might be to drop two final space delimited numbers, or all but the last digit or everything after the last digit connected to a word. And these are just some possible rules. Once you know the rule, making a regex to follow it becomes easier.

If you are puzzling over Justin's example, read it like this:

( ... )+  means match what is inside the parens one or more times

[a-z]+  means match one or more letters

\d?   means match one or zero digits 

\s   means match a space

Each 'word' matched by the parens includes an optional trailing digit and a required trailing space, even the last word. Justin used the trim function to remove the final space from the result. If your data might not have a space after the last word, the last word won't be included.

Craige
swu2
Level III

Re: regex function to match letter and letter with one digital that doesn't have space in between th

Thank you so much for the help, and the explanation of the rules help. and I figure out that I need to trim every free standing number, and the following work perfectly.

 

"(\D+(\w+\w))+"