BookmarkSubscribeRSS Feed
Shlomizr

Occasional Contributor

Joined:

Sep 13, 2018

how to extract string between specific pattern using regexp?


Hi,
I am new to regex, i need to extract the name of users from their id, for example:
I have the following long string
123450_dan; 154396_kelli; 198756_dev;
Result need to be: dan
What regex do i need to use in order to extract the name dan which have the id 123450?

Thanks in advance.
3 ACCEPTED SOLUTIONS

Accepted Solutions
markbailey

Staff

Joined:

Jun 23, 2011

Solution

Re: how to extract string between specific pattern using regexp?

You did not specifiy much detail about the ID so here are a few examples to get started. I used a JMP script, but this extraction could be used in a column formula, too.

 

Names Default to Here( 1 );

// test string
string = "123450_dan; 154396_kelli; 198756_dev;";

// assume specific ID
name = Regex( string, "123450_(\w+\b)", "\1" );
Show( name );

// assume first ID with 6 numbers
name = Regex( string, "\d{6}_(\w+\b)", "\1" );
Show( name );

// assume first ID with any number of numbers
name = Regex( string, "\d+_(\w+\b)", "\1" );
Show( name );
Learn it once, use it forever!
gzmorgan0

Community Trekker

Joined:

Jul 25, 2016

Solution

Re: how to extract string between specific pattern using regexp?

The script that I provided previously using JMP functions would not require any modifications.  The regular expression to capture one or more "names" after the underscore would be (\w+\s*)+ which means accumulate ( ) at  least one word \w+, zero or more spaces \s* and the trailing plus implies repeat the pattern one or more times.

 

Names Default to Here(1);
string="123450_dan fox; 154396_kelli ann marie; 198756_dev;";

name1 = Regex( string, "123450_((\w+\s*)+\b)", "\1" );
name2 = Regex( string, "154396_((\w+\s*)+\b)", "\1" );

Show( name1, name2 ); 
/*: name1 = "dan fox"; name2 = "kelli ann marie"; */
markbailey

Staff

Joined:

Jun 23, 2011

Solution

Re: how to extract string between specific pattern using regexp?

Use this form instead:

 

name = Regex( string, "123450_([a-zA-Z ]+);", "\1" );
Learn it once, use it forever!
9 REPLIES
markbailey

Staff

Joined:

Jun 23, 2011

Solution

Re: how to extract string between specific pattern using regexp?

You did not specifiy much detail about the ID so here are a few examples to get started. I used a JMP script, but this extraction could be used in a column formula, too.

 

Names Default to Here( 1 );

// test string
string = "123450_dan; 154396_kelli; 198756_dev;";

// assume specific ID
name = Regex( string, "123450_(\w+\b)", "\1" );
Show( name );

// assume first ID with 6 numbers
name = Regex( string, "\d{6}_(\w+\b)", "\1" );
Show( name );

// assume first ID with any number of numbers
name = Regex( string, "\d+_(\w+\b)", "\1" );
Show( name );
Learn it once, use it forever!
Shlomizr

Occasional Contributor

Joined:

Sep 13, 2018

Re: how to extract string between specific pattern using regexp?

Thanks Mark,
Working perfect!!
Shlomizr

Occasional Contributor

Joined:

Sep 13, 2018

Re: how to extract string between specific pattern using regexp?

Hi Mark,
Just 1 question, what do i need to add if i have space in the user name:

for example: "123450_dan fox; 154396_kelli; 198756_dev;"
I tried the following Regex:

string="123450_dan fox; 154396_kelli; 198756_dev;"

name = Regex( string, "123450_(\w+ | \s+)", "\1" );
Show( name );

Result: "dan" instead of "dan fox"

Thanks in advance.
markbailey

Staff

Joined:

Jun 23, 2011

Solution

Re: how to extract string between specific pattern using regexp?

Use this form instead:

 

name = Regex( string, "123450_([a-zA-Z ]+);", "\1" );
Learn it once, use it forever!
Shlomizr

Occasional Contributor

Joined:

Sep 13, 2018

Re: how to extract string between specific pattern using regexp?

Thanks, work perfect.
Highlighted
markbailey

Staff

Joined:

Jun 23, 2011

Re: how to extract string between specific pattern using regexp?

Assuming that you want fhe name associated with a specific ID in the parttern "id_name;", then this version will work, too.

 

Names Default to Here( 1 );

// test string
string = "123450_dan; 154396_kelli; 198756_dev;";

// target ID
id = "123450";

// assume specific ID
name = Regex( string, id || "_(\w+);", "\1" );
Show( name );
Learn it once, use it forever!
gzmorgan0

Community Trekker

Joined:

Jul 25, 2016

Re: how to extract string between specific pattern using regexp?

Just for fun, and because regular expressions can be tough to read by others (or two weeks later), Here is an alternative, using JMP functions.

 

Names Default to Here( 1 );

// test string
string = "123450_dan; 154396_kelli; 198756_dev;";

//create a list of id_name
id = Words(string, ";");

//create a list if ids, a list if names, an associative array(keyed list) 
idlist={};
namelist={};
lookup = [=>""];  //Associative Array that retuns an empty string, if number is not valid

for(i=1, i<=nitems(id), i++,
   InsertInto(idlist, num(word(1,Trim(id[i]),"_")) );
   InsertInto(namelist, Titlecase(word(2,Trim(id[i]),"_")) ); //don't need TitleCase
   lookup[idlist[i]]=namelist[i]  //ids are the keys and names are the values
);
show(id, idlist, namelist, lookup, lookup[198756], lookup[128954]); //last one is not valid

 

gzmorgan0

Community Trekker

Joined:

Jul 25, 2016

Solution

Re: how to extract string between specific pattern using regexp?

The script that I provided previously using JMP functions would not require any modifications.  The regular expression to capture one or more "names" after the underscore would be (\w+\s*)+ which means accumulate ( ) at  least one word \w+, zero or more spaces \s* and the trailing plus implies repeat the pattern one or more times.

 

Names Default to Here(1);
string="123450_dan fox; 154396_kelli ann marie; 198756_dev;";

name1 = Regex( string, "123450_((\w+\s*)+\b)", "\1" );
name2 = Regex( string, "154396_((\w+\s*)+\b)", "\1" );

Show( name1, name2 ); 
/*: name1 = "dan fox"; name2 = "kelli ann marie"; */
Shlomizr

Occasional Contributor

Joined:

Sep 13, 2018

Re: how to extract string between specific pattern using regexp?

Thanks, work perfect!!