Solved: how to extract string between specific pattern using regexp?

Shlomizr · Sep 14, 2018 05:17 AM

Hi,
I am new to regex, i need to extract the name of users from their id, for example:
I have the following long string
123450_dan; 154396_kelli; 198756_dev;
Result need to be: dan
What regex do i need to use in order to extract the name dan which have the id 123450?

Thanks in advance.

Mark_Bailey · Sep 14, 2018 09:22 AM

You did not specifiy much detail about the ID so here are a few examples to get started. I used a JMP script, but this extraction could be used in a column formula, too.

Names Default to Here( 1 );

// test string
string = "123450_dan; 154396_kelli; 198756_dev;";

// assume specific ID
name = Regex( string, "123450_(\w+\b)", "\1" );
Show( name );

// assume first ID with 6 numbers
name = Regex( string, "\d{6}_(\w+\b)", "\1" );
Show( name );

// assume first ID with any number of numbers
name = Regex( string, "\d+_(\w+\b)", "\1" );
Show( name );

View solution in original post

gzmorgan0 · Sep 16, 2018 05:17 AM

The script that I provided previously using JMP functions would not require any modifications. The regular expression to capture one or more "names" after the underscore would be (\w+\s*)+ which means accumulate ( ) at least one word \w+, zero or more spaces \s* and the trailing plus implies repeat the pattern one or more times.

Names Default to Here(1);
string="123450_dan fox; 154396_kelli ann marie; 198756_dev;";

name1 = Regex( string, "123450_((\w+\s*)+\b)", "\1" );
name2 = Regex( string, "154396_((\w+\s*)+\b)", "\1" );

Show( name1, name2 ); 
/*: name1 = "dan fox";  name2 = "kelli ann marie"; */

View solution in original post

Mark_Bailey · Sep 16, 2018 11:53 AM

Use this form instead:

name = Regex( string, "123450_([a-zA-Z ]+);", "\1" );

View solution in original post

Mark_Bailey · Sep 14, 2018 09:22 AM

You did not specifiy much detail about the ID so here are a few examples to get started. I used a JMP script, but this extraction could be used in a column formula, too.

Names Default to Here( 1 );

// test string
string = "123450_dan; 154396_kelli; 198756_dev;";

// assume specific ID
name = Regex( string, "123450_(\w+\b)", "\1" );
Show( name );

// assume first ID with 6 numbers
name = Regex( string, "\d{6}_(\w+\b)", "\1" );
Show( name );

// assume first ID with any number of numbers
name = Regex( string, "\d+_(\w+\b)", "\1" );
Show( name );

Shlomizr · Sep 14, 2018 09:56 AM

Thanks Mark,
Working perfect!!

Shlomizr · Sep 16, 2018 02:55 AM

Hi Mark,
Just 1 question, what do i need to add if i have space in the user name:

for example: "123450_dan fox; 154396_kelli; 198756_dev;"
I tried the following Regex:

string="123450_dan fox; 154396_kelli; 198756_dev;"

name = Regex( string, "123450_(\w+ | \s+)", "\1" );
Show( name );

Result: "dan" instead of "dan fox"

Thanks in advance.

Mark_Bailey · Sep 16, 2018 11:53 AM

Use this form instead:

name = Regex( string, "123450_([a-zA-Z ]+);", "\1" );

Shlomizr · Sep 17, 2018 02:32 AM

Thanks, work perfect.

Mark_Bailey · Sep 14, 2018 09:27 AM

Assuming that you want fhe name associated with a specific ID in the parttern "id_name;", then this version will work, too.

Names Default to Here( 1 );

// test string
string = "123450_dan; 154396_kelli; 198756_dev;";

// target ID
id = "123450";

// assume specific ID
name = Regex( string, id || "_(\w+);", "\1" );
Show( name );

gzmorgan0 · Sep 14, 2018 02:13 PM

Just for fun, and because regular expressions can be tough to read by others (or two weeks later), Here is an alternative, using JMP functions.

Names Default to Here( 1 );

// test string
string = "123450_dan; 154396_kelli; 198756_dev;";

//create a list of id_name
id = Words(string, ";");

//create a list if ids, a list if names, an associative array(keyed list) 
idlist={};
namelist={};
lookup = [=>""];  //Associative Array that retuns an empty string, if number is not valid

for(i=1, i<=nitems(id), i++,
   InsertInto(idlist, num(word(1,Trim(id[i]),"_")) );
   InsertInto(namelist, Titlecase(word(2,Trim(id[i]),"_")) ); //don't need TitleCase
   lookup[idlist[i]]=namelist[i]  //ids are the keys and names are the values
);
show(id, idlist, namelist, lookup, lookup[198756], lookup[128954]); //last one is not valid

gzmorgan0 · Sep 16, 2018 05:17 AM

The script that I provided previously using JMP functions would not require any modifications. The regular expression to capture one or more "names" after the underscore would be (\w+\s*)+ which means accumulate ( ) at least one word \w+, zero or more spaces \s* and the trailing plus implies repeat the pattern one or more times.

Names Default to Here(1);
string="123450_dan fox; 154396_kelli ann marie; 198756_dev;";

name1 = Regex( string, "123450_((\w+\s*)+\b)", "\1" );
name2 = Regex( string, "154396_((\w+\s*)+\b)", "\1" );

Show( name1, name2 ); 
/*: name1 = "dan fox";  name2 = "kelli ann marie"; */

Shlomizr · Sep 17, 2018 02:34 AM

Thanks, work perfect!!

how to extract string between specific pattern using regexp?

Re: how to extract string between specific pattern using regexp?

Re: how to extract string between specific pattern using regexp?

Re: how to extract string between specific pattern using regexp?

Re: how to extract string between specific pattern using regexp?

Re: how to extract string between specific pattern using regexp?

Re: how to extract string between specific pattern using regexp?

Re: how to extract string between specific pattern using regexp?

Re: how to extract string between specific pattern using regexp?

Re: how to extract string between specific pattern using regexp?

Re: how to extract string between specific pattern using regexp?

Re: how to extract string between specific pattern using regexp?

Re: how to extract string between specific pattern using regexp?