- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Extract text from string
Hi,
If possible I would like extract particular text from the strings:
\\swirt.misc@CAS-K-CLERK\K10447-Wit_OG lo hi TT-YYT (CTL1083.1)_172-7-10053 -303
\\swirt.misc@CAS-K-CLERK\K10445-Wit_OG lo hi ES WERT (CIL0858.1)_172-7-10046-600
\\swirt.misc@CAS-K-CLERK\K10437-Wit_OG lo hi TT-YYT (CTL1077.1)_172-7-10053-303
\\swirt.misc@CAS-K-CLERK\K10420-Wit_OG lo hi ES TRTR (CMA0851.1)_172-7-10047-600
\\swirt.misc@CAS-K-CLERK\K10377-Wit_OG_lo hi_ES_SI (CMS0499.1)_ 172-7-10041-600
\\swirt.misc@CAS-K-CLERK\K10451-Wit_OG lo hi TT-YYT (CTL1078
\\swirt.misc@CAS-K-CLERK\K10468-Wit_OG lo hi YYT DEV-020856 (CTLR1073
\\swirt.misc@CAS-K-CLERK\K10022 -Wit_OG lo hi YYT DEV-020414 (CTLR0999.1)
\\swirt.misc@CAS-K-CLERK\K10030-Wit_OG lo hi TT-YYT (CTL1008.1)
and I want to extract the IDs from above:
CTL1083.1
CIL0858.1
CTL1077.1
CMA0851.1
CMS0499.1
CTL1078
CTLR1073
CTLR0999.1
CTL1008.1
Any help appreciated.
Thanks,
Tomasz
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Extract text from string
The solution from @pmroz is very robust, nice!
A little different approach might be to use formulas in the data table
The location of the start delimiter could be located with a "Contains" argument
Contains(:text, "(")
The location of the end delimiter could be located the same way
Contains( :text, ")" )
An obscure but really fun argument, "munger", could be used to extract the string between the delimiter locations.
In this case we want the string after the delimiter character, that ends before the ending delimiter character, so 1 is added to the start and subtracted from the end.
The function for munger in this case uses the arguments, whole text, where to start, how many characters to return. Munger can also do some other interesting things, like find and replace.
Munger( :text, :start + 1, (:stop - :start) - 1 )
Wrapping all that up into one formula would look like this
Munger(
:text,
Contains( :text, "(" ) + 1,
(Contains( :text, ")" ) - Contains( :text, "(" )) - 1
)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Extract text from string
Use the WORDS function with ( and ) as delimiters. Use the second element in the returned list. Run the code below and check the output in the log window.
dt = New Table( "Test Words", Add Rows( 9 ),
New Column( "Column 1", Character, "Nominal",
Set Values(
{
"\\swirt.misc@CAS-K-CLERK\K10447-Wit_OG lo hi TT-YYT (CTL1083.1)_172-7-10053 -303",
"\\swirt.misc@CAS-K-CLERK\K10445-Wit_OG lo hi ES WERT (CIL0858.1)_172-7-10046-600",
"\\swirt.misc@CAS-K-CLERK\K10437-Wit_OG lo hi TT-YYT (CTL1077.1)_172-7-10053-303",
"\\swirt.misc@CAS-K-CLERK\K10420-Wit_OG lo hi ES TRTR (CMA0851.1)_172-7-10047-600",
"\\swirt.misc@CAS-K-CLERK\K10377-Wit_OG_lo hi_ES_SI (CMS0499.1)_ 172-7-10041-600",
"\\swirt.misc@CAS-K-CLERK\K10451-Wit_OG lo hi TT-YYT (CTL1078",
"\\swirt.misc@CAS-K-CLERK\K10468-Wit_OG lo hi YYT DEV-020856 (CTLR1073",
"\\swirt.misc@CAS-K-CLERK\K10022 -Wit_OG lo hi YYT DEV-020414 (CTLR0999.1)",
"\\swirt.misc@CAS-K-CLERK\K10030-Wit_OG lo hi TT-YYT (CTL1008.1)"}
),
Set Display Width( 583 )
)
);
for (i = 1, i <= nrows(dt), i++,
one_value = dt:column 1[i];
word_list = words(one_value, "()");
print(word_list[2]);
);
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Extract text from string
The solution from @pmroz is very robust, nice!
A little different approach might be to use formulas in the data table
The location of the start delimiter could be located with a "Contains" argument
Contains(:text, "(")
The location of the end delimiter could be located the same way
Contains( :text, ")" )
An obscure but really fun argument, "munger", could be used to extract the string between the delimiter locations.
In this case we want the string after the delimiter character, that ends before the ending delimiter character, so 1 is added to the start and subtracted from the end.
The function for munger in this case uses the arguments, whole text, where to start, how many characters to return. Munger can also do some other interesting things, like find and replace.
Munger( :text, :start + 1, (:stop - :start) - 1 )
Wrapping all that up into one formula would look like this
Munger(
:text,
Contains( :text, "(" ) + 1,
(Contains( :text, ")" ) - Contains( :text, "(" )) - 1
)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Extract text from string
I use SUBSTR for similar effect.
for (i = 1, i <= nrows(dt), i++,
one_value = dt:column 1[i];
start = contains(one_value, "(") + 1;
end = contains(one_value, ")") - 1;
len = end - start + 1;
result = substr(one_value, start, len);
print(result);
);
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Extract text from string
In case you are into regular expressions... (Note: cleaned up hastily written code and replaced listing below on 1/12/22)
Names Default To Here( 1 );
// make list of example original strings
sample =
"\\swirt.misc@CAS-K-CLERK\K10447-Wit_OG lo hi TT-YYT (CTL1083.1)_172-7-10053 -303
\\swirt.misc@CAS-K-CLERK\K10445-Wit_OG lo hi ES WERT (CIL0858.1)_172-7-10046-600
\\swirt.misc@CAS-K-CLERK\K10437-Wit_OG lo hi TT-YYT (CTL1077.1)_172-7-10053-303
\\swirt.misc@CAS-K-CLERK\K10420-Wit_OG lo hi ES TRTR (CMA0851.1)_172-7-10047-600
\\swirt.misc@CAS-K-CLERK\K10377-Wit_OG_lo hi_ES_SI (CMS0499.1)_ 172-7-10041-600
\\swirt.misc@CAS-K-CLERK\K10451-Wit_OG lo hi TT-YYT (CTL1078
\\swirt.misc@CAS-K-CLERK\K10468-Wit_OG lo hi YYT DEV-020856 (CTLR1073
\\swirt.misc@CAS-K-CLERK\K10022 -Wit_OG lo hi YYT DEV-020414 (CTLR0999.1)
\\swirt.misc@CAS-K-CLERK\K10030-Wit_OG lo hi TT-YYT (CTL1008.1)";
sample = Words( sample, "\!U000D" );
// Show( sample );
n = N Items( sample );
// iterate over list and return target strings
id = List();
For Each( {string}, sample, Insert Into( id, Regex( string, If( Contains( string, ")" ), "(.+\()(.+)(\).*)", "(.+\()(.+)" ), "\2" ) ) );
Show( id );
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Extract text from string
I like the "Word()" function for such string extractions
Names Default To Here( 1 );
dt = New Table( "Test Words",
Add Rows( 9 ),
New Column( "Column 1",
Character,
"Nominal",
Set Values(
{"\\swirt.misc@CAS-K-CLERK\K10447-Wit_OG lo hi TT-YYT (CTL1083.1)_172-7-10053 -303",
"\\swirt.misc@CAS-K-CLERK\K10445-Wit_OG lo hi ES WERT (CIL0858.1)_172-7-10046-600",
"\\swirt.misc@CAS-K-CLERK\K10437-Wit_OG lo hi TT-YYT (CTL1077.1)_172-7-10053-303",
"\\swirt.misc@CAS-K-CLERK\K10420-Wit_OG lo hi ES TRTR (CMA0851.1)_172-7-10047-600",
"\\swirt.misc@CAS-K-CLERK\K10377-Wit_OG_lo hi_ES_SI (CMS0499.1)_ 172-7-10041-600",
"\\swirt.misc@CAS-K-CLERK\K10451-Wit_OG lo hi TT-YYT (CTL1078",
"\\swirt.misc@CAS-K-CLERK\K10468-Wit_OG lo hi YYT DEV-020856 (CTLR1073",
"\\swirt.misc@CAS-K-CLERK\K10022 -Wit_OG lo hi YYT DEV-020414 (CTLR0999.1)",
"\\swirt.misc@CAS-K-CLERK\K10030-Wit_OG lo hi TT-YYT (CTL1008.1)"}
),
Set Display Width( 583 )
)
);
For( i = 1, i <= N Rows( dt ), i++,
Print( Word( 2, dt:column 1[i], "()" ) )
);
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Extract text from string
@pmroz , your code always looks so much less haphazard than mine ; )
Substring (substr) is a more straightforward approach, I just can't resist a good case for munger.
Craig Hales probably has some even more efficient approaches using pattern matching. I'm sure there are already some examples in the community pages or his blog page.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Extract text from string
Wow lots of ways to slice a loaf of bread in JMP.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Extract text from string
Pattern matching you say? How about this
Names Default to Here(1);
sample = "\\swirt.misc@CAS-K-CLERK\K10447-Wit_OG lo hi TT-YYT (CTL1083.1)_172-7-10053 -303
\\swirt.misc@CAS-K-CLERK\K10445-Wit_OG lo hi ES WERT (CIL0858.1)_172-7-10046-600
\\swirt.misc@CAS-K-CLERK\K10437-Wit_OG lo hi TT-YYT (CTL1077.1)_172-7-10053-303
\\swirt.misc@CAS-K-CLERK\K10420-Wit_OG lo hi ES TRTR (CMA0851.1)_172-7-10047-600
\\swirt.misc@CAS-K-CLERK\K10377-Wit_OG_lo hi_ES_SI (CMS0499.1)_ 172-7-10041-600
\\swirt.misc@CAS-K-CLERK\K10451-Wit_OG lo hi TT-YYT (CTL1078
\\swirt.misc@CAS-K-CLERK\K10468-Wit_OG lo hi YYT DEV-020856 (CTLR1073
\\swirt.misc@CAS-K-CLERK\K10022 -Wit_OG lo hi YYT DEV-020414 (CTLR0999.1)
\\swirt.misc@CAS-K-CLERK\K10030-Wit_OG lo hi TT-YYT (CTL1008.1)";
charSet = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890.";
idList = {};
Pat Match(sample,"(" + Pat Span(charSet)>>nextID + (")" | "\!r") + Pat Test(Insert Into(idList,nextID);0));
It's unfortunate there isn't more support documentation for learning how to use pattern matching. I was a "Why bother with pattern matching when you have Regex?" person until I started to gain a better understand and appreciation for it (thank you @Craige_Hales!). It is blazingly fast and compact. To explain the code above, Pat Match is looking for an open parenthesis followed by one or more characters from charSet followed by the close parenthesis or a carriage return. The results from Pat Span are stored in the variable nextID. Each time the pattern is found, Pat Test is used to insert it into idList. The 0 at the end of Pat Test lets you continue searching the input string from the position immediately after the last character in the matched string (but before the next character). If the last statement in Pat Test had evaluated to true (anything not 0) the search would have ended, returning just the first match.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Extract text from string
very nice, all of the suggestions works the magic.
I've used the Munger in my script and works perfect.
Thanks a lot
Tomasz