Solved: Re: Extract text from string

Report Inappropriate Content · Jun 9, 2023 11:08 AM

Hi,

If possible I would like extract particular text from the strings:

\\swirt.misc@CAS-K-CLERK\K10447-Wit_OG lo hi TT-YYT (CTL1083.1)_172-7-10053 -303

\\swirt.misc@CAS-K-CLERK\K10445-Wit_OG lo hi ES WERT (CIL0858.1)_172-7-10046-600

\\swirt.misc@CAS-K-CLERK\K10437-Wit_OG lo hi TT-YYT (CTL1077.1)_172-7-10053-303

\\swirt.misc@CAS-K-CLERK\K10420-Wit_OG lo hi ES TRTR (CMA0851.1)_172-7-10047-600

\\swirt.misc@CAS-K-CLERK\K10377-Wit_OG_lo hi_ES_SI (CMS0499.1)_ 172-7-10041-600

\\swirt.misc@CAS-K-CLERK\K10451-Wit_OG lo hi TT-YYT (CTL1078

\\swirt.misc@CAS-K-CLERK\K10468-Wit_OG lo hi YYT DEV-020856 (CTLR1073

\\swirt.misc@CAS-K-CLERK\K10022 -Wit_OG lo hi YYT DEV-020414 (CTLR0999.1)

\\swirt.misc@CAS-K-CLERK\K10030-Wit_OG lo hi TT-YYT (CTL1008.1)

and I want to extract the IDs from above:

CTL1083.1

CIL0858.1

CTL1077.1

CMA0851.1

CMS0499.1

CTL1078

CTLR1073

CTLR0999.1

CTL1008.1

Any help appreciated.

Thanks,

Tomasz

Byron_JMP · Jan 11, 2022 08:37 AM

The solution from @pmroz is very robust, nice!

A little different approach might be to use formulas in the data table

Screen Shot 2022-01-11 at 8.29.19 AM.png

The location of the start delimiter could be located with a "Contains" argument

Contains(:text, "(")

The location of the end delimiter could be located the same way

Contains( :text, ")" )

An obscure but really fun argument, "munger", could be used to extract the string between the delimiter locations.

In this case we want the string after the delimiter character, that ends before the ending delimiter character, so 1 is added to the start and subtracted from the end.

The function for munger in this case uses the arguments, whole text, where to start, how many characters to return. Munger can also do some other interesting things, like find and replace.

Munger( :text, :start + 1, (:stop - :start) - 1 )

Wrapping all that up into one formula would look like this

Munger(
	:text,
	Contains( :text, "(" ) + 1,
	(Contains( :text, ")" ) - Contains( :text, "(" )) - 1
)

JMP Systems Engineer, Health and Life Sciences (Pharma)

View solution in original post

pmroz · Jan 11, 2022 08:17 AM

Use the WORDS function with ( and ) as delimiters. Use the second element in the returned list. Run the code below and check the output in the log window.

dt = New Table( "Test Words", Add Rows( 9 ),
	New Column( "Column 1", Character, "Nominal",
		Set Values(
			{
			"\\swirt.misc@CAS-K-CLERK\K10447-Wit_OG lo hi TT-YYT (CTL1083.1)_172-7-10053 -303",
			"\\swirt.misc@CAS-K-CLERK\K10445-Wit_OG lo hi ES WERT (CIL0858.1)_172-7-10046-600",
			"\\swirt.misc@CAS-K-CLERK\K10437-Wit_OG lo hi TT-YYT (CTL1077.1)_172-7-10053-303",
			"\\swirt.misc@CAS-K-CLERK\K10420-Wit_OG lo hi ES TRTR (CMA0851.1)_172-7-10047-600",
			"\\swirt.misc@CAS-K-CLERK\K10377-Wit_OG_lo hi_ES_SI (CMS0499.1)_ 172-7-10041-600",
			"\\swirt.misc@CAS-K-CLERK\K10451-Wit_OG lo hi TT-YYT (CTL1078",
			"\\swirt.misc@CAS-K-CLERK\K10468-Wit_OG lo hi YYT DEV-020856 (CTLR1073",
			"\\swirt.misc@CAS-K-CLERK\K10022 -Wit_OG lo hi YYT DEV-020414 (CTLR0999.1)",
			"\\swirt.misc@CAS-K-CLERK\K10030-Wit_OG lo hi TT-YYT (CTL1008.1)"}
		),
		Set Display Width( 583 )
	)
);

for (i = 1, i <= nrows(dt), i++,
	one_value = dt:column 1[i];
	
	word_list = words(one_value, "()");
	print(word_list[2]);
);

Byron_JMP · Jan 11, 2022 08:37 AM

The solution from @pmroz is very robust, nice!

A little different approach might be to use formulas in the data table

Screen Shot 2022-01-11 at 8.29.19 AM.png

The location of the start delimiter could be located with a "Contains" argument

Contains(:text, "(")

The location of the end delimiter could be located the same way

Contains( :text, ")" )

An obscure but really fun argument, "munger", could be used to extract the string between the delimiter locations.

In this case we want the string after the delimiter character, that ends before the ending delimiter character, so 1 is added to the start and subtracted from the end.

The function for munger in this case uses the arguments, whole text, where to start, how many characters to return. Munger can also do some other interesting things, like find and replace.

Munger( :text, :start + 1, (:stop - :start) - 1 )

Wrapping all that up into one formula would look like this

Munger(
	:text,
	Contains( :text, "(" ) + 1,
	(Contains( :text, ")" ) - Contains( :text, "(" )) - 1
)

JMP Systems Engineer, Health and Life Sciences (Pharma)

pmroz · Jan 11, 2022 09:42 AM

I use SUBSTR for similar effect.

for (i = 1, i <= nrows(dt), i++,
	one_value = dt:column 1[i];
	start  = contains(one_value, "(") + 1;
	end    = contains(one_value, ")") - 1;
	len    = end - start + 1;
	result = substr(one_value, start, len);
	print(result);
);

Mark_Bailey · Jan 12, 2022 8:01 AM

In case you are into regular expressions... (Note: cleaned up hastily written code and replaced listing below on 1/12/22)

Names Default To Here( 1 );

// make list of example original strings
sample =
"\\swirt.misc@CAS-K-CLERK\K10447-Wit_OG lo hi TT-YYT (CTL1083.1)_172-7-10053 -303

\\swirt.misc@CAS-K-CLERK\K10445-Wit_OG lo hi ES WERT (CIL0858.1)_172-7-10046-600

\\swirt.misc@CAS-K-CLERK\K10437-Wit_OG lo hi TT-YYT (CTL1077.1)_172-7-10053-303

\\swirt.misc@CAS-K-CLERK\K10420-Wit_OG lo hi ES TRTR (CMA0851.1)_172-7-10047-600

\\swirt.misc@CAS-K-CLERK\K10377-Wit_OG_lo hi_ES_SI (CMS0499.1)_ 172-7-10041-600

\\swirt.misc@CAS-K-CLERK\K10451-Wit_OG lo hi TT-YYT (CTL1078

\\swirt.misc@CAS-K-CLERK\K10468-Wit_OG lo hi YYT DEV-020856 (CTLR1073

\\swirt.misc@CAS-K-CLERK\K10022 -Wit_OG lo hi YYT DEV-020414 (CTLR0999.1)

\\swirt.misc@CAS-K-CLERK\K10030-Wit_OG lo hi TT-YYT (CTL1008.1)";
sample = Words( sample, "\!U000D" );
// Show( sample );
n = N Items( sample );

// iterate over list and return target strings
id = List();
For Each( {string}, sample, Insert Into( id, Regex( string, If( Contains( string, ")" ), "(.+\()(.+)(\).*)", "(.+\()(.+)" ), "\2" ) ) );

Show( id );

txnelson · Jan 11, 2022 10:59 AM

I like the "Word()" function for such string extractions

Names Default To Here( 1 );
dt = New Table( "Test Words",
	Add Rows( 9 ),
	New Column( "Column 1",
		Character,
		"Nominal",
		Set Values(
			{"\\swirt.misc@CAS-K-CLERK\K10447-Wit_OG lo hi TT-YYT (CTL1083.1)_172-7-10053 -303",
			"\\swirt.misc@CAS-K-CLERK\K10445-Wit_OG lo hi ES WERT (CIL0858.1)_172-7-10046-600",
			"\\swirt.misc@CAS-K-CLERK\K10437-Wit_OG lo hi TT-YYT (CTL1077.1)_172-7-10053-303",
			"\\swirt.misc@CAS-K-CLERK\K10420-Wit_OG lo hi ES TRTR (CMA0851.1)_172-7-10047-600",
			"\\swirt.misc@CAS-K-CLERK\K10377-Wit_OG_lo hi_ES_SI (CMS0499.1)_ 172-7-10041-600",
			"\\swirt.misc@CAS-K-CLERK\K10451-Wit_OG lo hi TT-YYT (CTL1078",
			"\\swirt.misc@CAS-K-CLERK\K10468-Wit_OG lo hi YYT DEV-020856 (CTLR1073",
			"\\swirt.misc@CAS-K-CLERK\K10022 -Wit_OG lo hi YYT DEV-020414 (CTLR0999.1)",
			"\\swirt.misc@CAS-K-CLERK\K10030-Wit_OG lo hi TT-YYT (CTL1008.1)"}
		),
		Set Display Width( 583 )
	)
);

For( i = 1, i <= N Rows( dt ), i++,
	Print( Word( 2, dt:column 1[i], "()" ) )
);

Jim

Byron_JMP · Jan 11, 2022 11:33 AM

@pmroz , your code always looks so much less haphazard than mine ; )

Substring (substr) is a more straightforward approach, I just can't resist a good case for munger.

Craig Hales probably has some even more efficient approaches using pattern matching. I'm sure there are already some examples in the community pages or his blog page.

JMP Systems Engineer, Health and Life Sciences (Pharma)

pmroz · Jan 11, 2022 11:36 AM

Wow lots of ways to slice a loaf of bread in JMP.

DonMcCormack · Jan 11, 2022 05:04 PM

Pattern matching you say? How about this

Names Default to Here(1);

sample = "\\swirt.misc@CAS-K-CLERK\K10447-Wit_OG lo hi TT-YYT (CTL1083.1)_172-7-10053 -303
\\swirt.misc@CAS-K-CLERK\K10445-Wit_OG lo hi ES WERT (CIL0858.1)_172-7-10046-600
\\swirt.misc@CAS-K-CLERK\K10437-Wit_OG lo hi TT-YYT (CTL1077.1)_172-7-10053-303
\\swirt.misc@CAS-K-CLERK\K10420-Wit_OG lo hi ES TRTR (CMA0851.1)_172-7-10047-600
\\swirt.misc@CAS-K-CLERK\K10377-Wit_OG_lo hi_ES_SI (CMS0499.1)_ 172-7-10041-600
\\swirt.misc@CAS-K-CLERK\K10451-Wit_OG lo hi TT-YYT (CTL1078
\\swirt.misc@CAS-K-CLERK\K10468-Wit_OG lo hi YYT DEV-020856 (CTLR1073
\\swirt.misc@CAS-K-CLERK\K10022 -Wit_OG lo hi YYT DEV-020414 (CTLR0999.1)
\\swirt.misc@CAS-K-CLERK\K10030-Wit_OG lo hi TT-YYT (CTL1008.1)";

charSet = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890.";

idList = {};
Pat Match(sample,"(" + Pat Span(charSet)>>nextID + (")" | "\!r") + Pat Test(Insert Into(idList,nextID);0));

It's unfortunate there isn't more support documentation for learning how to use pattern matching. I was a "Why bother with pattern matching when you have Regex?" person until I started to gain a better understand and appreciation for it (thank you @Craige_Hales!). It is blazingly fast and compact. To explain the code above, Pat Match is looking for an open parenthesis followed by one or more characters from charSet followed by the close parenthesis or a carriage return. The results from Pat Span are stored in the variable nextID. Each time the pattern is found, Pat Test is used to insert it into idList. The 0 at the end of Pat Test lets you continue searching the input string from the position immediately after the last character in the matched string (but before the next character). If the last statement in Pat Test had evaluated to true (anything not 0) the search would have ended, returning just the first match.

swiergi11 · Jan 11, 2022 11:38 AM

very nice, all of the suggestions works the magic.

I've used the Munger in my script and works perfect.

Thanks a lot

Tomasz

Extract text from string

Re: Extract text from string

Re: Extract text from string

Re: Extract text from string

Re: Extract text from string

Re: Extract text from string

Re: Extract text from string

Re: Extract text from string

Re: Extract text from string

Re: Extract text from string

Re: Extract text from string

Recommended Articles

Get Going with JMP: Essentials for Using JMP

New Features in Process Screening for JMP 17

Manage Limits for JMP 17

EWMA New Features in JMP 17

CUSUM New Features in JMP 17