cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
swiergi11
Level III

Extract text from string

Hi,

If possible I would like extract particular text from the strings:

\\swirt.misc@CAS-K-CLERK\K10447-Wit_OG lo hi TT-YYT (CTL1083.1)_172-7-10053 -303

\\swirt.misc@CAS-K-CLERK\K10445-Wit_OG lo hi ES WERT (CIL0858.1)_172-7-10046-600

\\swirt.misc@CAS-K-CLERK\K10437-Wit_OG lo hi TT-YYT (CTL1077.1)_172-7-10053-303

\\swirt.misc@CAS-K-CLERK\K10420-Wit_OG lo hi ES TRTR (CMA0851.1)_172-7-10047-600

\\swirt.misc@CAS-K-CLERK\K10377-Wit_OG_lo hi_ES_SI (CMS0499.1)_ 172-7-10041-600

\\swirt.misc@CAS-K-CLERK\K10451-Wit_OG lo hi TT-YYT (CTL1078

\\swirt.misc@CAS-K-CLERK\K10468-Wit_OG lo hi YYT DEV-020856 (CTLR1073

\\swirt.misc@CAS-K-CLERK\K10022 -Wit_OG lo hi YYT DEV-020414 (CTLR0999.1)

\\swirt.misc@CAS-K-CLERK\K10030-Wit_OG lo hi TT-YYT (CTL1008.1)

 

and I want to extract the IDs from above:

CTL1083.1

CIL0858.1

CTL1077.1

CMA0851.1

CMS0499.1

CTL1078

CTLR1073

CTLR0999.1

CTL1008.1

 

Any help appreciated.

Thanks,

Tomasz

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Byron_JMP
Staff

Re: Extract text from string

The solution from @pmroz is very robust, nice!

A little different approach might be to use formulas in the data table

 

Screen Shot 2022-01-11 at 8.29.19 AM.png

 The location of the start delimiter could be located with a "Contains" argument

Contains(:text, "(")

The location of the end delimiter could be located the same way

Contains( :text, ")" )

An obscure but really fun argument, "munger", could be used to extract the string between the delimiter locations. 

In this case we want the string after the delimiter character, that ends before the ending delimiter character, so 1 is added to the start and subtracted from the end.

The function for munger in this case uses the arguments, whole text, where to start, how many characters to return. Munger can also do some other interesting things, like find and replace.

Munger( :text, :start + 1, (:stop - :start) - 1 )

Wrapping all that up into one formula would look like this

Munger(
	:text,
	Contains( :text, "(" ) + 1,
	(Contains( :text, ")" ) - Contains( :text, "(" )) - 1
)

 

 

 

JMP Systems Engineer, Health and Life Sciences (Pharma)

View solution in original post

11 REPLIES 11
pmroz
Super User

Re: Extract text from string

Use the WORDS function with ( and ) as delimiters.  Use the second element in the returned list.  Run the code below and check the output in the log window.

dt = New Table( "Test Words", Add Rows( 9 ),
	New Column( "Column 1", Character, "Nominal",
		Set Values(
			{
			"\\swirt.misc@CAS-K-CLERK\K10447-Wit_OG lo hi TT-YYT (CTL1083.1)_172-7-10053 -303",
			"\\swirt.misc@CAS-K-CLERK\K10445-Wit_OG lo hi ES WERT (CIL0858.1)_172-7-10046-600",
			"\\swirt.misc@CAS-K-CLERK\K10437-Wit_OG lo hi TT-YYT (CTL1077.1)_172-7-10053-303",
			"\\swirt.misc@CAS-K-CLERK\K10420-Wit_OG lo hi ES TRTR (CMA0851.1)_172-7-10047-600",
			"\\swirt.misc@CAS-K-CLERK\K10377-Wit_OG_lo hi_ES_SI (CMS0499.1)_ 172-7-10041-600",
			"\\swirt.misc@CAS-K-CLERK\K10451-Wit_OG lo hi TT-YYT (CTL1078",
			"\\swirt.misc@CAS-K-CLERK\K10468-Wit_OG lo hi YYT DEV-020856 (CTLR1073",
			"\\swirt.misc@CAS-K-CLERK\K10022 -Wit_OG lo hi YYT DEV-020414 (CTLR0999.1)",
			"\\swirt.misc@CAS-K-CLERK\K10030-Wit_OG lo hi TT-YYT (CTL1008.1)"}
		),
		Set Display Width( 583 )
	)
);

for (i = 1, i <= nrows(dt), i++,
	one_value = dt:column 1[i];
	
	word_list = words(one_value, "()");
	print(word_list[2]);
);
Byron_JMP
Staff

Re: Extract text from string

The solution from @pmroz is very robust, nice!

A little different approach might be to use formulas in the data table

 

Screen Shot 2022-01-11 at 8.29.19 AM.png

 The location of the start delimiter could be located with a "Contains" argument

Contains(:text, "(")

The location of the end delimiter could be located the same way

Contains( :text, ")" )

An obscure but really fun argument, "munger", could be used to extract the string between the delimiter locations. 

In this case we want the string after the delimiter character, that ends before the ending delimiter character, so 1 is added to the start and subtracted from the end.

The function for munger in this case uses the arguments, whole text, where to start, how many characters to return. Munger can also do some other interesting things, like find and replace.

Munger( :text, :start + 1, (:stop - :start) - 1 )

Wrapping all that up into one formula would look like this

Munger(
	:text,
	Contains( :text, "(" ) + 1,
	(Contains( :text, ")" ) - Contains( :text, "(" )) - 1
)

 

 

 

JMP Systems Engineer, Health and Life Sciences (Pharma)
pmroz
Super User

Re: Extract text from string

I use SUBSTR for similar effect.

for (i = 1, i <= nrows(dt), i++,
	one_value = dt:column 1[i];
	start  = contains(one_value, "(") + 1;
	end    = contains(one_value, ")") - 1;
	len    = end - start + 1;
	result = substr(one_value, start, len);
	print(result);
);

Re: Extract text from string

In case you are into regular expressions...   (Note: cleaned up hastily written code and replaced listing below on 1/12/22)

Names Default To Here( 1 );

// make list of example original strings
sample =
"\\swirt.misc@CAS-K-CLERK\K10447-Wit_OG lo hi TT-YYT (CTL1083.1)_172-7-10053 -303

\\swirt.misc@CAS-K-CLERK\K10445-Wit_OG lo hi ES WERT (CIL0858.1)_172-7-10046-600

\\swirt.misc@CAS-K-CLERK\K10437-Wit_OG lo hi TT-YYT (CTL1077.1)_172-7-10053-303

\\swirt.misc@CAS-K-CLERK\K10420-Wit_OG lo hi ES TRTR (CMA0851.1)_172-7-10047-600

\\swirt.misc@CAS-K-CLERK\K10377-Wit_OG_lo hi_ES_SI (CMS0499.1)_ 172-7-10041-600

\\swirt.misc@CAS-K-CLERK\K10451-Wit_OG lo hi TT-YYT (CTL1078

\\swirt.misc@CAS-K-CLERK\K10468-Wit_OG lo hi YYT DEV-020856 (CTLR1073

\\swirt.misc@CAS-K-CLERK\K10022 -Wit_OG lo hi YYT DEV-020414 (CTLR0999.1)

\\swirt.misc@CAS-K-CLERK\K10030-Wit_OG lo hi TT-YYT (CTL1008.1)";
sample = Words( sample, "\!U000D" );
// Show( sample );
n = N Items( sample );

// iterate over list and return target strings
id = List();
For Each( {string}, sample, Insert Into( id, Regex( string, If( Contains( string, ")" ), "(.+\()(.+)(\).*)", "(.+\()(.+)" ), "\2" ) ) );

Show( id );

 

 

txnelson
Super User

Re: Extract text from string

I like the "Word()" function for such string extractions

Names Default To Here( 1 );
dt = New Table( "Test Words",
	Add Rows( 9 ),
	New Column( "Column 1",
		Character,
		"Nominal",
		Set Values(
			{"\\swirt.misc@CAS-K-CLERK\K10447-Wit_OG lo hi TT-YYT (CTL1083.1)_172-7-10053 -303",
			"\\swirt.misc@CAS-K-CLERK\K10445-Wit_OG lo hi ES WERT (CIL0858.1)_172-7-10046-600",
			"\\swirt.misc@CAS-K-CLERK\K10437-Wit_OG lo hi TT-YYT (CTL1077.1)_172-7-10053-303",
			"\\swirt.misc@CAS-K-CLERK\K10420-Wit_OG lo hi ES TRTR (CMA0851.1)_172-7-10047-600",
			"\\swirt.misc@CAS-K-CLERK\K10377-Wit_OG_lo hi_ES_SI (CMS0499.1)_ 172-7-10041-600",
			"\\swirt.misc@CAS-K-CLERK\K10451-Wit_OG lo hi TT-YYT (CTL1078",
			"\\swirt.misc@CAS-K-CLERK\K10468-Wit_OG lo hi YYT DEV-020856 (CTLR1073",
			"\\swirt.misc@CAS-K-CLERK\K10022 -Wit_OG lo hi YYT DEV-020414 (CTLR0999.1)",
			"\\swirt.misc@CAS-K-CLERK\K10030-Wit_OG lo hi TT-YYT (CTL1008.1)"}
		),
		Set Display Width( 583 )
	)
);

For( i = 1, i <= N Rows( dt ), i++,
	Print( Word( 2, dt:column 1[i], "()" ) )
);
Jim
Byron_JMP
Staff

Re: Extract text from string

@pmroz , your code always looks so much less haphazard than mine ; ) 

Substring (substr) is a more straightforward approach, I just can't resist a good case for munger.

 

Craig Hales probably has some even more efficient approaches using pattern matching. I'm sure there are already some examples in the community pages or his blog page.

JMP Systems Engineer, Health and Life Sciences (Pharma)
pmroz
Super User

Re: Extract text from string

Wow lots of ways to slice a loaf of bread in JMP.

Re: Extract text from string

Pattern matching you say? How about this

Names Default to Here(1);

sample = "\\swirt.misc@CAS-K-CLERK\K10447-Wit_OG lo hi TT-YYT (CTL1083.1)_172-7-10053 -303
\\swirt.misc@CAS-K-CLERK\K10445-Wit_OG lo hi ES WERT (CIL0858.1)_172-7-10046-600
\\swirt.misc@CAS-K-CLERK\K10437-Wit_OG lo hi TT-YYT (CTL1077.1)_172-7-10053-303
\\swirt.misc@CAS-K-CLERK\K10420-Wit_OG lo hi ES TRTR (CMA0851.1)_172-7-10047-600
\\swirt.misc@CAS-K-CLERK\K10377-Wit_OG_lo hi_ES_SI (CMS0499.1)_ 172-7-10041-600
\\swirt.misc@CAS-K-CLERK\K10451-Wit_OG lo hi TT-YYT (CTL1078
\\swirt.misc@CAS-K-CLERK\K10468-Wit_OG lo hi YYT DEV-020856 (CTLR1073
\\swirt.misc@CAS-K-CLERK\K10022 -Wit_OG lo hi YYT DEV-020414 (CTLR0999.1)
\\swirt.misc@CAS-K-CLERK\K10030-Wit_OG lo hi TT-YYT (CTL1008.1)";

charSet = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890.";

idList = {};
Pat Match(sample,"(" + Pat Span(charSet)>>nextID + (")" | "\!r") + Pat Test(Insert Into(idList,nextID);0));

It's unfortunate there isn't more support documentation for learning how to use pattern matching. I was a "Why bother with pattern matching when you have Regex?" person until I started to gain a better understand and appreciation for it (thank you @Craige_Hales!). It is blazingly fast and compact. To explain the code above, Pat Match is looking for an open parenthesis followed by one or more characters from charSet followed by the close parenthesis or a carriage return. The results from Pat Span are stored in the variable nextID. Each time the pattern is found, Pat Test is used to insert it into idList. The 0 at the end of Pat Test lets you continue searching the input string from the position immediately after the last character in the matched string (but before the next character). If the last statement in Pat Test had evaluated to true (anything not 0) the search would have ended, returning just the first match.

swiergi11
Level III

Re: Extract text from string

very nice, all of the suggestions works the magic.

 

I've used the Munger in my script and works perfect.

 

Thanks a lot

Tomasz