cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
Onjai
Level III

Searching and counting substrings within a text field

JMP Add-in did a great job importing hundreds of MSWord documents into a table. 

I have a column called "comment (original)" that I wish to count the number of "no comment" phrases. 

 

When I use the formula,

N Rows(
Loc(
Words( :"comment (white space collapsed)"n, " " ),
"comments"
)
)

it returns the number of occurrences for the term "comments".  However, the Loc function will not accept "no comments" as an argument.

 

N Rows(
	Loc(
		Words( :"comment (white space collapsed)"n, " " ),
		"comments"
	)
)

JMP v17.1

 

Interestingly, the formula works correctly if the comment column was recoded to collapse whitespace.  Not sure why.

Any thoughts or assistance would be appreciated!

Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
jthi
Super User

Re: Searching and counting substrings within a text field

If you want to use Loc, you have to figure out other way of splitting the string into a list than Words(). With Words() you won't get "no comments" string in the list as those will be split into "no" and "comments.

 

One option would be to substitute all "no comments" with empty strings, then compare length of the original and new string and divide with the length of comparison string. Something like this might work:

Names Default To Here(1);

str = "Friday, December 9, 2022 10/19/2022 9:37 AM flagged with a comment 10/21/2022 6:15PM : no comment 11/15/2022 1:03 PM 3: no comments 11/16/2022 11:06 AM 4: no comments 11/18/2022 2:49 PM : no comments 11/30/2022 3:52 PM : comments 12/09/2022 9:07 AM : no comment";
str_to_search = "no comment";
new_str = Substitute(str, str_to_search, "");

matches = (Length(str) - Length(new_str)) / Length(str_to_search);
-Jarmo

View solution in original post

5 REPLIES 5
jthi
Super User

Re: Searching and counting substrings within a text field

If you want to use Loc, you have to figure out other way of splitting the string into a list than Words(). With Words() you won't get "no comments" string in the list as those will be split into "no" and "comments.

 

One option would be to substitute all "no comments" with empty strings, then compare length of the original and new string and divide with the length of comparison string. Something like this might work:

Names Default To Here(1);

str = "Friday, December 9, 2022 10/19/2022 9:37 AM flagged with a comment 10/21/2022 6:15PM : no comment 11/15/2022 1:03 PM 3: no comments 11/16/2022 11:06 AM 4: no comments 11/18/2022 2:49 PM : no comments 11/30/2022 3:52 PM : comments 12/09/2022 9:07 AM : no comment";
str_to_search = "no comment";
new_str = Substitute(str, str_to_search, "");

matches = (Length(str) - Length(new_str)) / Length(str_to_search);
-Jarmo
Onjai
Level III

Re: Searching and counting substrings within a text field

Interesting approach.  Thank you.

 

When I ran the script, it returned a count of 5.  Three (3) of the cases are "no comments" and two (2) are "no comment".

Not sure if or how the Substitute function allowed the plural? i.e. str_to_search = "no comment".   Also the Length function would have been 1 more for the plural "no comments".

I am curious to know if there is another approach to validate this?

jthi
Super User

Re: Searching and counting substrings within a text field

I expected that no comment and no comments would be the same and because no comment is in both of those, the replacement will work (for no comments it will just leave the s there). If you need more complex replacements Regex would be a good option while using GLOBALREPLACE.

-Jarmo
Onjai
Level III

Re: Searching and counting substrings within a text field

Got it.  I saw that in the script when hovering over the variables.  This works.  The only issue is that it is case sensitive.

I may run this as a column formula.  Adding 2 more columns for the case types and sum the columns by row.

(Length( :"comment (white space collapsed)"n ) - Length(
	Substitute( :"comment (white space collapsed)"n, "no comment", "" )
)) / Length( "no comment" )

Thank you.  I'm back in business.

Cheers

Onjai.

jthi
Super User

Re: Searching and counting substrings within a text field

Substitute does allow you to use << IGNORECASE

Names Default To Here(1);
Substitute("Apple,APPLE,apple", "apple", "orange", <<IGNORECASE);

jthi_0-1685390944820.png

https://www.jmp.com/support/help/en/17.0/#page/jmp/list-functions.shtml?os=win&source=application#ww...

-Jarmo