cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Check out the JMP® Marketplace featured Capability Explorer add-in
Choose Language Hide Translation Bar
agaddis
Level I

Count number of occurrences of specific words in a string

Hello I'm looking to create a new column in my dataset that counts the number of times a specfic word appears in a string. The words are separated by "*" in each row. 

Example table:

Animal List

1  dog*dog*cat*bird*dog

2  dog*dog*cat*bird*dog*dog

 

 

Output should be

Animal List                                        Dog Count

1  dog*dog*cat*bird*dog                         3

2  dog*dog*cat*bird*dog*dog                  4

 

 

Can provide additional information if needed.

17 REPLIES 17
Craige_Hales
Super User

Re: Count number of occurrences of specific words in a string

It looks like your data is numeric (base 10 already!) so you might need to use char(value) to get it back to a string representation.

Craige
ian_jmp
Level X

Re: Count number of occurrences of specific words in a string

Please find one of the alternatives. Run the script to make a table.

New Table( "Counting Ones",
	Add Rows( 5 ),
	New Column( "Number",
		Numeric,
		"Continuous",
		Format( "Best", 17 ),
		Set Values( [111000010100001, 10101, 11111, 0, 101010101] ),
		Set Display Width( 132 )
	),
	New Column( "Count of One",
		Numeric,
		"Ordinal",
		Format( "Best", 12 ),
		Formula(
			Length( Char( :Number ) ) - Length(
				Substitute( Char( :Number ), "1", "" )
			)
		),
		Set Display Width( 102 )
	)
)
hogi
Level XII

Re: Count number of occurrences of specific words in a string

many thanks, @ian_jmp .

Actually, I did not expect the Length - Length to be 4x  faster than N rows(Loc ())...

 

// https://community.jmp.com/t5/Discussions/Count-number-of-occurrences-of-specific-words-in-a-string/td-p/47053

dt = Open( "$SAMPLE_DATA/Wafer Stacked.jmp" );

	For Each Row(
		dt,
		dt:Lot_Wafer Label[] = Substitute( dt:Lot_Wafer Label, "1", "xxx_" )
	);

dt_long=dt << Concatenate(dt,dt,dt,dt);

t0=hptime();
dt_long <<new column("new", set each value((Length(:Lot_Wafer Label)-Length(Substitute(:Lot_Wafer Label,"xxx","")))/3));
Write("via Length - Length:\!t",(hptime() - t0)/1000000,"s\!n");

t0=hptime();
dt_long << new column("new", set each value(N Rows(loc(Words(:Lot_Wafer Label,"_"),"xxx"))));
Write ("via N Rows(Loc()):\!t", (hptime() - t0)/1000000,"s\!n"); 

 

hogi_1-1708163913574.png

 

I added this topic to  Tips and Tricks - best practice with JMP/JSL 

hogi
Level XII

Re: Count number of occurrences of specific words in a string

As I won't remember the  trick in 6 months - and as I don't want to invest the time to tell it every colleague, I tried to set up a custom function:

 

Add Custom Functions(
	New Custom Function(
		"test",
		"countWord",
		Function( {myString, myWord},
			If( Length( myWord ) > 0,
				Return( (Length( myString ) - Length( Substitute( myString, myWord, "" ) )) / Length( myWord ) ),
				Return( 0 )
			)
		),
		<<Description( "count words in a String" ),
		<<Example( Expr( test:countWord( "cat, dog, Dogs, cat, Cat", "cat" ) ) )
	)
);

t0=hptime();
dt_long << new column("new", set each value(test:countWord(:Lot_Wafer Label,"xxx")));
Write ("via custom function:\!t",(hptime() - t0)/1000000,"s\!n");

unfortunately:

hogi_0-1708163680692.png

 

How do I have to adjust the custom function to make it as fast as the original version by @ian_jmp ?
[removing the If(...) helps a bit - but how to get it < 1s?]

ms
Super User (Alumni) ms
Super User (Alumni)

Re: Count number of occurrences of specific words in a string

Two more alternatives.

Use the Words() function with an empty string as delimiter:

 

b = 111001010;
N Row(Loc(Words(Char(b), ""), "1"));

Or just with math (i.e. without converting number into string). For example:

 

 

b = 111001010;
Sum(Mod(Round(b / 10 ^ (Floor(Log10(b + 1)) :: 0)), 10));

 

 

 

Craige_Hales
Super User

Re: Count number of occurrences of specific words in a string

I like that last one! It looks like it will actually work up to 16 bits; 17 consecutive ones can't be stored in a double precision number. The :: operator makes a matrix, and there is a lot of matrix math after that.

 

The words function might be the best one for general use...being easy to explain...

 

Craige
Onjai
Level III

Re: Count number of occurrences of specific words in a string

Hi,

How would I change the function to count number of occurrences of both dog and cat?

I tried the Or, but this does not work properly.

Thank you

Onjai

txnelson
Super User

Re: Count number of occurrences of specific words in a string

N Rows( Loc( Words( :Column 1, "*" ), "dog" ) ) +
N Rows( Loc( Words( :Column 1, "*" ), "cat" ) )
Jim