cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
JMP is taking Discovery online, April 16 and 18. Register today and join us for interactive sessions featuring popular presentation topics, networking, and discussions with the experts.
Choose Language Hide Translation Bar
agaddis
Level I

Count number of occurrences of specific words in a string

Hello I'm looking to create a new column in my dataset that counts the number of times a specfic word appears in a string. The words are separated by "*" in each row. 

Example table:

Animal List

1  dog*dog*cat*bird*dog

2  dog*dog*cat*bird*dog*dog

 

 

Output should be

Animal List                                        Dog Count

1  dog*dog*cat*bird*dog                         3

2  dog*dog*cat*bird*dog*dog                  4

 

 

Can provide additional information if needed.

17 REPLIES 17
Craige_Hales
Super User

Re: Count number of occurrences of specific words in a string

It looks like your data is numeric (base 10 already!) so you might need to use char(value) to get it back to a string representation.

Craige
ian_jmp
Staff

Re: Count number of occurrences of specific words in a string

Please find one of the alternatives. Run the script to make a table.

New Table( "Counting Ones",
	Add Rows( 5 ),
	New Column( "Number",
		Numeric,
		"Continuous",
		Format( "Best", 17 ),
		Set Values( [111000010100001, 10101, 11111, 0, 101010101] ),
		Set Display Width( 132 )
	),
	New Column( "Count of One",
		Numeric,
		"Ordinal",
		Format( "Best", 12 ),
		Formula(
			Length( Char( :Number ) ) - Length(
				Substitute( Char( :Number ), "1", "" )
			)
		),
		Set Display Width( 102 )
	)
)
hogi
Level XI

Re: Count number of occurrences of specific words in a string

many thanks, @ian_jmp .

Actually, I did not expect the Length - Length to be 4x  faster than N rows(Loc ())...

 

// https://community.jmp.com/t5/Discussions/Count-number-of-occurrences-of-specific-words-in-a-string/td-p/47053

dt = Open( "$SAMPLE_DATA/Wafer Stacked.jmp" );

	For Each Row(
		dt,
		dt:Lot_Wafer Label[] = Substitute( dt:Lot_Wafer Label, "1", "xxx_" )
	);

dt_long=dt << Concatenate(dt,dt,dt,dt);

t0=hptime();
dt_long <<new column("new", set each value((Length(:Lot_Wafer Label)-Length(Substitute(:Lot_Wafer Label,"xxx","")))/3));
Write("via Length - Length:\!t",(hptime() - t0)/1000000,"s\!n");

t0=hptime();
dt_long << new column("new", set each value(N Rows(loc(Words(:Lot_Wafer Label,"_"),"xxx"))));
Write ("via N Rows(Loc()):\!t", (hptime() - t0)/1000000,"s\!n"); 

 

hogi_1-1708163913574.png

 

I added this topic to  Tips and Tricks - best practice with JMP/JSL 

hogi
Level XI

Re: Count number of occurrences of specific words in a string

As I won't remember the  trick in 6 months - and as I don't want to invest the time to tell it every colleague, I tried to set up a custom function:

 

Add Custom Functions(
	New Custom Function(
		"test",
		"countWord",
		Function( {myString, myWord},
			If( Length( myWord ) > 0,
				Return( (Length( myString ) - Length( Substitute( myString, myWord, "" ) )) / Length( myWord ) ),
				Return( 0 )
			)
		),
		<<Description( "count words in a String" ),
		<<Example( Expr( test:countWord( "cat, dog, Dogs, cat, Cat", "cat" ) ) )
	)
);

t0=hptime();
dt_long << new column("new", set each value(test:countWord(:Lot_Wafer Label,"xxx")));
Write ("via custom function:\!t",(hptime() - t0)/1000000,"s\!n");

unfortunately:

hogi_0-1708163680692.png

 

How do I have to adjust the custom function to make it as fast as the original version by @ian_jmp ?
[removing the If(...) helps a bit - but how to get it < 1s?]

ms
Super User (Alumni) ms
Super User (Alumni)

Re: Count number of occurrences of specific words in a string

Two more alternatives.

Use the Words() function with an empty string as delimiter:

 

b = 111001010;
N Row(Loc(Words(Char(b), ""), "1"));

Or just with math (i.e. without converting number into string). For example:

 

 

b = 111001010;
Sum(Mod(Round(b / 10 ^ (Floor(Log10(b + 1)) :: 0)), 10));

 

 

 

Craige_Hales
Super User

Re: Count number of occurrences of specific words in a string

I like that last one! It looks like it will actually work up to 16 bits; 17 consecutive ones can't be stored in a double precision number. The :: operator makes a matrix, and there is a lot of matrix math after that.

 

The words function might be the best one for general use...being easy to explain...

 

Craige
Onjai
Level III

Re: Count number of occurrences of specific words in a string

Hi,

How would I change the function to count number of occurrences of both dog and cat?

I tried the Or, but this does not work properly.

Thank you

Onjai

txnelson
Super User

Re: Count number of occurrences of specific words in a string

N Rows( Loc( Words( :Column 1, "*" ), "dog" ) ) +
N Rows( Loc( Words( :Column 1, "*" ), "cat" ) )
Jim