Hello I'm looking to create a new column in my dataset that counts the number of times a specfic word appears in a string. The words are separated by "*" in each row.
Example table:
Animal List
1 dog*dog*cat*bird*dog
2 dog*dog*cat*bird*dog*dog
Output should be
Animal List Dog Count
1 dog*dog*cat*bird*dog 3
2 dog*dog*cat*bird*dog*dog 4
Can provide additional information if needed.
It looks like your data is numeric (base 10 already!) so you might need to use char(value) to get it back to a string representation.
Please find one of the alternatives. Run the script to make a table.
New Table( "Counting Ones",
Add Rows( 5 ),
New Column( "Number",
Numeric,
"Continuous",
Format( "Best", 17 ),
Set Values( [111000010100001, 10101, 11111, 0, 101010101] ),
Set Display Width( 132 )
),
New Column( "Count of One",
Numeric,
"Ordinal",
Format( "Best", 12 ),
Formula(
Length( Char( :Number ) ) - Length(
Substitute( Char( :Number ), "1", "" )
)
),
Set Display Width( 102 )
)
)
many thanks, @ian_jmp .
Actually, I did not expect the Length - Length to be 4x faster than N rows(Loc ())...
// https://community.jmp.com/t5/Discussions/Count-number-of-occurrences-of-specific-words-in-a-string/td-p/47053
dt = Open( "$SAMPLE_DATA/Wafer Stacked.jmp" );
For Each Row(
dt,
dt:Lot_Wafer Label[] = Substitute( dt:Lot_Wafer Label, "1", "xxx_" )
);
dt_long=dt << Concatenate(dt,dt,dt,dt);
t0=hptime();
dt_long <<new column("new", set each value((Length(:Lot_Wafer Label)-Length(Substitute(:Lot_Wafer Label,"xxx","")))/3));
Write("via Length - Length:\!t",(hptime() - t0)/1000000,"s\!n");
t0=hptime();
dt_long << new column("new", set each value(N Rows(loc(Words(:Lot_Wafer Label,"_"),"xxx"))));
Write ("via N Rows(Loc()):\!t", (hptime() - t0)/1000000,"s\!n");
I added this topic to Tips and Tricks - best practice with JMP/JSL
As I won't remember the trick in 6 months - and as I don't want to invest the time to tell it every colleague, I tried to set up a custom function:
Add Custom Functions(
New Custom Function(
"test",
"countWord",
Function( {myString, myWord},
If( Length( myWord ) > 0,
Return( (Length( myString ) - Length( Substitute( myString, myWord, "" ) )) / Length( myWord ) ),
Return( 0 )
)
),
<<Description( "count words in a String" ),
<<Example( Expr( test:countWord( "cat, dog, Dogs, cat, Cat", "cat" ) ) )
)
);
t0=hptime();
dt_long << new column("new", set each value(test:countWord(:Lot_Wafer Label,"xxx")));
Write ("via custom function:\!t",(hptime() - t0)/1000000,"s\!n");
unfortunately:
How do I have to adjust the custom function to make it as fast as the original version by @ian_jmp ?
[removing the If(...) helps a bit - but how to get it < 1s?]
Two more alternatives.
Use the Words() function with an empty string as delimiter:
b = 111001010;
N Row(Loc(Words(Char(b), ""), "1"));
Or just with math (i.e. without converting number into string). For example:
b = 111001010;
Sum(Mod(Round(b / 10 ^ (Floor(Log10(b + 1)) :: 0)), 10));
I like that last one! It looks like it will actually work up to 16 bits; 17 consecutive ones can't be stored in a double precision number. The :: operator makes a matrix, and there is a lot of matrix math after that.
The words function might be the best one for general use...being easy to explain...
Hi,
How would I change the function to count number of occurrences of both dog and cat?
I tried the Or, but this does not work properly.
Thank you
Onjai
N Rows( Loc( Words( :Column 1, "*" ), "dog" ) ) +
N Rows( Loc( Words( :Column 1, "*" ), "cat" ) )