JMP for Mac
Version 17.2.0 (701896)
I have a column "Data" with rows of number_number like this:
1_3
3_7
6_5
...
I want to test this column to see if some of the rows contain values from a list of values and then flag or label the rows which do.
I know I could construct an elaborate formula like this:
If( :Die X_Y == "3_9" | :Die X_Y == "3_4",
1,
0
)But that's very laborious for long lists of different values to check. I'd like to just be able to paste a list of values to check for in a formula, delimited by a space or comma.
I found the function "Contains Item." It works fine for one item but the help entry doesn't offer any insight on how to construct the "list" option. I tried this (specifying the optional delimiter as a space):
If( Contains Item( :Die X_Y, "3_4 3_9", " " ),
"exclude",
"not exclude"
)and this:
If( Contains Item( :Data, "3_4,3_9", "," ),
"exclude",
"not exclude"
)I understand the default delimiter is a comma but wanted to enforce to find anything that works. Neither Contains Item function is returning the sought-for behavior (logical 1 if there is a match, 0 if not). If there is a better function to use please advise.
This is a generalized problem I'd like to be able to solve -- inspecting a column to find matches with long lists of items.
Thanks
Usually in cases like this I tend to use Words() to split the string into a list and then compare using Contains() (or in some rare cases Contains Item()). But Contains Item() should also work and you just seem to have your parameters in incorrect order
New Table("Untitled",
Add Rows(3),
Compress File When Saved(1),
New Column("Column 1", Character, "Nominal", Set Values({"1_3", "3_7", "6_5"})),
New Column("Column 2",
Character,
"Nominal",
Formula(
If(Contains Item("1_3 3_9", :Column 1, " "),
"exclude",
"not exclude"
)
),
Set Selected
)
)
Second example from Scripting Index
or JSL Syntax Reference have okeish examples https://www.jmp.com/support/help/en/18.0/#page/jmp/character-functions-2.shtml?os=win&source=applica... but the documentation is bit lacking.
Here is the approach I would take
names default to here(1);
dt=current data table();
dt << clear select;
find = {"3_1","4_4"};
for each( {compare}, find,
dt<<select where( dt:Die X_Y == compare, current selection("Extend"));
);
Once the rows are selected, then excluding, deleting etc. can be done.
Usually in cases like this I tend to use Words() to split the string into a list and then compare using Contains() (or in some rare cases Contains Item()). But Contains Item() should also work and you just seem to have your parameters in incorrect order
New Table("Untitled",
Add Rows(3),
Compress File When Saved(1),
New Column("Column 1", Character, "Nominal", Set Values({"1_3", "3_7", "6_5"})),
New Column("Column 2",
Character,
"Nominal",
Formula(
If(Contains Item("1_3 3_9", :Column 1, " "),
"exclude",
"not exclude"
)
),
Set Selected
)
)
Second example from Scripting Index
or JSL Syntax Reference have okeish examples https://www.jmp.com/support/help/en/18.0/#page/jmp/character-functions-2.shtml?os=win&source=applica... but the documentation is bit lacking.
I did some timing tests comparing Jarmo's solution compared to my solution, and the Contains Item solution is spectacularly faster than the Select Where solution.
-
names default to here(1);
dt = Open( "$SAMPLE_DATA/Wafer Stacked.jmp" );
New Column( "x_y",
Character,
Formula( Concat Items( {Char( :X_Die ), Char( :Y_Die )}, "_" ) )
);
wait(0);
t0=hptime();
New Column( "contains item",
Character,
Formula( If( Contains Item( "1_3 3_9", :x_y, " " ), "exclude", "not exclude" ) )
);
dt << run formulas();
t= (hptime()-t0)/1000000;
print("formula - contains item", t);
t0=hptime();
New Column( "contains item v2",
Character,
Formula( If( Contains Item( :x_y, {"3_1","4_4"}) , "exclude", "not exclude" ) )
);
dt << run formulas();
t= (hptime()-t0)/1000000;
print("1) formula - contains item v2", t);
t0=hptime();
New Column( "get rows where",
Character,
set each value("not excluded") );
myrows=[];
for each( {compare}, {"3_1","4_4"},
myrows = V concat to (myrows, dt << get rows where(:x_y == compare));
);
dt << select rows(myrows); // just for fun - it's fast
dt[myrows,"get rows where"] = Repeat({"excluded"},n items(myrows) );
t= (hptime()-t0)/1000000;
print("2) get rows where", t);
t0=hptime();
for each( {compare}, {"3_1","4_4"},
dt<<select where( dt:X_Y == compare, current selection("Extend"));
);
t= (hptime()-t0)/1000000;
print("select where - extend", t);
t0=hptime();
for each( {compare}, {"3_1","4_4"},
dt<<select where( dt:X_Y == compare);
);
t= (hptime()-t0)/1000000;
print("select where", t);
t0=hptime();
New Column( "select where + contains item",
Character,
set each value("not excluded") );
dt<<select where( Contains Item( "1_3 3_9", :x_y, " " ));
myrows = dt << get selected rows ();
dt[myrows,"select where + contains item"] = Repeat({"excluded"},n items(myrows) );
t= (hptime()-t0)/1000000;
print("3) select where + contains item (v1)", t);
t0=hptime();
New Column( "get rows where & contains item",
Character,
set each value("not excluded") );
myrows = dt << get rows where(Contains Item( "1_3 3_9", :x_y, " " ));
dt[myrows,"get rows where & contains item"] = Repeat({"excluded"},n items(myrows) );
t= (hptime()-t0)/1000000;
print("4) get rows where + contains item (v1)", t);
I think it's worth adding that the Where() function is much faster (> an order of magnitude!) than <<Select Where or <<Get Rows Where.
Names Default to Here( 1 );
dt = Data Table( "SATByYear" );
Show( N Rows( dt ) ); //N Rows(dt) = 21245376;
now = HP Time();
dt << Select Where( :State == "Wisconsin" | :State == "Alabama" );
Wait( 0 );
total = HP Time() - now;
Print( "<<Select Where took " || Char( total / 1e6 ) || " seconds." ); //"<<Select Where took 3.241594 seconds."
dt << Clear Select;
now = HP Time();
wh = Where( dt, :State == "Wisconsin" | :State == "Alabama" );
dt << Select Rows( wh );
Wait( 0 );
total = HP Time() - now;
Print( "Where() took " || Char( total / 1e6 ) || " seconds." ); //"Where() took 0.284793 seconds."
dt << Clear Select;
now = HP Time();
wh = dt << Get Rows Where( dt, :State == "Wisconsin" | :State == "Alabama" );
dt << Select Rows( wh );
Wait( 0 );
total = HP Time() - now;
Print( "<<Get Rows Where took " || Char( total / 1e6 ) || " seconds." ); //"<<Get Rows Where took 7.059092 seconds."
Ah, great!
In the beginning I had quite bad experience with Where() - but cannot remember why. *)
Maybe I used a wrong syntax - or the issue got fixed in the mean time ...
*) edit:
bad experience: doesn't work with contains()
Thanks @jthi and @txnelson and @hogi and @mmarchandFSLR for answers and effort to my question.
Jarmo found my problem -- switching the list and item in my formula. Thanks for the bit of code illustrating the correct usage.
re: "Documentation a bit lacking"
I concur. The instructions and example in the scripting index do seem a little ambiguous to me. Here is the entry for the scripting index:
"
Contains Item(x, <item | {list} | pattern>, <delimiter>)
...
Returns a Boolean that indicates whether the word (item), one of a list of words (list), or pattern (pattern) matches one of the words in the text represented by x. Words are delimited by the characters in the optional quoted delimiter (delimiter) string. A comma is the default delimiter. Blanks are trimmed from the ends of each extracted word from the input text string (x).
"
So (to my simplistic reading) either x or the {list} could contain the words to be searched or the list of words to compare it with. A few more examples showing some alternatives that included a Column as in Jarmo's example might help clarify.
I've put the solution to use already! Thanks again.