JMP for Mac
Version 17.2.0 (701896)
I have a column "Data" with rows of number_number like this:
1_3
3_7
6_5
...
I want to test this column to see if some of the rows contain values from a list of values and then flag or label the rows which do.
I know I could construct an elaborate formula like this:
If( :Die X_Y == "3_9" | :Die X_Y == "3_4",
1,
0
)But that's very laborious for long lists of different values to check. I'd like to just be able to paste a list of values to check for in a formula, delimited by a space or comma.
I found the function "Contains Item." It works fine for one item but the help entry doesn't offer any insight on how to construct the "list" option. I tried this (specifying the optional delimiter as a space):
If( Contains Item( :Die X_Y, "3_4 3_9", " " ),
"exclude",
"not exclude"
)and this:
If( Contains Item( :Data, "3_4,3_9", "," ),
"exclude",
"not exclude"
)I understand the default delimiter is a comma but wanted to enforce to find anything that works. Neither Contains Item function is returning the sought-for behavior (logical 1 if there is a match, 0 if not). If there is a better function to use please advise.
This is a generalized problem I'd like to be able to solve -- inspecting a column to find matches with long lists of items.
Thanks
the reading is: 3 arguments
@datanaut wrote:
So (to my simplistic reading) either x or the {list} could contain the words to be searched or the list of words to compare it with.
-> yes.
included a Column as in Jarmo's example
For a column formula, this is always possible, along the idea:
For every row, JMP will take automatically the entry of the column in the respective row.
When you use Formula editor, you do this trick all the time - just without noting.
A nice side effect:
This discussion made me think:
does Contains also increase the speed of Col .. aggregations?
And believe it or not - it does. wow, it does!!!!
Any clue why the character comparison is SOOO slow?
I wondered if deleting rows without selecting them is much faster:
dt << select rows(myRows) << delete rows();
dt << delete rows();
[actually, I wondered why the corresponding << exclude(myRows) is missing.]
So, here is another speed test:
Names Default to here(1);
get timing = Function( {myExpr},
dt = New Table( "test", add rows( 10000000 ), New Column( "rnd", set each value( Random Integer( 1000 ) ) ) );
myRows = Random Shuffle( Index( 1, N Rows() ) )[Index( 1, N Rows() / 2 )];
t0 = HP Time();
myExpr;
//print(n rows()); // same timing without this line
If( N arg( myExpr ) > 2,
Print( Char( Head( Arg( myExpr, 2 ) ) ) || " + " || Char( Head( Arg( myExpr, 3 ) ) ) ),
Print( Char( Head( Arg( myExpr, 2 ) ) ) );
);
Print( Char( (HP Time() - t0) / 1000000 ) || " s\!n" );
Close( dt, noSave );
);
get timing( Expr( dt << select rows( myRows ) << delete rows ) );
get timing( Expr( dt << delete rows( myRows ) ) );
get timing( Expr( dt << select rows( myRows ) << exclude( 1 ) ) );
Print("-------------");
get timing( Expr( dt << select where( :rnd > 500 ) ) );
get timing( Expr( dt << get rows where( :rnd > 500 ) ) );
get timing( Expr( myRows = Where( dt, :rnd > 500 ) ) );
Print("_____________");
So, deleting rows without selecting them is just marginally faster.
-> no wonder there is no corresponding function for Exclude.
by the way:
Print("_________");
Print("__X______");
Print("X________");
why?
My guess is that Print is doing some parsing/evaluations as it is meant to be used for displaying values of arguments. And when you start -"variable" with two underscores it is considered hidden -> possibly some interaction with Print() to not make it leak those values.
show("__");
myString="__";
show(myString);
show("_"|| "_");
__x="secret";
show(__x);
show(name Expr(__x));
show(Eval(__x));
a second layer of ulra-tight protection ; )