cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
datanaut
Level III

Formula to check if any of a list of items is contained in the rows of a column

JMP for Mac 

Version 17.2.0 (701896)

 

I have a column "Data" with rows of number_number like this:

1_3

3_7

6_5

...

 

I want to test this column to see if some of the rows contain values from a list of values and then flag or label the rows which do.

 

I know I could construct an elaborate formula like this:

If( :Die X_Y == "3_9" | :Die X_Y == "3_4",
	1,
	0
)

But that's very laborious for long lists of different values to check.  I'd like to just be able to paste a list of values to check for in a formula, delimited by a space or comma.  

 

I found the function "Contains Item."  It works fine for one item but the help entry doesn't offer any insight on how to construct the "list" option.  I tried this (specifying the optional delimiter as a space):

If( Contains Item( :Die X_Y, "3_4 3_9", " " ),
	"exclude",
	"not exclude"
)

and this:

If( Contains Item( :Data, "3_4,3_9", "," ),
	"exclude",
	"not exclude"
)

I understand the default delimiter is a comma but wanted to enforce to find anything that works.  Neither Contains Item function is returning the sought-for behavior (logical 1 if there is a match, 0 if not).  If there is a better function to use please advise.  

 

This is a generalized problem I'd like to be able to solve -- inspecting a column to find matches with long lists of items.  

 

Thanks

 

1 ACCEPTED SOLUTION

Accepted Solutions
jthi
Super User

Re: Formula to check if any of a list of items is contained in the rows of a column

Usually in cases like this I tend to use Words() to split the string into a list and then compare using Contains() (or in some rare cases Contains Item()). But Contains Item() should also work and you just seem to have your parameters in incorrect order

New Table("Untitled",
	Add Rows(3),
	Compress File When Saved(1),
	New Column("Column 1", Character, "Nominal", Set Values({"1_3", "3_7", "6_5"})),
	New Column("Column 2",
		Character,
		"Nominal",
		Formula(
			If(Contains Item("1_3 3_9", :Column 1, " "),
				"exclude",
				"not exclude"
			)
		),
		Set Selected
	)
)

jthi_0-1731165057018.png

Second example from Scripting Index

jthi_1-1731165235628.png

jthi_2-1731165277236.png

or JSL Syntax Reference have okeish examples https://www.jmp.com/support/help/en/18.0/#page/jmp/character-functions-2.shtml?os=win&source=applica... but the documentation is bit lacking.

 

-Jarmo

View solution in original post

11 REPLIES 11
txnelson
Super User

Re: Formula to check if any of a list of items is contained in the rows of a column

Here is the approach I would take

names default to here(1);
dt=current data table();

dt << clear select;

find = {"3_1","4_4"};

for each( {compare}, find,
	dt<<select where( dt:Die X_Y == compare, current selection("Extend"));
);

Once the rows are selected, then excluding, deleting etc. can be done.

Jim
jthi
Super User

Re: Formula to check if any of a list of items is contained in the rows of a column

Usually in cases like this I tend to use Words() to split the string into a list and then compare using Contains() (or in some rare cases Contains Item()). But Contains Item() should also work and you just seem to have your parameters in incorrect order

New Table("Untitled",
	Add Rows(3),
	Compress File When Saved(1),
	New Column("Column 1", Character, "Nominal", Set Values({"1_3", "3_7", "6_5"})),
	New Column("Column 2",
		Character,
		"Nominal",
		Formula(
			If(Contains Item("1_3 3_9", :Column 1, " "),
				"exclude",
				"not exclude"
			)
		),
		Set Selected
	)
)

jthi_0-1731165057018.png

Second example from Scripting Index

jthi_1-1731165235628.png

jthi_2-1731165277236.png

or JSL Syntax Reference have okeish examples https://www.jmp.com/support/help/en/18.0/#page/jmp/character-functions-2.shtml?os=win&source=applica... but the documentation is bit lacking.

 

-Jarmo
txnelson
Super User

Re: Formula to check if any of a list of items is contained in the rows of a column

I did some timing tests comparing Jarmo's solution compared to my solution, and the Contains Item solution is spectacularly faster than the Select Where solution. 

Jim
hogi
Level XII

Re: Formula to check if any of a list of items is contained in the rows of a column

Thanks for the comment.
digging deeper - actually it's not contains item versus Select Where.

 

some findings:

  1. Contains item() is faster with the inverse "v2" syntax
  2. Select where() gets much faster by replacing it with get rows where() + select rows().
  3. Select Where() gets significantly faster by replacing the string comparison with the  contains item().
  4. and combining (2) and (3) makes it ... factor 2 slower than (2)  (?!?!)
  5. so, the winner (for this data set) is #2:

in total:

hogi_1-1731241317667.png

 

 

 

details:

View more...
names default to here(1);
dt = Open( "$SAMPLE_DATA/Wafer Stacked.jmp" );

New Column( "x_y",
	Character,
	Formula( Concat Items( {Char( :X_Die ), Char( :Y_Die )}, "_" ) )
);

t0=hptime();
New Column( "contains item",
	Character,
	Formula( If( Contains Item( "1_3 3_9", :x_y, " " ), "exclude", "not exclude" ) )
);
dt << run formulas();
t= (hptime()-t0)/1000000;
print("formula - contains item", t);


t0=hptime();
New Column( "contains item v2",
	Character,
	Formula( If( Contains Item( :x_y, {"3_1","4_4"}) , "exclude", "not exclude" ) )
);
dt << run formulas();
t= (hptime()-t0)/1000000;
print("1) formula - contains item v2", t);



t0=hptime();
New Column( "get rows where",
	Character,
	set each value("not excluded") );
myrows=[];
for each( {compare}, {"3_1","4_4"},	
	myrows = V concat to (myrows,  dt << get rows where(:x_y == compare));	
);
dt << select rows(myrows); // just for fun - it's fast
dt[myrows,"get rows where"] = Repeat({"excluded"},n items(myrows) );
t= (hptime()-t0)/1000000;
print("2) get rows where", t);



t0=hptime();
for each( {compare}, {"3_1","4_4"},
	dt<<select where( dt:X_Y == compare, current selection("Extend"));
);
t= (hptime()-t0)/1000000;
print("select where - extend", t);



t0=hptime();
for each( {compare}, {"3_1","4_4"},
	dt<<select where( dt:X_Y == compare);
);
t= (hptime()-t0)/1000000;
print("select where", t);



t0=hptime();
dt<<select where( Contains Item( :x_y, {"3_1","4_4"}));
t= (hptime()-t0)/1000000;
print("3) select where + contains item", t);



t0=hptime();
New Column( "get rows where & contains item",
	Character,
	set each value("not excluded") );
myrows = dt << get rows where(Contains Item( :x_y, {"3_1","4_4"}));	
dt[myrows,"get rows where & contains item"] = Repeat({"excluded"},n items(myrows) );
t= (hptime()-t0)/1000000;
print("4) get rows where + contains item v2", t);

 

hogi
Level XII

Re: Formula to check if any of a list of items is contained in the rows of a column

added it to Tips and Tricks - best practice with JMP/JSL 
#4 is surprising. At the moment, I'd say:

hogi_0-1731241802910.png

 

hogi
Level XII

Re: Formula to check if any of a list of items is contained in the rows of a column

Sorry, I missed a wait() in the previous runs - which inverts answer 1 ... and leads to new possibilities for #3, #4 & #5

the new findings:

  1. Contains item() is faster with @jthi 's  syntax - v2 is slower
  2. Select where() gets much faster by replacing it with get rows where() + select rows().
  3. Select Where() gets significantly (!!) faster by replacing the string comparison with the  contains item().
  4. combining (2) and (3) is quite comparable
  5. ... also for larger data sets.

hogi_0-1731261697022.png



View more...

 

names default to here(1);
dt = Open( "$SAMPLE_DATA/Wafer Stacked.jmp" );

New Column( "x_y",
	Character,
	Formula( Concat Items( {Char( :X_Die ), Char( :Y_Die )}, "_" ) )
);

wait(0);

t0=hptime();
New Column( "contains item",
	Character,
	Formula( If( Contains Item( "1_3 3_9", :x_y, " " ), "exclude", "not exclude" ) )
);
dt << run formulas();
t= (hptime()-t0)/1000000;
print("formula - contains item", t);



t0=hptime();
New Column( "contains item v2",
	Character,
	Formula( If( Contains Item( :x_y, {"3_1","4_4"}) , "exclude", "not exclude" ) )
);
dt << run formulas();
t= (hptime()-t0)/1000000;
print("1) formula - contains item v2", t);




t0=hptime();
New Column( "get rows where",
	Character,
	set each value("not excluded") );
myrows=[];
for each( {compare}, {"3_1","4_4"},	
	myrows = V concat to (myrows,  dt << get rows where(:x_y == compare));	
);
dt << select rows(myrows); // just for fun - it's fast
dt[myrows,"get rows where"] = Repeat({"excluded"},n items(myrows) );
t= (hptime()-t0)/1000000;
print("2) get rows where", t);



t0=hptime();
for each( {compare}, {"3_1","4_4"},
	dt<<select where( dt:X_Y == compare, current selection("Extend"));
);
t= (hptime()-t0)/1000000;
print("select where - extend", t);



t0=hptime();
for each( {compare}, {"3_1","4_4"},
	dt<<select where( dt:X_Y == compare);
);
t= (hptime()-t0)/1000000;
print("select where", t);



t0=hptime();
New Column( "select where + contains item",
	Character,
	set each value("not excluded") );
dt<<select where( Contains Item( "1_3 3_9", :x_y, " " ));
myrows = dt << get selected rows ();
dt[myrows,"select where + contains item"] = Repeat({"excluded"},n items(myrows) );
t= (hptime()-t0)/1000000;
print("3) select where + contains item (v1)", t);





t0=hptime();
New Column( "get rows where & contains item",
	Character,
	set each value("not excluded") );
myrows = dt << get rows where(Contains Item( "1_3 3_9", :x_y, " " ));	
dt[myrows,"get rows where & contains item"] = Repeat({"excluded"},n items(myrows) );
t= (hptime()-t0)/1000000;
print("4) get rows where + contains item (v1)", t);  

hogi_0-1731305493836.png

 

mmarchandFSLR
Level IV

Re: Formula to check if any of a list of items is contained in the rows of a column

I think it's worth adding that the Where() function is much faster (> an order of magnitude!) than <<Select Where or <<Get Rows Where.

 

Names Default to Here( 1 );
dt = Data Table( "SATByYear" );
Show( N Rows( dt ) );		//N Rows(dt) = 21245376;
now = HP Time();
dt << Select Where( :State == "Wisconsin" | :State == "Alabama" );
Wait( 0 );
total = HP Time() - now;
Print( "<<Select Where took " || Char( total / 1e6 ) || " seconds." );		//"<<Select Where took 3.241594 seconds."
dt << Clear Select;
now = HP Time();
wh = Where( dt, :State == "Wisconsin" | :State == "Alabama" );
dt << Select Rows( wh );
Wait( 0 );
total = HP Time() - now;
Print( "Where() took " || Char( total / 1e6 ) || " seconds." );		//"Where() took 0.284793 seconds."
dt << Clear Select;
now = HP Time();
wh = dt << Get Rows Where( dt, :State == "Wisconsin" | :State == "Alabama" );
dt << Select Rows( wh );
Wait( 0 );
total = HP Time() - now;
Print( "<<Get Rows Where took " || Char( total / 1e6 ) || " seconds." );		//"<<Get Rows Where took 7.059092 seconds."
hogi
Level XII

Re: Formula to check if any of a list of items is contained in the rows of a column

Ah, great!

In the beginning I had quite bad experience with Where() - but cannot remember why.
Maybe I used a wrong syntax - or the issue got fixed in the mean time ...

datanaut
Level III

Re: Formula to check if any of a list of items is contained in the rows of a column

Thanks @jthi and @txnelson and @hogi and @mmarchandFSLR for answers and effort to my question.

 

Jarmo found my problem -- switching the list and item in my formula.  Thanks for the bit of code illustrating the correct usage.

 

re: "Documentation a bit lacking"
I concur.  The instructions and example in the scripting index do seem a little ambiguous to me.  Here is the entry for the scripting index:

"

Contains Item(x, <item | {list} | pattern>, <delimiter>)

...

Returns a Boolean that indicates whether the word (item), one of a list of words (list), or pattern (pattern) matches one of the words in the text represented by x. Words are delimited by the characters in the optional quoted delimiter (delimiter) string. A comma is the default delimiter. Blanks are trimmed from the ends of each extracted word from the input text string (x).

"

 

So (to my simplistic reading) either x or the {list} could contain the words to be searched or the list of words to compare it with.  A few more examples showing some alternatives that included a Column as in Jarmo's example might help clarify.  

 

I've put the solution to use already!  Thanks again.