cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Djtjhin
Level IV

Column Search using list of Partial String

I'm trying to get rows where the value has a match from a defined list of partial strings. Example below.

 

The partial string which I'm interested is either "aaa" or "aa bb". I've started with the JSL below. Ideally, the result is [1,2,4,5,7,8,11]. The script below was able to get everything except for the last one (11). Nevertheless, perhaps there are more elegant solutions than what I have as well. Appreciate the help! 

 

Djtjhin_0-1647527212715.png

 

 

names default to here(1);

keys_list = {"aaa", "aa bb"};

dt = current data table();

Row_list = dt <<get rows where(contains(keys_list, lowercase(as column("TEST"))));

for(i=1,i<=nrows(dt),i++,
	revised_string = lowercase(regex(:Test[i],"\W",",",GLOBALREPLACE));
	for(j=1,j<=nitems(keys_list),j++,
		if(patmatch(revised_string,keys_list[j]) & nitems(Loc(Row_list,i))==0,
			Insertinto(Row_list,i)
		)
	)
	
);

show(sort ascending(Row_list))
1 ACCEPTED SOLUTION

Accepted Solutions
jthi
Super User

Re: Column Search using list of Partial String

I think I would do it exactly as you say, either loop over the keys_list or just enter them there with "wild" card character (in case of regex . should work). Edit: using . might be matchj too many different characters, so it is better most likely to build a string which has all the allowed strings. One more source for regex regex101

Names Default To Here(1);

keys_list = {"aaa", "aa bb"};

new_keys_list = Transform Each({new_key}, keys_list, Substitute(new_key, " ", "."));

dt = Current Data Table();
regex_pattern = "(" || Concat Items(new_keys_list, "|") || ")";

row_list = [];
For Each Row(dt,
	If(!IsMissing(Regex(lowercase(:TEST), regex_pattern)),
		Insert Into(row_list, Row())
	);
);

show(row_list); //row_list = [1, 2, 4, 5, 7, 8, 11];

@vince_faller I think I remember seeing something about that, but I'm not sure anymore in which case it was and which function, at least with JMP16 the IsMissing() does work with strings. Maybe it was with comparisons like "a" == ., as it will throw error:

jthi_0-1647534634634.png

 

 

-Jarmo

View solution in original post

9 REPLIES 9
jthi
Super User

Re: Column Search using list of Partial String

I would most likely use regular regex here with something like this. If it gets too slow, then I would maybe start thinking about something more complicated

Names Default To Here(1);

keys_list = {"aaa", "aa bb"};

dt = Current Data Table();
regex_pattern = "(" || Concat Items(keys_list, "|") || ")";

row_list = [];
For Each Row(dt,
	If(!IsMissing(Regex(lowercase(:TEST), regex_pattern)),
		Insert Into(row_list, Row())
	);
);

show(row_list); //row_list = [1, 2, 4, 5, 7, 8, 11];
-Jarmo
Djtjhin
Level IV

Re: Column Search using list of Partial String

@jthi  , @txnelson  , @vince_faller  Thanks for the solutions! I actually forgot to mention one more scenario. The second element in the keys_list is "aa bb" which is two words separated with a space delimiter. In the dataset which I'm working on, there could be instances where the delimiter is some other special characters. For instance, if there's a 12th row with a value of "aa-bb", I would like that row to be called as well, which leads to the final result of [1,2,4,5,7,8,11,12].

 

How should I update the scripts to account for this as well ? I'm guessing there should be a step which replaces all delimiter pattern in keys_list with a regex code for special characters ?

jthi
Super User

Re: Column Search using list of Partial String

I think I would do it exactly as you say, either loop over the keys_list or just enter them there with "wild" card character (in case of regex . should work). Edit: using . might be matchj too many different characters, so it is better most likely to build a string which has all the allowed strings. One more source for regex regex101

Names Default To Here(1);

keys_list = {"aaa", "aa bb"};

new_keys_list = Transform Each({new_key}, keys_list, Substitute(new_key, " ", "."));

dt = Current Data Table();
regex_pattern = "(" || Concat Items(new_keys_list, "|") || ")";

row_list = [];
For Each Row(dt,
	If(!IsMissing(Regex(lowercase(:TEST), regex_pattern)),
		Insert Into(row_list, Row())
	);
);

show(row_list); //row_list = [1, 2, 4, 5, 7, 8, 11];

@vince_faller I think I remember seeing something about that, but I'm not sure anymore in which case it was and which function, at least with JMP16 the IsMissing() does work with strings. Maybe it was with comparisons like "a" == ., as it will throw error:

jthi_0-1647534634634.png

 

 

-Jarmo
Djtjhin
Level IV

Re: Column Search using list of Partial String

Thanks @jthi !

 

regex101 is a fantastic resource!

 

txnelson
Super User

Re: Column Search using list of Partial String

Here is my approach to solve this issue

names default to here(1);

keys_list = {"aaa", "aa bb"};

dt = current data table();

row_list=[];
For( i = 1, i <= N Rows( dt ), i++,
	For( k = 1, k <= N Items( keys_list ), k++,
		If( Contains( :test[i], keys_list[k] ),
			row_list = row_list || Matrix( i );
			Break();
		)
	)
);
show(sort ascending(Row_list));
Jim
vince_faller
Super User (Alumni)

Re: Column Search using list of Partial String

If you want to do it in a column formula

 

Names default to here(1);
dt = current data table();
dt << Add Rows({:Test = "aa-bb"});
dt << New Column("Match", Formula(isstring(Regex(:TEST, "(aaa)|(aa.bb)", "\0", IGNORECASE))));
dt << Run Formulas(); // just to make sure the formula runs
match_rows = loc(dt:Match << Get Values());
dt << Delete Columns(:Match);
show(match_rows); // match_rows = [1, 2, 4, 5, 7, 8, 11];

 

Here's a regex resource I like if you don't know it very well. 

 

Also @jthi didn't `ismissing()` not used to work on strings?  I remember it always killing my regexs (hence why I used `isstring()` here).  Though I tried it all the way back to 14.3 and it seems to be fine.  

 

*Edit* made the wildcard. 

Vince Faller - Predictum
Djtjhin
Level IV

Re: Column Search using list of Partial String

@jthi , @vince_faller , @txnelson . Thanks! These are great solutions!

 

Actually, I forgot to include one more scenario which I might see in the data that I'm working on. The second item in the keys_list is "aa bb" which is two words separated by a space delimiter. In my dataset, there might be instances where the words are separated with other special characters but I would still like to have them included. For instance, if there's a 12th row with a value of "aa-bb", I would like that to be called out also, hence the final result will be [1,2,4,5,7,8,11,12]

 

How would you go about updating the scripts to accommodate this ? I'm guessing there's a step where all delimiters in keys_list are replaced with a special code for special characters which can be recognized in a pattern by regex ?

 

Thanks for the help!

vince_faller
Super User (Alumni)

Re: Column Search using list of Partial String

I updated my script to accommodate this.  It's just a `.` wildcard. You might want to be more specific about your wildcard.  It's usually not the best to just allow anything.  Especially if that's not what you truly want.  

 

 

Vince Faller - Predictum
Djtjhin
Level IV

Re: Column Search using list of Partial String

@jthi , @txnelson , @vince_faller Thanks for the solutions!

 

I actually forgot to mention one more scenario. The second element in the keys_list is "aa bb" which is two words separated with a space delimiter. In the dataset which I'm working on, there could be instances where the delimiter is some other special characters. For instance, if there's a 12th row with a value of "aa-bb", I would like that row to be called as well, which leads to the final result of [1,2,4,5,7,8,11,12].

 

How should I update the scripts to account for this as well ? I'm guessing there should be a step which replaces all delimiter pattern in keys_list with a regex code for special characters ?