I'm trying to get rows where the value has a match from a defined list of partial strings. Example below.
The partial string which I'm interested is either "aaa" or "aa bb". I've started with the JSL below. Ideally, the result is [1,2,4,5,7,8,11]. The script below was able to get everything except for the last one (11). Nevertheless, perhaps there are more elegant solutions than what I have as well. Appreciate the help!
names default to here(1);
keys_list = {"aaa", "aa bb"};
dt = current data table();
Row_list = dt <<get rows where(contains(keys_list, lowercase(as column("TEST"))));
for(i=1,i<=nrows(dt),i++,
revised_string = lowercase(regex(:Test[i],"\W",",",GLOBALREPLACE));
for(j=1,j<=nitems(keys_list),j++,
if(patmatch(revised_string,keys_list[j]) & nitems(Loc(Row_list,i))==0,
Insertinto(Row_list,i)
)
)
);
show(sort ascending(Row_list))
I think I would do it exactly as you say, either loop over the keys_list or just enter them there with "wild" card character (in case of regex . should work). Edit: using . might be matchj too many different characters, so it is better most likely to build a string which has all the allowed strings. One more source for regex regex101
Names Default To Here(1);
keys_list = {"aaa", "aa bb"};
new_keys_list = Transform Each({new_key}, keys_list, Substitute(new_key, " ", "."));
dt = Current Data Table();
regex_pattern = "(" || Concat Items(new_keys_list, "|") || ")";
row_list = [];
For Each Row(dt,
If(!IsMissing(Regex(lowercase(:TEST), regex_pattern)),
Insert Into(row_list, Row())
);
);
show(row_list); //row_list = [1, 2, 4, 5, 7, 8, 11];
@vince_faller I think I remember seeing something about that, but I'm not sure anymore in which case it was and which function, at least with JMP16 the IsMissing() does work with strings. Maybe it was with comparisons like "a" == ., as it will throw error:
I would most likely use regular regex here with something like this. If it gets too slow, then I would maybe start thinking about something more complicated
Names Default To Here(1);
keys_list = {"aaa", "aa bb"};
dt = Current Data Table();
regex_pattern = "(" || Concat Items(keys_list, "|") || ")";
row_list = [];
For Each Row(dt,
If(!IsMissing(Regex(lowercase(:TEST), regex_pattern)),
Insert Into(row_list, Row())
);
);
show(row_list); //row_list = [1, 2, 4, 5, 7, 8, 11];
@jthi , @txnelson , @vince_faller Thanks for the solutions! I actually forgot to mention one more scenario. The second element in the keys_list is "aa bb" which is two words separated with a space delimiter. In the dataset which I'm working on, there could be instances where the delimiter is some other special characters. For instance, if there's a 12th row with a value of "aa-bb", I would like that row to be called as well, which leads to the final result of [1,2,4,5,7,8,11,12].
How should I update the scripts to account for this as well ? I'm guessing there should be a step which replaces all delimiter pattern in keys_list with a regex code for special characters ?
I think I would do it exactly as you say, either loop over the keys_list or just enter them there with "wild" card character (in case of regex . should work). Edit: using . might be matchj too many different characters, so it is better most likely to build a string which has all the allowed strings. One more source for regex regex101
Names Default To Here(1);
keys_list = {"aaa", "aa bb"};
new_keys_list = Transform Each({new_key}, keys_list, Substitute(new_key, " ", "."));
dt = Current Data Table();
regex_pattern = "(" || Concat Items(new_keys_list, "|") || ")";
row_list = [];
For Each Row(dt,
If(!IsMissing(Regex(lowercase(:TEST), regex_pattern)),
Insert Into(row_list, Row())
);
);
show(row_list); //row_list = [1, 2, 4, 5, 7, 8, 11];
@vince_faller I think I remember seeing something about that, but I'm not sure anymore in which case it was and which function, at least with JMP16 the IsMissing() does work with strings. Maybe it was with comparisons like "a" == ., as it will throw error:
Here is my approach to solve this issue
names default to here(1);
keys_list = {"aaa", "aa bb"};
dt = current data table();
row_list=[];
For( i = 1, i <= N Rows( dt ), i++,
For( k = 1, k <= N Items( keys_list ), k++,
If( Contains( :test[i], keys_list[k] ),
row_list = row_list || Matrix( i );
Break();
)
)
);
show(sort ascending(Row_list));
If you want to do it in a column formula
Names default to here(1);
dt = current data table();
dt << Add Rows({:Test = "aa-bb"});
dt << New Column("Match", Formula(isstring(Regex(:TEST, "(aaa)|(aa.bb)", "\0", IGNORECASE))));
dt << Run Formulas(); // just to make sure the formula runs
match_rows = loc(dt:Match << Get Values());
dt << Delete Columns(:Match);
show(match_rows); // match_rows = [1, 2, 4, 5, 7, 8, 11];
Here's a regex resource I like if you don't know it very well.
Also @jthi didn't `ismissing()` not used to work on strings? I remember it always killing my regexs (hence why I used `isstring()` here). Though I tried it all the way back to 14.3 and it seems to be fine.
*Edit* made the wildcard.
@jthi , @vince_faller , @txnelson . Thanks! These are great solutions!
Actually, I forgot to include one more scenario which I might see in the data that I'm working on. The second item in the keys_list is "aa bb" which is two words separated by a space delimiter. In my dataset, there might be instances where the words are separated with other special characters but I would still like to have them included. For instance, if there's a 12th row with a value of "aa-bb", I would like that to be called out also, hence the final result will be [1,2,4,5,7,8,11,12]
How would you go about updating the scripts to accommodate this ? I'm guessing there's a step where all delimiters in keys_list are replaced with a special code for special characters which can be recognized in a pattern by regex ?
Thanks for the help!
I updated my script to accommodate this. It's just a `.` wildcard. You might want to be more specific about your wildcard. It's usually not the best to just allow anything. Especially if that's not what you truly want.
@jthi , @txnelson , @vince_faller Thanks for the solutions!
I actually forgot to mention one more scenario. The second element in the keys_list is "aa bb" which is two words separated with a space delimiter. In the dataset which I'm working on, there could be instances where the delimiter is some other special characters. For instance, if there's a 12th row with a value of "aa-bb", I would like that row to be called as well, which leads to the final result of [1,2,4,5,7,8,11,12].
How should I update the scripts to account for this as well ? I'm guessing there should be a step which replaces all delimiter pattern in keys_list with a regex code for special characters ?