- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Fastest way to filter a list based on partial matches.
I'm trying to see if there's a faster way to filter a list based off partial matches. See the code below. This is for JMP 12 by the way, before all of the list improvements.
names default to here(1);
dt = New table("Test",
New Column("SomeList", Set Values(1::100000)),
private
);
lista = Column(dt, "SomeList") << Data Type(Character) << Get Values;
close(dt, no save);
filter_by_partial_match = Function({list, value},
dt = New Table("Faster?",
New Column("funlist", Character, Set Values(list)),
New Column("Bool", Formula(Contains( lowercase(:funlist), lowercase(value) )) ),
private
);
bools = loc(Column(dt, "Bool")<<Get Values);
filtered_list = :funlist[bools];
close(dt, no save);
filtered_list;
);
start_time = HPTime();
filtered_list = filter_by_partial_match(lista, "2");
t_time = HPTime() - start_time;
show(t_time, nitems(filtered_list));
anyone know of any better way?
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Fastest way to filter a list based on partial matches.
Have you tried it in version 13, yet? Here are my results using the private table approach:
t_time = 115828;
N Items(filtered_list) = 40951;
Here are the results using a loop:
t_time = 61895;
N Items(filtered_list) = 40951;
By contrast, this is what I got in version 12, respectively:
t_time = 127356;
N Items(filtered_list) = 40951;
t_time = 161896168;
N Items(filtered_list) = 40951;
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Fastest way to filter a list based on partial matches.
That's fast! Loop or regex approaches wouldn't even come close (I tried and failed...).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Fastest way to filter a list based on partial matches.
I get a slight improvement if I remove the lowercase function from the column formula, and instead send in a lowercase list and set the value to be lowercase. Otherwise this looks pretty fast.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Fastest way to filter a list based on partial matches.
Have you tried it in version 13, yet? Here are my results using the private table approach:
t_time = 115828;
N Items(filtered_list) = 40951;
Here are the results using a loop:
t_time = 61895;
N Items(filtered_list) = 40951;
By contrast, this is what I got in version 12, respectively:
t_time = 127356;
N Items(filtered_list) = 40951;
t_time = 161896168;
N Items(filtered_list) = 40951;
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Fastest way to filter a list based on partial matches.
Yeah, I love the 13 list improvement. But unfortunately, I have to use 12 for the customer. I love telling them to go to 13 but it's probably a futile attempt.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Fastest way to filter a list based on partial matches.
How do you make the entire list lowercase? Wouldn't the cost of doing that outweigh you any benefit you gain from it?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Fastest way to filter a list based on partial matches.
Yes there would definitely be overhead to making the entire list lowercase. You could at least make the value lowercase. But again the performance gain is not much.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Fastest way to filter a list based on partial matches.
I believe the data table enjoys the benefit of multithreading (col formulas, etc). So, I would be surprised if there's a faster way to do this in JMP 12.