Subscribe Bookmark RSS Feed

Fastest way to filter a list based on partial matches.

vince_faller

Super User

Joined:

Mar 17, 2015

I'm trying to see if there's a faster way to filter a list based off partial matches.  See the code below. This is for JMP 12 by the way, before all of the list improvements.

names default to here(1);

dt = New table("Test",

       New Column("SomeList", Set Values(1::100000)),

       private

);

lista = Column(dt, "SomeList") << Data Type(Character) << Get Values;

close(dt, no save);

filter_by_partial_match = Function({list, value},

              dt = New Table("Faster?",

                     New Column("funlist", Character, Set Values(list)),

                     New Column("Bool", Formula(Contains( lowercase(:funlist), lowercase(value) )) ),

                     private

              );

              bools = loc(Column(dt, "Bool")<<Get Values);

              filtered_list = :funlist[bools];

              close(dt, no save);

             

              filtered_list;

);

start_time = HPTime();

filtered_list = filter_by_partial_match(lista, "2");

t_time = HPTime() - start_time;

show(t_time, nitems(filtered_list));

anyone know of any better way?

1 ACCEPTED SOLUTION

Accepted Solutions
Solution

 

Have you tried it in version 13, yet? Here are my results using the private table approach:

t_time = 115828;

N Items(filtered_list) = 40951;

Here are the results using a loop:

t_time = 61895;

N Items(filtered_list) = 40951;

By contrast, this is what I got in version 12, respectively:

t_time = 127356;

N Items(filtered_list) = 40951;

t_time = 161896168;

N Items(filtered_list) = 40951;

 

7 REPLIES
ms

Super User

Joined:

Jun 23, 2011

That's fast! Loop or regex approaches wouldn't even come close (I tried and failed...).

pmroz

Super User

Joined:

Jun 23, 2011

I get a slight improvement if I remove the lowercase function from the column formula, and instead send in a lowercase list and set the value to be lowercase.  Otherwise this looks pretty fast.

Solution

 

Have you tried it in version 13, yet? Here are my results using the private table approach:

t_time = 115828;

N Items(filtered_list) = 40951;

Here are the results using a loop:

t_time = 61895;

N Items(filtered_list) = 40951;

By contrast, this is what I got in version 12, respectively:

t_time = 127356;

N Items(filtered_list) = 40951;

t_time = 161896168;

N Items(filtered_list) = 40951;

 

vince_faller

Super User

Joined:

Mar 17, 2015

Yeah, I love the 13 list improvement. But unfortunately, I have to use 12 for the customer.  I love telling them to go to 13 but it's probably a futile attempt.

vince_faller

Super User

Joined:

Mar 17, 2015

How do you make the entire list lowercase?  Wouldn't the cost of doing that outweigh you any benefit you gain from it?

pmroz

Super User

Joined:

Jun 23, 2011

Yes there would definitely be overhead to making the entire list lowercase.  You could at least make the value lowercase.  But again the performance gain is not much.

Phil_Brown

Super User

Joined:

Mar 20, 2012

I believe the data table enjoys the benefit of multithreading (col formulas, etc). So, I would be surprised if there's a faster way to do this in JMP 12.

PDB