BookmarkSubscribeRSS Feed
vince_faller

Super User

Joined:

Mar 17, 2015

Fastest way to filter a list based on partial matches.

I'm trying to see if there's a faster way to filter a list based off partial matches.  See the code below. This is for JMP 12 by the way, before all of the list improvements.

names default to here(1);

dt = New table("Test",

       New Column("SomeList", Set Values(1::100000)),

       private

);

lista = Column(dt, "SomeList") << Data Type(Character) << Get Values;

close(dt, no save);

filter_by_partial_match = Function({list, value},

              dt = New Table("Faster?",

                     New Column("funlist", Character, Set Values(list)),

                     New Column("Bool", Formula(Contains( lowercase(:funlist), lowercase(value) )) ),

                     private

              );

              bools = loc(Column(dt, "Bool")<<Get Values);

              filtered_list = :funlist[bools];

              close(dt, no save);

             

              filtered_list;

);

start_time = HPTime();

filtered_list = filter_by_partial_match(lista, "2");

t_time = HPTime() - start_time;

show(t_time, nitems(filtered_list));

anyone know of any better way?

1 ACCEPTED SOLUTION

Accepted Solutions
jerry_cooper

Staff

Joined:

Jul 10, 2014

Solution

Re: Fastest way to filter a list based on partial matches.

 

Have you tried it in version 13, yet? Here are my results using the private table approach:

t_time = 115828;

N Items(filtered_list) = 40951;

Here are the results using a loop:

t_time = 61895;

N Items(filtered_list) = 40951;

By contrast, this is what I got in version 12, respectively:

t_time = 127356;

N Items(filtered_list) = 40951;

t_time = 161896168;

N Items(filtered_list) = 40951;

 

7 REPLIES
ms

Super User

Joined:

Jun 23, 2011

Re: Fastest way to filter a list based on partial matches.

That's fast! Loop or regex approaches wouldn't even come close (I tried and failed...).

pmroz

Super User

Joined:

Jun 23, 2011

Re: Fastest way to filter a list based on partial matches.

I get a slight improvement if I remove the lowercase function from the column formula, and instead send in a lowercase list and set the value to be lowercase.  Otherwise this looks pretty fast.

jerry_cooper

Staff

Joined:

Jul 10, 2014

Solution

Re: Fastest way to filter a list based on partial matches.

 

Have you tried it in version 13, yet? Here are my results using the private table approach:

t_time = 115828;

N Items(filtered_list) = 40951;

Here are the results using a loop:

t_time = 61895;

N Items(filtered_list) = 40951;

By contrast, this is what I got in version 12, respectively:

t_time = 127356;

N Items(filtered_list) = 40951;

t_time = 161896168;

N Items(filtered_list) = 40951;

 

vince_faller

Super User

Joined:

Mar 17, 2015

Re: Fastest way to filter a list based on partial matches.

Yeah, I love the 13 list improvement. But unfortunately, I have to use 12 for the customer.  I love telling them to go to 13 but it's probably a futile attempt.

vince_faller

Super User

Joined:

Mar 17, 2015

Re: Fastest way to filter a list based on partial matches.

How do you make the entire list lowercase?  Wouldn't the cost of doing that outweigh you any benefit you gain from it?

pmroz

Super User

Joined:

Jun 23, 2011

Re: Fastest way to filter a list based on partial matches.

Yes there would definitely be overhead to making the entire list lowercase.  You could at least make the value lowercase.  But again the performance gain is not much.

Phil_Brown

Super User

Joined:

Mar 20, 2012

Re: Fastest way to filter a list based on partial matches.

I believe the data table enjoys the benefit of multithreading (col formulas, etc). So, I would be surprised if there's a faster way to do this in JMP 12.

PDB