cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
vince_faller
Super User (Alumni)

Fastest way to filter a list based on partial matches.

I'm trying to see if there's a faster way to filter a list based off partial matches.  See the code below. This is for JMP 12 by the way, before all of the list improvements.

names default to here(1);

dt = New table("Test",

       New Column("SomeList", Set Values(1::100000)),

       private

);

lista = Column(dt, "SomeList") << Data Type(Character) << Get Values;

close(dt, no save);

filter_by_partial_match = Function({list, value},

              dt = New Table("Faster?",

                     New Column("funlist", Character, Set Values(list)),

                     New Column("Bool", Formula(Contains( lowercase(:funlist), lowercase(value) )) ),

                     private

              );

              bools = loc(Column(dt, "Bool")<<Get Values);

              filtered_list = :funlist[bools];

              close(dt, no save);

             

              filtered_list;

);

start_time = HPTime();

filtered_list = filter_by_partial_match(lista, "2");

t_time = HPTime() - start_time;

show(t_time, nitems(filtered_list));

anyone know of any better way?

Vince Faller - Predictum
1 ACCEPTED SOLUTION

Accepted Solutions
jerry_cooper
Staff (Retired)

Re: Fastest way to filter a list based on partial matches.

 

Have you tried it in version 13, yet? Here are my results using the private table approach:

t_time = 115828;

N Items(filtered_list) = 40951;

Here are the results using a loop:

t_time = 61895;

N Items(filtered_list) = 40951;

By contrast, this is what I got in version 12, respectively:

t_time = 127356;

N Items(filtered_list) = 40951;

t_time = 161896168;

N Items(filtered_list) = 40951;

 

View solution in original post

7 REPLIES 7
ms
Super User (Alumni) ms
Super User (Alumni)

Re: Fastest way to filter a list based on partial matches.

That's fast! Loop or regex approaches wouldn't even come close (I tried and failed...).

pmroz
Super User

Re: Fastest way to filter a list based on partial matches.

I get a slight improvement if I remove the lowercase function from the column formula, and instead send in a lowercase list and set the value to be lowercase.  Otherwise this looks pretty fast.

jerry_cooper
Staff (Retired)

Re: Fastest way to filter a list based on partial matches.

 

Have you tried it in version 13, yet? Here are my results using the private table approach:

t_time = 115828;

N Items(filtered_list) = 40951;

Here are the results using a loop:

t_time = 61895;

N Items(filtered_list) = 40951;

By contrast, this is what I got in version 12, respectively:

t_time = 127356;

N Items(filtered_list) = 40951;

t_time = 161896168;

N Items(filtered_list) = 40951;

 

vince_faller
Super User (Alumni)

Re: Fastest way to filter a list based on partial matches.

Yeah, I love the 13 list improvement. But unfortunately, I have to use 12 for the customer.  I love telling them to go to 13 but it's probably a futile attempt.

Vince Faller - Predictum
vince_faller
Super User (Alumni)

Re: Fastest way to filter a list based on partial matches.

How do you make the entire list lowercase?  Wouldn't the cost of doing that outweigh you any benefit you gain from it?

Vince Faller - Predictum
pmroz
Super User

Re: Fastest way to filter a list based on partial matches.

Yes there would definitely be overhead to making the entire list lowercase.  You could at least make the value lowercase.  But again the performance gain is not much.

Phil_Brown
Super User (Alumni)

Re: Fastest way to filter a list based on partial matches.

I believe the data table enjoys the benefit of multithreading (col formulas, etc). So, I would be surprised if there's a faster way to do this in JMP 12.

PDB