cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Check out the JMP® Marketplace featured Capability Explorer add-in
Choose Language Hide Translation Bar
aallman
Level III

Search a list of strings for a partial string

Hello

 

I am trying to find a way to search through a list of strings for all items that contain a partial string. I know this can be done using a For loop, but the list that I will be running this script on is very large and For loops take a very long time. I am wondering if there is some function out there similar to Loc(list, string) that will find the items that have the partial string.

 

Just as an example:

 

list= {football, hockey, baseball, tennis};

Loc(list, "ball");

Would ideally return [1, 3].

 

Thanks!

 

1 REPLY 1
Craige_Hales
Super User

Re: Search a list of strings for a partial string

I think the loop may be a good choice; most of the work is in the contains function, not the loop overhead.

x={"34565467456745673456345634563456abca","34563456456745674534563456346accca","34456745674556345634563456accda"};
for(i=1,i<20,i+=1,
x=x||x;
);
nitems(x); // 1572864
result={};
start=tickseconds();
for(i=1,i<=nitems(x),i+=1,
	if(contains(x[i],"ccc"),insertinto(result,i))
);
stop=tickseconds();
show(nitems(x)/nitems(result),stop-start);// 3:1, <1 second

One of the three strings contains the search pattern, the result list is 1/3 the size of the source. 1 second for 1.5 million items seems reasonable. What size list, typical item length, and what time requirement do you have?

 

Another approach, not as good. Avoids the explicit loop but copies the data into a table and uses row selection:

x={"34565467456745673456345634563456abca","34563456456745674534563456346accca","34456745674556345634563456accda"};
for(i=1,i<20,i+=1,
x=x||x;
);
nitems(x); // 1572864

start=tickseconds();
dt = New Table( "Untitled",
	New Column( "Column 1", Character, "Nominal", Set Values( x ) )
);
stop=tickseconds();
show(stop-start); // 1.3 sec

start=tickseconds();
dt<<selectwhere(contains(column1,"ccc"));
list=dt<<getselectedrows;
stop=tickseconds();
show(stop-start); // 1 sec

JMP added better list support in JMP 13; if you are using an older version, read this post

https://community.jmp.com/t5/Uncharted/Fast-List/ba-p/28947

to make JMP < 13 faster.

Craige