Solved: Search For Strings Using Partial String

uday_guntupalli · Aug 15, 2016 01:13 PM

All,

I am trying to search for string matches with partial string as the search criteria . I have tried multiple ways as I show below - I believe there must be a better way using RegEx or Pat Match to do this - but I am unable to get it right. Any help will be appreciated .

For the Sake of an example , I am using Air Traffic Sample Data table :

// Method # 1 Using Pat Match

dt = Current Data Table();

ColNames = dt << Get COlumn Names();

Match = list();

sp = Pat Span("ID");

Pat Match(ColNames,sp>>Match);

Match;

// This yields no matches . However in the same code - if ID itself was a column name - this works . So I want to understand how to use part of the search string to do the search .

// Method # 2 Using Contains()

for( i=1, i<= N Items(ColNames),i++,

If(Contains(ColNames,"ID")== 1,

Insert Into(Match,i);

);

Best
uday

Best
Uday

Craige_Hales · Aug 15, 2016 02:11 PM

maybe like this:

dt = Current Data Table();

ColNames = dt << Get COlumn Names();

colstrings = char(colnames);

match = regex(colstrings, "[{,]\s*([^{,}]*ID[^{,}]*)[,}]", "\1");

show(Match);

Match = "Flight ID";

which depends on the column names not using commas or curly braces. The char function turns the list of names into a string:

{Airline, Carrier Code, Flight Number, Tail Number, Flight ID, Event, Airport, Time,

Original Time, Longitude, Latitude}

becomes

"{Airline, Carrier Code, Flight Number, Tail Number, Flight ID, Event, Airport, Time, Original Time, Longitude, Latitude}"

I chose regex for this because...well it won't be beautiful either way. Breaking down the regex pattern:

"[{,] either an open curly brace or a comma

\s* optional white space

([^{,}]*ID[^{,}]*) the outer parens are the \1 match (more below)

[,}]" either a comma or a close curly brace

breaking down the \1 part:

[^{,}]* zero or more characters that are not curly braces or commas

ID the word you are hunting in a column name

[^{,}]* zero or more characters that are not curly braces or commas

the third arg to regex, "\1", means use the first open paren to identify the replacement text.

Craige

View solution in original post

Craige_Hales · Aug 15, 2016 02:11 PM

maybe like this:

dt = Current Data Table();

ColNames = dt << Get COlumn Names();

colstrings = char(colnames);

match = regex(colstrings, "[{,]\s*([^{,}]*ID[^{,}]*)[,}]", "\1");

show(Match);

Match = "Flight ID";

which depends on the column names not using commas or curly braces. The char function turns the list of names into a string:

{Airline, Carrier Code, Flight Number, Tail Number, Flight ID, Event, Airport, Time,

Original Time, Longitude, Latitude}

becomes

"{Airline, Carrier Code, Flight Number, Tail Number, Flight ID, Event, Airport, Time, Original Time, Longitude, Latitude}"

I chose regex for this because...well it won't be beautiful either way. Breaking down the regex pattern:

"[{,] either an open curly brace or a comma

\s* optional white space

([^{,}]*ID[^{,}]*) the outer parens are the \1 match (more below)

[,}]" either a comma or a close curly brace

breaking down the \1 part:

[^{,}]* zero or more characters that are not curly braces or commas

ID the word you are hunting in a column name

[^{,}]* zero or more characters that are not curly braces or commas

the third arg to regex, "\1", means use the first open paren to identify the replacement text.

Craige

uday_guntupalli · Aug 15, 2016 03:09 PM

Craige@JMP :
While the solution you provided works - if I may I have 2 follow up questions :

1. How would you get the indices when there are matches found ?

2. If the column names look like this - which they do in my case

Name( "IPhone 7, California, Yes" ) - ColNames
Name(\!"IPhone 7, California, Yes\!") - ColString

Please advice .

Best

Uday

Best
Uday

Craige_Hales · Aug 15, 2016 04:16 PM

Sounds like an explicit loop will be easier and more clear.

for( icol = 1, icol <= nitems(colnames), icol++,

// use regex on char(colnames) and break out of this loop when you get the match you need.

)

Since the column names DO have commas, and you need the icol index, and the regex can be a lot simpler. This isn't tested, but something like

if( ! ismissing( regex( char(colnames[ icol ]), "ID" ) ), break() );

would test each icol for ID in the name. regex returns missing if the match fails.

Craige

Search For Strings Using Partial String

Re: Search For Strings Using Partial String

Re: Search For Strings Using Partial String

Re: Search For Strings Using Partial String

Re: Search For Strings Using Partial String