Subscribe Bookmark RSS Feed

Search For Strings Using Partial String

uday_guntupalli

Community Trekker

Joined:

Sep 15, 2014

All,

     I am trying to search for string matches with partial  string as the search criteria . I have tried multiple ways as I show below - I believe there must be a better way using RegEx or Pat Match to do this - but I am unable to get it right. Any help will be appreciated .


For the Sake of an example , I am using Air Traffic Sample Data table :

// Method # 1 Using Pat Match

dt  = Current Data Table();

ColNames = dt << Get COlumn Names();

Match = list();

sp = Pat Span("ID");

Pat Match(ColNames,sp>>Match);

Match; 

// This yields no matches . However in the same code - if ID itself was a column name - this works . So I want to understand how to use part of the search string to do the search .

// Method # 2 Using Contains()

for(     i=1, i<= N Items(ColNames),i++,

                 If(Contains(ColNames,"ID")== 1,

                           Insert Into(Match,i);

                   );

   );

Best
uday

1 ACCEPTED SOLUTION

Accepted Solutions
Solution

maybe like this:

dt  = Current Data Table();

ColNames = dt << Get COlumn Names();

colstrings = char(colnames);

match = regex(colstrings, "[{,]\s*([^{,}]*ID[^{,}]*)[,}]", "\1");

show(Match);

Match = "Flight ID";

which depends on the column names not using commas or curly braces.  The char function turns the list of names into a string:

{Airline, Carrier Code, Flight Number, Tail Number, Flight ID, Event, Airport, Time,

Original Time, Longitude, Latitude}

becomes

"{Airline, Carrier Code, Flight Number, Tail Number, Flight ID, Event, Airport, Time, Original Time, Longitude, Latitude}"

I chose regex for this because...well it won't be beautiful either way.  Breaking down the regex pattern:

"[{,]     either an open curly brace or a comma

\s*      optional white space

([^{,}]*ID[^{,}]*)    the outer parens are the \1 match (more below)

[,}]"     either a comma or a close curly brace

breaking down the \1 part:

[^{,}]*     zero or more characters that are not curly braces or commas

ID          the word you are hunting in a column name

[^{,}]*     zero or more characters that are not curly braces or commas


the third arg to regex, "\1", means use the first open paren to identify the replacement text.

Craige
3 REPLIES
Solution

maybe like this:

dt  = Current Data Table();

ColNames = dt << Get COlumn Names();

colstrings = char(colnames);

match = regex(colstrings, "[{,]\s*([^{,}]*ID[^{,}]*)[,}]", "\1");

show(Match);

Match = "Flight ID";

which depends on the column names not using commas or curly braces.  The char function turns the list of names into a string:

{Airline, Carrier Code, Flight Number, Tail Number, Flight ID, Event, Airport, Time,

Original Time, Longitude, Latitude}

becomes

"{Airline, Carrier Code, Flight Number, Tail Number, Flight ID, Event, Airport, Time, Original Time, Longitude, Latitude}"

I chose regex for this because...well it won't be beautiful either way.  Breaking down the regex pattern:

"[{,]     either an open curly brace or a comma

\s*      optional white space

([^{,}]*ID[^{,}]*)    the outer parens are the \1 match (more below)

[,}]"     either a comma or a close curly brace

breaking down the \1 part:

[^{,}]*     zero or more characters that are not curly braces or commas

ID          the word you are hunting in a column name

[^{,}]*     zero or more characters that are not curly braces or commas


the third arg to regex, "\1", means use the first open paren to identify the replacement text.

Craige
uday_guntupalli

Community Trekker

Joined:

Sep 15, 2014

Craige@JMP​ :
    While the solution you provided works - if I may I have 2 follow up questions :

1. How would you get the indices when there are matches found ?

2. If the column names look like this - which they do in my case

Name( "IPhone 7, California, Yes" )  - ColNames
Name(\!"IPhone 7, California, Yes\!") - ColString

    Please advice .

Best

Uday

Craige_Hales

Staff

Joined:

Mar 21, 2013

Sounds like an explicit loop will be easier and more clear.

for( icol = 1, icol <= nitems(colnames), icol++,

// use regex on char(colnames) and break out of this loop when you get the match you need.

)

Since the column names DO have commas, and you need the icol index, and the regex can be a lot simpler.  This isn't tested, but something like

if( ! ismissing( regex( char(colnames[ icol ]), "ID" ) ), break() );

would test each icol for ID in the name.  regex returns missing if the match fails.

Craige