BookmarkSubscribe
Choose Language Hide Translation Bar
tarkan_bih
Community Trekker

Rename columns with Regex

Hi,

 

I have a list of a huge number of columns to stack that have column format name of TEST_BOUROUGH_GENDER, as shown below. Now the BOUROUGH can be missing from the column name and that imples the value from the column is from all the BOUROUGHs (ie NYC). I stacking all of the TEST_BOUROUGH_GENDER column names and then creating new columns called test, location and gender (see below). When I run into the case of a column that is missing a BOUROUGH name I would like to add NYC to that column name. I would also prefer if this is could be done with regular expressions. Any thoughts or input is greatly appreciated. 

 

column format name: CALC1_BROOKLYN_M, CALC2_QUEENS_F, CALC2_M, ..

 

 

ref = Current Data Table();

stat stack << ref stack(Source Label Column("Label"), Stacked Data Column("Data"));

stat stack << New Column("test", Formula(Right( :Label, 5 )));

stat stack << New Column("location", Formula(Regex( :Label, "_(.*)_", "\1" )));

stat stack << New Column("gender", Formula(Right( :Label, 1)))

0 Kudos
1 ACCEPTED SOLUTION

Accepted Solutions
txnelson
Super User

Re: Rename columns with Regex

This solution on changing the names doesn't use regular expressions, but it is a very simple piece of code:

Names Default To Here( 2 );
dt = Current Data Table();
colNamesList = dt << get column names( string );

For( i = 1, i <= N Items( colNamesList ), i++,
	If( Word( 3, colNamesList[i], "_" ) == "",
		Column( colNamesList[i] ) << set name(
			Word( 1, colNamesList[i], "_" ) || "_NYC_" || Word( 2, colNamesList[i], "_" )
		)
	)
);
Jim
5 REPLIES 5
txnelson
Super User

Re: Rename columns with Regex

This solution on changing the names doesn't use regular expressions, but it is a very simple piece of code:

Names Default To Here( 2 );
dt = Current Data Table();
colNamesList = dt << get column names( string );

For( i = 1, i <= N Items( colNamesList ), i++,
	If( Word( 3, colNamesList[i], "_" ) == "",
		Column( colNamesList[i] ) << set name(
			Word( 1, colNamesList[i], "_" ) || "_NYC_" || Word( 2, colNamesList[i], "_" )
		)
	)
);
Jim
tarkan_bih
Community Trekker

Re: Rename columns with Regex

This is what I want! However when I try to put this in my script it is crashing. The values I used in my example (TEST_BOUROUGH_GENDER) are not real due to IP issues but mearly are used as an example. Also after the last delimiter "_" the column will either start M or P. Is there a way with startswith function to include two seperate characters it can start with?

 

allColNames = dt << getcolumnnames;
pulledCols = {};

Ncols = N items(allColNames);
for(i = 1, i <= Ncols, i++,
	cname = Column(allColNames[i]) << get Name;
	if(startswith(Word(5,cname , "_"),"M"),
		Column(cname) << set name(
			Word(1, cname, "_") || "_"|| Word(2, cname, "_") || "_" || Word(3, cname, "_") || "_" || Word(4, cname, "_") ||"_ALL_" || Word(5,cname,"_")
		)
	),
	if(startswith(Word(5,cname , "_"),"P"),
		Column(cname) << set name(
			Word(1, cname, "_") || "_"|| Word(2, cname, "_") || "_" || Word(3, cname, "_") || "_" || Word(4, cname, "_") ||"_ALL_" || Word(5,cname,"_")
		)
	)
);
0 Kudos
txnelson
Super User

Re: Rename columns with Regex

If I am understanding what you want, you can use an OR clause to get what you want

allColNames = dt << getcolumnnames;
pulledCols = {};

for(i = 1, i <= N items(allColNames), i++,
	cname = Column(allColNames[i]) << get Name;
	if(startswith(Word(5,cname , "_"),"M") | startswith(Word(5,cname , "_"),"P"),
		Column(cname) << set name(
			Word(1, cname, "_") || "_"|| Word(2, cname, "_") || "_" || Word(3, cname, "_") || "_" || Word(4, cname, "_") ||"_ALL_" || Word(5,cname,"_")
		)
	)
);
Jim
0 Kudos
tarkan_bih
Community Trekker

Re: Rename columns with Regex

This fixes everything and now it works! Thanks for all the support. One final improvement I see is that instead of having a long string of concatentation with the word function is there a way to return everything before the delimiter? So for example is there a way to write it as <all values before Word(4, cname, "_")  including the underscores> || "_ALL_" || Word(5, cname, "_"). So basically I want column name string before the 4th delimiter.

 

Column(cname) << set name(
	Word(1, cname, "_") || "_"|| Word(2, cname, "_") || "_" || Word(3, cname, "_") || "_" || Word(4, cname, "_") ||"_ALL_" || Word(5,cname,"_")

0 Kudos
ms
Super User ms
Super User

Re: Rename columns with Regex

This should work as long as Bourough don't contains an underscore.

dt = New Table("test",
    Add Rows(3),
    New Column("Label", Character, "Nominal", Set Selected, Set Values({"CALC1_BROOKLYN_M", "CALC2_QUEENS_F", "CALC2_M"}))
);

dt << New Column("test", Formula(Left(:Label, 5)));
dt << New Column("location", Formula(If(Is Missing(Regex(:Label, "_(.*)_")), "NYC", Regex(:Label, "_(.*)_", "\1"))));
dt << New Column("gender", Formula(Right(:Label, 1)));