cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
JMP is taking Discovery online, April 16 and 18. Register today and join us for interactive sessions featuring popular presentation topics, networking, and discussions with the experts.
Choose Language Hide Translation Bar
tarkan_bih
Level III

Rename columns with Regex

Hi,

 

I have a list of a huge number of columns to stack that have column format name of TEST_BOUROUGH_GENDER, as shown below. Now the BOUROUGH can be missing from the column name and that imples the value from the column is from all the BOUROUGHs (ie NYC). I stacking all of the TEST_BOUROUGH_GENDER column names and then creating new columns called test, location and gender (see below). When I run into the case of a column that is missing a BOUROUGH name I would like to add NYC to that column name. I would also prefer if this is could be done with regular expressions. Any thoughts or input is greatly appreciated. 

 

column format name: CALC1_BROOKLYN_M, CALC2_QUEENS_F, CALC2_M, ..

 

 

ref = Current Data Table();

stat stack << ref stack(Source Label Column("Label"), Stacked Data Column("Data"));

stat stack << New Column("test", Formula(Right( :Label, 5 )));

stat stack << New Column("location", Formula(Regex( :Label, "_(.*)_", "\1" )));

stat stack << New Column("gender", Formula(Right( :Label, 1)))

1 ACCEPTED SOLUTION

Accepted Solutions
txnelson
Super User

Re: Rename columns with Regex

This solution on changing the names doesn't use regular expressions, but it is a very simple piece of code:

Names Default To Here( 2 );
dt = Current Data Table();
colNamesList = dt << get column names( string );

For( i = 1, i <= N Items( colNamesList ), i++,
	If( Word( 3, colNamesList[i], "_" ) == "",
		Column( colNamesList[i] ) << set name(
			Word( 1, colNamesList[i], "_" ) || "_NYC_" || Word( 2, colNamesList[i], "_" )
		)
	)
);
Jim

View solution in original post

5 REPLIES 5
txnelson
Super User

Re: Rename columns with Regex

This solution on changing the names doesn't use regular expressions, but it is a very simple piece of code:

Names Default To Here( 2 );
dt = Current Data Table();
colNamesList = dt << get column names( string );

For( i = 1, i <= N Items( colNamesList ), i++,
	If( Word( 3, colNamesList[i], "_" ) == "",
		Column( colNamesList[i] ) << set name(
			Word( 1, colNamesList[i], "_" ) || "_NYC_" || Word( 2, colNamesList[i], "_" )
		)
	)
);
Jim
tarkan_bih
Level III

Re: Rename columns with Regex

This is what I want! However when I try to put this in my script it is crashing. The values I used in my example (TEST_BOUROUGH_GENDER) are not real due to IP issues but mearly are used as an example. Also after the last delimiter "_" the column will either start M or P. Is there a way with startswith function to include two seperate characters it can start with?

 

allColNames = dt << getcolumnnames;
pulledCols = {};

Ncols = N items(allColNames);
for(i = 1, i <= Ncols, i++,
	cname = Column(allColNames[i]) << get Name;
	if(startswith(Word(5,cname , "_"),"M"),
		Column(cname) << set name(
			Word(1, cname, "_") || "_"|| Word(2, cname, "_") || "_" || Word(3, cname, "_") || "_" || Word(4, cname, "_") ||"_ALL_" || Word(5,cname,"_")
		)
	),
	if(startswith(Word(5,cname , "_"),"P"),
		Column(cname) << set name(
			Word(1, cname, "_") || "_"|| Word(2, cname, "_") || "_" || Word(3, cname, "_") || "_" || Word(4, cname, "_") ||"_ALL_" || Word(5,cname,"_")
		)
	)
);
txnelson
Super User

Re: Rename columns with Regex

If I am understanding what you want, you can use an OR clause to get what you want

allColNames = dt << getcolumnnames;
pulledCols = {};

for(i = 1, i <= N items(allColNames), i++,
	cname = Column(allColNames[i]) << get Name;
	if(startswith(Word(5,cname , "_"),"M") | startswith(Word(5,cname , "_"),"P"),
		Column(cname) << set name(
			Word(1, cname, "_") || "_"|| Word(2, cname, "_") || "_" || Word(3, cname, "_") || "_" || Word(4, cname, "_") ||"_ALL_" || Word(5,cname,"_")
		)
	)
);
Jim
tarkan_bih
Level III

Re: Rename columns with Regex

This fixes everything and now it works! Thanks for all the support. One final improvement I see is that instead of having a long string of concatentation with the word function is there a way to return everything before the delimiter? So for example is there a way to write it as <all values before Word(4, cname, "_")  including the underscores> || "_ALL_" || Word(5, cname, "_"). So basically I want column name string before the 4th delimiter.

 

Column(cname) << set name(
	Word(1, cname, "_") || "_"|| Word(2, cname, "_") || "_" || Word(3, cname, "_") || "_" || Word(4, cname, "_") ||"_ALL_" || Word(5,cname,"_")

ms
Super User (Alumni) ms
Super User (Alumni)

Re: Rename columns with Regex

This should work as long as Bourough don't contains an underscore.

dt = New Table("test",
    Add Rows(3),
    New Column("Label", Character, "Nominal", Set Selected, Set Values({"CALC1_BROOKLYN_M", "CALC2_QUEENS_F", "CALC2_M"}))
);

dt << New Column("test", Formula(Left(:Label, 5)));
dt << New Column("location", Formula(If(Is Missing(Regex(:Label, "_(.*)_")), "NYC", Regex(:Label, "_(.*)_", "\1"))));
dt << New Column("gender", Formula(Right(:Label, 1)));