Hi,
I have a list of a huge number of columns to stack that have column format name of TEST_BOUROUGH_GENDER, as shown below. Now the BOUROUGH can be missing from the column name and that imples the value from the column is from all the BOUROUGHs (ie NYC). I stacking all of the TEST_BOUROUGH_GENDER column names and then creating new columns called test, location and gender (see below). When I run into the case of a column that is missing a BOUROUGH name I would like to add NYC to that column name. I would also prefer if this is could be done with regular expressions. Any thoughts or input is greatly appreciated.
column format name: CALC1_BROOKLYN_M, CALC2_QUEENS_F, CALC2_M, ..
ref = Current Data Table();
stat stack << ref stack(Source Label Column("Label"), Stacked Data Column("Data"));
stat stack << New Column("test", Formula(Right( :Label, 5 )));
stat stack << New Column("location", Formula(Regex( :Label, "_(.*)_", "\1" )));
stat stack << New Column("gender", Formula(Right( :Label, 1)))
This solution on changing the names doesn't use regular expressions, but it is a very simple piece of code:
Names Default To Here( 2 );
dt = Current Data Table();
colNamesList = dt << get column names( string );
For( i = 1, i <= N Items( colNamesList ), i++,
If( Word( 3, colNamesList[i], "_" ) == "",
Column( colNamesList[i] ) << set name(
Word( 1, colNamesList[i], "_" ) || "_NYC_" || Word( 2, colNamesList[i], "_" )
)
)
);
This solution on changing the names doesn't use regular expressions, but it is a very simple piece of code:
Names Default To Here( 2 );
dt = Current Data Table();
colNamesList = dt << get column names( string );
For( i = 1, i <= N Items( colNamesList ), i++,
If( Word( 3, colNamesList[i], "_" ) == "",
Column( colNamesList[i] ) << set name(
Word( 1, colNamesList[i], "_" ) || "_NYC_" || Word( 2, colNamesList[i], "_" )
)
)
);
This is what I want! However when I try to put this in my script it is crashing. The values I used in my example (TEST_BOUROUGH_GENDER) are not real due to IP issues but mearly are used as an example. Also after the last delimiter "_" the column will either start M or P. Is there a way with startswith function to include two seperate characters it can start with?
allColNames = dt << getcolumnnames;
pulledCols = {};
Ncols = N items(allColNames);
for(i = 1, i <= Ncols, i++,
cname = Column(allColNames[i]) << get Name;
if(startswith(Word(5,cname , "_"),"M"),
Column(cname) << set name(
Word(1, cname, "_") || "_"|| Word(2, cname, "_") || "_" || Word(3, cname, "_") || "_" || Word(4, cname, "_") ||"_ALL_" || Word(5,cname,"_")
)
),
if(startswith(Word(5,cname , "_"),"P"),
Column(cname) << set name(
Word(1, cname, "_") || "_"|| Word(2, cname, "_") || "_" || Word(3, cname, "_") || "_" || Word(4, cname, "_") ||"_ALL_" || Word(5,cname,"_")
)
)
);
If I am understanding what you want, you can use an OR clause to get what you want
allColNames = dt << getcolumnnames;
pulledCols = {};
for(i = 1, i <= N items(allColNames), i++,
cname = Column(allColNames[i]) << get Name;
if(startswith(Word(5,cname , "_"),"M") | startswith(Word(5,cname , "_"),"P"),
Column(cname) << set name(
Word(1, cname, "_") || "_"|| Word(2, cname, "_") || "_" || Word(3, cname, "_") || "_" || Word(4, cname, "_") ||"_ALL_" || Word(5,cname,"_")
)
)
);
This fixes everything and now it works! Thanks for all the support. One final improvement I see is that instead of having a long string of concatentation with the word function is there a way to return everything before the delimiter? So for example is there a way to write it as <all values before Word(4, cname, "_") including the underscores> || "_ALL_" || Word(5, cname, "_"). So basically I want column name string before the 4th delimiter.
Column(cname) << set name(
Word(1, cname, "_") || "_"|| Word(2, cname, "_") || "_" || Word(3, cname, "_") || "_" || Word(4, cname, "_") ||"_ALL_" || Word(5,cname,"_")
This should work as long as Bourough don't contains an underscore.
dt = New Table("test",
Add Rows(3),
New Column("Label", Character, "Nominal", Set Selected, Set Values({"CALC1_BROOKLYN_M", "CALC2_QUEENS_F", "CALC2_M"}))
);
dt << New Column("test", Formula(Left(:Label, 5)));
dt << New Column("location", Formula(If(Is Missing(Regex(:Label, "_(.*)_")), "NYC", Regex(:Label, "_(.*)_", "\1"))));
dt << New Column("gender", Formula(Right(:Label, 1)));