- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
delete CHAR cols if there's only a single value, no string variation
Wondering how I can script something to look at all the columns to see if there's at least 2 different values and delete all the single-value cols? I have thousands of 'feature' cols, all character-type, but a noticeable amount of them only have a single record in them...aka the distribution level = 1.
how to detect and delete 1-record cols?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: delete CHAR cols if there's only a single value, no string variation
Here is a simple little script that finds all character columns and then goes from the end of the list to the beginning and if there is only a single value not counting blanks is found then delete the column
Names Default To Here( 1 );
dt = Current Data Table();
charCols = dt << get column names( string, character );
For( i = N Items( charCols ), i >= 1, i--,
Summarize( dt, bygroup = by( As Column( charCols[i] ) ) );
If( bygroup[1] == "",
Remove From( bygroup, 1, 1 )
);
If( N Items( bygroup ) <= 1,
dt << delete columns( Column( dt, charCols[i] ) )
);
);
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: delete CHAR cols if there's only a single value, no string variation
Similar idea as Jim (use summarize to take unique values and then remove missing if there is such) but a bit different take: use filter each to find the columns to remove and then remove them all at the same time using delete columns.
Names Default To Here(1);
dt = Current Data Table();
char_cols = dt << get column names(Character, String);
cols_to_remove = Filter Each({colname}, char_cols,
Summarize(dt, uniq = by(Column(dt, colname)));
Remove From(uniq, Contains(uniq, "")); // to remove missing values from list
N Items(uniq) == 1;
);
dt << Delete Columns(cols_to_remove);
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: delete CHAR cols if there's only a single value, no string variation
One thing to note: both me and Jim remove missing value from our list of unique values. Depending on your data/model/use case you might want to (or have to) consider that as a separate value so take that into account. Both our scripts can take that into account BUT they do need small changes.