cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
The Discovery Summit 2025 Call for Content is open! Submit an abstract today to present at our premier analytics conference.
Choose Language Hide Translation Bar
View Original Published Thread

delete CHAR cols if there's only a single value, no string variation

TriangularLlama
Level II

Wondering how I can script something to look at all the columns to see if there's at least 2 different values and delete all the single-value cols? I have thousands of 'feature' cols, all character-type, but a noticeable amount of them only have a single record in them...aka the distribution level = 1.

 

how to detect and delete 1-record cols?

3 REPLIES 3
txnelson
Super User


Re: delete CHAR cols if there's only a single value, no string variation

Here is a simple little script that finds all character columns and then goes from the end of the list to the beginning and if there is only a single value not counting blanks is found then delete the column

Names Default To Here( 1 );
dt = Current Data Table();

charCols = dt << get column names( string, character );

For( i = N Items( charCols ), i >= 1, i--,
	Summarize( dt, bygroup = by( As Column( charCols[i] ) ) );
	If( bygroup[1] == "",
		Remove From( bygroup, 1, 1 )
	);
	If( N Items( bygroup ) <= 1,
		dt << delete columns( Column( dt, charCols[i] ) )
	);
);
Jim
jthi
Super User


Re: delete CHAR cols if there's only a single value, no string variation

Similar idea as Jim (use summarize to take unique values and then remove missing if there is such) but a bit different take: use filter each to find the columns to remove and then remove them all at the same time using delete columns.

Names Default To Here(1);

dt = Current Data Table();

char_cols = dt << get column names(Character, String);

cols_to_remove = Filter Each({colname}, char_cols,
	Summarize(dt, uniq = by(Column(dt, colname)));
	Remove From(uniq, Contains(uniq, "")); // to remove missing values from list
	N Items(uniq) == 1;
);


dt << Delete Columns(cols_to_remove);
-Jarmo
jthi
Super User


Re: delete CHAR cols if there's only a single value, no string variation

One thing to note: both me and Jim remove missing value from our list of unique values. Depending on your data/model/use case you might want to (or have to) consider that as a separate value so take that into account. Both our scripts can take that into account BUT they do need small changes.

-Jarmo