cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Check out the JMP® Marketplace featured Capability Explorer add-in
Choose Language Hide Translation Bar

delete CHAR cols if there's only a single value, no string variation

Wondering how I can script something to look at all the columns to see if there's at least 2 different values and delete all the single-value cols? I have thousands of 'feature' cols, all character-type, but a noticeable amount of them only have a single record in them...aka the distribution level = 1.

 

how to detect and delete 1-record cols?

3 REPLIES 3
txnelson
Super User

Re: delete CHAR cols if there's only a single value, no string variation

Here is a simple little script that finds all character columns and then goes from the end of the list to the beginning and if there is only a single value not counting blanks is found then delete the column

Names Default To Here( 1 );
dt = Current Data Table();

charCols = dt << get column names( string, character );

For( i = N Items( charCols ), i >= 1, i--,
	Summarize( dt, bygroup = by( As Column( charCols[i] ) ) );
	If( bygroup[1] == "",
		Remove From( bygroup, 1, 1 )
	);
	If( N Items( bygroup ) <= 1,
		dt << delete columns( Column( dt, charCols[i] ) )
	);
);
Jim
jthi
Super User

Re: delete CHAR cols if there's only a single value, no string variation

Similar idea as Jim (use summarize to take unique values and then remove missing if there is such) but a bit different take: use filter each to find the columns to remove and then remove them all at the same time using delete columns.

Names Default To Here(1);

dt = Current Data Table();

char_cols = dt << get column names(Character, String);

cols_to_remove = Filter Each({colname}, char_cols,
	Summarize(dt, uniq = by(Column(dt, colname)));
	Remove From(uniq, Contains(uniq, "")); // to remove missing values from list
	N Items(uniq) == 1;
);


dt << Delete Columns(cols_to_remove);
-Jarmo
jthi
Super User

Re: delete CHAR cols if there's only a single value, no string variation

One thing to note: both me and Jim remove missing value from our list of unique values. Depending on your data/model/use case you might want to (or have to) consider that as a separate value so take that into account. Both our scripts can take that into account BUT they do need small changes.

-Jarmo