Choose Language Hide Translation Bar
Highlighted
TRR21
Level III

Deleting columns that fit a numeric criteria

Hi

I have several thousand columns in my table and would like to trim it to only those columns that mostly have non-zero numbers. ie delete or group and collapse those columns what have zeros for more than half the rows. Any tips?

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted
ms
Super User ms
Super User

Re: Deleting columns that fit a numeric criteria

For big data tables a matrix solution may be faster than iterating over all columns.

 

Example:

dt = Current Data Table();
m = dt << get all columns as matrix; // Get table as matrix
// Get list of columns with > 50% zeroes
zero_cols = (dt << get column names)[(Loc(V Sum(m == 0) / N Row(m) > 0.5))];
// Delete them (or group them)
dt << Delete Columns(zero_cols);
// dt << Group Columns("zero", zero_cols);

View solution in original post

6 REPLIES 6
Highlighted

Re: Deleting columns that fit a numeric criteria

I did not test it but this script gives you an idea.

 

Names Default to Here( 1 );

// assume that data table is open and reference to it is stored in variable dt

For( c = N Col( dt ), c > 0, c--,
	n zero = dt << Get Rows Where( Column( c )[] == 0 );
	If( n zero / N Row( dt ) > 0.5,
		dt << Delete Columns( Column( c ) );
	);
);
Learn it once, use it forever!
Highlighted
TRR21
Level III

Re: Deleting columns that fit a numeric criteria

Thanks - the table is open, but would need some help with the synatx for how to refer to it in the dt variable.

// assume that data table is open and reference to it is stored in variable dt

 

Highlighted

Re: Deleting columns that fit a numeric criteria

dt = Current Data Table();
Learn it once, use it forever!
Highlighted
TRR21
Level III

Re: Deleting columns that fit a numeric criteria

It is rendering JMP non-responsive! Not by CPU activity -- but just can't interact with the table window for minutes...

Highlighted
ms
Super User ms
Super User

Re: Deleting columns that fit a numeric criteria

For big data tables a matrix solution may be faster than iterating over all columns.

 

Example:

dt = Current Data Table();
m = dt << get all columns as matrix; // Get table as matrix
// Get list of columns with > 50% zeroes
zero_cols = (dt << get column names)[(Loc(V Sum(m == 0) / N Row(m) > 0.5))];
// Delete them (or group them)
dt << Delete Columns(zero_cols);
// dt << Group Columns("zero", zero_cols);

View solution in original post

Highlighted
TRR21
Level III

Re: Deleting columns that fit a numeric criteria

My table had 55K cols and this removed ~17K of them. So it worked.
Article Labels

    There are no labels assigned to this post.