Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

- JMP User Community
- :
- Discussions
- :
- Deleting columns that fit a numeric criteria

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Jul 18, 2019 1:05 PM
(1754 views)

Hi

I have several thousand columns in my table and would like to trim it to only those columns that mostly have non-zero numbers. ie delete or group and collapse those columns what have zeros for more than half the rows. Any tips?

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

For big data tables a matrix solution may be faster than iterating over all columns.

Example:

```
dt = Current Data Table();
m = dt << get all columns as matrix; // Get table as matrix
// Get list of columns with > 50% zeroes
zero_cols = (dt << get column names)[(Loc(V Sum(m == 0) / N Row(m) > 0.5))];
// Delete them (or group them)
dt << Delete Columns(zero_cols);
// dt << Group Columns("zero", zero_cols);
```

6 REPLIES 6

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Deleting columns that fit a numeric criteria

I did not test it but this script gives you an idea.

```
Names Default to Here( 1 );
// assume that data table is open and reference to it is stored in variable dt
For( c = N Col( dt ), c > 0, c--,
n zero = dt << Get Rows Where( Column( c )[] == 0 );
If( n zero / N Row( dt ) > 0.5,
dt << Delete Columns( Column( c ) );
);
);
```

Learn it once, use it forever!

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Deleting columns that fit a numeric criteria

Thanks - the table is open, but would need some help with the synatx for how to refer to it in the dt variable.

`// assume that data table is open and reference to it is stored in variable dt`

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Deleting columns that fit a numeric criteria

`dt = Current Data Table();`

Learn it once, use it forever!

Highlighted
##

It is rendering JMP non-responsive! Not by CPU activity -- but just can't interact with the table window for minutes...

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Deleting columns that fit a numeric criteria

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

For big data tables a matrix solution may be faster than iterating over all columns.

Example:

```
dt = Current Data Table();
m = dt << get all columns as matrix; // Get table as matrix
// Get list of columns with > 50% zeroes
zero_cols = (dt << get column names)[(Loc(V Sum(m == 0) / N Row(m) > 0.5))];
// Delete them (or group them)
dt << Delete Columns(zero_cols);
// dt << Group Columns("zero", zero_cols);
```

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Deleting columns that fit a numeric criteria

My table had 55K cols and this removed ~17K of them. So it worked.

Article Labels

There are no labels assigned to this post.