Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- JMP User Community
- :
- Discussions
- :
- remove almost empty categorical columns

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Mar 16, 2017 10:18 AM
(8100 views)

Some one can help me?

From a database i need to remove categorical columns with a certain percentage of missing character and at the same time I would like to remove the numeric columns with a percentage of stagnant values (not necessary 100%).

Thanks. Felice

1 ACCEPTED SOLUTION

Accepted Solutions

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Yeah it all depends on how you set it up. The first check is finding the most common value in each column and determines if it is above or below the threshold percentage. Unfortunately, a missing value isn't a value, so it can't be the most common. That's why it doesn't work on the numeric columns. It does work on the Character columns b/c "" is both missing and an empty string value.

Hope that helps explain things.

If you want more granularity, just run multiple loops with different percentages using different strategies.

Hope that helps explain things.

If you want more granularity, just run multiple loops with different percentages using different strategies.

9 REPLIES 9

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: remove almost empty categorical columns

Here is a quick and dirty script that gets rid of columns with less than a specified percent of data

```
Names Default To Here( 1 );
dt = Current Data Table();
percent = 90;
colList = dt << get column names( character, string );
NumRows = N Rows( dt );
For( i = N Items( colList ), i >= 1, i--,
If( Col Number( Column( dt, i ) ) / NumRows * 100 < percent,
dt << delete columns( colList[i] )
)
);
```

Jim

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: remove almost empty categorical columns

Another method, I believe slightly more robust.

```
Names Default To Here( 1 );
dt = Current Data Table();
percent = 90;
delcols = {};
numRows = nrows(dt);
for(i=1; maxi=NCols(dt), i<=maxi, i++,
col = column(dt, i);
nmissing = col N missing(col);
if(nmissing > numRows * percent/100,
insert into(delcols, i);
);
);
if(nitems(delcols) > 0, dt << delete columns(delcols));
```

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: remove almost empty categorical columns

I think you are also looking for something like this:

Which will delete a numeric column if all data is only one value.

```
Names Default To Here( 1 );
dt = Current Data Table();
delcols = {};
numRows = nrows(dt);
for(i=1; maxi=NCols(dt), i<=maxi, i++,
col = column(dt, i);
try(summarize(dt, max = max(col), min = min(col));
if((max == min) | (ismissing(max)),
insertinto(delcols, i);
)
);
);
if(nitems(delcols) > 0, dt << delete columns(delcols));
```

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: remove almost empty categorical columns

you might be able to use the method you posted before but with the mode of the column for something less than 100%

Names Default To Here( 1 ); dt = Current Data Table(); percent = 90; delcols = {}; numRows = nrows(dt); for(i=1; maxi=NCols(dt), i<=maxi, i++, if(nrows(loc(aslist(column(dt,i)<<getasmatrix),mode(column(dt,i)<<getasmatrix))) > numRows * percent/100, insert into(delcols, i); ); ); if(nitems(delcols) > 0, dt << delete columns(delcols));

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: remove almost empty categorical columns

Hi

I tried yor script on the second table below and the result is here.

As you can see it was able to remove almost all desired columns but second one (col2) not. How I can do to remove it too?

Thanks Felice

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: remove almost empty categorical columns

Just combine the strategies. Something like:

```
Names Default To Here( 1 );
dt = Current Data Table();
percent = 90;
delcols = {};
numRows = nrows(dt);
for(i=1; maxi=NCols(dt), i<=maxi, i++,
if(nrows(loc(aslist(column(dt,i)<<getasmatrix),mode(column(dt,i)<<getasmatrix))) > numRows * percent/100,
insert into(delcols, i);
,
if(Col N Missing(column(dt,i)) > numRows * percent/100,
insert into(delcols, i);
);
);
);
if(nitems(delcols) > 0, dt << delete columns(delcols));
```

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: remove almost empty categorical columns

Hi msharp thanks for your reply. Let'm ask another question. The percent variable work for both on categorical and numerical columns? If yes can I have two different value for missing and stagnant cases?

Sorry for maybe stupid question but I'm not expert in jsl language.

Felice

Highlighted
Yeah it all depends on how you set it up. The first check is finding the most common value in each column and determines if it is above or below the threshold percentage. Unfortunately, a missing value isn't a value, so it can't be the most common. That's why it doesn't work on the numeric columns. It does work on the Character columns b/c "" is both missing and an empty string value.

Hope that helps explain things.

If you want more granularity, just run multiple loops with different percentages using different strategies.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Hope that helps explain things.

If you want more granularity, just run multiple loops with different percentages using different strategies.

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: remove almost empty categorical columns

Perfect thanks.

Felice