- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
remove almost empty categorical columns
Some one can help me?
From a database i need to remove categorical columns with a certain percentage of missing character and at the same time I would like to remove the numeric columns with a percentage of stagnant values (not necessary 100%).
Thanks. Felice
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: remove almost empty categorical columns
Hope that helps explain things.
If you want more granularity, just run multiple loops with different percentages using different strategies.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: remove almost empty categorical columns
Here is a quick and dirty script that gets rid of columns with less than a specified percent of data
Names Default To Here( 1 );
dt = Current Data Table();
percent = 90;
colList = dt << get column names( character, string );
NumRows = N Rows( dt );
For( i = N Items( colList ), i >= 1, i--,
If( Col Number( Column( dt, i ) ) / NumRows * 100 < percent,
dt << delete columns( colList[i] )
)
);
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: remove almost empty categorical columns
Another method, I believe slightly more robust.
Names Default To Here( 1 );
dt = Current Data Table();
percent = 90;
delcols = {};
numRows = nrows(dt);
for(i=1; maxi=NCols(dt), i<=maxi, i++,
col = column(dt, i);
nmissing = col N missing(col);
if(nmissing > numRows * percent/100,
insert into(delcols, i);
);
);
if(nitems(delcols) > 0, dt << delete columns(delcols));
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: remove almost empty categorical columns
I think you are also looking for something like this:
Which will delete a numeric column if all data is only one value.
Names Default To Here( 1 );
dt = Current Data Table();
delcols = {};
numRows = nrows(dt);
for(i=1; maxi=NCols(dt), i<=maxi, i++,
col = column(dt, i);
try(summarize(dt, max = max(col), min = min(col));
if((max == min) | (ismissing(max)),
insertinto(delcols, i);
)
);
);
if(nitems(delcols) > 0, dt << delete columns(delcols));
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: remove almost empty categorical columns
you might be able to use the method you posted before but with the mode of the column for something less than 100%
Names Default To Here( 1 ); dt = Current Data Table(); percent = 90; delcols = {}; numRows = nrows(dt); for(i=1; maxi=NCols(dt), i<=maxi, i++, if(nrows(loc(aslist(column(dt,i)<<getasmatrix),mode(column(dt,i)<<getasmatrix))) > numRows * percent/100, insert into(delcols, i); ); ); if(nitems(delcols) > 0, dt << delete columns(delcols));
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: remove almost empty categorical columns
Hi
I tried yor script on the second table below and the result is here.
As you can see it was able to remove almost all desired columns but second one (col2) not. How I can do to remove it too?
Thanks Felice
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: remove almost empty categorical columns
Just combine the strategies. Something like:
Names Default To Here( 1 );
dt = Current Data Table();
percent = 90;
delcols = {};
numRows = nrows(dt);
for(i=1; maxi=NCols(dt), i<=maxi, i++,
if(nrows(loc(aslist(column(dt,i)<<getasmatrix),mode(column(dt,i)<<getasmatrix))) > numRows * percent/100,
insert into(delcols, i);
,
if(Col N Missing(column(dt,i)) > numRows * percent/100,
insert into(delcols, i);
);
);
);
if(nitems(delcols) > 0, dt << delete columns(delcols));
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: remove almost empty categorical columns
Hi msharp thanks for your reply. Let'm ask another question. The percent variable work for both on categorical and numerical columns? If yes can I have two different value for missing and stagnant cases?
Sorry for maybe stupid question but I'm not expert in jsl language.
Felice
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: remove almost empty categorical columns
Hope that helps explain things.
If you want more granularity, just run multiple loops with different percentages using different strategies.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: remove almost empty categorical columns
Perfect thanks.
Felice