Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.
Showing results for
Search instead for
Did you mean:
Our World Statistics Day conversations have been a great reminder of how much statistics can inform our lives. Do you have an example of how statistics has made a difference in your life? Share your story with the Community!
How to create a selectable list based on column statistics (JSL)
Jul 29, 2019 5:55 AM(652 views)
I am currently struggling with realizing an idea i had for further automatization of my daily work.
Starting point of my idea is an automated data retrieval script i wrote for fetching manufacturing data from an sql server for a given set of IDs. I managed to implement a piece of code to automatically get rid of all the columns which have only missing or only "0"-values. The next step would be to filter out all the columns which only show one distinctive value. Now here is where it starts getting complicated.
I don't want to remove every column with a standard deviation of zero as there are some process values (columns) where a single distinctive value is sensible and still carries information (i.e. recipe-Nr the IDs have seen in a certain process step). I was thinking of a list of checkboxes or a selectable list comprising the column names and basic column statistics (especially standard deviation) and an action button to delete selected columns from the data table (I did find the Column viewer functionality very interesting for this, unfortunately I was not able to retrieve any piece of code this feature is using). In the best case, it would be possible to filter the columns which should be appearing within this selectable list based on their respective column statistics (StdDev = 0).
So right now I am sitting here with my idea and have no clue how to realize it. Any help or idea is greatly appreciated.
I think from what you are saying is that your idea is to use the condition 'standard deviation ==0' to discover columns with only a single value. You don't need to do that, you can write code that determines the number of unique levels and then identify the columns with a single level. Not sure if that will help you or not - if it will then check out the summarize function.
I tried messing around with different tabulate/summarize/summary szenarios, but didn't manage to find a constellation which gives me a table that can actually give me a feedback (i.e. giving back some information about selected rows like the "table" function) which could be used for deleting selected columns. Right now I'm working in the direction of putting together a list of columns with single values (via summarize), compiling them into a custom "table" function hoping to get the feedback i am looking for. This path might work but still is somewhat cumbersome, so if anyone has a more convenient path, please let me know :)