I have a dataset with a categorical variable X. I want to select rows where the frequency count of variable X is greater than a threshold. For example I only want to see rows where the count is 5 or higher. There could be 100-200 different values of X, so a simple distribution will be too busy to do a graphical selection from.
I've tried various things:
1. Tabulation where I put X into the drop zone for rows. After converting the results to a table I have X in column 1 and the count (N) in column 2. I join this back to the original table using X as my join variable. Now I have the counts for X back in my original table and I can select rows using Rows > Row Selection > Select Where (N > threshold). This works but has too many steps to sell to users.
2. Pareto plot of X. Sorts by the count which is nice. Any way to rotate it? The brush tool doesn't seem to work here. So it's good to look but I can't seem to conveniently select bars having a count of (threshold) or higher.
One possible solution: Table --> Summary: Group by X This will result in a table of X and frequency of each value of X. This table will be linked to the original table.
In the summary table, "Select Where" X < 5. This will highlight all the rows where X < 5 on the summary table AND the original table. Now you can Row --> Exclude, Row --> Hide, on the summary table, and that will reflect on the original table, too.