Unable to use 'Declutter' button on K-Means Clustering Control Panel
Mar 9, 2011 4:18 PM(970 views)
I want to eliminate outliers from my cluster analysis. I know there is Declutter option (Outlier Clean Up) to do this. But I'm unable to use this.
I clicked on Declutter and entered 5 when prompted for "How many neighbors to go upto?". It gives five distance plots and I can see some outliers in the plot which I want to exclude.
Below the plot there is a button 'Reset Excluded'. I'm not sure what this does. I selected some points on the plot and clicked this button, but no use. The selected observations are still being assigned to a cluster.
Can some one help me with how to make use of Declutter?
Re: Unable to use 'Declutter' button on K-Means Clustering Control Panel
Mar 15, 2011 10:39 AM(928 views)
"Reset Excluded" actually means "Do not include the selected points in the clustering procedure."
So, after selecting some points considered as outliers on the plot, please do right-clicking on the plot and select "Row Exclude". It excludes the corresponding rows in your data table. After that, click "Reset Excluded" (This action seems to do nothing, but it filters out the selected rows for the subsequent clustering) and click "Go" button. The clustering result is made based on the subset of your original data, which consists of only non-excluded rows.
Here is why we have the "Reset Excluded" button. Let us go back to the plot and select more points that you may want to "additionally" exclude. Right-click and "Row Exclude" again. Click "Reset Excluded" and then "Go". Another clustering result is generated based on the current setting of excluding rows. We can systematically repeat these steps for our cluster analysis.
Another tip for excluding rows: After seleting some points on the plot, you can go to the data table and from the top menu select "Rows" -> "Row Selection" -> "Invert Row Selection". After that, select "Tables" -> "Subset" from the top menu. Clikc "OK". Then, you can do your cluster analysis based on this subset consisting of only selected rows from your original data, which doesn't have the outliers.