I know this has been discussed before, but I'm looking for suggestions for my particular situation. I have a very large data table (~50 columns x 50,000+ rows). I need to check for "duplicate rows", where duplicate means the rows match in three columns (eg. ColA, ColB, and ColC). When duplicates exist, I need to delete all except the first of the matching rows.
Ideally, I'd like to do this with a script as I frequently need to re-pull and re-analyze the updated table. I suspect I can use a summary table to help with this (at least it will identify & select the duplicate rows). However, from there I'm not sure how to automate moving through each set of "matched" rows and delete all but the first.
I would use summary on the the 3 matching columns. Then join the summary table to the original table, match on the same 3 columns, with the drop duplicates option checked, and select only the relevant columns you want for the output table.