cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
JMP is taking Discovery online, April 16 and 18. Register today and join us for interactive sessions featuring popular presentation topics, networking, and discussions with the experts.
Choose Language Hide Translation Bar
Is it possible to find and get rid of duplicate rows easily?

Olivia Lippincott shares a tip from Byron Wingerd for an easy way to find and get rid of duplicate rows. Join the table to itself (Tables>Join), match all columns, drop multiples for the Main and With Tables, and then save the joined table with a new name.  (Saving with a new name is most always better than renaming or changing in place because it allows you to examine the changes and assure they reflect what you really wanted to do.)

Comments
WHTseng

Easy, efficient, and clear.

Thank you, gail_massari.

vhuac

Hi Gail,

 

This method is nice but may not work for a large data table with more than 50 millions rows or more than 10 GB in file size.  The reason is that it create an additional instance of the data table and hence use up all the memory and cause the computer to be very slow or even hang and crash.

 

Is there a more efficient way to remove duplicate that does not create an additional table and does not take too long?

Recommended Articles