Subscribe Bookmark RSS Feed

Remove Rows where 3 out of 5 column values are identical

dmocatta

Community Member

Joined:

Sep 9, 2014

I having been trying to find a way to remove rows that have identical cells in some columns but not in others.

For example I want to find rows where values for columns PartNo, SerialNum and Site are identical but not Customer and Value. Then I want to delete one of these rows. In
this case I would be left with row 1,3 and 4-see below.

Thanks

Part NoSerial NumSiteCustomerValue
123a1.5
123b2
345c3
234d5
234e6
2 REPLIES
Wendy_Murphrey

Joined:

Jun 23, 2011

Hello,

You can use Join to join the table to itself, matching on the Part No, Serial Num and Site columns, and drop multiples.  Here is what the Join dialog would look like for this case.7234_JoinDropDups.PNG

Hope that helps!

Wendy
ian_jmp

Staff

Joined:

Jun 23, 2011

If you want to do it through scripting, the following code shows an example which retains only the first instance of a duplicate row. It exploits the fact that the summary table is linked to the detail table.

// Example data

dt1 = Open("$SAMPLE_DATA/Big Class.jmp");

// Which columns are used to define duplicates?

keyCols = {"Age", "Sex"};

// Make an invisible, linked summary table, dt2

dt2 = dt1 << Summary(Group(Eval(keyCols)), Invisible);

// Loop over the rows of dt2...

rows2Keep = [];

for (r=1, r<=NRows(dt2), r++,

  // ...select each row in turn

  dt2 << SelectRows({r});

  // Find the corresponding rows in dt1

  duplicateRows = dt1 << GetSelectedRows;

  // Keep only the first

  rows2Keep = VConcat(rows2Keep, duplicateRows[1]);

  // Remove the current selection

  dt2 << ClearSelect;

);

Close(dt2, NoSave);

// Delete the rows in dt1 that are duplicates

rows2Delete = (1::NRow(dt1))`;

rows2Delete[rows2Keep] = [];

dt1 << DeleteRows(rows2Delete);