cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
  • Register to attend Discovery Summit 2025 Online: Early Users Edition, Sept. 24-25.
  • New JMP features coming to desktops everywhere this September. Sign up to learn more at jmp.com/launch.
Choose Language Hide Translation Bar
SKHDS
Level I

Duplicate Row Selection Including the Original Row

Hello. I have a data table that contains 960K rows and 49 columns. There are duplicate rows (from 2-4 rows) based on one of the 49 columns. I've used the Row Selection - Select Duplicate Rows function after selecting this one column that contains the duplicate values. When the results came back, the duplicate rows became highlighted but the original row above the duplicate row/s is not. Is there a way for JMP to select ALL rows containing the duplicated values including the original row with the identical value? I don't have knowledge of writing a script but I would think the built-in functions in JMP would allow me to do this row selection. I'm using JMP16 on Windows 10. If someone knows the solution or a trick to do this, that 'll be greatly appreciated.

4 REPLIES 4
txnelson
Super User

Re: Duplicate Row Selection Including the Original Row

The reason the Select Duplicate Rows function does not select all of the matching rows, is because it is assumed that you want to retain one of the matching rows. So when the Select Duplicate Rows function returns with the selected rows, all you have to do is to right click on one of the selected rows, and select, Delete Rows. You will then be left with only one copy of the duplicated rows.
What is your need to see all matching rows?
You can select the cells you want matched for a given row, and then right click and choose, Select Matching Cells, and you will see all of the rows where exact matches are found.
Jim

Re: Duplicate Row Selection Including the Original Row

Hi - I have a similar need.

 

My dataset is recording measurements.  Sometimes, the measurement is invalid and not all columns of data are present in a row; alternatively, a measurement will be made twice resulting in an extra column.  In this case, that measurement is repeated.

 

So, my dataset has some repeated rows...some directly following the row with the same UID, sometimes farther down the dataset.

 

I'd like to select all the rows that have duplicate UID, not just the duplicate rows, but also the row that was duplicated.  I'd also like to choose only the final row for my data analysis.

 

Thoughts?

Tim

txnelson
Super User

Re: Duplicate Row Selection Including the Original Row

This will find all duplicate rows

     Tables=>Summary

and specifying for grouping, all of the columns that will detect matching rows.

Then delete all rows in the Summary table that found only a single row for the grouping columns(N Rows==1 indicates no duplicates found).

Then use 

     Tables=>Update

to merge the Summary table back into the original table, matching on the grouping columns.

You can then use

    dt << select where(:nrows<0)

to find all of the duplicate rows. 

 

You can also use the Select Duplicate Rows, leaving the last row found as unselected.  Then by deleting all of the selected rows, you will delete all of the duplicate rows except the last row found.  To do this:

Create a new column.  Call it RowNum.  Set the formula for the new column to

Row()

Then go to the Col Info window for the new row, and Remove the formula.

Next sort the data in descending order based upon the RowNum column.

Now run the Select Duplicate Rows.  It will find all duplicate, leaving the first matching row unselected, which because of the sorting, is actually the last duplicate row.

Delete the selected rows.

Sort the data again, this time sort it by ascending RowNum, which returns the table back to the original order.

Then delete the RowNum column 

Jim

Re: Duplicate Row Selection Including the Original Row

Hi,

 

An easy way to do this interactively is with the Summary platform, in the tables menu. Place the column(s) of interest in the Group role, then hit OK.

 

A summary table, linked to the original table, is created. Selecting any row in the summary table selects the source rows in the original table from which the summary row derives.

 

In your case, you likely want to select rows with an entry other than "1" in the N Rows column. If you'd like to select all of these at once:

- right-click on a "1" in the N Row column and select "Select matching rows" from the context-sensitive menu

- now, right-click on any colored field to the left of the first column (that is, on any selected row) and select "Invert Selection". This selects all rows NOT containing a 1 in the N Rows column... which is all rows where duplicates exist in the main table. Since the tables are linked, all corresponding rows in the main table are now selected.

 

Cheers,

Brady

Recommended Articles