cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Check out the JMP® Marketplace featured Capability Explorer add-in
Choose Language Hide Translation Bar
CMG
CMG
Level II

Find and identify duplicates for unsorted data

Fairly new to JMP, so my knowledge of how to use formulas is pretty limited.

 

If I have a column of values (character) and I want to identify them as duplicates in a different column, what formula would I use? (My data has 10 columns already sorted by a different column's values)

 

In the past, I have used the formula below (when the data that needs duplicates identified has been sorted alphabetically). 

If(
	Row() == 1, 1,
	:Column1 != Lag( :Column1 ), 1,
	0
)

Since I am not allowed to present data sorted in any other way than the way it comes in, I am kind of at a loss as to how to indicate duplicates.

 

Any and all help is appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions
jthi
Super User

Re: Find and identify duplicates for unsorted data

Below are at least two ways to do this with scripting:

Names Default To Here(1);

dt = New Table("Untitled",
	Add Rows(7),
	Compress File When Saved(1),
	New Column("Column 1",
		Numeric,
		"Continuous",
		Format("Best", 12),
		Set Values([1, 1, 2, 3, 2, 5, 1]),
		Set Display Width(60)
	)
);


dt << New column("Dublicate_formula", Numeric, <<Formula(
	If(Col Rank(1, :column 1) > 1, 1,0))
);

dt << Select duplicate rows(Match(:column 1));
dubRows = dt << Get Selected Rows;
dt << New Column("Dublicates", Numeric, Nominal);
Column(dt, "Dublicates")[dubRows] = 1;
Column(dt, "Dublicates")[dt << Get Rows Where(IsMissing(:Dublicates))] = 0;

You can also do this without any scripting:

  1. Select columns you are interested in.
  2. From Rows menu: Row Selection -> Select Dublicate Rows
  3. Rows menu: Row Selection -> Name Selection In Column
  4. Choose name, Selected as 1 and Unselected as 0
-Jarmo

View solution in original post

6 REPLIES 6
Mauro_Gerber
Level IV

Re: Find and identify duplicates for unsorted data

 

Maybe this line helps

 

 

dt = Current Data Table();

dt << Select duplicate rows( Match( :column_1, :column_2, :column_3) );

 

 

This will select rows that are not unique in the upper combination of column 1 to 3.

With the selection you can now delete, hide and exclude, fill an additional column or what you want to do with them.

 

 

r_select = dt << select rows( dt << get selected rows ); // this gets you a vector of row number.

r_select << Delete Rows;

// or
r_select << Hide and Exclude;

 

 

"I thought about our dilemma, and I came up with a solution that I honestly think works out best for one of both of us"
- GLaDOS
CMG
CMG
Level II

Re: Find and identify duplicates for unsorted data

Thank you for the response.

 

I don't think I explained myself well. What I want is a helper column to indicate if a certain value is duplicate, maybe with a "1" indicating all duplicates, and a "0" indicating no duplicates. I do not want to hide or delete the duplicates.

 

jthi
Super User

Re: Find and identify duplicates for unsorted data

Below are at least two ways to do this with scripting:

Names Default To Here(1);

dt = New Table("Untitled",
	Add Rows(7),
	Compress File When Saved(1),
	New Column("Column 1",
		Numeric,
		"Continuous",
		Format("Best", 12),
		Set Values([1, 1, 2, 3, 2, 5, 1]),
		Set Display Width(60)
	)
);


dt << New column("Dublicate_formula", Numeric, <<Formula(
	If(Col Rank(1, :column 1) > 1, 1,0))
);

dt << Select duplicate rows(Match(:column 1));
dubRows = dt << Get Selected Rows;
dt << New Column("Dublicates", Numeric, Nominal);
Column(dt, "Dublicates")[dubRows] = 1;
Column(dt, "Dublicates")[dt << Get Rows Where(IsMissing(:Dublicates))] = 0;

You can also do this without any scripting:

  1. Select columns you are interested in.
  2. From Rows menu: Row Selection -> Select Dublicate Rows
  3. Rows menu: Row Selection -> Name Selection In Column
  4. Choose name, Selected as 1 and Unselected as 0
-Jarmo
CMG
CMG
Level II

Re: Find and identify duplicates for unsorted data

Thank you very much. I was able to mark without using the scripting.

Appreciate your help!

txnelson
Super User

Re: Find and identify duplicates for unsorted data

You can also do this as a formula column using the below formula which is using the Big Class data table as an example

If( Row() == 1,
	Current Data Table() << select duplicate rows(
		Match( :age, :sex, Empty() )
	)
);
If( Selected( Row State( Row() ) ),
	"Group 1",
	"Group 2"
);

 

Jim
ms
Super User (Alumni) ms
Super User (Alumni)

Re: Find and identify duplicates for unsorted data

Here's another example using a column formula. 

 

dt = Open( "$SAMPLE_DATA/Big Class.jmp" );

// Create a column that indicates duplicate names (there are two "Robert")

dt << New Column( "Duplicate", Formula( If( Col Number( Row(), :name ) > 1, 1, 0 ) ) );