cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
lisaash
Level I

Duplicate values in a columns

I would like to create a formula in a new column to count duplicate rows in another column how do I do this?

 

5 REPLIES 5
Victor_G
Super User

Re: Duplicate values in a colums

Hello @lisaash,

 

If I understand your problem correctly, you may find different options to create a count column with the number of duplicates in this topic : Solved: What is the easiest way to count the number of occurrences in a column when rows... - JMP Us...

Hope it helps you,

Victor GUILLER
Scientific Expertise Engineer
L'Oréal - Data & Analytics
txnelson
Super User

Re: Duplicate values in a colums

The 
 
Col Number( xCol, <byVar, ...> )

 

function will work for this.


Using the Big Class sample data table, counting all of the individuals that have the same weight, one can create a new column and apply this formula

Col Number( :Weight, :Weight );

to get the results

txnelson_0-1665405877906.png

 

Jim
Hegedus1
Level III

Re: Duplicate values in a colums

Hi, I am interested in a more generic solution/script.

Early in my work flow I would like to identify columns with only a single value since in essence they contain no information and will actually throw some errors if used in the wrong analysis.  For example some of the data files I am working with have 600+ columns of which as many as 40-50 may contain only the same repeated values.  Note they may be different from file to file so I need to keep them.

 

Now I need to identify them but I do not want to delete them, but I would like to get them out of the way.

For a given data table, I would like to identify all columns that have only a single value and group them into group with a label such as "no info".

How would you approach a generic script that I could add to the add-ins?

 

Andy

 

txnelson
Super User

Re: Duplicate values in a colums

Here is a little script that will get you started on what you want to do.

Names Default To Here( 1 );
dt = Current Data Table();

// Loop across the columns
For( i = N Cols( dt ), i >= 1, i--,
	Summarize( dt, bygroup = by( As Column( i ) ) );
	// Remove blank/missing category
	If( bygroup[1] == "" | bygroup[1] == ".",
		Remove From( bygroup, 1, 1 )
	);
	// If only 1 or less values are found, delete the column
	If( N Items( bygroup ) <= 1,
		dt << delete columns( i )
	);
);

Please take the time to make sure you understand how the above code works.  You will need to do that to make the changes to the code to get it to do exactly what you want the final product to work.

Scripting documentation is in the Scripting Guide and in the Scripting Index.

Jim
Jeff_Perkinson
Community Manager Community Manager

Re: Duplicate values in a columns

I'm afraid I don't understand exactly what you're looking for.

 

Could you give an example table showing what this new column would contain?

-Jeff