cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Check out the JMP® Marketplace featured Capability Explorer add-in
Choose Language Hide Translation Bar
lisaash
Level I

Duplicate values in a columns

I would like to create a formula in a new column to count duplicate rows in another column how do I do this?

 

9 REPLIES 9
Victor_G
Super User

Re: Duplicate values in a colums

Hello @lisaash,

 

If I understand your problem correctly, you may find different options to create a count column with the number of duplicates in this topic : Solved: What is the easiest way to count the number of occurrences in a column when rows... - JMP Us...

Hope it helps you,

Victor GUILLER
L'Oréal Data & Analytics

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)
txnelson
Super User

Re: Duplicate values in a colums

The 
 
Col Number( xCol, <byVar, ...> )

 

function will work for this.


Using the Big Class sample data table, counting all of the individuals that have the same weight, one can create a new column and apply this formula

Col Number( :Weight, :Weight );

to get the results

txnelson_0-1665405877906.png

 

Jim
Hegedus1
Level III

Re: Duplicate values in a colums

Hi, I am interested in a more generic solution/script.

Early in my work flow I would like to identify columns with only a single value since in essence they contain no information and will actually throw some errors if used in the wrong analysis.  For example some of the data files I am working with have 600+ columns of which as many as 40-50 may contain only the same repeated values.  Note they may be different from file to file so I need to keep them.

 

Now I need to identify them but I do not want to delete them, but I would like to get them out of the way.

For a given data table, I would like to identify all columns that have only a single value and group them into group with a label such as "no info".

How would you approach a generic script that I could add to the add-ins?

 

Andy

 

txnelson
Super User

Re: Duplicate values in a colums

Here is a little script that will get you started on what you want to do.

Names Default To Here( 1 );
dt = Current Data Table();

// Loop across the columns
For( i = N Cols( dt ), i >= 1, i--,
	Summarize( dt, bygroup = by( As Column( i ) ) );
	// Remove blank/missing category
	If( bygroup[1] == "" | bygroup[1] == ".",
		Remove From( bygroup, 1, 1 )
	);
	// If only 1 or less values are found, delete the column
	If( N Items( bygroup ) <= 1,
		dt << delete columns( i )
	);
);

Please take the time to make sure you understand how the above code works.  You will need to do that to make the changes to the code to get it to do exactly what you want the final product to work.

Scripting documentation is in the Scripting Guide and in the Scripting Index.

Jim
Phuong4692
Level I

Re: Duplicate values in a colums

 

Thank you for your great help. But could you help me a little more?

how can I count duplicated "Weight" if it happens with 2 different ages or 3 as below?

Phuong4692_0-1722417149304.png

 

 

txnelson
Super User

Re: Duplicate values in a colums

Assuming you want to know the count across all ages for each unique weight, here is one way to do it

names default to here(1);
dt=
// Open Data Table: Big Class.jmp
// → Data Table( "Big Class" )
Open( "$SAMPLE_DATA/Big Class.jmp" );

// Create a table that has only unique values of weight for each age
dtOne = dt << Summary(
	Group( :age, :weight ),
	Freq( "None" ),
	Weight( "None" ),
	output table name( "Summary of Big Class grouped by age, weight" )
);

// Create a table that has the counts of the different weights across ages
dtTwo = dtOne <<
Summary(
	Group( :weight ),
	Freq( "None" ),
	Weight( "None" ),
	output table name(
		"Summary of Summary of Big Class grouped by age, weight grouped by weight"
	)
);

// Add the counts to the original data table
dt << Update(
	With(
		Data Table(dtTwo
		)
	),
	Match Columns( :weight = :weight )
);

close( dtOne, nosave );
close( dtTwo, nosave );

txnelson_0-1722424880622.png

 

Jim
Phuong4692
Level I

Re: Duplicate values in a colums

thank you so much but it's a little different with my target. My homework is like that below. But so far, I don't have ideal how to deal with that

 

Phuong4692_0-1722439993685.png

 
txnelson
Super User

Re: Duplicate values in a colums

Use 

     Tables=>Subset

with the By option based on Process

Then use

     Tables=>Join

To join matching on Model and you can select the Inner Join as you indicate

Jim
Jeff_Perkinson
Community Manager Community Manager

Re: Duplicate values in a columns

I'm afraid I don't understand exactly what you're looking for.

 

Could you give an example table showing what this new column would contain?

-Jeff