Solved: find duplicates and drop by state

Report Inappropriate Content · Jul 23, 2023 02:05 PM

Hi,

I have a data table that contains duplicates.

I want to find the duplicates.

duplicates are rows with the same, Col, Row, Scan number, and ID values.

then, for duplicates with the same col, row, scan number, and ID values I want to keep only those with max abs offsetX (i only have the OffsetX values, need to run Abs formula )

each Col, Row, and Scan number should have 4 rows with ID 0-3.

my final table will include all unique results+ results after removing duplicates.

jthi · Jul 23, 2023 02:43 PM

I'm not 100% sure if this will work or not as I didn't test it

Create new column for Abs(OffsetX). You should be able to do this by selecting the column and right clicking
Sort descending by the new column
Select Col, Row, Scan and ID columns
Go to Rows / Row Selection and Select Duplicate Rows
You should have now the rows selected which you wish to delete

-Jarmo

View solution in original post

txnelson · Jul 23, 2023 03:01 PM

Here is a script that uses the Summary Platform to find the unique combinations of Col, Row, ID and Scan Number. It also uses a little trick to find the Maximum deviation from zero value for the OffssetX column. However, it is not finding a consistent number of rows for the unique combinations.

Names Default To Here( 1 );
dt = Data Table( "raw data-start" );
Data Table( "raw data-start" ) << Summary(
	Group( :Col, :Row, :ID, :Scan Number ),
	Min( :OffsetX ),
	Max( Transform Column( "Abs OffsetX", Formula( Abs( :OffsetX ) ) ) ),
	Freq( "None" ),
	Weight( "None" ),
	Link to Original Data Table( 0 ),
	statistics column name format( "column" )
);

For Each Row(
	If( :OffsetX > 0 & Abs( :OffsetX ) < :Abs OffsetX,
		:OffsetX = :Abs OffsetX
	)
);

// dt << delete columns( :Abs OffsetX );

Jim

View solution in original post

jthi · Jul 23, 2023 02:43 PM

I'm not 100% sure if this will work or not as I didn't test it

Create new column for Abs(OffsetX). You should be able to do this by selecting the column and right clicking
Sort descending by the new column
Select Col, Row, Scan and ID columns
Go to Rows / Row Selection and Select Duplicate Rows
You should have now the rows selected which you wish to delete

-Jarmo

txnelson · Jul 23, 2023 03:01 PM

Here is a script that uses the Summary Platform to find the unique combinations of Col, Row, ID and Scan Number. It also uses a little trick to find the Maximum deviation from zero value for the OffssetX column. However, it is not finding a consistent number of rows for the unique combinations.

Names Default To Here( 1 );
dt = Data Table( "raw data-start" );
Data Table( "raw data-start" ) << Summary(
	Group( :Col, :Row, :ID, :Scan Number ),
	Min( :OffsetX ),
	Max( Transform Column( "Abs OffsetX", Formula( Abs( :OffsetX ) ) ) ),
	Freq( "None" ),
	Weight( "None" ),
	Link to Original Data Table( 0 ),
	statistics column name format( "column" )
);

For Each Row(
	If( :OffsetX > 0 & Abs( :OffsetX ) < :Abs OffsetX,
		:OffsetX = :Abs OffsetX
	)
);

// dt << delete columns( :Abs OffsetX );

Jim

Ohad_s · Jul 24, 2023 03:39 AM

Thank you, this works fine!

find duplicates and drop by state

Re: find duplicates and drop by state

Re: find duplicates and drop by state

Re: find duplicates and drop by state

Re: find duplicates and drop by state

Re: find duplicates and drop by state

Recommended Articles

Get Going with JMP: Essentials for Using JMP

Distribution new features for JMP 17

Analytics with Confidence 2: Models That Don't Generalise

Multiple-Group Analysis in Structural Equation Modeling

A Two-Sample t Test