Solved: Selecting duplicates from different source columns

Stijnw91 · Jul 4, 2023 08:09 AM

Hello,

I'm trying to select (and delete) duplicates after combining two data tables with a source column, I have 2 criteria for a selection:

-The duplicate values should match in 2 columns (Match :col1, :col2), works fine.

-The duplicate values should not have the same value in the generated source column.

I'm not sure how I can code the 2nd criterium, is there some simple != line or 'does not' match ':Source Table'?

dt1 << concatenate (dt1, dt2, append to first table(1), Create Source Column ) ;		
dt1 << Select Duplicate Rows (Match (:col1, :col2)) ;

dt1 << Delete Rows();

txnelson · Jul 4, 2023 01:10 PM

I think you need to take a different approach. Your submitted JSL has a couple of issues, and a misinterpretation of the Select Duplicate Rows function.

Your Concatenation statement creates a new data table where your data table dt1 is replicated twice in the output table.

The Select Duplicate Rows function only finds duplicates within the same column.

I suggest you look into using the Join platform, which has the capability of determining which rows between the 2 data tables have duplicate values.

Here is an example of comparing between 2 data tables and finding that the person named Robert is in both tables.

Names Default To Here( 1 );
dt1 = New Table( "one",
	Add Rows( 20 ),
	New Column( "name",
		Character,
		"Nominal",
		Set Values(
			{"KATIE", "LOUISE", "JANE", "JACLYN", "LILLIE", "TIM", "JAMES", "ROBERT", "BARBARA", "ALICE",
			"SUSAN", "JOHN", "JOE", "MICHAEL", "DAVID", "JUDY", "ELIZABETH", "LESLIE", "CAROL", "PATTY"}
		)
	)
);

dt2 = New Table( "Two",
	Add Rows( 20 ),
	New Column( "name",
		Character,
		"Nominal",
		Set Values(
			{"FREDERICK", "ALFRED", "HENRY", "LEWIS", "EDWARD", "CHRIS", "JEFFREY", "MARY", "AMY", "ROBERT",
			"WILLIAM", "CLAY", "MARK", "DANNY", "MARTHA", "MARION", "PHILLIP", "LINDA", "KIRK", "LAWRENCE"}
		)
	)
);

dt3 = dt1 << Join(
	With( dt2 ),
	Merge Same Name Columns,
	By Matching Columns( :name = :name ),
	Drop multiples( 1, 0 ),
	Include Nonmatches( 1, 1 ),
	Preserve main table order( 1 ),
	Output Table( "together" )
);

dt3 << select where( :Match Flag == 3 );

//dt3 << Delete Rows();
//dt3 << delete columns(match flag);

Jim

View solution in original post

txnelson · Jul 4, 2023 01:10 PM

I think you need to take a different approach. Your submitted JSL has a couple of issues, and a misinterpretation of the Select Duplicate Rows function.

Your Concatenation statement creates a new data table where your data table dt1 is replicated twice in the output table.

The Select Duplicate Rows function only finds duplicates within the same column.

I suggest you look into using the Join platform, which has the capability of determining which rows between the 2 data tables have duplicate values.

Here is an example of comparing between 2 data tables and finding that the person named Robert is in both tables.

Names Default To Here( 1 );
dt1 = New Table( "one",
	Add Rows( 20 ),
	New Column( "name",
		Character,
		"Nominal",
		Set Values(
			{"KATIE", "LOUISE", "JANE", "JACLYN", "LILLIE", "TIM", "JAMES", "ROBERT", "BARBARA", "ALICE",
			"SUSAN", "JOHN", "JOE", "MICHAEL", "DAVID", "JUDY", "ELIZABETH", "LESLIE", "CAROL", "PATTY"}
		)
	)
);

dt2 = New Table( "Two",
	Add Rows( 20 ),
	New Column( "name",
		Character,
		"Nominal",
		Set Values(
			{"FREDERICK", "ALFRED", "HENRY", "LEWIS", "EDWARD", "CHRIS", "JEFFREY", "MARY", "AMY", "ROBERT",
			"WILLIAM", "CLAY", "MARK", "DANNY", "MARTHA", "MARION", "PHILLIP", "LINDA", "KIRK", "LAWRENCE"}
		)
	)
);

dt3 = dt1 << Join(
	With( dt2 ),
	Merge Same Name Columns,
	By Matching Columns( :name = :name ),
	Drop multiples( 1, 0 ),
	Include Nonmatches( 1, 1 ),
	Preserve main table order( 1 ),
	Output Table( "together" )
);

dt3 << select where( :Match Flag == 3 );

//dt3 << Delete Rows();
//dt3 << delete columns(match flag);

Jim

Stijnw91 · Jul 5, 2023 03:49 AM

Thanks Jim, this works well!

Happy to learn more about the Join function.

Regards,

Stijn

Selecting duplicates from different source columns

Re: Selecting duplicates from different source columns

Re: Selecting duplicates from different source columns

Re: Selecting duplicates from different source columns