Subscribe Bookmark RSS Feed

Comparing Rows for equality

bille

Community Trekker

Joined:

Jun 23, 2011

I have 2 data tables with the same column format, but different numbers of rows. Some of the column types are strings and others are integers.

I want to write a script that goes through table_1 a row at a time and see if there is an identical row in table_2 (based on the column elements in the row).

Do I have to compare for equality column element by element, or is there a quick way to compare the whole row? If it makes a difference, I don't need to know which table_2 row is equal, I only need to know if there is a row with equal values.

Thanks,
Bill
4 REPLIES
If you want to avoid having to run a double loop on all the rows in table A vs all the rows in table B, you could add one extra column to each, to indicate from which table all the rows in it come (i.e. 'A' or 'B'), concatenate the two tables into a single table, and then sort the combined table by one or maybe two columns that contain the largest ranges of values. Then run down the combined table row by row looking for any matches between each row and as many rows ahead as have identical values in those one or two columns. If you get a match anywhere, but find also that the extra column of those rows has an 'A' in one position and a 'B' in the other, then (and only then) compare all the other columns too, item by item. You'd then only need to search the table with a single pass (plus a few extra comparisons if you find several rows almost identical), as opposed to having to do the whole thing with a double loop.

If there is a way to compare an entire row of two tables in a single statement, I don't know what it is, but at least the above should probably speed things up a little if you're dealing with relatively large tables.
bille

Community Trekker

Joined:

Jun 23, 2011

Thanks David,

You got me thinking of a similar approach. For one table I created a string for each row comprised of all the row elements. I then inserted the string into a list. Similarly I created a string for each row in the second table and checked to see if it was contained in the list.

dtOld = Data Table( "oldDelta" );
dtNew = Data Table( "Delta" );


nr = N Rows( dtOld );
nc = N Cols( dtOld );
oldList = {};
For( j = 1, j <= nr, j++,
oldString = "";
For( k = 1, k <= nc, k++,
oldString = oldString || Char( Column( dtOld, k ) )
);
Insert Into( oldList, oldString );
);

nr = N Rows( dtNew );
For( j = 1, j <= nr, j++,
newString = "";
For( k = 1, k <= nc, k++,
newString = newString || Char( Column( dtNew, k ) )
);
If( Contains( oldList, newString ),
Print( j )
);
);

Message was edited by: Bill E
pmroz

Super User

Joined:

Jun 23, 2011

You could also try joining the two tables to form a new table. The "where" statement would be

col1A = col1B
and col2A = col2B
and col3A = col3B
etc.

Your resulting table would be the duplicates. Rather tedious if you have a lot of columns though.
ms

Super User

Joined:

Jun 23, 2011

Luckily, Join() accepts lists as arguments for Matching Columns().
So if you just need a simple answer (yes or no) to the question if there are equal rows, this "one-liner" should work for comparing two tables (here dt1 & dt2) with any number of columns, as long as they have equal number of columns in the same order (although column names may differ).