Subscribe Bookmark RSS Feed

Quick way to compare two lists and identify the uncommon elements

djhanson

Community Trekker

Joined:

Jun 23, 2011

Does anyone know a quick way to compare two lists and identify any different list elements (i.e. uncommon elements)?  I can do this using two, nested loops, but like many things in JSL I was hoping there might be some obscure, easier way .  dj

 

My lists are much larger than this of course:

list1={"apple","pear","orange","kiwi","watermelon"};

list2={"apple","pear","orange","kiwi","strawberry","watermelon"};

 

Find any different/uncommon elements, in this case "strawberry".

1 ACCEPTED SOLUTION

Accepted Solutions
Craige_Hales

Staff

Joined:

Mar 21, 2013

Solution
set1=associativearray({"apple","grape","orange","kiwi","watermelon"});
set2=associativearray({"apple","pear","orange","kiwi","strawberry","watermelon"});
set3=associativearray({"apple","pear","watermelon"});
intersection=set1;intersection<<Intersect(set2);
print("common (intersection) items:",intersection<<getkeys);
union=set1;union<<insert(set2);
print("all (union) items", union<<getkeys);
set1HasSet3=set1;show(set1HasSet3<<contains(set3));
set2HasSet3=set2;show(set2HasSet3<<contains(set3));
"common (intersection) items:"
{"apple", "kiwi", "orange", "watermelon"}
"all (union) items"
{"apple", "grape", "kiwi", "orange", "pear", "strawberry", "watermelon"}
set1HasSet3 << Contains(set3) = 0;
set2HasSet3 << Contains(set3) = 1;

The associative arrays are very good for millions of values, especially compared to writing a JSL script that loops over millions of values.

 

The scripting index has the details for the associative array being used as a set.  Here's the uncommon items:

set1=associativearray({"apple","grape","orange","kiwi","watermelon"});
set2=associativearray({"apple","pear","orange","kiwi","strawberry","watermelon"});
intersection=set1;intersection<<Intersect(set2);
print("common (intersection) items:",intersection<<getkeys);
uncommon1 = set1; uncommon1<<remove(intersection);show(uncommon1<<getkeys);
uncommon2 = set2; uncommon2<<remove(intersection);show(uncommon2<<getkeys);
/*:

"common (intersection) items:"
{"apple", "kiwi", "orange", "watermelon"}
uncommon1 << getkeys = {"grape"};
uncommon2 << getkeys = {"pear", "strawberry"};

 

Craige
7 REPLIES
ms

Super User

Joined:

Jun 23, 2011

I would like to know that too. Common items are can be found with Intersect() (see below) but I am not aware of a corresponding function for uncommon items.

There are more options for data tables than for lists. The script below identifies the uncommon elements via a temporary datatable. It may be faster than looping through long lists but I have not compared

list1 = {"apple", "pear", "orange", "kiwi", "watermelon", "mango"};

list2 = {"apple", "pear", "kiwi", "orange", "strawberry", "watermelon", "melon"};

dt = New Table( "temp",

  New Column( "gr", character, setvalues( list1 || list2 ) ),

  New Column( "nr", numeric, formula( 1 ), evalformula ),

  invisible

);

Summarize( g = by( dt:gr ), n = Count( dt:nr ) );

not_in_common = g[Loc( As List( n ), 1 )];

common = Associative Array( list1 ) << intersect( Associative Array( list2 ) ) << getkeys;

Show( common, not_in_common );

Close( dt, nosave );

uday_guntupalli

Community Trekker

Joined:

Sep 15, 2014

@ms : 

      This solution that you have offered has been extremely helpful to me in many different ways so far and I want to thank you for it first. 

      I would also like to inquire that when we are dealing with really large sets of data ( for e.g. a couple of million items each ) , this approach will require large amounts of memory and inserting the elements into a new table doesn't seem efficient. Is there a more effective way to identify uncommon elements between two columns in two data tables i.e. If Column 1 in Data Table 1 (dt1) and Column 1 in Data Table 2 (dt2) have a large list of  numbers, can we compare them in a more effective way ? 

 

Best 
uday 

Best
Uday
Craige_Hales

Staff

Joined:

Mar 21, 2013

Solution
set1=associativearray({"apple","grape","orange","kiwi","watermelon"});
set2=associativearray({"apple","pear","orange","kiwi","strawberry","watermelon"});
set3=associativearray({"apple","pear","watermelon"});
intersection=set1;intersection<<Intersect(set2);
print("common (intersection) items:",intersection<<getkeys);
union=set1;union<<insert(set2);
print("all (union) items", union<<getkeys);
set1HasSet3=set1;show(set1HasSet3<<contains(set3));
set2HasSet3=set2;show(set2HasSet3<<contains(set3));
"common (intersection) items:"
{"apple", "kiwi", "orange", "watermelon"}
"all (union) items"
{"apple", "grape", "kiwi", "orange", "pear", "strawberry", "watermelon"}
set1HasSet3 << Contains(set3) = 0;
set2HasSet3 << Contains(set3) = 1;

The associative arrays are very good for millions of values, especially compared to writing a JSL script that loops over millions of values.

 

The scripting index has the details for the associative array being used as a set.  Here's the uncommon items:

set1=associativearray({"apple","grape","orange","kiwi","watermelon"});
set2=associativearray({"apple","pear","orange","kiwi","strawberry","watermelon"});
intersection=set1;intersection<<Intersect(set2);
print("common (intersection) items:",intersection<<getkeys);
uncommon1 = set1; uncommon1<<remove(intersection);show(uncommon1<<getkeys);
uncommon2 = set2; uncommon2<<remove(intersection);show(uncommon2<<getkeys);
/*:

"common (intersection) items:"
{"apple", "kiwi", "orange", "watermelon"}
uncommon1 << getkeys = {"grape"};
uncommon2 << getkeys = {"pear", "strawberry"};

 

Craige
Craige_Hales

Staff

Joined:

Mar 21, 2013

Also: if the members of the set are named with consecutive integers, starting at 1, this problem could be done very efficiently with a matrix.  But, as presented with named set members, an associative array is probably better than sorting, and definitely better than an N^2 pair of nested loops.

Craige
Craige_Hales

Staff

Joined:

Mar 21, 2013

And the uncommon method does not require finding the intersection first.  It's just the difference.

set1=associativearray({"apple","grape","orange","kiwi","watermelon"});
set2=associativearray({"apple","pear","orange","kiwi","strawberry","watermelon"});
uncommon1 = set1; uncommon1<<remove(set2);show(uncommon1<<getkeys);
uncommon2 = set2; uncommon2<<remove(set1);show(uncommon2<<getkeys);
/*:

uncommon1 << getkeys = {"grape"};
uncommon2 << getkeys = {"pear", "strawberry"};
Craige
vince_faller

Super User

Joined:

Mar 17, 2015

Quick question for this, is the time saved using the associative array vs the loop pretty much always larger than the overhead it takes to create the associative array vs the list?  

Craige_Hales

Staff

Joined:

Mar 21, 2013

You would have to benchmark it for different sizes and for different JSL loop approaches. I'd guess somewhere between 2 and 10 items in the set would be faster with the associative array. At size 1, the associative array is overkill, but it still constructs pretty fast...about a million per second...

start = tickseconds();
for(i=1,i<=1e6,i++,
a=associativearray({"a","b"});
b=associativearray({"a","b"});
c=associativearray({"a","b"});
d=associativearray({"a","b"});
e=associativearray({"a","b"});
);
stop=tickseconds();
aatime=(stop-start);
write("\!ntime for 5,000,000 associative arrays:",aatime);

start = tickseconds();
for(i=1,i<=1e6,i++,
a=1;
b=2;
c=3;
d=4;
e=5;
);
stop=tickseconds();
satime=(stop-start);
write("\!ntime for 5,000,000 simple assingments:",satime);

write("\!nsimple/associative=", satime/aatime);

/*:

time for 5,000,000 associative arrays:5.8166666666657
time for 5,000,000 simple assingments:0.283333333325572
simple/associative=0.0487106017178716

You probably can't run more than about 20-30 simple JSL statements for each associative array you create.

Craige