- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Quick way to compare two lists and identify the uncommon elements
Does anyone know a quick way to compare two lists and identify any different list elements (i.e. uncommon elements)? I can do this using two, nested loops, but like many things in JSL I was hoping there might be some obscure, easier way . dj
My lists are much larger than this of course:
list1={"apple","pear","orange","kiwi","watermelon"};
list2={"apple","pear","orange","kiwi","strawberry","watermelon"};
Find any different/uncommon elements, in this case "strawberry".
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Quick way to compare two lists (uncommon elements)?
set1=associativearray({"apple","grape","orange","kiwi","watermelon"});
set2=associativearray({"apple","pear","orange","kiwi","strawberry","watermelon"});
set3=associativearray({"apple","pear","watermelon"});
intersection=set1;intersection<<Intersect(set2);
print("common (intersection) items:",intersection<<getkeys);
union=set1;union<<insert(set2);
print("all (union) items", union<<getkeys);
set1HasSet3=set1;show(set1HasSet3<<contains(set3));
set2HasSet3=set2;show(set2HasSet3<<contains(set3));
"common (intersection) items:" {"apple", "kiwi", "orange", "watermelon"} "all (union) items" {"apple", "grape", "kiwi", "orange", "pear", "strawberry", "watermelon"} set1HasSet3 << Contains(set3) = 0; set2HasSet3 << Contains(set3) = 1;
The associative arrays are very good for millions of values, especially compared to writing a JSL script that loops over millions of values.
The scripting index has the details for the associative array being used as a set. Here's the uncommon items:
set1=associativearray({"apple","grape","orange","kiwi","watermelon"});
set2=associativearray({"apple","pear","orange","kiwi","strawberry","watermelon"});
intersection=set1;intersection<<Intersect(set2);
print("common (intersection) items:",intersection<<getkeys);
uncommon1 = set1; uncommon1<<remove(intersection);show(uncommon1<<getkeys);
uncommon2 = set2; uncommon2<<remove(intersection);show(uncommon2<<getkeys);
/*:
"common (intersection) items:"
{"apple", "kiwi", "orange", "watermelon"}
uncommon1 << getkeys = {"grape"};
uncommon2 << getkeys = {"pear", "strawberry"};
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Quick way to compare two lists (uncommon elements)?
I would like to know that too. Common items are can be found with Intersect() (see below) but I am not aware of a corresponding function for uncommon items.
There are more options for data tables than for lists. The script below identifies the uncommon elements via a temporary datatable. It may be faster than looping through long lists but I have not compared
list1 = {"apple", "pear", "orange", "kiwi", "watermelon", "mango"};
list2 = {"apple", "pear", "kiwi", "orange", "strawberry", "watermelon", "melon"};
dt = New Table( "temp",
New Column( "gr", character, setvalues( list1 || list2 ) ),
New Column( "nr", numeric, formula( 1 ), evalformula ),
invisible
);
Summarize( g = by( dt:gr ), n = Count( dt:nr ) );
not_in_common = g[Loc( As List( n ), 1 )];
common = Associative Array( list1 ) << intersect( Associative Array( list2 ) ) << getkeys;
Show( common, not_in_common );
Close( dt, nosave );
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Quick way to compare two lists (uncommon elements)?
@ms :
This solution that you have offered has been extremely helpful to me in many different ways so far and I want to thank you for it first.
I would also like to inquire that when we are dealing with really large sets of data ( for e.g. a couple of million items each ) , this approach will require large amounts of memory and inserting the elements into a new table doesn't seem efficient. Is there a more effective way to identify uncommon elements between two columns in two data tables i.e. If Column 1 in Data Table 1 (dt1) and Column 1 in Data Table 2 (dt2) have a large list of numbers, can we compare them in a more effective way ?
Best
uday
Uday
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Quick way to compare two lists (uncommon elements)?
set1=associativearray({"apple","grape","orange","kiwi","watermelon"});
set2=associativearray({"apple","pear","orange","kiwi","strawberry","watermelon"});
set3=associativearray({"apple","pear","watermelon"});
intersection=set1;intersection<<Intersect(set2);
print("common (intersection) items:",intersection<<getkeys);
union=set1;union<<insert(set2);
print("all (union) items", union<<getkeys);
set1HasSet3=set1;show(set1HasSet3<<contains(set3));
set2HasSet3=set2;show(set2HasSet3<<contains(set3));
"common (intersection) items:" {"apple", "kiwi", "orange", "watermelon"} "all (union) items" {"apple", "grape", "kiwi", "orange", "pear", "strawberry", "watermelon"} set1HasSet3 << Contains(set3) = 0; set2HasSet3 << Contains(set3) = 1;
The associative arrays are very good for millions of values, especially compared to writing a JSL script that loops over millions of values.
The scripting index has the details for the associative array being used as a set. Here's the uncommon items:
set1=associativearray({"apple","grape","orange","kiwi","watermelon"});
set2=associativearray({"apple","pear","orange","kiwi","strawberry","watermelon"});
intersection=set1;intersection<<Intersect(set2);
print("common (intersection) items:",intersection<<getkeys);
uncommon1 = set1; uncommon1<<remove(intersection);show(uncommon1<<getkeys);
uncommon2 = set2; uncommon2<<remove(intersection);show(uncommon2<<getkeys);
/*:
"common (intersection) items:"
{"apple", "kiwi", "orange", "watermelon"}
uncommon1 << getkeys = {"grape"};
uncommon2 << getkeys = {"pear", "strawberry"};
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Quick way to compare two lists (uncommon elements)?
Also: if the members of the set are named with consecutive integers, starting at 1, this problem could be done very efficiently with a matrix. But, as presented with named set members, an associative array is probably better than sorting, and definitely better than an N^2 pair of nested loops.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Quick way to compare two lists (uncommon elements)?
And the uncommon method does not require finding the intersection first. It's just the difference.
set1=associativearray({"apple","grape","orange","kiwi","watermelon"});
set2=associativearray({"apple","pear","orange","kiwi","strawberry","watermelon"});
uncommon1 = set1; uncommon1<<remove(set2);show(uncommon1<<getkeys);
uncommon2 = set2; uncommon2<<remove(set1);show(uncommon2<<getkeys);
/*:
uncommon1 << getkeys = {"grape"};
uncommon2 << getkeys = {"pear", "strawberry"};
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Quick way to compare two lists (uncommon elements)?
Quick question for this, is the time saved using the associative array vs the loop pretty much always larger than the overhead it takes to create the associative array vs the list?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Quick way to compare two lists (uncommon elements)?
You would have to benchmark it for different sizes and for different JSL loop approaches. I'd guess somewhere between 2 and 10 items in the set would be faster with the associative array. At size 1, the associative array is overkill, but it still constructs pretty fast...about a million per second...
start = tickseconds();
for(i=1,i<=1e6,i++,
a=associativearray({"a","b"});
b=associativearray({"a","b"});
c=associativearray({"a","b"});
d=associativearray({"a","b"});
e=associativearray({"a","b"});
);
stop=tickseconds();
aatime=(stop-start);
write("\!ntime for 5,000,000 associative arrays:",aatime);
start = tickseconds();
for(i=1,i<=1e6,i++,
a=1;
b=2;
c=3;
d=4;
e=5;
);
stop=tickseconds();
satime=(stop-start);
write("\!ntime for 5,000,000 simple assingments:",satime);
write("\!nsimple/associative=", satime/aatime);
/*:
time for 5,000,000 associative arrays:5.8166666666657
time for 5,000,000 simple assingments:0.283333333325572
simple/associative=0.0487106017178716
You probably can't run more than about 20-30 simple JSL statements for each associative array you create.