Solved: Re: Quick way to compare two lists (uncommon elements)?

djhanson · Sep 20, 2017 1:30 PM

Does anyone know a quick way to compare two lists and identify any different list elements (i.e. uncommon elements)? I can do this using two, nested loops, but like many things in JSL I was hoping there might be some obscure, easier way . dj

My lists are much larger than this of course:

list1={"apple","pear","orange","kiwi","watermelon"};

list2={"apple","pear","orange","kiwi","strawberry","watermelon"};

Find any different/uncommon elements, in this case "strawberry".

Craige_Hales · Jan 10, 2017 09:41 AM

set1=associativearray({"apple","grape","orange","kiwi","watermelon"});
set2=associativearray({"apple","pear","orange","kiwi","strawberry","watermelon"});
set3=associativearray({"apple","pear","watermelon"});
intersection=set1;intersection<<Intersect(set2);
print("common (intersection) items:",intersection<<getkeys);
union=set1;union<<insert(set2);
print("all (union) items", union<<getkeys);
set1HasSet3=set1;show(set1HasSet3<<contains(set3));
set2HasSet3=set2;show(set2HasSet3<<contains(set3));

"common (intersection) items:"
{"apple", "kiwi", "orange", "watermelon"}
"all (union) items"
{"apple", "grape", "kiwi", "orange", "pear", "strawberry", "watermelon"}
set1HasSet3 << Contains(set3) = 0;
set2HasSet3 << Contains(set3) = 1;

The associative arrays are very good for millions of values, especially compared to writing a JSL script that loops over millions of values.

The scripting index has the details for the associative array being used as a set. Here's the uncommon items:

set1=associativearray({"apple","grape","orange","kiwi","watermelon"});
set2=associativearray({"apple","pear","orange","kiwi","strawberry","watermelon"});
intersection=set1;intersection<<Intersect(set2);
print("common (intersection) items:",intersection<<getkeys);
uncommon1 = set1; uncommon1<<remove(intersection);show(uncommon1<<getkeys);
uncommon2 = set2; uncommon2<<remove(intersection);show(uncommon2<<getkeys);
/*:

"common (intersection) items:"
{"apple", "kiwi", "orange", "watermelon"}
uncommon1 << getkeys = {"grape"};
uncommon2 << getkeys = {"pear", "strawberry"};

Craige

View solution in original post

ms · Oct 19, 2012 08:06 PM

I would like to know that too. Common items are can be found with Intersect() (see below) but I am not aware of a corresponding function for uncommon items.

There are more options for data tables than for lists. The script below identifies the uncommon elements via a temporary datatable. It may be faster than looping through long lists but I have not compared

list1 = {"apple", "pear", "orange", "kiwi", "watermelon", "mango"};

list2 = {"apple", "pear", "kiwi", "orange", "strawberry", "watermelon", "melon"};

dt = New Table( "temp",

New Column( "gr", character, setvalues( list1 || list2 ) ),

New Column( "nr", numeric, formula( 1 ), evalformula ),

invisible

);

Summarize( g = by( dt:gr ), n = Count( dt:nr ) );

not_in_common = g[Loc( As List( n ), 1 )];

common = Associative Array( list1 ) << intersect( Associative Array( list2 ) ) << getkeys;

Show( common, not_in_common );

Close( dt, nosave );

uday_guntupalli · Jan 10, 2017 09:05 AM

@ms :

This solution that you have offered has been extremely helpful to me in many different ways so far and I want to thank you for it first.

I would also like to inquire that when we are dealing with really large sets of data ( for e.g. a couple of million items each ) , this approach will require large amounts of memory and inserting the elements into a new table doesn't seem efficient. Is there a more effective way to identify uncommon elements between two columns in two data tables i.e. If Column 1 in Data Table 1 (dt1) and Column 1 in Data Table 2 (dt2) have a large list of numbers, can we compare them in a more effective way ?

Best
uday

Best
Uday

Craige_Hales · Jan 10, 2017 09:41 AM

set1=associativearray({"apple","grape","orange","kiwi","watermelon"});
set2=associativearray({"apple","pear","orange","kiwi","strawberry","watermelon"});
set3=associativearray({"apple","pear","watermelon"});
intersection=set1;intersection<<Intersect(set2);
print("common (intersection) items:",intersection<<getkeys);
union=set1;union<<insert(set2);
print("all (union) items", union<<getkeys);
set1HasSet3=set1;show(set1HasSet3<<contains(set3));
set2HasSet3=set2;show(set2HasSet3<<contains(set3));

"common (intersection) items:"
{"apple", "kiwi", "orange", "watermelon"}
"all (union) items"
{"apple", "grape", "kiwi", "orange", "pear", "strawberry", "watermelon"}
set1HasSet3 << Contains(set3) = 0;
set2HasSet3 << Contains(set3) = 1;

The associative arrays are very good for millions of values, especially compared to writing a JSL script that loops over millions of values.

The scripting index has the details for the associative array being used as a set. Here's the uncommon items:

set1=associativearray({"apple","grape","orange","kiwi","watermelon"});
set2=associativearray({"apple","pear","orange","kiwi","strawberry","watermelon"});
intersection=set1;intersection<<Intersect(set2);
print("common (intersection) items:",intersection<<getkeys);
uncommon1 = set1; uncommon1<<remove(intersection);show(uncommon1<<getkeys);
uncommon2 = set2; uncommon2<<remove(intersection);show(uncommon2<<getkeys);
/*:

"common (intersection) items:"
{"apple", "kiwi", "orange", "watermelon"}
uncommon1 << getkeys = {"grape"};
uncommon2 << getkeys = {"pear", "strawberry"};

Craige

Craige_Hales · Jan 10, 2017 09:48 AM

Also: if the members of the set are named with consecutive integers, starting at 1, this problem could be done very efficiently with a matrix. But, as presented with named set members, an associative array is probably better than sorting, and definitely better than an N^2 pair of nested loops.

Craige

Craige_Hales · Jan 10, 2017 10:08 AM

And the uncommon method does not require finding the intersection first. It's just the difference.

set1=associativearray({"apple","grape","orange","kiwi","watermelon"});
set2=associativearray({"apple","pear","orange","kiwi","strawberry","watermelon"});
uncommon1 = set1; uncommon1<<remove(set2);show(uncommon1<<getkeys);
uncommon2 = set2; uncommon2<<remove(set1);show(uncommon2<<getkeys);
/*:

uncommon1 << getkeys = {"grape"};
uncommon2 << getkeys = {"pear", "strawberry"};

Craige

vince_faller · Jan 20, 2017 02:54 PM

Quick question for this, is the time saved using the associative array vs the loop pretty much always larger than the overhead it takes to create the associative array vs the list?

Vince Faller - Predictum

Craige_Hales · Jan 20, 2017 05:10 PM

You would have to benchmark it for different sizes and for different JSL loop approaches. I'd guess somewhere between 2 and 10 items in the set would be faster with the associative array. At size 1, the associative array is overkill, but it still constructs pretty fast...about a million per second...

start = tickseconds();
for(i=1,i<=1e6,i++,
a=associativearray({"a","b"});
b=associativearray({"a","b"});
c=associativearray({"a","b"});
d=associativearray({"a","b"});
e=associativearray({"a","b"}); 
);
stop=tickseconds();
aatime=(stop-start);
write("\!ntime for 5,000,000 associative arrays:",aatime);

start = tickseconds();
for(i=1,i<=1e6,i++,
a=1;
b=2;
c=3;
d=4;
e=5; 
);
stop=tickseconds();
satime=(stop-start);
write("\!ntime for 5,000,000 simple assingments:",satime);

write("\!nsimple/associative=", satime/aatime);

/*:

time for 5,000,000 associative arrays:5.8166666666657
time for 5,000,000 simple assingments:0.283333333325572
simple/associative=0.0487106017178716

You probably can't run more than about 20-30 simple JSL statements for each associative array you create.

Craige

Quick way to compare two lists and identify the uncommon elements

Re: Quick way to compare two lists (uncommon elements)?

Re: Quick way to compare two lists (uncommon elements)?

Re: Quick way to compare two lists (uncommon elements)?

Re: Quick way to compare two lists (uncommon elements)?

Re: Quick way to compare two lists (uncommon elements)?

Re: Quick way to compare two lists (uncommon elements)?

Re: Quick way to compare two lists (uncommon elements)?

Re: Quick way to compare two lists (uncommon elements)?