cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Browse apps to extend the software in the new JMP Marketplace
Choose Language Hide Translation Bar
djhanson
Level V

Quick way to compare two lists and identify the uncommon elements

Does anyone know a quick way to compare two lists and identify any different list elements (i.e. uncommon elements)?  I can do this using two, nested loops, but like many things in JSL I was hoping there might be some obscure, easier way .  dj

 

My lists are much larger than this of course:

list1={"apple","pear","orange","kiwi","watermelon"};

list2={"apple","pear","orange","kiwi","strawberry","watermelon"};

 

Find any different/uncommon elements, in this case "strawberry".

1 ACCEPTED SOLUTION

Accepted Solutions
Craige_Hales
Super User

Re: Quick way to compare two lists (uncommon elements)?

set1=associativearray({"apple","grape","orange","kiwi","watermelon"});
set2=associativearray({"apple","pear","orange","kiwi","strawberry","watermelon"});
set3=associativearray({"apple","pear","watermelon"});
intersection=set1;intersection<<Intersect(set2);
print("common (intersection) items:",intersection<<getkeys);
union=set1;union<<insert(set2);
print("all (union) items", union<<getkeys);
set1HasSet3=set1;show(set1HasSet3<<contains(set3));
set2HasSet3=set2;show(set2HasSet3<<contains(set3));
"common (intersection) items:"
{"apple", "kiwi", "orange", "watermelon"}
"all (union) items"
{"apple", "grape", "kiwi", "orange", "pear", "strawberry", "watermelon"}
set1HasSet3 << Contains(set3) = 0;
set2HasSet3 << Contains(set3) = 1;

The associative arrays are very good for millions of values, especially compared to writing a JSL script that loops over millions of values.

 

The scripting index has the details for the associative array being used as a set.  Here's the uncommon items:

set1=associativearray({"apple","grape","orange","kiwi","watermelon"});
set2=associativearray({"apple","pear","orange","kiwi","strawberry","watermelon"});
intersection=set1;intersection<<Intersect(set2);
print("common (intersection) items:",intersection<<getkeys);
uncommon1 = set1; uncommon1<<remove(intersection);show(uncommon1<<getkeys);
uncommon2 = set2; uncommon2<<remove(intersection);show(uncommon2<<getkeys);
/*:

"common (intersection) items:"
{"apple", "kiwi", "orange", "watermelon"}
uncommon1 << getkeys = {"grape"};
uncommon2 << getkeys = {"pear", "strawberry"};

 

Craige

View solution in original post

7 REPLIES 7
ms
Super User (Alumni) ms
Super User (Alumni)

Re: Quick way to compare two lists (uncommon elements)?

I would like to know that too. Common items are can be found with Intersect() (see below) but I am not aware of a corresponding function for uncommon items.

There are more options for data tables than for lists. The script below identifies the uncommon elements via a temporary datatable. It may be faster than looping through long lists but I have not compared

list1 = {"apple", "pear", "orange", "kiwi", "watermelon", "mango"};

list2 = {"apple", "pear", "kiwi", "orange", "strawberry", "watermelon", "melon"};

dt = New Table( "temp",

  New Column( "gr", character, setvalues( list1 || list2 ) ),

  New Column( "nr", numeric, formula( 1 ), evalformula ),

  invisible

);

Summarize( g = by( dt:gr ), n = Count( dt:nr ) );

not_in_common = g[Loc( As List( n ), 1 )];

common = Associative Array( list1 ) << intersect( Associative Array( list2 ) ) << getkeys;

Show( common, not_in_common );

Close( dt, nosave );

uday_guntupalli
Level VIII

Re: Quick way to compare two lists (uncommon elements)?

@ms : 

      This solution that you have offered has been extremely helpful to me in many different ways so far and I want to thank you for it first. 

      I would also like to inquire that when we are dealing with really large sets of data ( for e.g. a couple of million items each ) , this approach will require large amounts of memory and inserting the elements into a new table doesn't seem efficient. Is there a more effective way to identify uncommon elements between two columns in two data tables i.e. If Column 1 in Data Table 1 (dt1) and Column 1 in Data Table 2 (dt2) have a large list of  numbers, can we compare them in a more effective way ? 

 

Best 
uday 

Best
Uday
Craige_Hales
Super User

Re: Quick way to compare two lists (uncommon elements)?

set1=associativearray({"apple","grape","orange","kiwi","watermelon"});
set2=associativearray({"apple","pear","orange","kiwi","strawberry","watermelon"});
set3=associativearray({"apple","pear","watermelon"});
intersection=set1;intersection<<Intersect(set2);
print("common (intersection) items:",intersection<<getkeys);
union=set1;union<<insert(set2);
print("all (union) items", union<<getkeys);
set1HasSet3=set1;show(set1HasSet3<<contains(set3));
set2HasSet3=set2;show(set2HasSet3<<contains(set3));
"common (intersection) items:"
{"apple", "kiwi", "orange", "watermelon"}
"all (union) items"
{"apple", "grape", "kiwi", "orange", "pear", "strawberry", "watermelon"}
set1HasSet3 << Contains(set3) = 0;
set2HasSet3 << Contains(set3) = 1;

The associative arrays are very good for millions of values, especially compared to writing a JSL script that loops over millions of values.

 

The scripting index has the details for the associative array being used as a set.  Here's the uncommon items:

set1=associativearray({"apple","grape","orange","kiwi","watermelon"});
set2=associativearray({"apple","pear","orange","kiwi","strawberry","watermelon"});
intersection=set1;intersection<<Intersect(set2);
print("common (intersection) items:",intersection<<getkeys);
uncommon1 = set1; uncommon1<<remove(intersection);show(uncommon1<<getkeys);
uncommon2 = set2; uncommon2<<remove(intersection);show(uncommon2<<getkeys);
/*:

"common (intersection) items:"
{"apple", "kiwi", "orange", "watermelon"}
uncommon1 << getkeys = {"grape"};
uncommon2 << getkeys = {"pear", "strawberry"};

 

Craige
Craige_Hales
Super User

Re: Quick way to compare two lists (uncommon elements)?

Also: if the members of the set are named with consecutive integers, starting at 1, this problem could be done very efficiently with a matrix.  But, as presented with named set members, an associative array is probably better than sorting, and definitely better than an N^2 pair of nested loops.

Craige
Craige_Hales
Super User

Re: Quick way to compare two lists (uncommon elements)?

And the uncommon method does not require finding the intersection first.  It's just the difference.

set1=associativearray({"apple","grape","orange","kiwi","watermelon"});
set2=associativearray({"apple","pear","orange","kiwi","strawberry","watermelon"});
uncommon1 = set1; uncommon1<<remove(set2);show(uncommon1<<getkeys);
uncommon2 = set2; uncommon2<<remove(set1);show(uncommon2<<getkeys);
/*:

uncommon1 << getkeys = {"grape"};
uncommon2 << getkeys = {"pear", "strawberry"};
Craige
vince_faller
Super User (Alumni)

Re: Quick way to compare two lists (uncommon elements)?

Quick question for this, is the time saved using the associative array vs the loop pretty much always larger than the overhead it takes to create the associative array vs the list?  

Vince Faller - Predictum
Craige_Hales
Super User

Re: Quick way to compare two lists (uncommon elements)?

You would have to benchmark it for different sizes and for different JSL loop approaches. I'd guess somewhere between 2 and 10 items in the set would be faster with the associative array. At size 1, the associative array is overkill, but it still constructs pretty fast...about a million per second...

start = tickseconds();
for(i=1,i<=1e6,i++,
a=associativearray({"a","b"});
b=associativearray({"a","b"});
c=associativearray({"a","b"});
d=associativearray({"a","b"});
e=associativearray({"a","b"});
);
stop=tickseconds();
aatime=(stop-start);
write("\!ntime for 5,000,000 associative arrays:",aatime);

start = tickseconds();
for(i=1,i<=1e6,i++,
a=1;
b=2;
c=3;
d=4;
e=5;
);
stop=tickseconds();
satime=(stop-start);
write("\!ntime for 5,000,000 simple assingments:",satime);

write("\!nsimple/associative=", satime/aatime);

/*:

time for 5,000,000 associative arrays:5.8166666666657
time for 5,000,000 simple assingments:0.283333333325572
simple/associative=0.0487106017178716

You probably can't run more than about 20-30 simple JSL statements for each associative array you create.

Craige