turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- JMP User Community
- :
- Discussions
- :
- Quick way to compare two lists (uncommon elements)...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic to the Top
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Oct 19, 2012 2:18 PM
(884 views)

Does anyone know a quick way to compare two lists and identify any different list elements (i.e. uncommon elements)? I can do this using two, nested loops, but like many things in JSL I was hoping there might be some obscure, easier way . dj

My lists are much larger than this of course:

list1={"apple","pear","orange","kiwi","watermelon"

list2={"apple","pear","orange","kiwi","strawberry"

Find any different/uncommon elements, in this case "strawberry".

7 REPLIES

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Oct 19, 2012 5:06 PM
(499 views)

I would like to know that too. Common items are can be found with Intersect() (see below) but I am not aware of a corresponding function for uncommon items.

There are more options for data tables than for lists. The script below identifies the uncommon elements via a temporary datatable. It may be faster than looping through long lists but I have not compared

list1 = **{**"apple", "pear", "orange", "kiwi", "watermelon", "mango"**}**;

list2 = **{**"apple", "pear", "kiwi", "orange", "strawberry", "watermelon", "melon"**}**;

dt = New Table**(** "temp",

New Column**(** "gr", character, setvalues**(** list1 || list2 **)** **)**,

New Column**(** "nr", numeric, formula**(** **1** **)**, evalformula **)**,

invisible

**)**;

Summarize**(** g = by**(** dt:gr **)**, n = Count**(** dt:nr **)** **)**;

not_in_common = g**[**Loc**(** As List**(** n **)**, **1** **)]**;

common = Associative Array**(** list1 **)** << intersect**(** Associative Array**(** list2 **)** **)** << getkeys;

Show**(** common, not_in_common **)**;

Close**(** dt, nosave **)**;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Jan 10, 2017 6:05 AM
(456 views)

@ms :

This solution that you have offered has been extremely helpful to me in many different ways so far and I want to thank you for it first.

I would also like to inquire that when we are dealing with really large sets of data ( for e.g. a couple of million items each ) , this approach will require large amounts of memory and inserting the elements into a new table doesn't seem efficient. Is there a more effective way to identify uncommon elements between two columns in two data tables i.e. If Column 1 in Data Table 1 (dt1) and Column 1 in Data Table 2 (dt2) have a large list of numbers, can we compare them in a more effective way ?

Best

uday

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Jan 10, 2017 6:41 AM
(452 views)

`set1=associativearray({"apple","grape","orange","k`iwi","watermelon"});
set2=associativearray({"apple","pear","orange","kiwi","strawberry","watermelon"});
set3=associativearray({"apple","pear","watermelon"});
intersection=set1;intersection<<Intersect(set2);
print("common (intersection) items:",intersection<<getkeys);
union=set1;union<<insert(set2);
print("all (union) items", union<<getkeys);
set1HasSet3=set1;show(set1HasSet3<<contains(set3));
set2HasSet3=set2;show(set2HasSet3<<contains(set3));

"common (intersection) items:" {"apple", "kiwi", "orange", "watermelon"} "all (union) items" {"apple", "grape", "kiwi", "orange", "pear", "strawberry", "watermelon"} set1HasSet3 << Contains(set3) = 0; set2HasSet3 << Contains(set3) = 1;

The associative arrays are very good for millions of values, especially compared to writing a JSL script that loops over millions of values.

The scripting index has the details for the associative array being used as a set. Here's the uncommon items:

`set1=associativearray({"apple","grape","orange","k`iwi","watermelon"});
set2=associativearray({"apple","pear","orange","kiwi","strawberry","watermelon"});
intersection=set1;intersection<<Intersect(set2);
print("common (intersection) items:",intersection<<getkeys);
uncommon1 = set1; uncommon1<<remove(intersection);show(uncommon1<<getkeys);
uncommon2 = set2; uncommon2<<remove(intersection);show(uncommon2<<getkeys);
/*:
"common (intersection) items:"
{"apple", "kiwi", "orange", "watermelon"}
uncommon1 << getkeys = {"grape"};
uncommon2 << getkeys = {"pear", "strawberry"};

Craige

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Jan 10, 2017 6:48 AM
(450 views)

Also: if the members of the set are named with consecutive integers, starting at 1, this problem could be done very efficiently with a matrix. But, as presented with named set members, an associative array is probably better than sorting, and definitely better than an N^2 pair of nested loops.

Craige

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Jan 10, 2017 7:08 AM
(445 views)

And the uncommon method does not require finding the intersection first. It's just the difference.

`set1=associativearray({"apple","grape","orange","k`iwi","watermelon"});
set2=associativearray({"apple","pear","orange","kiwi","strawberry","watermelon"});
uncommon1 = set1; uncommon1<<remove(set2);show(uncommon1<<getkeys);
uncommon2 = set2; uncommon2<<remove(set1);show(uncommon2<<getkeys);
/*:
uncommon1 << getkeys = {"grape"};
uncommon2 << getkeys = {"pear", "strawberry"};

Craige

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Jan 20, 2017 11:54 AM
(345 views)

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Jan 20, 2017 2:10 PM
(328 views)

You would have to benchmark it for different sizes and for different JSL loop approaches. I'd guess somewhere between 2 and 10 items in the set would be faster with the associative array. At size 1, the associative array is overkill, but it still constructs pretty fast...about a million per second...

start = tickseconds();

for(i=1,i<=1e6,i++,

a=associativearray({"a","b"});

b=associativearray({"a","b"});

c=associativearray({"a","b"});

d=associativearray({"a","b"});

e=associativearray({"a","b"});

);

stop=tickseconds();

aatime=(stop-start);

write("\!ntime for 5,000,000 associative arrays:",aatime);

start = tickseconds();

for(i=1,i<=1e6,i++,

a=1;

b=2;

c=3;

d=4;

e=5;

);

stop=tickseconds();

satime=(stop-start);

write("\!ntime for 5,000,000 simple assingments:",satime);

write("\!nsimple/associative=", satime/aatime);

/*:

time for 5,000,000 associative arrays:5.8166666666657

time for 5,000,000 simple assingments:0.283333333325572

simple/associative=0.0487106017178716

You probably can't run more than about 20-30 simple JSL statements for each associative array you create.

Craige