turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- JMP User Community
- :
- Discussions
- :
- Discussions
- :
- Re: Quick way to compare two lists (uncommon elements)?

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Oct 19, 2012 2:18 PM
(11638 views)

Does anyone know a quick way to compare two lists and identify any different list elements (i.e. uncommon elements)? I can do this using two, nested loops, but like many things in JSL I was hoping there might be some obscure, easier way . dj

My lists are much larger than this of course:

list1={"apple","pear","orange","kiwi","watermelon"};

list2={"apple","pear","orange","kiwi","strawberry","watermelon"};

Find any different/uncommon elements, in this case "strawberry".

1 ACCEPTED SOLUTION

Accepted Solutions

Highlighted
Solution

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Jan 10, 2017 6:41 AM
(17476 views)
| Posted in reply to message from uday_guntupalli 01/10/2017 09:05 AM

```
set1=associativearray({"apple","grape","orange","kiwi","watermelon"});
set2=associativearray({"apple","pear","orange","kiwi","strawberry","watermelon"});
set3=associativearray({"apple","pear","watermelon"});
intersection=set1;intersection<<Intersect(set2);
print("common (intersection) items:",intersection<<getkeys);
union=set1;union<<insert(set2);
print("all (union) items", union<<getkeys);
set1HasSet3=set1;show(set1HasSet3<<contains(set3));
set2HasSet3=set2;show(set2HasSet3<<contains(set3));
```

"common (intersection) items:" {"apple", "kiwi", "orange", "watermelon"} "all (union) items" {"apple", "grape", "kiwi", "orange", "pear", "strawberry", "watermelon"} set1HasSet3 << Contains(set3) = 0; set2HasSet3 << Contains(set3) = 1;

The associative arrays are very good for millions of values, especially compared to writing a JSL script that loops over millions of values.

The scripting index has the details for the associative array being used as a set. Here's the uncommon items:

```
set1=associativearray({"apple","grape","orange","kiwi","watermelon"});
set2=associativearray({"apple","pear","orange","kiwi","strawberry","watermelon"});
intersection=set1;intersection<<Intersect(set2);
print("common (intersection) items:",intersection<<getkeys);
uncommon1 = set1; uncommon1<<remove(intersection);show(uncommon1<<getkeys);
uncommon2 = set2; uncommon2<<remove(intersection);show(uncommon2<<getkeys);
/*:
"common (intersection) items:"
{"apple", "kiwi", "orange", "watermelon"}
uncommon1 << getkeys = {"grape"};
uncommon2 << getkeys = {"pear", "strawberry"};
```

Craige

7 REPLIES

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

I would like to know that too. Common items are can be found with Intersect() (see below) but I am not aware of a corresponding function for uncommon items.

There are more options for data tables than for lists. The script below identifies the uncommon elements via a temporary datatable. It may be faster than looping through long lists but I have not compared

list1 = **{**"apple", "pear", "orange", "kiwi", "watermelon", "mango"**}**;

list2 = **{**"apple", "pear", "kiwi", "orange", "strawberry", "watermelon", "melon"**}**;

dt = New Table**(** "temp",

New Column**(** "gr", character, setvalues**(** list1 || list2 **)** **)**,

New Column**(** "nr", numeric, formula**(** **1** **)**, evalformula **)**,

invisible

**)**;

Summarize**(** g = by**(** dt:gr **)**, n = Count**(** dt:nr **)** **)**;

not_in_common = g**[**Loc**(** As List**(** n **)**, **1** **)]**;

common = Associative Array**(** list1 **)** << intersect**(** Associative Array**(** list2 **)** **)** << getkeys;

Show**(** common, not_in_common **)**;

Close**(** dt, nosave **)**;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

@ms :

This solution that you have offered has been extremely helpful to me in many different ways so far and I want to thank you for it first.

I would also like to inquire that when we are dealing with really large sets of data ( for e.g. a couple of million items each ) , this approach will require large amounts of memory and inserting the elements into a new table doesn't seem efficient. Is there a more effective way to identify uncommon elements between two columns in two data tables i.e. If Column 1 in Data Table 1 (dt1) and Column 1 in Data Table 2 (dt2) have a large list of numbers, can we compare them in a more effective way ?

Best

uday

Best

Uday

Uday

Highlighted
Solution

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Jan 10, 2017 6:41 AM
(17477 views)
| Posted in reply to message from uday_guntupalli 01/10/2017 09:05 AM

```
set1=associativearray({"apple","grape","orange","kiwi","watermelon"});
set2=associativearray({"apple","pear","orange","kiwi","strawberry","watermelon"});
set3=associativearray({"apple","pear","watermelon"});
intersection=set1;intersection<<Intersect(set2);
print("common (intersection) items:",intersection<<getkeys);
union=set1;union<<insert(set2);
print("all (union) items", union<<getkeys);
set1HasSet3=set1;show(set1HasSet3<<contains(set3));
set2HasSet3=set2;show(set2HasSet3<<contains(set3));
```

"common (intersection) items:" {"apple", "kiwi", "orange", "watermelon"} "all (union) items" {"apple", "grape", "kiwi", "orange", "pear", "strawberry", "watermelon"} set1HasSet3 << Contains(set3) = 0; set2HasSet3 << Contains(set3) = 1;

The associative arrays are very good for millions of values, especially compared to writing a JSL script that loops over millions of values.

The scripting index has the details for the associative array being used as a set. Here's the uncommon items:

```
set1=associativearray({"apple","grape","orange","kiwi","watermelon"});
set2=associativearray({"apple","pear","orange","kiwi","strawberry","watermelon"});
intersection=set1;intersection<<Intersect(set2);
print("common (intersection) items:",intersection<<getkeys);
uncommon1 = set1; uncommon1<<remove(intersection);show(uncommon1<<getkeys);
uncommon2 = set2; uncommon2<<remove(intersection);show(uncommon2<<getkeys);
/*:
"common (intersection) items:"
{"apple", "kiwi", "orange", "watermelon"}
uncommon1 << getkeys = {"grape"};
uncommon2 << getkeys = {"pear", "strawberry"};
```

Craige

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Jan 10, 2017 6:48 AM
(11204 views)
| Posted in reply to message from Craige_Hales 01/10/2017 09:41 AM

Also: if the members of the set are named with consecutive integers, starting at 1, this problem could be done very efficiently with a matrix. But, as presented with named set members, an associative array is probably better than sorting, and definitely better than an N^2 pair of nested loops.

Craige

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Jan 10, 2017 7:08 AM
(11199 views)
| Posted in reply to message from Craige_Hales 01/10/2017 09:48 AM

And the uncommon method does not require finding the intersection first. It's just the difference.

```
set1=associativearray({"apple","grape","orange","kiwi","watermelon"});
set2=associativearray({"apple","pear","orange","kiwi","strawberry","watermelon"});
uncommon1 = set1; uncommon1<<remove(set2);show(uncommon1<<getkeys);
uncommon2 = set2; uncommon2<<remove(set1);show(uncommon2<<getkeys);
/*:
uncommon1 << getkeys = {"grape"};
uncommon2 << getkeys = {"pear", "strawberry"};
```

Craige

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Jan 20, 2017 11:54 AM
(11099 views)
| Posted in reply to message from Craige_Hales 01/10/2017 09:41 AM

Quick question for this, is the time saved using the associative array vs the loop pretty much always larger than the overhead it takes to create the associative array vs the list?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Jan 20, 2017 2:10 PM
(11082 views)
| Posted in reply to message from vince_faller 01/20/2017 02:54 PM

You would have to benchmark it for different sizes and for different JSL loop approaches. I'd guess somewhere between 2 and 10 items in the set would be faster with the associative array. At size 1, the associative array is overkill, but it still constructs pretty fast...about a million per second...

start = tickseconds();

for(i=1,i<=1e6,i++,

a=associativearray({"a","b"});

b=associativearray({"a","b"});

c=associativearray({"a","b"});

d=associativearray({"a","b"});

e=associativearray({"a","b"});

);

stop=tickseconds();

aatime=(stop-start);

write("\!ntime for 5,000,000 associative arrays:",aatime);

start = tickseconds();

for(i=1,i<=1e6,i++,

a=1;

b=2;

c=3;

d=4;

e=5;

);

stop=tickseconds();

satime=(stop-start);

write("\!ntime for 5,000,000 simple assingments:",satime);

write("\!nsimple/associative=", satime/aatime);

/*:

time for 5,000,000 associative arrays:5.8166666666657

time for 5,000,000 simple assingments:0.283333333325572

simple/associative=0.0487106017178716

You probably can't run more than about 20-30 simple JSL statements for each associative array you create.

Craige