Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- JMP User Community
- :
- Discussions
- :
- fastest way to get distinct items from a matrix/li...

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Mar 8, 2017 10:21 AM
(2171 views)

I'm trying to get the distinct items in a fairly large vector and I was wondering if anyone knew of a faster way than making an associative array. I tried the following.

```
Names Default to here(1);
dt = open("$SAMPLE_DATA\Probe.jmp");
rows = dt << Get rows Where(Num(:Wafer Number) <= 10);
times = Column(dt, "Start Time")[rows];
//option 1
st = HPTime();
distinct1 = associative array(as list(times));
distinct1 = distinct1 << Get Keys;
opt1 = HPTime()-st;
show(opt1);
//option 2
st = HPTime();
dt_sub = dt << Subset(
Selected Rows( 0 ),
Rows( rows ),
Selected columns only( 0 )
);
Summarize(dt_sub, distinct2 = by(:Start Time));
close(dt_sub, no save);
opt2 = HPTime()-st;
show(opt2);
//option 3
st = HPTime();
distinct3 = [];
for(i=1, i<=nrows(times), i++,
if(!any(distinct3 == times[i]),
distinct3 ||= times[i]
)
);
opt3 = HPTime()-st;
show(opt3);
//option 4
st = HPTime();
distinct4 = {};
for(i=1, i<=nrows(times), i++,
if(!Contains(distinct4, times[i]),
insert into(distinct4, times[i])
)
);
opt4 = HPTime()-st;
show(opt4);
show(nitems(distinct1), nitems(distinct2), nrows(distinct3`), nitems(distinct4));
```

Which gave an output of:

opt1 = 1082;

opt2 = 40914;

opt3 = 5853;

opt4 = 8001;

N Items(distinct1) = 211;

N Items(distinct2) = 211;

N Rows(distinct3`) = 211;

N Items(distinct4) = 211;

*Edit* Okay, definitely DON'T use associative array because it doesn't allow floating point keys, it rounds everything. The for loop seems pretty slow for this operation so anyone have anything better?

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Mar 9, 2017 6:12 AM
(4245 views)

Solution

Nice use of Summary.

The best solution may be different for different size problems; a large setup overhead might pay off on a large enough problem.

I was investigating how to turn floating point numbers into keys for associative arrays and only came up with slower answers involving strings made from the numbers.

You might scale the floating point numbers into integers between +/- 2^52 and use the integers to index associative arrays. Yes, 2^52 not 2^32 and not 2^64. The 8-byte double preceision floating point numbers JMP uses have a 52 bit fraction wikipedia . This would be a lossy conversion but could keep most of the information.

Craige

5 REPLIES

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Mar 9, 2017 2:12 AM
(2141 views)

That still seem to work faster than using a for loop since it is a bit of matrix manipulation followed by creating the associative array.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Mar 9, 2017 4:33 AM
(2137 views)

Here is another alternative that you might want to try

```
Names Default To Here( 1 );
dt = Open( "$SAMPLE_DATA\Probe.jmp" );
dt << Select Where( Num( :Wafer Number ) > 10 );
dt << exclude;
dtSumm = dt << Summary(
private,
Group( :Start Time ),
Freq( "None" ),
Weight( "None" ),
statistics column name format( "column" ),
Link to original data table( 0 )
);
dtSumm << delete rows;
Distinct = dtSumm:Start Time << get values;
Close( dtSumm, nosave );
dt << clear row states;
```

Jim

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Mar 9, 2017 6:12 AM
(4246 views)

Nice use of Summary.

The best solution may be different for different size problems; a large setup overhead might pay off on a large enough problem.

I was investigating how to turn floating point numbers into keys for associative arrays and only came up with slower answers involving strings made from the numbers.

You might scale the floating point numbers into integers between +/- 2^52 and use the integers to index associative arrays. Yes, 2^52 not 2^32 and not 2^64. The 8-byte double preceision floating point numbers JMP uses have a 52 bit fraction wikipedia . This would be a lossy conversion but could keep most of the information.

Craige

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Mar 9, 2017 8:50 AM
(2115 views)

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Mar 9, 2017 1:39 PM
(2104 views)

Not sure how it compares in speed, but it's faster to type...

`distinct = Design(times, <<levels)[2];`