I'm trying to get the distinct items in a fairly large vector and I was wondering if anyone knew of a faster way than making an associative array. I tried the following.
Names Default to here(1);
dt = open("$SAMPLE_DATA\Probe.jmp");
rows = dt << Get rows Where(Num(:Wafer Number) <= 10);
times = Column(dt, "Start Time")[rows];
//option 1
st = HPTime();
distinct1 = associative array(as list(times));
distinct1 = distinct1 << Get Keys;
opt1 = HPTime()-st;
show(opt1);
//option 2
st = HPTime();
dt_sub = dt << Subset(
Selected Rows( 0 ),
Rows( rows ),
Selected columns only( 0 )
);
Summarize(dt_sub, distinct2 = by(:Start Time));
close(dt_sub, no save);
opt2 = HPTime()-st;
show(opt2);
//option 3
st = HPTime();
distinct3 = [];
for(i=1, i<=nrows(times), i++,
if(!any(distinct3 == times[i]),
distinct3 ||= times[i]
)
);
opt3 = HPTime()-st;
show(opt3);
//option 4
st = HPTime();
distinct4 = {};
for(i=1, i<=nrows(times), i++,
if(!Contains(distinct4, times[i]),
insert into(distinct4, times[i])
)
);
opt4 = HPTime()-st;
show(opt4);
show(nitems(distinct1), nitems(distinct2), nrows(distinct3`), nitems(distinct4));
Which gave an output of:
opt1 = 1082;
opt2 = 40914;
opt3 = 5853;
opt4 = 8001;
N Items(distinct1) = 211;
N Items(distinct2) = 211;
N Rows(distinct3`) = 211;
N Items(distinct4) = 211;
*Edit* Okay, definitely DON'T use associative array because it doesn't allow floating point keys, it rounds everything. The for loop seems pretty slow for this operation so anyone have anything better?