<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic fastest way to get distinct items from a matrix/list in Discussions</title>
    <link>https://community.jmp.com/t5/Discussions/fastest-way-to-get-distinct-items-from-a-matrix-list/m-p/36959#M21686</link>
    <description>&lt;P&gt;I'm trying to get the distinct items in a fairly large vector and I was wondering if anyone knew of a faster way than making an associative array. &amp;nbsp;I tried the following. &amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-jsl"&gt;Names Default to here(1);
dt = open("$SAMPLE_DATA\Probe.jmp");
rows = dt &amp;lt;&amp;lt; Get rows Where(Num(:Wafer Number) &amp;lt;= 10);
times = Column(dt, "Start Time")[rows];

//option 1
st = HPTime();
distinct1 = associative array(as list(times));
distinct1 = distinct1 &amp;lt;&amp;lt; Get Keys;
opt1 = HPTime()-st;
show(opt1);

//option 2
st = HPTime();
dt_sub = dt  &amp;lt;&amp;lt; Subset(
	Selected Rows( 0 ),
	Rows( rows ),
	Selected columns only( 0 )
);
Summarize(dt_sub, distinct2 = by(:Start Time));
close(dt_sub, no save);
opt2 = HPTime()-st;
show(opt2);

//option 3
st = HPTime();
distinct3 = [];
for(i=1, i&amp;lt;=nrows(times), i++, 
	if(!any(distinct3 == times[i]), 
		distinct3 ||= times[i]
	)
);
opt3 = HPTime()-st;
show(opt3);

//option 4
st = HPTime();
distinct4 = {};
for(i=1, i&amp;lt;=nrows(times), i++, 
	if(!Contains(distinct4, times[i]), 
		insert into(distinct4, times[i])
	)
);
opt4 = HPTime()-st;
show(opt4);

show(nitems(distinct1), nitems(distinct2), nrows(distinct3`), nitems(distinct4));&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Which gave an output of:&lt;/P&gt;
&lt;P&gt;opt1 = 1082;&lt;BR /&gt;opt2 = 40914;&lt;BR /&gt;opt3 = 5853;&lt;BR /&gt;opt4 = 8001;&lt;BR /&gt;N Items(distinct1) = 211;&lt;BR /&gt;N Items(distinct2) = 211;&lt;BR /&gt;N Rows(distinct3`) = 211;&lt;BR /&gt;N Items(distinct4) = 211;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;*Edit* Okay, definitely DON'T use associative array because it doesn't allow floating point keys, it rounds everything. &amp;nbsp;The for loop seems pretty slow for this operation so anyone have anything better?&lt;/P&gt;</description>
    <pubDate>Wed, 08 Mar 2017 18:25:36 GMT</pubDate>
    <dc:creator>vince_faller</dc:creator>
    <dc:date>2017-03-08T18:25:36Z</dc:date>
    <item>
      <title>fastest way to get distinct items from a matrix/list</title>
      <link>https://community.jmp.com/t5/Discussions/fastest-way-to-get-distinct-items-from-a-matrix-list/m-p/36959#M21686</link>
      <description>&lt;P&gt;I'm trying to get the distinct items in a fairly large vector and I was wondering if anyone knew of a faster way than making an associative array. &amp;nbsp;I tried the following. &amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-jsl"&gt;Names Default to here(1);
dt = open("$SAMPLE_DATA\Probe.jmp");
rows = dt &amp;lt;&amp;lt; Get rows Where(Num(:Wafer Number) &amp;lt;= 10);
times = Column(dt, "Start Time")[rows];

//option 1
st = HPTime();
distinct1 = associative array(as list(times));
distinct1 = distinct1 &amp;lt;&amp;lt; Get Keys;
opt1 = HPTime()-st;
show(opt1);

//option 2
st = HPTime();
dt_sub = dt  &amp;lt;&amp;lt; Subset(
	Selected Rows( 0 ),
	Rows( rows ),
	Selected columns only( 0 )
);
Summarize(dt_sub, distinct2 = by(:Start Time));
close(dt_sub, no save);
opt2 = HPTime()-st;
show(opt2);

//option 3
st = HPTime();
distinct3 = [];
for(i=1, i&amp;lt;=nrows(times), i++, 
	if(!any(distinct3 == times[i]), 
		distinct3 ||= times[i]
	)
);
opt3 = HPTime()-st;
show(opt3);

//option 4
st = HPTime();
distinct4 = {};
for(i=1, i&amp;lt;=nrows(times), i++, 
	if(!Contains(distinct4, times[i]), 
		insert into(distinct4, times[i])
	)
);
opt4 = HPTime()-st;
show(opt4);

show(nitems(distinct1), nitems(distinct2), nrows(distinct3`), nitems(distinct4));&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Which gave an output of:&lt;/P&gt;
&lt;P&gt;opt1 = 1082;&lt;BR /&gt;opt2 = 40914;&lt;BR /&gt;opt3 = 5853;&lt;BR /&gt;opt4 = 8001;&lt;BR /&gt;N Items(distinct1) = 211;&lt;BR /&gt;N Items(distinct2) = 211;&lt;BR /&gt;N Rows(distinct3`) = 211;&lt;BR /&gt;N Items(distinct4) = 211;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;*Edit* Okay, definitely DON'T use associative array because it doesn't allow floating point keys, it rounds everything. &amp;nbsp;The for loop seems pretty slow for this operation so anyone have anything better?&lt;/P&gt;</description>
      <pubDate>Wed, 08 Mar 2017 18:25:36 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/fastest-way-to-get-distinct-items-from-a-matrix-list/m-p/36959#M21686</guid>
      <dc:creator>vince_faller</dc:creator>
      <dc:date>2017-03-08T18:25:36Z</dc:date>
    </item>
    <item>
      <title>Re: fastest way to get distinct items from a matrix/list</title>
      <link>https://community.jmp.com/t5/Discussions/fastest-way-to-get-distinct-items-from-a-matrix-list/m-p/36977#M21699</link>
      <description>I did the same using an associative array. As you note, to handle numbers less than zero/with many decimal places you need to first take a sample of the data, find out what a good scaling factor would be and then scale the elements of the vector.&lt;BR /&gt;&lt;BR /&gt;That still seem to work faster than using a for loop since it is a bit of matrix manipulation followed by creating the associative array.</description>
      <pubDate>Thu, 09 Mar 2017 10:12:50 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/fastest-way-to-get-distinct-items-from-a-matrix-list/m-p/36977#M21699</guid>
      <dc:creator>stephen_pearson</dc:creator>
      <dc:date>2017-03-09T10:12:50Z</dc:date>
    </item>
    <item>
      <title>Re: fastest way to get distinct items from a matrix/list</title>
      <link>https://community.jmp.com/t5/Discussions/fastest-way-to-get-distinct-items-from-a-matrix-list/m-p/36979#M21701</link>
      <description>&lt;P&gt;Here is another alternative that you might want to try&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-jsl"&gt;Names Default To Here( 1 );
dt = Open( "$SAMPLE_DATA\Probe.jmp" );
dt &amp;lt;&amp;lt; Select Where( Num( :Wafer Number ) &amp;gt; 10 );
dt &amp;lt;&amp;lt; exclude;

dtSumm = dt &amp;lt;&amp;lt; Summary(
	private,
	Group( :Start Time ),
	Freq( "None" ),
	Weight( "None" ),
	statistics column name format( "column" ),
	Link to original data table( 0 )
);
dtSumm &amp;lt;&amp;lt; delete rows;
Distinct = dtSumm:Start Time &amp;lt;&amp;lt; get values;

Close( dtSumm, nosave );
dt &amp;lt;&amp;lt; clear row states;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 09 Mar 2017 12:33:22 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/fastest-way-to-get-distinct-items-from-a-matrix-list/m-p/36979#M21701</guid>
      <dc:creator>txnelson</dc:creator>
      <dc:date>2017-03-09T12:33:22Z</dc:date>
    </item>
    <item>
      <title>Re: fastest way to get distinct items from a matrix/list</title>
      <link>https://community.jmp.com/t5/Discussions/fastest-way-to-get-distinct-items-from-a-matrix-list/m-p/36985#M21707</link>
      <description>&lt;P&gt;Nice use of Summary.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The best solution may be different for different size problems; a large&amp;nbsp;setup overhead might pay off on a large enough problem.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I was investigating how to turn floating point numbers into keys for associative arrays and only came up with slower answers involving strings made from the numbers.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You might scale the floating point numbers into integers between +/- 2^52 and use the integers to index associative arrays. &amp;nbsp;Yes, 2^52 not 2^32 and not 2^64. The 8-byte double preceision floating point numbers JMP uses have a 52 bit fraction &lt;A href="https://en.wikipedia.org/wiki/Double-precision_floating-point_format" target="_blank"&gt;wikipedia&lt;/A&gt;&amp;nbsp;. &amp;nbsp;This would be a lossy conversion but could keep most of the information.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 09 Mar 2017 14:12:33 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/fastest-way-to-get-distinct-items-from-a-matrix-list/m-p/36985#M21707</guid>
      <dc:creator>Craige_Hales</dc:creator>
      <dc:date>2017-03-09T14:12:33Z</dc:date>
    </item>
    <item>
      <title>Re: fastest way to get distinct items from a matrix/list</title>
      <link>https://community.jmp.com/t5/Discussions/fastest-way-to-get-distinct-items-from-a-matrix-list/m-p/36989#M21709</link>
      <description>&lt;P&gt;For most of my sets I've tested. &amp;nbsp;The scaling by 2^52 does seem to be the fastest (with acceptable loss). &amp;nbsp;Thanks for all the feedback. &amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 09 Mar 2017 16:50:09 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/fastest-way-to-get-distinct-items-from-a-matrix-list/m-p/36989#M21709</guid>
      <dc:creator>vince_faller</dc:creator>
      <dc:date>2017-03-09T16:50:09Z</dc:date>
    </item>
    <item>
      <title>Re: fastest way to get distinct items from a matrix/list</title>
      <link>https://community.jmp.com/t5/Discussions/fastest-way-to-get-distinct-items-from-a-matrix-list/m-p/37005#M21724</link>
      <description>&lt;P&gt;Not sure how it compares in speed, but it's faster to type...&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-jsl"&gt;distinct = Design(times, &amp;lt;&amp;lt;levels)[2];&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 09 Mar 2017 21:39:01 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/fastest-way-to-get-distinct-items-from-a-matrix-list/m-p/37005#M21724</guid>
      <dc:creator>ms</dc:creator>
      <dc:date>2017-03-09T21:39:01Z</dc:date>
    </item>
  </channel>
</rss>

