cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
katharina_l
Level III

Get distinct entries from a list?

is there a simple way to obtain the distinct entries in a list? (similar to summary of a table by grouping column in order to get distinct entries in that column)
The result could be either a reduced list containing only the distinct entries (repeated ones removed) or just a number showing the amount of different entries.
Before writing a script I would like to check if there is a simple way / function / formula. Couldn't find anything in the JMP help ...

1 ACCEPTED SOLUTION

Accepted Solutions
txnelson
Super User

Re: Get distinct entries from a list?

One way is to use an Associative Array

names default to here(1);

// Create a list
dt=
// Open Data Table: Big Class.jmp
// → Data Table( "Big Class" )
Open( "$SAMPLE_DATA/Big Class.jmp" );
genderList = :sex << get values;

// Get the distinct list of values
distinctList = (associative array(genderList))<<get keys;
{"F", "M"}

 

Jim

View solution in original post

13 REPLIES 13
txnelson
Super User

Re: Get distinct entries from a list?

One way is to use an Associative Array

names default to here(1);

// Create a list
dt=
// Open Data Table: Big Class.jmp
// → Data Table( "Big Class" )
Open( "$SAMPLE_DATA/Big Class.jmp" );
genderList = :sex << get values;

// Get the distinct list of values
distinctList = (associative array(genderList))<<get keys;
{"F", "M"}

 

Jim
katharina_l
Level III

Re: Get distinct entries from a list?

awesome, this works! Thank you very much Jim!

hogi
Level XII

Re: Get distinct entries from a list?

The approach via associative array is easy to script / remember / apply.
Take care if you want to get the distinct values for a list of  mio of values.

 

A workaround: 
save the values to a table and use either summarize or the tables/summary which you mentioned above.

 

Maybe, in the future, there will be a direct / fast way in JMP to calculate unique values?
I added this hope as a subtopic to Col N Categories - and all the others ... 
Please support the idea and vote : )

katharina_l
Level III

Re: Get distinct entries from a list?

Actually, my problem is that I want to detect unique entries in a sequence of strings in a table. Here I attached an example: Originally I have the column "sequence" (arbitrary delimiter, here the delimiter is "|). My actual workaround now is to transform this to a list and then apply Jim's solution with the associative array. But an easy way to calculate unique values would be highly appreciated ...

hogi
Level XII

Re: Get distinct entries from a list?

Oh yeah, this is where the associative arrays hurt!
At the end you count the entries. Is this what you need - or do you also need the intermediate step (with the list of unique entries)?


A quick and easy way to handle unique values - let's hope that the JMP developers recognize this topic as something really useful. Such that it gets implemented in the next release ...

katharina_l
Level III

Re: Get distinct entries from a list?

For my current application, I only need the number of distinct entries, not the values themselves. But I think there might be many more situations where it would be helpful to reduce a list to its unique entries. 


And yeah, let's hope for implementation in the next release

hogi
Level XII

Re: Get distinct entries from a list?

Regarding speed, I just checked if the intermediate steps gets faster if I replace the associative array part with a simple

If(not(contains(), insert into(mylist ... - no benefit.

 

New Column( "unique entries",
	Expression,
	Formula(
		Local( {nr = N Rows( Current Data Table() )},
			If( Row() == nr,
				Caption( "done" )
			);
			Match( :variant,
				1, Associative Array( :list from sequence[Empty()] ) << get keys,
				2,
					myList = {};
					For Each( {entry}, :list from sequence,
						If( !Contains( myList, entry ),
							Insert Into( myList, entry )
						)
					);
					myList;
			);
		)
	)
)


for 1mio rows, I get:

hogi_2-1733302159526.png

So 1:0 for associative arrays. Any better idea?

 

[Another 2+2 seconds are needed for the columns :list from sequence and :N unique entries.
So, no real benefit to get into the ms regime for the intermediate step]

hogi
Level XII

Re: Get distinct entries from a list?

- easy

- fast

 

and:
- not greedy

 

I just tried to check the timing with 10 mio rows - and got stuck.
JMP crashes and the last 60 seconds look like this:

hogi_0-1733308211187.png

hogi
Level XII

Re: Get distinct entries from a list?

... so, I just tried it with 5 mio rows - and got an interesting insight:

hogi_1-1733308275918.png

When I save the table, close JMP and load the table again, the memory usage is significantly less than what is needed to calculate the values!

At  first sight, I thought it's due to the Associative Array, but the same thing happens when I use mylist.

How can I prevent the column formula from eating my memory?

TS-00177710