Solved: count unique

Report Inappropriate Content · Jun 10, 2023 4:26 PM

Hi, I have a table with transaction data like the below.

transactionID, individualID, product

1, a, apple

2, a, orange

3, b, apple

4, b, banana

5, c, orange

I want to do in tabulate, not using script, to count how many unique individualID by product.

Does anyone know how to do this?

I can do summary table using individualID and product first, and then do a tabulate on it to get the uniqueID count by product, but this is 2 steps. Wondering is there a way we can do it in tabulate just one step cause i constantly need to do this for many different groups?

Thank you!

brady_brady · Feb 24, 2021 11:11 PM

I am unsure of how to use Tabulate to do this, but if you're open to using another platform, this is really easy to do using the JMP Query Builder:

1) Select Tables > JMP Query Builder from the main menu.

2) In the ensuing window, click the "Build Query" button, located in the window's lower right corner.

3) In the ensuing window:

Double-click on the product column, casting it into the "included columns" pane.
Double-click on the individual ID column, casting it into the "included columns" pane.
Right-click on the t1.product column, now in the "included columns" pane, and select "Order By".
From the drop-down menu in the t1.individualID column's Aggregation field in the "included columns" pane, select Count DISTINCT.
At the bottom of the window, click "Run Query", producing a table with what I believe you are after. (?)

Cheers,

Brady

View solution in original post

ih · Feb 25, 2021 08:37 AM

One more option is to calculate that in your original data table using a column with the col sum() function with a 'by' variable, but referencing the number 1 instead of a column, which turns it into a 'count' function. If you need it in the tabulate or table format, you could then tabulate using an average or median:

View solution in original post

brady_brady · Feb 24, 2021 11:11 PM

I am unsure of how to use Tabulate to do this, but if you're open to using another platform, this is really easy to do using the JMP Query Builder:

1) Select Tables > JMP Query Builder from the main menu.

2) In the ensuing window, click the "Build Query" button, located in the window's lower right corner.

3) In the ensuing window:

Double-click on the product column, casting it into the "included columns" pane.
Double-click on the individual ID column, casting it into the "included columns" pane.
Right-click on the t1.product column, now in the "included columns" pane, and select "Order By".
From the drop-down menu in the t1.individualID column's Aggregation field in the "included columns" pane, select Count DISTINCT.
At the bottom of the window, click "Run Query", producing a table with what I believe you are after. (?)

Cheers,

Brady

jthi · Feb 25, 2021 01:59 AM

It seems like Tabulate platform doesn't behave nicely with Character datatype for analysis. You could change individualID to numeric (recode or create formula to new column) and then use tabulate:

-Jarmo

joann · Feb 25, 2021 03:25 PM

Thank you! This is good and quick way as well!

WJ · Jul 17, 2022 02:33 AM

Thank you very much. This is the most efficient way to generate table results for counting unique values and proportions.

joann · Feb 25, 2021 03:21 PM

Thank you so much Brady!

Never used this query builder and it's super powerful!

I just tried taking 2 source tables and did query, and it's so smart that it matched the individual ID on its own and returned the correct unique counts.

2 questions to the query builder:

1. I could see 3 green triangles as attached after I ran the above query. To re-run the same query with updated data, which triangles should I use?

2. If I have many different queries, how do I store them in one place, and then later when I got the latest data, to apply the queries on it again?

brady_brady · Feb 25, 2021 04:29 PM

I could see 3 green triangles as attached after I ran the above query. To re-run the same query with updated data, which triangles should I use?

It depends... The "Source" script produces a new table when run, while the "Update From Database" script refreshes the present table (i.e., without creating a new table.)

Note: A handy feature of JMP data tables is that if you name a script "On Open", it will run whenever the JMP data table containing it is opened. So, you can rename the "Update from Database" script to "On Open" to assure that you're always starting off with updated results. Of course, this script can still be run whenever you wish, by clicking the green arrow.

If I have many different queries, how do I store them in one place, and then later when I got the latest data, to apply the queries on it again?

There are several ways to address this, I'll offer a couple of the easiest.

- You could use the "On Open" trick mentioned above if you have simple queries with no derivative dependencies. (i.e., no single query needs to be run prior to any other, to obtain correct results for its table)

- You could create a "master" script, by copying + pasting each of the scripts into it. The order may or may not matter, depending on your situation, and you may need to insert controls into the script to make sure results from prior queries are finished before proceeding.

I've not had to do much of this in practice, so it might be worthwhile to ask the community this question in a new discussion, to see what are actually the most popular approaches "in the field", and why.

Cheers,

Brady

joann · Mar 1, 2021 06:05 PM

Thank you so much Brady! These are really great knowledge and I'll practice each of them! Really appreciate the help!

ih · Feb 25, 2021 08:37 AM

One more option is to calculate that in your original data table using a column with the col sum() function with a 'by' variable, but referencing the number 1 instead of a column, which turns it into a 'count' function. If you need it in the tabulate or table format, you could then tabulate using an average or median:

joann · Feb 25, 2021 03:27 PM

Thank you! I just tried and it works! never thought I can use col sum to achieve this. Smart!

count unique

Re: count unique

Re: count unique

Re: count unique

Re: count unique

Re: count unique

Re: count unique

Re: count unique

Re: count unique

Re: count unique

Re: count unique

Re: count unique

Recommended Articles

Get Going with JMP: Essentials for Using JMP

New Features in Process Screening for JMP 17