new JSL function: Col N Categories

hogi · ‎01-24-2023

☑ cool new feature
☑ could help many users!

☐ removes a „bug“

☐ nice to have

☐ nobody needs it

What inspired this wish list request?

In Jmp, due to the possibilities of Graph Builder it is very easy to generate complicated plots - with GroupX/GroupY/Wrap/Page and all the other approaches to generate hierarchical structures. The default variable which is displayed in Bar charts, Heatmaps etc. is the number of corresponding rows, i.e. the occurrence of the respective event. That's very convenient

What happens if the user is not interested in occurrences, but in probabilities.

Let's take

Open( "$SAMPLE_DATA/Airline Delays.jmp" );

as an example. Southwest has the largest number of large delays !!!

- but just because it has the largest number of flights in the table.

It's clear, one has to divide the number of occurrences by the number of total flights.

Actually, no problem.

But now, let's assume that we just have a list of pass/fail infos for chips on wafers with different process parameters.
alternative application: https://community.jmp.com/t5/Discussions/count-unique/m-p/592337/highlight/true#M79638

Important: The data was not intentionally generated by a DOe, therefore, the number of wafers is not the same for the different process parameters. In analogy to the first example, the wafers with process parameters Southwest has by far the largest # of fail devices.

Does Southwest also have the largest failure rate?

This time, there is no easily accessible number in the database which we can use for the normalization.

What we need to remove this bias from the data: the number of wafers per group

Fortunately, there is a column with the wafer ID in the data table.

Unfortunately, there is no function in JSL which we can use to (directly) "count" the wafers - and to distribute to results among the respective rows of the data table. To cite @Beaux :
Strange that that function is not available in the formula editor (statistical) or JSL... Maybe next...

The shortcut function "Count" from the New Formula Column Right Click Context menu sounds great - but it doesn't count different/unique values, it counts every single non-empty row: In the generated JSL code, Count is Col Number .

What is the improvement you would like to see?

Follow @Beaux 's wish and add a function Col N Categories. It should behaves similar to Col Number - but it should just count unique values: entries which show up multiple times should just be counted once.

NB:

1) There is already a N Categories in the Tables/summary function. So, I hope that the effort will be low to provide such a functionality as well as a JSL function.

2) Like Col Number, also Col N Categories should provide a GroupBy option, which generates groups of rows, executes the analysis for each group and distributes the results to the corresponding rows.

Why is this idea important?

With N categories available, one can calculate probabilities from occurrences, like:

Percentage failChipsPerWafer = Col Sum(:defects ,:processVariant) / Col N Categories(:wafer_ID, :processVariant)*100 
Percentage failWafers = Col N Categories(if(Col Sum(:defects,:wafer_ID)>0,:wafer_ID,.),:processVariant) / N Categories(:wafer_ID, :processVariant)*100

more wishes submitted by

Sarah-Sylvestre · ‎02-08-2023

Hi @hogi, thank you for taking the time to submit this idea and provide examples! We will review your request and keep you updated on its status.

hogi · ‎10-28-2023

@XanGregg
Please also add N Categories to the Summary Statistics of Graph Builder.

hogi · ‎11-04-2023

similar discussions in the community:

Add Counter for Unique Cases in Groups

count unique

Custom Function - how to reference the column

Column reference in custom function

count(distinct val)
Formula for Number of Unique Categories in Column

How Do I create a Column formula for N Categories?

It's OK that users have to create their own workarounds for everyday tasks which are not yet available via standard function.
The danger: there are very slow workarounds out there. Then it's quite likely that new users will just copy them... and later complain about the poor performance of "JMP".

Much better:

If many users need a feature, just implement it directly in JMP/JSL.
Then the experts can find the fastest approach and provide it for all users.

hogi

please add an option to specify if Null/Missing/Empty is included in the counting or not.

Add function which can be universally used to check if object/variable is empty/missing