☑ cool new feature
☑ could help many users!
☐ removes a „bug“
☐ nice to have
☐ nobody needs it
What inspired this wish list request?
In Jmp, due to the possibilities of Graph Builder it is very easy to generate complicated plots - with GroupX/GroupY/Wrap/Page and all the other approaches to generate hierarchical structures. The default variable which is displayed in Bar charts, Heatmaps etc. is the number of corresponding rows, i.e. the occurrence of the respective event. That's very convenient
What happens if the user is not interested in occurrences, but in probabilities.
Let's take
Open( "$SAMPLE_DATA/Airline Delays.jmp" );
as an example. Southwest has the largest number of large delays !!!
- but just because it has the largest number of flights in the table.
It's clear, one has to divide the number of occurrences by the number of total flights.
Actually, no problem.
But now, let's assume that we just have a list of pass/fail infos for chips on wafers with different process parameters.
alternative application: https://community.jmp.com/t5/Discussions/count-unique/m-p/592337/highlight/true#M79638
Important: The data was not intentionally generated by a DOe, therefore, the number of wafers is not the same for the different process parameters. In analogy to the first example, the wafers with process parameters Southwest has by far the largest # of fail devices.
Does Southwest also have the largest failure rate?
This time, there is no easily accessible number in the database which we can use for the normalization.
What we need to remove this bias from the data: the number of wafers per group
Fortunately, there is a column with the wafer ID in the data table.
Unfortunately, there is no function in JSL which we can use to (directly) "count" the wafers - and to distribute to results among the respective rows of the data table. To cite @Beaux :
Strange that that function is not available in the formula editor (statistical) or JSL... Maybe next...
The shortcut function "Count" from the New Formula Column Right Click Context menu sounds great - but it doesn't count different/unique values, it counts every single non-empty row: In the generated JSL code, Count is Col Number .
What is the improvement you would like to see?
Follow @Beaux 's wish and add a function Col N Categories. It should behaves similar to Col Number - but it should just count unique values: entries which show up multiple times should just be counted once.
NB:
1) There is already a N Categories in the Tables/summary function. So, I hope that the effort will be low to provide such a functionality as well as a JSL function.
2) Like Col Number, also Col N Categories should provide a GroupBy option, which generates groups of rows, executes the analysis for each group and distributes the results to the corresponding rows.
Why is this idea important?
With N categories available, one can calculate probabilities from occurrences, like:
Percentage failChipsPerWafer = Col Sum(:defects ,:processVariant) / Col N Categories(:wafer_ID, :processVariant)*100
Percentage failWafers = Col N Categories(if(Col Sum(:defects,:wafer_ID)>0,:wafer_ID,.),:processVariant) / N Categories(:wafer_ID, :processVariant)*100
more wishes submitted by