Subscribe Bookmark RSS Feed

How to make a cumulative summary on a numerical column

a01

Community Trekker

Joined:

Nov 14, 2014

I would like a convenient way in JMP to make a cumulative summary on a certain column, where for each value of the target column, you are told how many rows (elements) met some boundary of that value in that column (less-than-or-equal-to, greater-than-or-equal-to, less-than, or greater-than).

For example, starting with this table:

Petri Dish IDN Bacteria
0100
1103
298
398
4102
597

I may want to generate the number of petri dishes that had less than or equal to each amount of bacteria, potentially like this:

At most N BacteriaN Petri Dishes
960
971
983
993
1004
1014
1025
1036
1046

Similarly, I might want columns for At least N Bacteria, Less than N Bacteria, and More than N Bacteria.

Is there a convenient way to do this using any of the menu tools and/or perhaps a formula column, or would it require a JSL script?  Would it require pre-sorting the original table on the target column (N Bacteria)?  Are there any add-ons that would do this?

Note: In MS Excel this could be done using the "COUNTIF" function adjacent to a manually-created "At most N Bacteria" (or whatever other boundary) column.

Thanks.

3 REPLIES
brady_brady

Staff

Joined:

Jun 9, 2012

You can do it with the following formula; simply change the inequality to reflect your needs.

Eval(Eval Expr(Col Sum(:N Bacteria <= Expr(:Threshold[Row()]))))

In this example, the summary columns are created in the same table as the raw data. (See attached file.)

Note that Threshold and Count in this formula correspond to the columns in the second table of your original post.

The Eval(Eval Expr()) construct is needed to force evaluation of the Threshold value at each given row; for example, if you try the formula below, it will not give you the results you intend.

Col Sum(:N Bacteria <= :Threshold[Row()])

Cheers,

Brady

a01

Community Trekker

Joined:

Nov 14, 2014

Thanks for this formula Brady.  I'm still confused about the particulars of why you needed to use a subscript ([]) for Threshold but not for N Bacteria, and why you need to use "Expr" and "Eval Expr" in these ways.  How does the unwanted value caching work, and how does this formula work around it to make it produce the correct result?  Can you please elaborate on these details?

brady_brady

Staff

Joined:

Jun 9, 2012

Eval(Eval Expr(Col Sum(:N Bacteria <= Expr(:Threshold[Row()]))))

For each given value in the Threshold column, we want to create a column vector of 0/1 flags that symbolize which rows in the :N bacteria column (the entire column) satisfy the inequality.

Suppose row 3 of the Threshold column is 98. Then the vector is produced by:

:N Bacteria <= 98, which compares each row of the :N Bacteria column to 98 and creates a 0 or 1 accordingly.

The Col Sum () function will sum the 0s and 1s in the vector produced by :N Bacteria <= 98. The sum represents the number of rows where :N Bacteria <= 98.

The :Threshold[row()] piece is :Threshold[1] for row 1, Threshold[2] for row 2, etc. The issue is that we need to evaluate the value of row(), substituting in the actual constant value contained in the data table, before evaluating the rest of the formula.

So, we use the construct Expr( :Threshold [Row()] ), which, when used in conjunction with theEval ( EvalExpr () )construct, causes :Threshold[row()] to be replaced with whatever value is contained in the :Threshold column's "rowth" row.

Cheers,

Brady