Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- JMP User Community
- :
- Discussions
- :
- How to make a cumulative summary on a numerical column

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

How to make a cumulative summary on a numerical column

Jan 11, 2015 10:12 PM
(2867 views)

I would like a convenient way in JMP to make a cumulative summary on a certain column, where for each value of the target column, you are told how many rows (elements) met some boundary of that value in that column (**less-than-or-equal-to**, **greater-than-or-equal-to**, **less-than**, or **greater-than**).

For example, starting with this table:

Petri Dish ID | N Bacteria |
---|---|

0 | 100 |

1 | 103 |

2 | 98 |

3 | 98 |

4 | 102 |

5 | 97 |

I may want to generate the **number of petri dishes that had less than or equal to** each amount of bacteria, potentially like this:

At most N Bacteria | N Petri Dishes |
---|---|

96 | 0 |

97 | 1 |

98 | 3 |

99 | 3 |

100 | 4 |

101 | 4 |

102 | 5 |

103 | 6 |

104 | 6 |

Similarly, I might want columns for **At least N Bacteria**, **Less than N Bacteria**, and **More than N Bacteria**.

Is there a convenient way to do this using any of the menu tools and/or perhaps a formula column, or would it require a JSL script? Would it require pre-sorting the original table on the target column (N Bacteria)? Are there any add-ons that would do this?

Note: In MS Excel this could be done using the "**COUNTIF**" function adjacent to a manually-created "At most N Bacteria" (or whatever other boundary) column.

Thanks.

3 REPLIES 3

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: How to make a cumulative summary on a numerical column

You can do it with the following formula; simply change the inequality to reflect your needs.

Eval(Eval Expr(Col Sum(:N Bacteria <= Expr(:Threshold[Row()]))))

In this example, the summary columns are created in the same table as the raw data. (See attached file.)

Note that Threshold and Count in this formula correspond to the columns in the second table of your original post.

The Eval(Eval Expr()) construct is needed to force evaluation of the Threshold value at each given row; for example, if you try the formula below, it will not give you the results you intend.

Col Sum(:N Bacteria <= :Threshold[Row()])

Cheers,

Brady

Highlighted
##

Thanks for this formula Brady. I'm still confused about the particulars of why you needed to use a subscript ([]) for Threshold but not for N Bacteria, and why you need to use "Expr" and "Eval Expr" in these ways. How does the unwanted value caching work, and how does this formula work around it to make it produce the correct result? Can you please elaborate on these details?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: How to make a cumulative summary on a numerical column

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: How to make a cumulative summary on a numerical column

Eval(Eval Expr(Col Sum(:N Bacteria <= Expr(:Threshold[Row()]))))

For each given value in the Threshold column, we want to create a column vector of 0/1 flags that symbolize which rows in the :N bacteria column (the entire column) satisfy the inequality.

Suppose row 3 of the Threshold column is 98. Then the vector is produced by:

:N Bacteria <= 98, which compares each row of the :N Bacteria column to 98 and creates a 0 or 1 accordingly.

The Col Sum () function will sum the 0s and 1s in the vector produced by :N Bacteria <= 98. The sum represents the number of rows where :N Bacteria <= 98.

The :Threshold[row()] piece is :Threshold[1] for row 1, Threshold[2] for row 2, etc. The issue is that we need to evaluate the value of row(), substituting in the actual constant value contained in the data table, before evaluating the rest of the formula.

So, we use the construct Expr( :Threshold [Row()] ), which, when used in conjunction with theEval ( EvalExpr () )construct, causes :Threshold[row()] to be replaced with whatever value is contained in the :Threshold column's "rowth" row.

Cheers,

Brady