turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- JMP User Community
- :
- Discussions
- :
- Discussions
- :
- SUMIF: how to get the conditional sum across columns?

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Sep 22, 2015 7:13 AM
(4673 views)

I want to calculate the sum of 10 columns into a new column, whereby I only want to sum the positive values. Similarly, I want another column summing all negative values. In Excel I would use the function SUMIF: =SUMIF(A1:J1;">0")

How can I do something similar in JMP?

Using a simple example in excel (while obviously my dataset is a bigger), this is what I am aiming for:

A | B | C | D | positive sum | negative sum |

1,00 | 2,00 | -7,00 | 5,00 | 8,00 | -7,00 |

4,00 | 64,00 | 55,90 | 5,20 | 129,10 | 0,00 |

86,10 | -34,44 | 5,60 | 7,21 | 98,91 | -34,44 |

6,00 | 8,00 | -96,00 | 6,00 | 20,00 | -96,00 |

1,00 | -8,00 | 5,00 | 0,64 | 6,64 | -8,00 |

Many thanks for your help!

Yanne

- Tags:
- 条件求和

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

I like such simple and clever column formulas like the one in Brady's post above. But yes, they often scale terribly with table size.

An alternative would be for the column formula to get the current row directly without first getting the whole matrix in every row. The formula below is an attempt at that.

It is not as fast as Brady's JSL code or Ian's explicit sum over columns, but it is much faster, and scales better (more or less linearly with table size), than a formula relying on *dt<<getasmatrix()*. For a 10000x4 table it takes about 0.2 seconds to run on my laptop.

The example also shows an alternative method to generalize the formula using a using a local variable instead of building the formulas dynamically, which can be a challenge. (in this example it is assumed that the formula columns always remain the last columns).

//Example table

dt = New Table**(**"Conditional sum across columns", AddRows**(****10000****))**;

For**(**i = **1**, i <= **4**, i++,

dt << **New Column****(**"X", Numeric, Formula**(**Random Normal**(****0**, **1****)))**

**)**;

dt << **delete column****(****1****)**;

// Add conditional sums across all columns

dt << **New Column****(**"Positive Sum",

Numeric,

Formula**(**

Local**({**i = **0**, n = N Col**(**Current Data Table**())** - **1****}**,

Sum**(**Repeat**({**i++ ; Column**(**i**)[]** * **(**Column**(**i**)[]** > **0****)}**, n**))**

**)**

**)**

**)**;

dt << **New Column****(**"Negative Sum",

Numeric,

Formula**(**

Local**({**i = **0**, n = N Col**(**Current Data Table**())** - **2****}**,

Sum**(**Repeat**({**i++ ; Column**(**i**)[]** * **(**Column**(**i**)[]** < **0****)}**, n**))**

**)**

**)**

**)**;

7 REPLIES

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

You could use an approach like that in this table:

New Table**(** "Sum of Columns",

Add Rows**(** **0** **)**,

New Column**(** "Column 1",

Numeric,

"Continuous",

Format**(** "Fixed Dec", **10**, **3** **)**,

Formula**(**RandomNormal**(****0**, **1****))**

**)**,

New Column**(** "Column 2",

Numeric,

"Continuous",

Format**(** "Fixed Dec", **10**, **3** **)**,

Formula**(**RandomNormal**(****0**, **1****))**

**)**,

New Column**(** "Column 3",

Numeric,

"Continuous",

Format**(** "Fixed Dec", **10**, **3** **)**,

Formula**(**RandomNormal**(****0**, **1****))**

**)**,

New Column**(** "Sum of Positives",

Numeric,

"Continuous",

Format**(** "Fixed Dec", **10**, **3** **)**,

Formula**(**

:Column 1 * If**(** :Column 1 >= **0**, **1**, **0** **)** + :Column 2 *

If**(** :Column 2 >= **0**, **1**, **0** **)** + :Column 3 * If**(** :Column 3 >= **0**, **1**, **0** **)**

**)**

**)**,

New Column**(** "Sum of Negatives",

Numeric,

"Continuous",

Format**(** "Fixed Dec", **10**, **3** **)**,

Formula**(**

:Column 1 * If**(** :Column 1 < **0**, **1**, **0** **)** + :Column 2 *

If**(** :Column 2 < **0**, **1**, **0** **)** + :Column 3 * If**(** :Column 3 < **0**, **1**, **0** **)**

**)**

**)**,

AddRows**(****10****)**

**);**

With many columns, it may be easier to build the requisite 'sum' columns programmatically.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

As Ian mentions, this can get unwieldy for many columns. If you don't truly need a formula (i.e, the source columns will not change) you can do it quickly for all rows by running the following JSL with the desired data table current (the code assumes the numerics are in columns 1-4, per the indices in line 2 of the code... change as needed but keep them contiguous):

dt = Current Data Table**()**;

m = **(**dt << **Get as Matrix****)[****0**, **1**::**4****]**;

dt << **new column****(**"Positive Sum", << **set values****(**vsum**(**emult**(**m>**0**,m**)**`**)**`**))**;

dt << **new column****(**"Negative Sum", << **set values****(**vsum**(**emult**(**m<**0**,m**)**`**)**`**))**;

If you need a formula, you could use the following (best to copy/paste it into the editor), which is simple, but will run more slowly as your table grows very large because you're getting the whole matrix for each row. (Change the inequality for the negative sum. The :* operator is the elementwise multiplication used in the emult() function above.)

dt = Current Data Table**()**;

m = **(**dt << **getasmatrix****)[**Row**()**, Index**(** **1**, **4** **)]**;

Sum**(** **(**m > **0****)** :* m **)**;

The tradeoff here is that despite its performance limitations, as functions go, it is pretty short, simple, and easy to change if you add more columns. I'd only use it on relatively small tables, though. Otherwise, it would be best to use something like Ian suggests.

Comparing the two approaches I describe for a 10,000 row by 4-column table of random numbers:

Script approach: 0.002713 seconds

Formula approach: 6.050183 seconds

The script approach only takes 3 seconds on a 10,000,000 row by 4 column table. I stopped running the formula approach after an hour on only a 1,000,000 row table. It really comes down to whether you need the formula piece, or are willing to save a script into the table and just run it as needed. If so, you'll obtain a considerable performance benefit.

Cheers,

Brady

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Dear Ian and Brady,

Thanks both for your kind and quick help!!! I tried both approaches, and somehow got two different results :-(.

By testing some examples manually, I found that the approach from Ian gave the correct result, while the script from Brady returned far bigger sums. No idea why and unfortunately I am still a JMP beginner so cannot figure out what went wrong. But as the approach from Ian seemed to work out fine, I decided to do stick with that one (even though the formula gets pretty long for 10 columns).

So thanks both for your help!!!

(and let's hope if I ever need this for 100 columns, JMP will have a SUMIF feature by then ;-) )

Cheers,

Yanne

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Brady's results will match Ian's, but you have to change this line: m = **(**dt << **Get as Matrix****)[****0**, **1**::**4****]**;

to this: m = **(**dt << **Get as Matrix****)[****0**, **1**::3**]**;

The table Ian creates only has three columns in it, while Brady's example is for 4 columns.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

As Brady said already, if you can forgo the 'formula' aspect (which allows you to append new rows and get results automatically), the matrix approach scales much better.

But if you did want to dynamically create the formula columns, it would look something like this:

NamesDefaultToHere**(****1****)**;

// Table with six source columns

dt = New Table**(** "Sum of Columns",

New Column**(** "Column 1", Numeric, "Continuous", Format**(** "Fixed Dec", **10**, **3** **)**, Formula**(** Random Normal**(** **0**, **1** **)** **)** **)**,

New Column**(** "Column 2", Numeric, "Continuous", Format**(** "Fixed Dec", **10**, **3** **)**, Formula**(** Random Normal**(** **0**, **1** **)** **)** **)**,

New Column**(** "Column 3", Numeric, "Continuous", Format**(** "Fixed Dec", **10**, **3** **)**, Formula**(** Random Normal**(** **0**, **1** **)** **)** **)**,

New Column**(** "Column 4", Numeric, "Continuous", Format**(** "Fixed Dec", **10**, **3** **)**, Formula**(** Random Normal**(** **0**, **1** **)** **)** **)**,

New Column**(** "Column 5", Numeric, "Continuous", Format**(** "Fixed Dec", **10**, **3** **)**, Formula**(** Random Normal**(** **0**, **1** **)** **)** **)**,

New Column**(** "Column 6", Numeric, "Continuous", Format**(** "Fixed Dec", **10**, **3** **)**, Formula**(** Random Normal**(** **0**, **1** **)** **)** **)**,

AddRows**(** **10** **)**

**)**;

// List of columns (can be any number of columns!)

cList = dt << **getColumnNames**;

// Build the 'negative sum' formula expression

colContribution = Expr**(**colTBD * If**(** colTBD < **0**, **1**, **0** **))**;

formArg = Expr**(**Add**())**;

For**(**c=**1**, c<=NItems**(**cList**)**, c++,

thisColContribution = NameExpr**(**Substitute**(**NameExpr**(**colContribution**)**, Expr**(**colTBD**)**, EvalExpr**(**AsColumn**(**cList**[**c**]))))**;

InsertInto**(**formArg, thisColContribution**)**;

**)**;

nsExpr = Expr**(**Formula**(**fTBD**))**;

SubstituteInto**(**nsExpr, Expr**(**fTBD**)**, NameExpr**(**formArg**))**;

// Print(nsExpr);

// Build the 'positive sum' formula expression

colContribution = Expr**(**colTBD * If**(** colTBD >= **0**, **1**, **0** **))**;

formArg = Expr**(**Add**())**;

For**(**c=**1**, c<=NItems**(**cList**)**, c++,

thisColContribution = NameExpr**(**Substitute**(**NameExpr**(**colContribution**)**, Expr**(**colTBD**)**, EvalExpr**(**AsColumn**(**cList**[**c**]))))**;

InsertInto**(**formArg, thisColContribution**)**;

**)**;

psExpr = Expr**(**Formula**(**fTBD**))**;

SubstituteInto**(**psExpr, Expr**(**fTBD**)**, NameExpr**(**formArg**))**;

// Print(psExpr);

// Add the two formula columns to dt

nsCol = dt << **NewColumn****(**"Sum of Negatives", Numeric, Continuous, Format**(** "Fixed Dec", **10**, **3** **))**;

makeFormula = Expr**(**Send**(**nsCol**))**;

InsertInto**(**makeFormula, NameExpr**(**nsExpr**))**;

makeFormula;

psCol = dt << **NewColumn****(**"Sum of Positives", Numeric, Continuous, Format**(** "Fixed Dec", **10**, **3** **))**;

makeFormula = Expr**(**Send**(**psCol**))**;

InsertInto**(**makeFormula, NameExpr**(p**sExpr**))**;

makeFormula;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

I like such simple and clever column formulas like the one in Brady's post above. But yes, they often scale terribly with table size.

An alternative would be for the column formula to get the current row directly without first getting the whole matrix in every row. The formula below is an attempt at that.

It is not as fast as Brady's JSL code or Ian's explicit sum over columns, but it is much faster, and scales better (more or less linearly with table size), than a formula relying on *dt<<getasmatrix()*. For a 10000x4 table it takes about 0.2 seconds to run on my laptop.

The example also shows an alternative method to generalize the formula using a using a local variable instead of building the formulas dynamically, which can be a challenge. (in this example it is assumed that the formula columns always remain the last columns).

//Example table

dt = New Table**(**"Conditional sum across columns", AddRows**(****10000****))**;

For**(**i = **1**, i <= **4**, i++,

dt << **New Column****(**"X", Numeric, Formula**(**Random Normal**(****0**, **1****)))**

**)**;

dt << **delete column****(****1****)**;

// Add conditional sums across all columns

dt << **New Column****(**"Positive Sum",

Numeric,

Formula**(**

Local**({**i = **0**, n = N Col**(**Current Data Table**())** - **1****}**,

Sum**(**Repeat**({**i++ ; Column**(**i**)[]** * **(**Column**(**i**)[]** > **0****)}**, n**))**

**)**

**)**

**)**;

dt << **New Column****(**"Negative Sum",

Numeric,

Formula**(**

Local**({**i = **0**, n = N Col**(**Current Data Table**())** - **2****}**,

Sum**(**Repeat**({**i++ ; Column**(**i**)[]** * **(**Column**(**i**)[]** < **0****)}**, n**))**

**)**

**)**

**)**;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Goodness, that's gorgeous!