Subscribe Bookmark RSS Feed

SUMIF: how to get the conditional sum across columns?

yanne

Community Trekker

Joined:

Sep 22, 2015

I want to calculate the sum of 10 columns into a new column, whereby I only want to sum the positive values. Similarly, I want another column summing all negative values. In Excel I would use the function SUMIF:     =SUMIF(A1:J1;">0")

How can I do something similar in JMP?

Using a simple example in excel (while obviously my dataset is a bigger), this is what I am aiming for:

  

ABCDpositive sumnegative sum
1,002,00-7,005,008,00-7,00
4,0064,0055,905,20129,100,00
86,10-34,445,607,2198,91-34,44
6,008,00-96,006,0020,00-96,00
1,00-8,005,000,646,64-8,00

Many thanks for your help!

Yanne

7 REPLIES
ian_jmp

Staff

Joined:

Jun 23, 2011

You could use an approach like that in this table:

New Table( "Sum of Columns",

  Add Rows( 0 ),

  New Column( "Column 1",

  Numeric,

  "Continuous",

  Format( "Fixed Dec", 10, 3 ),

  Formula(RandomNormal(0, 1))

  ),

  New Column( "Column 2",

  Numeric,

  "Continuous",

  Format( "Fixed Dec", 10, 3 ),

  Formula(RandomNormal(0, 1))

  ),

  New Column( "Column 3",

  Numeric,

  "Continuous",

  Format( "Fixed Dec", 10, 3 ),

  Formula(RandomNormal(0, 1))

  ),

  New Column( "Sum of Positives",

  Numeric,

  "Continuous",

  Format( "Fixed Dec", 10, 3 ),

  Formula(

  :Column 1 * If( :Column 1 >= 0, 1, 0 ) + :Column 2 *

  If( :Column 2 >= 0, 1, 0 ) + :Column 3 * If( :Column 3 >= 0, 1, 0 )

  )

  ),

  New Column( "Sum of Negatives",

  Numeric,

  "Continuous",

  Format( "Fixed Dec", 10, 3 ),

  Formula(

  :Column 1 * If( :Column 1 < 0, 1, 0 ) + :Column 2 *

  If( :Column 2 < 0, 1, 0 ) + :Column 3 * If( :Column 3 < 0, 1, 0 )

  )

  ),

  AddRows(10)

);

With many columns, it may be easier to build the requisite 'sum' columns programmatically.

brady_brady

Staff

Joined:

Jun 9, 2012

As Ian mentions, this can get unwieldy for many columns. If you don't truly need a formula (i.e, the source columns will not change) you can do it quickly for all rows by running the following JSL with the desired data table current (the code assumes the numerics are in columns 1-4, per the indices in line 2 of the code... change as needed but keep them contiguous):

dt = Current Data Table();

m = (dt << Get as Matrix)[0, 1::4];

dt << new column("Positive Sum", << set values(vsum(emult(m>0,m)`)`));

dt << new column("Negative Sum", << set values(vsum(emult(m<0,m)`)`));

If you need a formula, you could use the following (best to copy/paste it into the editor), which is simple, but will run more slowly as your table grows very large because you're getting the whole matrix for each row. (Change the inequality for the negative sum. The :* operator is the elementwise multiplication used in the emult() function above.)


dt = Current Data Table();

m = (dt << getasmatrix)[Row(), Index( 1, 4 )];

Sum( (m > 0) :* m );


The tradeoff here is that despite its performance limitations, as functions go, it is pretty short, simple, and easy to change if you add more columns. I'd only use it on relatively small tables, though. Otherwise, it would be best to use something like Ian suggests.

Comparing the two approaches I describe for a 10,000 row by 4-column table of random numbers:


Script approach: 0.002713 seconds

Formula approach: 6.050183 seconds


The script approach only takes 3 seconds on a 10,000,000 row by 4 column table. I stopped running the formula approach after an hour on only a 1,000,000 row table. It really comes down to whether you need the formula piece, or are willing to save a script into the table and just run it as needed. If so, you'll obtain a considerable performance benefit.



Cheers,

Brady

yanne

Community Trekker

Joined:

Sep 22, 2015

Dear Ian and Brady,

Thanks both for your kind and quick help!!! I tried both approaches, and somehow got two different results :-(.

By testing some examples manually, I found that the approach from Ian gave the correct result, while the script from Brady returned far bigger sums. No idea why and unfortunately I am still a JMP beginner so cannot figure out what went wrong. But as the approach from Ian seemed to work out fine, I decided to do stick with that one (even though the formula gets pretty long for 10 columns).

So thanks both for your help!!!

(and let's hope if I ever need this for 100 columns, JMP will have a SUMIF feature by then ;-) )

Cheers,

Yanne

pmroz

Super User

Joined:

Jun 23, 2011

Brady's results will match Ian's, but you have to change this line: m = (dt << Get as Matrix)[0, 1::4];

to this: m = (dt << Get as Matrix)[0, 1::3];

The table Ian creates only has three columns in it, while Brady's example is for 4 columns.

ian_jmp

Staff

Joined:

Jun 23, 2011

As Brady said already, if you can forgo the 'formula' aspect (which allows you to append new rows and get results automatically), the matrix approach scales much better.

But if you did want to dynamically create the formula columns, it would look something like this:

NamesDefaultToHere(1);

// Table with six source columns

dt = New Table( "Sum of Columns",

  New Column( "Column 1", Numeric, "Continuous", Format( "Fixed Dec", 10, 3 ), Formula( Random Normal( 0, 1 ) ) ),

  New Column( "Column 2", Numeric, "Continuous", Format( "Fixed Dec", 10, 3 ), Formula( Random Normal( 0, 1 ) ) ),

  New Column( "Column 3", Numeric, "Continuous", Format( "Fixed Dec", 10, 3 ), Formula( Random Normal( 0, 1 ) ) ),

  New Column( "Column 4", Numeric, "Continuous", Format( "Fixed Dec", 10, 3 ), Formula( Random Normal( 0, 1 ) ) ),

  New Column( "Column 5", Numeric, "Continuous", Format( "Fixed Dec", 10, 3 ), Formula( Random Normal( 0, 1 ) ) ),

  New Column( "Column 6", Numeric, "Continuous", Format( "Fixed Dec", 10, 3 ), Formula( Random Normal( 0, 1 ) ) ),

  AddRows( 10 )

);

// List of columns (can be any number of columns!)

cList = dt << getColumnNames;

// Build the 'negative sum' formula expression

colContribution = Expr(colTBD * If( colTBD < 0, 1, 0 ));

formArg = Expr(Add());

For(c=1, c<=NItems(cList), c++,

  thisColContribution = NameExpr(Substitute(NameExpr(colContribution), Expr(colTBD), EvalExpr(AsColumn(cList[c]))));

  InsertInto(formArg, thisColContribution);

);

nsExpr = Expr(Formula(fTBD));

SubstituteInto(nsExpr, Expr(fTBD), NameExpr(formArg));

// Print(nsExpr);

// Build the 'positive sum' formula expression

colContribution = Expr(colTBD * If( colTBD >= 0, 1, 0 ));

formArg = Expr(Add());

For(c=1, c<=NItems(cList), c++,

  thisColContribution = NameExpr(Substitute(NameExpr(colContribution), Expr(colTBD), EvalExpr(AsColumn(cList[c]))));

  InsertInto(formArg, thisColContribution);

);

psExpr = Expr(Formula(fTBD));

SubstituteInto(psExpr, Expr(fTBD), NameExpr(formArg));

// Print(psExpr);

// Add the two formula columns to dt

nsCol = dt << NewColumn("Sum of Negatives", Numeric, Continuous, Format( "Fixed Dec", 10, 3 ));

makeFormula = Expr(Send(nsCol));

InsertInto(makeFormula, NameExpr(nsExpr));

makeFormula;

psCol = dt << NewColumn("Sum of Positives", Numeric, Continuous, Format( "Fixed Dec", 10, 3 ));

makeFormula = Expr(Send(psCol));

InsertInto(makeFormula, NameExpr(psExpr));

makeFormula;

ms

Super User

Joined:

Jun 23, 2011

I like such simple and clever column formulas like the one in Brady's post above. But yes, they often scale terribly with table size.

An alternative would be for the column formula to get the current row directly without first getting the whole matrix in every row. The formula below is an attempt at that.


It is not as fast as Brady's JSL code or Ian's explicit sum over columns, but it is much faster, and scales better (more or less linearly with table size), than a formula relying on dt<<getasmatrix(). For a 10000x4 table it takes about 0.2 seconds to run on my laptop.


The example also shows an alternative method to generalize the formula using a using a local variable instead of building the formulas dynamically, which can be a challenge. (in this example it is assumed that the formula columns always remain the last columns). 

//Example table

dt = New Table("Conditional sum across columns", AddRows(10000));

For(i = 1, i <= 4, i++,

    dt << New Column("X", Numeric, Formula(Random Normal(0, 1)))

);

dt << delete column(1);

// Add conditional sums across all columns

dt << New Column("Positive Sum",

    Numeric,

    Formula(

        Local({i = 0, n = N Col(Current Data Table()) - 1},

            Sum(Repeat({i++ ; Column(i)[] * (Column(i)[] > 0)}, n))

        )

    )

);

dt << New Column("Negative Sum",

    Numeric,

    Formula(

        Local({i = 0, n = N Col(Current Data Table()) - 2},

            Sum(Repeat({i++ ; Column(i)[] * (Column(i)[] < 0)}, n))

        )

    )

);

brady_brady

Staff

Joined:

Jun 9, 2012

Goodness, that's gorgeous!