BookmarkSubscribe
Choose Language Hide Translation Bar
ts2
ts2
Community Trekker

Rolling Cumulative Sum

I am trying to find an efficient way to calculate Rolling 3-day CUMULATIVE SUM. 

I have some for loops that work, but the data sets I am working with are ~500k rows with many more GROUPS, SUBGROUPS and VALUE columns so it takes ~30 minutes to calculate. The example data set is very stripped down and basic.

 

I moved the values into a matrix and that speeds it up a little, but not enough.

 

There is a GROUP and SUBGROUP columns that are to be CUMULATIVE SUM seperately from eachother. Also for each row the CUMULATIVESUM can only be for DATE - 3 days. Anything before :DATE[]-3 days, and after :DATE[] is not used for that row's CUMULATIVE SUM.

 

I have a feeling a column formula using LAG will probably be fastest, but I cannot get it to work properly. 

 

Speed is very important.

 

I have tried subsetting the tables, and that just made it slower.

See table and JSL. The top two JSL methods work perfectly, the third JSL method doesn't work properly.

 

Thank You!

[edit] JMP Pro 12.2.0

table.JPG

dt = Current Data Table();

For( i = 1, i <= N Rows( dt ), i = i + 1,
:CUMULATIVE_SUM_VALUE[i] = :VALUE[i];
For( c = i - 1, c > 0, c = c - 1,
If( :SUBGROUP[c] == :SUBGROUP[i] & :GROUP[c] == :GROUP[i] & (:DATE[i] - :DATE[c]) <= In Days( 3 ),
:CUMULATIVE_SUM_VALUE[i] = :CUMULATIVE_SUM_VALUE[i] + :VALUE[c],
c = -1
)
);
);



//////////////////////////////////////////////////////////////////
//////////////////////////////////////////////////////////////////

//1=GROUP, 2=SUBGROUP, 3=DATE, 4=VALUE, 5=CUMULATIVE_SUM_VALUE

TT = dt << Get All Columns As Matrix;

For( i = 1, i <= N Rows( TT ), i = i + 1,
TT[i, 5] = TT[i, 4];
For( c = i - 1, c > 0, c = c - 1,
If( TT[c, 2] == TT[i, 2] & TT[c, 1] == TT[i, 1] & (TT[i, 3] - TT[c, 3]) <= In Days( 3 ),
TT[i, 5] = TT[i, 5] + TT[c, 4],
c = -1
)
);
);

For Each Row( :CUMULATIVE_SUM_VALUE = TT[Row(), 5] );


//////////////////////////////////////////////////////////////////
//////////////////////////////////////////////////////////////////

dt << New Column( "CUMULATIVE_SUM_VALUE_2",
Formula(
//:VALUE;
If(
Row() == 1, :VALUE,
:GROUP == Lag( :GROUP, 1 ) & :SUBGROUP == Lag( :SUBGROUP, 1 ) & :DATE - Lag( :DATE, 1 ) <= In Days( 3 ), :VALUE + Lag( :CUMULATIVE_SUM_VALUE, 1 )
,
:GROUP != Lag( :GROUP, 1 ) & :SUBGROUP != Lag( :SUBGROUP, 1 ) & :DATE - Lag( :DATE, 1 ) <= In Days( 3 ), :VALUE
,
:GROUP == Lag( :GROUP, 1 ) & :SUBGROUP == Lag( :SUBGROUP, 1 ) & :DATE - Lag( :DATE, 1 ) > In Days( 3 ), :VALUE
,
:GROUP != Lag( :GROUP, 1 ) | :SUBGROUP != Lag( :SUBGROUP, 1 ), :VALUE

)
),
Eval Formula
);

 

 

0 Kudos
1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted
gzmorgan0
Super User

Re: Rolling Cumulative Sum

This should be a little faster. It is a simple offset counting routine, it hinges on the the power of the Loc() and matrix functions.

 

Items to Note:

  • change the path or user_id or set dt = current data table();
  • creating a constant like drng finds In Days(3) once possibly saving finding it 500k times.  This depends upon how JMP optimizes/parses JSL code.
  • In line 30 of the script, I used <= drng so that the results of this script would match your results.  However, consider whether this should be a strict less than (<). Row 37 in your table sums rows 33-37, Jan19 thru Jan22, that is 4 days. There are other instances. Suppose you wanted the last 30 days of January, that should be Jan02 thru Jan31.
  • :Date is formatted as m/d/y, however, the date value includes time. If you wanted to roll by calendar day (not time), comment out line 30 and uncomment line 31.  The value 86400 = 24*60*60 or In Days(1)

 

Names Default to Here(1);

dt = Open("c:\users\<user_id>\downloads\cumulative_sum_test_table.jmp");

//This assumes the data table is soted by Group and SubGroup and Date
scol = dt << new column("Sequence", numeric, continuous);
scol << set each value(Row());
ccol = dt << new Column("CUMULATIVE_SUM_VALUE_2", numeric, continuous);
summarize(Data Table(dt), _ttt = By(:GROUP, :SUBGROUP), _ibeg=Min(scol), _iend=Max(scol) );

dt << delete column(scol);  //no longer needed

dmat = :DATE  << get as matrix;
vmat = :VALUE << get as matrix;

//for each group
drng = In Days(3);  //259200
For(i=1, i<=nitems(_ttt[1]), i++,
    b = _ibeg[i];
    e = _iend[i];
	dsub = dmat[ b::e ];
	vsub = vmat[ b::e ];
	For(j = b, j <= e, j++,
	    //j=37
	    k= j-b+1;
//JMP12 converted a 1x1 matrix to a scalar
        if(k==1, 
           ccol[j] = vsub[1] 
           ,    //else
           idx = loc( 0<= -1 * dsub[1::k] + dmat[j] <= drng  );  //should this be < not <=
//idx = loc( 0<= Floor( (-1 * dsub[1::k] + dmat[j])/86400) <= 3 ); // < ?? ccol[j] = sum(vsub[idx]) ); //end if ); // end for j );
3 REPLIES 3
Highlighted
gzmorgan0
Super User

Re: Rolling Cumulative Sum

This should be a little faster. It is a simple offset counting routine, it hinges on the the power of the Loc() and matrix functions.

 

Items to Note:

  • change the path or user_id or set dt = current data table();
  • creating a constant like drng finds In Days(3) once possibly saving finding it 500k times.  This depends upon how JMP optimizes/parses JSL code.
  • In line 30 of the script, I used <= drng so that the results of this script would match your results.  However, consider whether this should be a strict less than (<). Row 37 in your table sums rows 33-37, Jan19 thru Jan22, that is 4 days. There are other instances. Suppose you wanted the last 30 days of January, that should be Jan02 thru Jan31.
  • :Date is formatted as m/d/y, however, the date value includes time. If you wanted to roll by calendar day (not time), comment out line 30 and uncomment line 31.  The value 86400 = 24*60*60 or In Days(1)

 

Names Default to Here(1);

dt = Open("c:\users\<user_id>\downloads\cumulative_sum_test_table.jmp");

//This assumes the data table is soted by Group and SubGroup and Date
scol = dt << new column("Sequence", numeric, continuous);
scol << set each value(Row());
ccol = dt << new Column("CUMULATIVE_SUM_VALUE_2", numeric, continuous);
summarize(Data Table(dt), _ttt = By(:GROUP, :SUBGROUP), _ibeg=Min(scol), _iend=Max(scol) );

dt << delete column(scol);  //no longer needed

dmat = :DATE  << get as matrix;
vmat = :VALUE << get as matrix;

//for each group
drng = In Days(3);  //259200
For(i=1, i<=nitems(_ttt[1]), i++,
    b = _ibeg[i];
    e = _iend[i];
	dsub = dmat[ b::e ];
	vsub = vmat[ b::e ];
	For(j = b, j <= e, j++,
	    //j=37
	    k= j-b+1;
//JMP12 converted a 1x1 matrix to a scalar
        if(k==1, 
           ccol[j] = vsub[1] 
           ,    //else
           idx = loc( 0<= -1 * dsub[1::k] + dmat[j] <= drng  );  //should this be < not <=
//idx = loc( 0<= Floor( (-1 * dsub[1::k] + dmat[j])/86400) <= 3 ); // < ?? ccol[j] = sum(vsub[idx]) ); //end if ); // end for j );
ts2
ts2
Community Trekker

Re: Rolling Cumulative Sum

This works great. I am seeing >200% speed increase using this method with my dataset.

 

There was a small bug where if there was only one row for a SubGroup (you had a JSL comment about a 1x1 matrix converted to scalar). I have no idea why this would happen.

I made sure these would stay a matrix as so:

	dsub = Matrix(dmat[b :: e]);
	vsub = Matrix(vmat[b :: e]);

 

gzmorgan0
Super User

Re: Rolling Cumulative Sum

Good catch. I made a rookie mistake not catching the corner case for a 1 row subgroup. I am so glad JMP no longer converts a 1x1 matrix to a scalar. 

//JMP12 converted a 1x1 matrix to a scalar

 

0 Kudos