Subscribe Bookmark RSS Feed

Efficient moving average algorithm

daneo

Community Trekker

Joined:

May 9, 2014

Hello,

Thanks to those who have posted on here before, I've found some useful answers!

I have several large data sets (20 million points each!) from an oscilloscope. I want to calculate a moving average. However, the algorithm I've devised is very slow. Here is the code for my formula:

"

If(Row() < 100, Empty(), Summation(i = Row() - 99, Row(), :Current[i,Empty()]) / 100)

"

6272_Screen Shot 2014-05-09 at 1.40.00 PM.png

As the width of the averaging window (in this case 100 samples) gets larger (say 500, 1000, or even 50000 or more), the calculation time becomes unreasonable. Does someone have a faster way of doing a moving average?

Thanks!

Dane

1 ACCEPTED SOLUTION

Accepted Solutions
Solution

I can offer a couple of strategies here.

I think both are more efficient that the Summation() operator you're currently using. I can't say they are the most efficient possible though.

1) Use the Sum() function on a matrix.


If(Row()<:Moving Avg Window,


  .,


  Sum( :Current[Index( Row() - (:Moving Avg Window - 1), Row() )] ) / :Moving Avg Window


)



6275_JMPScreenSnapz003.png

This takes advantage of the speed of the matrix operation. The subscript of the Current column in this formula is a matrix of the row numbers from the start of the moving average window to the current row number. My subscripting the column this way JMP returns a matrix and the Sum function is relatively efficient over a matrix.

An even faster way though is to calculate this moving average over two columns. The first column simply maintains a moving sum by adding the value of Current in one row to the value of the rolling sum in the previous row and then subtracting the value of Current from the row at the beginning of the window.

Here's the formula for Moving Sum.


If(Row()<=:Moving Avg Window,


  If( Row() == 1,


  :Current,


  Lag( :Name( "moving sum(x)" ), 1 ) + :Current


  ),


  (Lag( :Name( "moving sum(x)" ), 1 ) + :Current) -


  Lag( :Current, :Moving Avg Window )


)



6276_JMPScreenSnapz004.png

Then it's easy to have a column that divides Moving Sum by the Moving Avg Window.


If(Row()>=:Moving Avg Window,


  :Name( "moving sum(x)" ) / :Moving Avg Window,


  .


)



6277_JMPScreenSnapz005.png

I'm attaching a data table that shows both of these methods. In the data table the Moving Avg Window is a Data Table Variable.

To see the efficiency of the second method in this data table you may want to suppress the evaluation of the of the Moving Avg(x) column first.

-Jeff

-Jeff
4 REPLIES
Solution

I can offer a couple of strategies here.

I think both are more efficient that the Summation() operator you're currently using. I can't say they are the most efficient possible though.

1) Use the Sum() function on a matrix.


If(Row()<:Moving Avg Window,


  .,


  Sum( :Current[Index( Row() - (:Moving Avg Window - 1), Row() )] ) / :Moving Avg Window


)



6275_JMPScreenSnapz003.png

This takes advantage of the speed of the matrix operation. The subscript of the Current column in this formula is a matrix of the row numbers from the start of the moving average window to the current row number. My subscripting the column this way JMP returns a matrix and the Sum function is relatively efficient over a matrix.

An even faster way though is to calculate this moving average over two columns. The first column simply maintains a moving sum by adding the value of Current in one row to the value of the rolling sum in the previous row and then subtracting the value of Current from the row at the beginning of the window.

Here's the formula for Moving Sum.


If(Row()<=:Moving Avg Window,


  If( Row() == 1,


  :Current,


  Lag( :Name( "moving sum(x)" ), 1 ) + :Current


  ),


  (Lag( :Name( "moving sum(x)" ), 1 ) + :Current) -


  Lag( :Current, :Moving Avg Window )


)



6276_JMPScreenSnapz004.png

Then it's easy to have a column that divides Moving Sum by the Moving Avg Window.


If(Row()>=:Moving Avg Window,


  :Name( "moving sum(x)" ) / :Moving Avg Window,


  .


)



6277_JMPScreenSnapz005.png

I'm attaching a data table that shows both of these methods. In the data table the Moving Avg Window is a Data Table Variable.

To see the efficiency of the second method in this data table you may want to suppress the evaluation of the of the Moving Avg(x) column first.

-Jeff

-Jeff
daneo

Community Trekker

Joined:

May 9, 2014

Thanks, Jeff. These run much faster than the summation method!

tmenzer0

Community Member

Joined:

Nov 4, 2015

That's a great solution. Note that any missing values in the data column ("current" in this example) will create blank values in the calculated data column from the row where the first missing data occurs to all rows below. It is a small issue to be aware of.
vince_faller

Super User

Joined:

Mar 17, 2015

I usually find that if I can avoid the table all together it's faster.  This runs faster on my machine.  You'll notice that My numbers are different than Jeffs too.  That's simply because I mirror the numbers on the end then go x points out from the center (rather than last x points).  Also this has weighting of missing values so if you have any missing values it's a non issue.  At least I think so.  If you don't need a column at the end it's even faster too.  And the since it's just basically a concatted index matrix, it's pretty quick.  

 

Names Default to here(1);
dt = open("$SAMPLE_DATA\Big Class.jmp");
st1 = HPtime();
mat = :height << Get Values;
//number of rows you want to average
x= 5;
//+- gap
z = (x-1)/2;
//get the rows
rows = 1::nrows(mat);
//mirror the rows on the front and back
mirrored = ((z+1)::2)||rows||((ncol(rows)-1)::(ncol(rows)-z));
big_mat = [];
for(i=0, i<=x-1, i++, 
//this is cutting up the longer mirrored vector into x vectors of the same length as the original
big_mat ||= mat[mirrored[rows+i]];
);

//now we can just take a VSum of the transpose /x to get our average
Squasher = !ismissing(big_mat); //everywhere there's a non-missing, make 1 
Squasher = VSum(Squasher`); //count real numbers, this is to not weight non-missings
A = Vsum(big_mat`)/x;
New Column("Moving Average", Set Values(A)); //If you don't need this column 
//and you're just doing matrix operations this is actually about half the total time
en1 = HPTime()-st1;



// Data Table method
st2 = HPTime();
dt<< New Column("moving sum(x)", formula(If(Row()<=:Moving Avg Window,
If( Row() == 1,
:height,
Lag( :Name( "moving sum(x)" ), 1 ) + :height
),
(Lag( :Name( "moving sum(x)" ), 1 ) + :height) -
Lag( :height, :Moving Avg Window )
)));
dt<< New Column("Moving Avg", Formula(
If(Row()>=:Moving Avg Window,
:Name( "moving sum(x)" ) / :Moving Avg Window,
.
)));
en2 = HPTime()-st2;
show(en1, en2);