Subscribe Bookmark RSS Feed

Re: Moving Average with Missing Data

terapin

Community Trekker

Joined:

Jun 23, 2011

I'm trying to calculate a moving average of some data that has some holes in it.  I can't seem to figure out how to caculate a moving average that only includes non missing data and that only calculates the result for rows with non missing data.  Any suggestions would be appreciated.

 

New Table( "Untitled 5",
	Add Rows( 30 ),
	Compress File When Saved( 1 ),
	New Table Variable( "Moving Avg Window", 5 ),
	New Column( "DateTime",
		Numeric,
		"Continuous",
		Format( "m/d/y h:m", 19 ),
		Input Format( "m/d/y h:m" ),
		Set Values(
			[3225888000, 3225888900, 3225889800, 3225890700, 3225891600, 3225892500, 3225893400, 3225894300,
			3225895200, 3225896100, 3225897000, 3225897900, 3225898800, 3225899700, 3225900600, 3225901500,
			3225902400, 3225903300, 3225904200, 3225905100, 3225906000, 3225906900, 3225907800, 3225908700,
			3225909600, 3225910500, 3225911400, 3225912300, 3225913200, 3225914100]
		)
	),
	New Column( "data",
		Numeric,
		"Continuous",
		Format( "Best", Use thousands separator( 0 ), 15 ),
		Set Values( [5, 5, 5, 5, 5, ., 5, 5, 5, 5, 5, 5, 5, 5, ., 5, 5, 5, 5, 5, 5, 5, ., ., 5, 5, 5, 5, 5, 5] )
	),
	New Column( "expected run avg",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Set Values( [., ., ., ., 5, ., 5, 5, 5, 5, 5, 5, 5, 5, ., 5, 5, 5, 5, 5, 5, 5, ., ., 5, 5, 5, 5, 5, 5] )
	),
	New Column( "run avg",
		Numeric,
		"Continuous",
		Format( "Fixed Dec", 12, 1 ),
		Formula(
			If( Row() < :Moving Avg Window,
				.,
				Sum( :data[Index( Row() - (:Moving Avg Window - 1), Row() )] ) / :Moving Avg Window
			)
		)
	)
);
1 ACCEPTED SOLUTION

Accepted Solutions
ms

Super User

Joined:

Jun 23, 2011

Solution

The Mean() function neglects any missing values. The below formula stick to the time window but do not include missing values when calculating the mean.

If(Row() < :Moving Avg Window,
    Mean(:data[Index(1, Row())]),
    Mean(:data[Index(Row() - (:Moving Avg Window - 1), Row())])
);

 

 

6 REPLIES
Highlighted
txnelson

Super User

Joined:

Jun 22, 2012

Here is a script that I believe will give you what you want

Names Default To Here( 1 );
dt = New Table( "Untitled 5",
	Add Rows( 30 ),
	Compress File When Saved( 1 ),
	New Table Variable( "Moving Avg Window", 5 ),
	New Column( "DateTime",
		Numeric,
		"Continuous",
		Format( "m/d/y h:m", 19 ),
		Input Format( "m/d/y h:m" ),
		Set Values(
			[3225888000, 3225888900, 3225889800, 3225890700, 3225891600, 3225892500, 3225893400,
			3225894300, 3225895200, 3225896100, 3225897000, 3225897900, 3225898800, 3225899700,
			3225900600, 3225901500, 3225902400, 3225903300, 3225904200, 3225905100, 3225906000,
			3225906900, 3225907800, 3225908700, 3225909600, 3225910500, 3225911400, 3225912300,
			3225913200, 3225914100]
		)
	),
	New Column( "data",
		Numeric,
		"Continuous",
		Format( "Best", Use thousands separator( 0 ), 15 ),
		Set Values(
			[5, 5, 5, 5, 5, ., 5, 5, 5, 5, 5, 5, 5, 5, ., 5, 5, 5, 5, 5, 5, 5, ., ., 5, 5, 5, 5,
			5, 5]
		)
	),
	New Column( "expected run avg",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Set Values(
			[., ., ., ., 5, ., 5, 5, 5, 5, 5, 5, 5, 5, ., 5, 5, 5, 5, 5, 5, 5, ., ., 5, 5, 5, 5,
			5, 5]
		)
	),
	New Column( "run avg",
		Numeric,
		"Continuous",
		Format( "Fixed Dec", 12, 1 ),
		Formula(
			If( Row() == 1,
				NonMissingRows = dt << get rows where( Is Missing( :data ) == 0 )
			);
			If( Loc( As List( NonMissingRows ), Row() ) < :Moving Avg Window,
				.,
				TargetPosition = Loc( As List( NonMissingRows ), Row() );
				MatrixofMAW = NonMissingRows[Index(
					TargetPosition - (Moving Avg Window - 1),
					TargetPosition
				)];
				Show( Row(), MatrixofMAW );
				Sum( :data[MatrixofMAW] ) / :Moving Avg Window;
			);
		)
	)
);
Jim
mpl34

Community Trekker

Joined:

Feb 16, 2016

If you can't create a subset excluding the null values I think the formula below works--though I'm sure there is a better way. This gives a null value for rows without data but you can have it repeat the average by getting rid of the if statement.

 

If(Is Missing(:data), ., rsum = 0; rcount = 0; temprow = 0; (While(rcount < :Moving Avg Window, While(Is Missing(:data[Row() - temprow]) & Row() - temprow >= 1, temprow += 1); rsum += :data[Row() - temprow]; temprow += 1; rcount += 1); rsum / :Moving Avg Window))

 

 

ms

Super User

Joined:

Jun 23, 2011

Solution

The Mean() function neglects any missing values. The below formula stick to the time window but do not include missing values when calculating the mean.

If(Row() < :Moving Avg Window,
    Mean(:data[Index(1, Row())]),
    Mean(:data[Index(Row() - (:Moving Avg Window - 1), Row())])
);

 

 

vince_faller

Super User

Joined:

Mar 17, 2015

In the below link we talked about this and put a couple of faster algorithms for it. 

 

https://community.jmp.com/t5/Discussions/Efficient-moving-average-algorithm/td-p/8553

 

terapin

Community Trekker

Joined:

Jun 23, 2011

Thanks everyone for weighing in on this question.  MS' suggestion is certainly the cleanest and easiest for me to implement since it let's me define and specify backward and forward moving window size.  I certainly learned a lot from all the examples provided.  Thanks!

ian_jmp

Staff

Joined:

Jun 23, 2011

I didn't look at this case in detail. But I just wanted to also mention:

 

Col Moving Average(name, options, <By var, ...>)

 

which seems to have a lot of flexibility.