cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
ih
Super User (Alumni) ih
Super User (Alumni)

A time series (seasonal ARIMA) model for varying variability in different seasons

Is there a time series ARIMA model that can represents increased variation (expand confidence intervals) in values during one season?

 

As an example, attached is a data set showing the number of deaths in the United States.  A time series analysis of deaths prior to March of 2020 (settings below) gives a reasonable prediction of deaths, however its accuracy is reduced during Dec - Feb (flu season and dead of winter).  The error bars around this model do not increase during that season, so while relatively few points on the whole graph are outside the CI, 5 out of 6 years have points out of the expected range in Dec or Jan, and no points are out of range during any other season.

 

I think the confidence interval should be smaller in the summer and wider in the winter.  Is there a better model to represent this data? A simple linear regression of week x values from previous years would show this increased variation but I feel like something in the time series platform should be able to help.

 

See the confidence intervals (orange lines) in the bottom of the chart below, and note how prior to 2020 the blue dots (actual deaths) are often out of that range near the start of each year.

 

ih_1-1633114834691.png

 

Settings for the time series model used to draw this graph:

Time Series(
	X( :Week Ending Date ),
	Y( :All Cause ),
	Autocorrelation Lags( 52 ),
	Forecast Periods( 52 ),
	Seasonal ARIMA( 0, 0, 0, 1, 1, 1, 52, Prediction Interval( 0.99 ) ),
	SendToReport(
		Dispatch( {}, "Time Series Basic Diagnostics", OutlineBox, {Close( 1 )} ),
		Dispatch( {"Model Comparison"}, "", ScrollBox, {Background Color( 117 )} )
	)
)

You can recreate this data table, or see it for an individual state, using the script below.  Download the data (export as csv and save to your computer) from here:

https://data.cdc.gov/NCHS/Weekly-Counts-of-Deaths-by-State-and-Select-Causes/3yf8-kanr
https://data.cdc.gov/NCHS/Weekly-Provisional-Counts-of-Deaths-by-State-and-S/muzy-jte6

 

View more...
Names default to here(1);

//Choose Juristiction , either 'United States' or a state name, match a value in that column in csv files.
Juristiction Filter = "United States";
//Juristiction Filter = "Illinois";

//Export data to csv files from the sites below:
//  https://data.cdc.gov/NCHS/Weekly-Counts-of-Deaths-by-State-and-Select-Causes/3yf8-kanr
//  https://data.cdc.gov/NCHS/Weekly-Provisional-Counts-of-Deaths-by-State-and-S/muzy-jte6

dtFinal = Open(
	pick file("Open final file (Weekly counts of deaths by states)", "$Downloads"),
	columns(
		New Column( "Jurisdiction of Occurrence", Character, "Nominal" ),
		New Column( "MMWR Year", Numeric, "Continuous", Format( "Best", 12 ) ),
		New Column( "MMWR Week", Numeric, "Continuous", Format( "Best", 12 ) ),
		New Column( "Week Ending Date",
			Numeric,
			"Continuous",
			Format( "m/d/y", 10 ),
			Input Format( "m/d/y" )
		),
		New Column( "All  Cause", Numeric, "Continuous", Format( "Best", 12 ) ),
		New Column( "Natural Cause", Numeric, "Continuous", Format( "Best", 12 ) ),
		New Column( "Septicemia (A40-A41)",
			Numeric,
			"Continuous",
			Format( "Best", 12 )
		),
		New Column( "Malignant neoplasms (C00-C97)",
			Numeric,
			"Continuous",
			Format( "Best", 12 )
		),
		New Column( "Diabetes mellitus (E10-E14)",
			Numeric,
			"Continuous",
			Format( "Best", 12 )
		),
		New Column( "Alzheimer disease (G30)",
			Numeric,
			"Continuous",
			Format( "Best", 12 )
		),
		New Column( "Influenza and pneumonia (J10-J18)",
			Numeric,
			"Continuous",
			Format( "Best", 12 )
		),
		New Column( "Chronic lower respiratory diseases (J40-J47)",
			Numeric,
			"Continuous",
			Format( "Best", 12 )
		),
		New Column(
			"Other diseases of respiratory system (J00-J06,J30-J39,J67,J70-J98)",
			Numeric,
			"Continuous",
			Format( "Best", 12 )
		),
		New Column(
			"Nephritis, nephrotic syndrome and nephrosis (N00-N07,N17-N19,N25-N27)",
			Numeric,
			"Continuous",
			Format( "Best", 12 )
		),
		New Column(
			"Symptoms, signs and abnormal clinical and laboratory findings, not elsewhere classified (R00-R99)",
			Numeric,
			"Continuous",
			Format( "Best", 12 )
		),
		New Column( "Diseases of heart (I00-I09,I11,I13,I20-I51)",
			Numeric,
			"Continuous",
			Format( "Best", 12 )
		),
		New Column( "Cerebrovascular diseases (I60-I69)",
			Numeric,
			"Continuous",
			Format( "Best", 12 )
		),
		New Column( "flag_allcause", Character, "Nominal" ),
		New Column( "flag_natcause", Character, "Nominal" ),
		New Column( "flag_sept", Character, "Nominal" ),
		New Column( "flag_neopl", Character, "Nominal" ),
		New Column( "flag_diab", Character, "Nominal" ),
		New Column( "flag_alz", Character, "Nominal" ),
		New Column( "flag_inflpn", Character, "Nominal" ),
		New Column( "flag_clrd", Character, "Nominal" ),
		New Column( "flag_otherresp", Character, "Nominal" ),
		New Column( "flag_nephr", Character, "Nominal" ),
		New Column( "flag_otherunk", Character, "Nominal" ),
		New Column( "flag_hd", Character, "Nominal" ),
		New Column( "flag_stroke", Character, "Nominal" )
	),
	Import Settings(
		End Of Line( CRLF, CR, LF ),
		End Of Field( Comma, CSV( 1 ) ),
		Strip Quotes( 1 ),
		Use Apostrophe as Quotation Mark( 0 ),
		Use Regional Settings( 0 ),
		Scan Whole File( 1 ),
		Treat empty columns as numeric( 0 ),
		CompressNumericColumns( 0 ),
		CompressCharacterColumns( 0 ),
		CompressAllowListCheck( 0 ),
		Labels( 1 ),
		Column Names Start( 1 ),
		Data Starts( 2 ),
		Lines To Read( "All" ),
		Year Rule( "20xx" )
	)
);

dtProv = Open(
	Pick File("Open provisional file (weekly provisional counts of deaths)", "$Downloads"),
	columns(
		New Column( "Data As Of",
			Numeric,
			"Continuous",
			Format( "m/d/y", 10 ),
			Input Format( "m/d/y" )
		),
		New Column( "Jurisdiction of Occurrence", Character, "Nominal" ),
		New Column( "MMWR Year", Numeric, "Continuous", Format( "Best", 12 ) ),
		New Column( "MMWR Week", Numeric, "Continuous", Format( "Best", 12 ) ),
		New Column( "Week Ending Date",
			Numeric,
			"Continuous",
			Format( "yyyy-mm-dd", 10 ),
			Input Format( "yyyy-mm-dd" )
		),
		New Column( "All Cause", Numeric, "Continuous", Format( "Best", 12 ) ),
		New Column( "Natural Cause", Numeric, "Continuous", Format( "Best", 12 ) ),
		New Column( "Septicemia (A40-A41)",
			Numeric,
			"Continuous",
			Format( "Best", 12 )
		),
		New Column( "Malignant neoplasms (C00-C97)",
			Numeric,
			"Continuous",
			Format( "Best", 12 )
		),
		New Column( "Diabetes mellitus (E10-E14)",
			Numeric,
			"Continuous",
			Format( "Best", 12 )
		),
		New Column( "Alzheimer disease (G30)",
			Numeric,
			"Continuous",
			Format( "Best", 12 )
		),
		New Column( "Influenza and pneumonia (J09-J18)",
			Numeric,
			"Continuous",
			Format( "Best", 12 )
		),
		New Column( "Chronic lower respiratory diseases (J40-J47)",
			Numeric,
			"Continuous",
			Format( "Best", 12 )
		),
		New Column(
			"Other diseases of respiratory system (J00-J06,J30-J39,J67,J70-J98)",
			Numeric,
			"Continuous",
			Format( "Best", 12 )
		),
		New Column(
			"Nephritis, nephrotic syndrome and nephrosis (N00-N07,N17-N19,N25-N27)",
			Numeric,
			"Continuous",
			Format( "Best", 12 )
		),
		New Column(
			"Symptoms, signs and abnormal clinical and laboratory findings, not elsewhere classified (R00-R99)",
			Numeric,
			"Continuous",
			Format( "Best", 12 )
		),
		New Column( "Diseases of heart (I00-I09,I11,I13,I20-I51)",
			Numeric,
			"Continuous",
			Format( "Best", 12 )
		),
		New Column( "Cerebrovascular diseases (I60-I69)",
			Numeric,
			"Continuous",
			Format( "Best", 12 )
		),
		New Column( "COVID-19 (U071, Multiple Cause of Death)",
			Numeric,
			"Continuous",
			Format( "Best", 12 )
		),
		New Column( "COVID-19 (U071, Underlying Cause of Death)",
			Numeric,
			"Continuous",
			Format( "Best", 12 )
		),
		New Column( "flag_allcause", Character, "Nominal" ),
		New Column( "flag_natcause", Character, "Nominal" ),
		New Column( "flag_sept", Character, "Nominal" ),
		New Column( "flag_neopl", Character, "Nominal" ),
		New Column( "flag_diab", Character, "Nominal" ),
		New Column( "flag_alz", Character, "Nominal" ),
		New Column( "flag_inflpn", Character, "Nominal" ),
		New Column( "flag_clrd", Character, "Nominal" ),
		New Column( "flag_otherresp", Character, "Nominal" ),
		New Column( "flag_nephr", Character, "Nominal" ),
		New Column( "flag_otherunk", Character, "Nominal" ),
		New Column( "flag_hd", Character, "Nominal" ),
		New Column( "flag_stroke", Character, "Nominal" ),
		New Column( "flag_cov19mcod", Character, "Nominal" ),
		New Column( "flag_cov19ucod", Character, "Nominal" )
	),
	Import Settings(
		End Of Line( CRLF, CR, LF ),
		End Of Field( Comma, CSV( 1 ) ),
		Strip Quotes( 1 ),
		Use Apostrophe as Quotation Mark( 0 ),
		Use Regional Settings( 0 ),
		Scan Whole File( 1 ),
		Treat empty columns as numeric( 0 ),
		CompressNumericColumns( 0 ),
		CompressCharacterColumns( 0 ),
		CompressAllowListCheck( 0 ),
		Labels( 1 ),
		Column Names Start( 1 ),
		Data Starts( 2 ),
		Lines To Read( "All" ),
		Year Rule( "20xx" )
	)
);

dtProv:" All Cause"n << Set Name("All Cause");
dtFinal:"All  Cause"n << Set Name("All Cause");

dtAll = dtFinal << Concatenate(
	dtProv,
	Create source column
);

dtAll << Select Where(:Jurisdiction of Occurrence == Juristiction Filter );

dtSub = dtAll << Subset(
	Selected Rows( 1 ),
	Selected columns only( 0 )
);

dtSub << Select where( :Week Ending Date > date dmy( 1,3,2020) );

//Hide new data
dtSub << Hide and Exclude(1);

ts = dtSub << Time Series(
	X( :Week Ending Date ),
	Y( :All Cause ),
	Autocorrelation Lags( 52 ),
	Forecast Periods( 52 ),
	Seasonal ARIMA( 0, 0, 0, 1, 1, 1, 52, Prediction Interval( 0.99 ) ),
	SendToReport(
		Dispatch( {}, "Time Series Basic Diagnostics", OutlineBox, {Close( 1 )} ),
		Dispatch( {"Model Comparison"}, "", ScrollBox, {Background Color( 117 )} )
	)
);

dtPred = (((ts << XPath("//OutlineBox[@helpKey='ARIMA Model']"))[1]) << Get Scriptable object) << Save Prediction Formula;

dtJoined = dtSub << Join(
	With( dtPred ),
	Merge Same Name Columns,
	By Row Number
);

dtJoined << New Column("Excess v Expected", Numeric, "Continuous", Format("Best", 12), Formula(:All Cause - :Predicted All Cause), Set Display Width(100));
dtJoined << New Column("Cumulative[Excess v Expected]", Numeric, "Continuous", Format("Best", 12), Formula(Col Cumulative Sum(:Excess v Expected)), Set Display Width(173));

last row with actuals = max(dtJoined << Get rows where( !Is Missing( :All Cause ) ))-1;
last date with actuals = dtJoined:Week Ending Date[last row with actuals];

dtJoined << Graph Builder(
	Size( 1174, 786 ),
	Show Control Panel( 0 ),
	Show Subtitle( 1 ),
	Variables(
		X( :Week Ending Date ),
		Y( :"Cumulative[Excess v Expected]"n ),
		Y( :Excess v Expected ),
		Y( :All Cause ),
		Y( :Predicted All Cause, Position( 3 ) ),
		Y( :"Upper CL (0.99) All Cause"n, Position( 3 ) ),
		Y( :"Lower CL (0.99) All Cause"n, Position( 3 ) )
	),
	Relative Sizes( "Y", [229 204 387] ),
	Elements( Position( 1, 1 ), Points( X, Y, Legend( 18 ) ) ),
	Elements(
		Position( 1, 2 ),
		Points( X, Y, Legend( 20 ) ),
		Area( X, Y, Legend( 22 ), Interval Style( "Band" ) )
	),
	Elements(
		Position( 1, 3 ),
		Points( X, Y( 1 ), Legend( 13 ) ),
		Line( X, Y( 2 ), Y( 3 ), Y( 4 ), Legend( 15 ) )
	),
	Local Data Filter(
		Add Filter(
			columns( :Week Ending Date ),
			Where(
				:Week Ending Date >= 3471638400 & :Week Ending Date <= last date with actuals
			)
		)
	),
	SendToReport(
		Dispatch(
			{},
			"Week Ending Date",
			ScaleBox,
			{Min( 3462203520 ), Max( last date with actuals + in days(7) ), Interval( "Year" ),
			Inc( 1 ), Minor Ticks( 0 ), Add Ref Line(
				last date with actuals, "Solid", "Black", Format( last date with actuals, "ddMonyyyy"), 1
			), Label Row( Show Major Grid( 1 ) )}
		),
		Dispatch(
			{},
			"Cumulative[Excess v Expected]",
			ScaleBox,
			{Min( 0 ), //Max( 819328.125380856 ), Inc( 200000 ),
			Minor Ticks( 1 ), Add Ref Line( 0, "Solid", "Black", "", 1 ),
			Label Row( Show Major Grid( 1 ) )}
		),
		Dispatch(
			{},
			"Excess v Expected",
			ScaleBox,
			{Min( 0 ), //Max( 26451.6956907074 ), Inc( 10000 ),
			Minor Ticks( 1 ), Add Ref Line( 0, "Solid", "Black", "", 1 ),
			Label Row( {Show Major Grid( 1 ), Show Minor Grid( 1 )} )}
		),
		Dispatch(
			{},
			"All Cause",
			ScaleBox,
			{Min( 0 ), //Max( 90470.7642581624 ), Inc( 10000 ),
			Minor Ticks( 4 ), Label Row(
				{Show Major Grid( 1 ), Show Minor Grid( 1 )}
			)}
		),
		Dispatch(
			{},
			"400",
			ScaleBox,
			{Legend Model(
				22,
				Properties(
					0,
					{Fill Color( 20 ), Transparency( 0.5 )},
					Item ID( "Excess v Expected", 1 )
				)
			), Legend Model(
				13,
				Properties( 0, {Line Color( 21 )}, Item ID( "All Cause", 1 ) )
			), Legend Model(
				15,
				Properties(
					0,
					{Line Color( 32 )},
					Item ID( "Predicted All Cause", 1 )
				),
				Properties(
					1,
					{Line Color( 70 )},
					Item ID( "Upper CL (0.99) All Cause", 1 )
				),
				Properties(
					2,
					{Line Color( 70 )},
					Item ID( "Lower CL (0.99) All Cause", 1 )
				)
			)}
		),
		Dispatch(
			{},
			"graph title",
			TextEditBox,
			{Set Text( Juristiction Filter || " Actual Deaths v Expected Deaths (Excess Deaths)" )}
		),
		Dispatch(
			{},
			"graph 1 title",
			TextEditBox,
			{Set Text(
				"Source: https://data.cdc.gov/NCHS/Weekly-Counts-of-Deaths-by-State-and-Select-Causes/3yf8-kanr  and   https://data.cdc.gov/NCHS/Weekly-Provisional-Counts-of-Deaths-by-State-and-S/muzy-jte6"
			)}
		),
		Dispatch( {}, "X title", TextEditBox, {Set Text( "" )} ),
		Dispatch(
			{},
			"Y title",
			TextEditBox,
			{Set Text( "Cumulative Excess Deaths" )}
		),
		Dispatch(
			{},
			"Y 1 title",
			TextEditBox,
			{Set Text( "Excess v Expected\!NDeaths per Month" )}
		),
		Dispatch( {}, "Y 2 title", TextEditBox, {Set Text( "Deaths per Month" )} ),
		Dispatch(
			{},
			"400",
			LegendBox,
			{Legend Position(
				{18, [0], 20, [1], 22, [6, -3], 13, [2], 15, [3, 4, 5]}
			)}
		),
		Dispatch(
			{},
			"Graph Builder",
			FrameBox(3),
			{Add Graphics Script(
				3,
				Description( "Script" ),
				dt = Current Data Table();

				yMatrix = (dtJoined:All Cause << get values)[54::last row with actuals] |/
				Matrix( Reverse( As List( (dtJoined:Predicted All Cause << get values)[54::last row with actuals] ) ) );
				xMatrix = (dtJoined:Week Ending Date << get values)[54::last row with actuals] |/
				Matrix( Reverse( As List( (dtJoined:Week Ending Date << get values)[54::last row with actuals] ) ) );
				Fill Color( "Medium Dark Green" );
				Transparency( 0.5 );
				Polygon( xMatrix, yMatrix );
			), Grid Line Order( 1 ), Reference Line Order( 3 )}
		)
	)
);

dtFinal << Close Window;
dtProv << Close Window;
dtAll << Close Window;
dtSub << Close Window;
dtPred << Close Window;

 

 

4 REPLIES 4
peng_liu
Staff

Re: A time series (seasonal ARIMA) model for varying variability in different seasons

I cannot be sure this is variation issue or bias issue or both. If outliers are always or mostly always close to the upper side, it seems more like a bias issue. Transfer function model in Time Series platform might be helpful. E.g. use week of the year or month or whatever appropriate time resolution indicator as an Input variable.
If this is indeed a variation issue, some can be addressed in the platform. Box-Cox transformed time series modeling, added during JMP16, is one tool. But here, maybe simpler. I see the desired model doesn't have non-seasonal orders. That should be technically equivalent to fitting 52 separate models by week of the year but constraining on the same variance, I bet. If you just fit 52 models separately, you get different variances. But make sure you always have 52 data points in a year.
peng_liu
Staff

Re: A time series (seasonal ARIMA) model for varying variability in different seasons

I take back the last couple of sentences about fitting separate models. I didn't realize coefficients needs to be constrained across models as well. But seems that Box-Cox transformation is relevant, because the higher variability is tied to higher series value.
ih
Super User (Alumni) ih
Super User (Alumni)

Re: A time series (seasonal ARIMA) model for varying variability in different seasons

I feel like I've exhausted possible box cox transformation and while they improve the prediction (more values in winter are within the confidence interval), but I don't feel like this is going to get me to a model with realistic errors bars in the winter.  Using the platform options in time series didn't seem to help very much but using the cube root of (deaths - 45000) gives a fairly normal distribution. Even using this as an input though there is still way more value outside the confidence interval in the winter than in the summer.

 

Transformed 

ih_0-1634064953549.png

 

Seasonality of residuals, both raw (on bottom) and 'standardized' by dividing them by the confidence interval (top). Note how the top chart looks better, but that is still a prominent pattern.

ih_1-1634067238538.png

 

 

 

peng_liu
Staff

Re: A time series (seasonal ARIMA) model for varying variability in different seasons

Using your transformation, I fit a different model Seasonal ARIMA(4, 0, 4)(1, 1, 1)52, and this is the overlay residual plot I get.

peng_liu_3-1634346672660.png

 

This is what they look like if I connect the dots:

peng_liu_4-1634346714150.png

 

Variances still suspiciously seem clustered by seasons. But there may not be anything that ARIMA can do about it. Here are some thoughts:

  1. This is an aggregated time series, maybe one should model individual causes, then aggregate.
  2. Weekly resolution have problems around New Year. Some years have 52 weeks, others have 53. So seasonality = 52 is not perfect.