Choose Language Hide Translation Bar

Which Forecasts Should I Use? An Add-in for Comparing ARIMA and State Space Smoothing Models (2021-US-30MP-845)

Level: Beginner

 

Jacob Rhyne, Sr Analytics Software Tester, JMP

 

The time series platform in JMP enables users to explore time series data and to forecast future observations. Two popular model classes, ARIMA models and State Space Smoothing models, are both available in JMP 16. JMP users may want to fit a set of ARIMA models, as well as a set of State Space Smoothing models, and choose one model for forecasting. However, choosing between an ARIMA and ETS model can be a difficult task. While models of the same class can be compared to each other, ARIMA and State Space Smoothing models cannot be directly compared using a model selection criterion such as AIC. An add-in to help users choose between candidate ARIMA and State Space Smoothing models will be presented. The add-in enables JMP users to compare model classes using time series cross-validation or to evaluate candidate models using a test set. After giving an overview of ARIMA and State Space Smoothing models in JMP, this presentation demonatrates how the add-in can be used to recommend which model to use for forecasting.

 

 

Auto-generated transcript...

 


Speaker

Transcript

  Hello, thank you for taking the time to view this. My name is Jacob Rhyne. I'm a software tester for JMP. Today I'm going to share with you an add-in that I wrote help you choose between an ARIMA model or State Space Smoothing model in the Time Series platform in JMP.
  So I begin with just a brief overview of fitting ARIMA and State Space Smoothing models in the Time Series platform.
  I'll introduce the need for an add-in to help choose between these two model classes. And give an overview of time series crossvalidation, which is the main tool of the add-in. Then I'll end with the demo of the add-in using JMP 16.1.
  The Time Series platform in JMP allows users to fit many different time series models. The focus on this presentation will be ARIMA and State Space Smoothing models and I'll show how to fit both of these using this Steel Shipments sample data.
  The Steel Shipments sample data contains monthly totals of US steel shipments from 1984 through 1991. And, I'll explore this with the Time Series platform in the Specialized Modeling menu.
  The Steel Shipments column is my Time Series. The date column is my Time ID. Let's suppose that I'm interested in forecasting the next two years of steel shipments. So, since this is monthly data I need 24 forecast periods. And I'll click OK, to launch my time series report.
  You can see that I have a plot at the time series some basic statistics of the series, as well as some basic time series diagnostics.
  I'd like to fit an ARIMA model to this series, and when I fit an ARIMA model I need to decide, do I need to take any kind of transformations or differencing. So I'll use the Difference option in the Time Series menu to explore this.
  I'll start with a Nonseasonal differencing of order one, and click estimate.
  Now I have a plot of the difference series. It's looking better in terms of being more stationary, but I still do have some seasonality here. And, I can tell this by looking at the diagnostics. I go down to lag number 12.
  I can see that auto correlation value is high and the partial auto correlation is also a little bit high for lag 12. So, this suggests that I should fit a seasonal ARIMA model with nonseasonal and seasonal differencing.
  I'm going to close out this difference report just to keep my report clean. And, I'm going to explore fitting some different ARIMA models, with the ARIMA model group option.
  So I've decided I'd like to fit a seasonal ARIMA model with nonseasonal diferencing order of one and a seasonal differencing order of one. But, for the autoregressive and moving average orders I'm going to let them either be zero or one.
  So when I click estimate JMP is going to fit these 16 different seasonal ARIMA models. I can scroll down to the model comparison window and view these models. The models are sorted in descending order by AIC so I'll select the model that minimize the AIC.
  I can scroll down take a look at the model and the report. I can see that it is stable and invertible. I could look through the parameter estimates table. Take a look at the forecast and the forecast intervals.
  Let's suppose I've looked through all of these diagnostics and I'm satisfied with this seasonal ARIMA model as a candidate to consider for forecasting.
  So I'll remove these other seasonal ARIMA models from the report.
  And say, I also want to consider using a State Space Smoothing Model to create a forecast I can do this by using the State Space Smoothing option from the time series menu.
  I need to say the period is 12 because this is monthly data, and I will just fit all of the recommended State Space Smoothing Models for this time series.
  So I'll click OK to run that if I scroll down to the model comparison report, I can see that JMP has fit all of the recommended State Space Smoothing Models. They're once again sorted in descending order by AIC.
  So the model that minimized AIC was a multiplicative error model with no trend and multiplicative seasonality. So, I'll select this model
  in the model comparison outline and scroll down to take a look at the model fit.
  So I can look through the model summary I can look at the parameter estimates table.
  You can view the forecast and the forecast intervals. Let's say that I've looked through all of these diagnostics and I'm happy with this State Space Smoothing Model as a candidate to consider for forecasting.
  So, I'll remove the other State Space Smothing Model from my models and from my model comparison window.
  So now, the question is how do I choose between this ARIMA model that I'm considering and the State Space Smoothing model that I'm considering for forecasting.
  You can see that I have this read text warning that tells me likliehood-based criteria are not comparable between the different model classes.
  And what this means is I cannot simply scroll over and view the AIC and say, since the AIC is lower, for the ARIMA model I'm going to use it for forecasting.
  That's not a valid decision. So, what do I do? Well, there is an option available in JMP already. If your data is long enough.
  to have a holdback set, you can use the forecast on holdback option when launching the Time Series platform.
  And what this would do is it would fit all of the time series models you used to the training set. And, in the model comparison window instead of sorting the models by AIC,
  it will sort of models by the root mean squared error or RMSE of the holdback set and you would choose the model which minimise RMSE of the holdback set.
  However, if your data is not long enough to use a holdback set, then we can't use the forecast on holdback option, and we need another method to help us choose between an ARIMA model or State Space Smoothing model.
  And to do this.
  This is why I made the add-in. So, the add-in helps you choose by using something called time series crossvalidation.
  Time series crossvalidation essentially breaks your time series down into multiple training and testing sets where any successive group you expand the training set by one row.
  I'm going to give a detailed example of time series crossvalidation momentarily, but I would like to point out, I didn't invent this method. If you'd like to learn more about it,
  after viewing this talk, this is a resource that I would recommend forecasting principles and practice by Dr. Rob Hyndman.
  But let's look at an example of time series crossvalidation.
  Let's say I have data for the North American box office results for January through July of 2021 because I'm a movie fan.
  And I'd like to forecast the next six weeks at the box office using either an ARIMA model or State Space Smoothing model. And, to decide which forecast to use I'd like to use time series crossvalidation.
  As I mentioned time series crossvalidation will subset your data into groups of training and test sets where you expand the training set by one row and each successive group.
  So I need to decide how many rows do I want in my initial training set.
  By default my add-in will make you have six time series crossvalidation groups. So that will give you a suggested number of default rows to use in your initial training set, but you're welcome to change it, if you want to.
  I'm going to keep the default of 17 initial rows in the training set for this example.
  When you launch the time series crossvalidation add-in is going to give you a standard time series report. But, it's also going to append a outline box for your time series crossvalidation controls at the bottom of the report.
  So here, you have to specify the number of initial rows, which is the number of rows in your first training set, the forecast length you're interested in, and the period if you're going to fit seasonal models.
  So I'm going to keep the default of 17 rows in the training set that's going to allow me to have six different time series crossvalidation groups.
  I'll have a forecast length of six because I want to forecast the next six weeks at the box office. And, here I'm going to keep the period equal to missing because I'm not going to fit seasonal models for this particular data.
  So, let's explore the first time series crossvalidation group.
  You can see that I have 17 rows in my first training set and the next six rows are my test set.
  What my add-in is going to do is it's going to fit whatever ARIMA models I asked it to to the training set. And it's going to fit the recommended state space smoothing models to the training set.
  It's going to select the best model of each class by AIC and take my to selected models forecast into the test set and evaluate the forecasting error.
  The second time series crossvalidation group is going to expand the training set by one row so now my training set is
  composed of rows one through 18 and the next six rows are my test set. And then I repeat the process that I did in the first time series crossvalidation group.
  I fit whatever requested ARIMA models were requested by the user and fit the recommended State Space Smoothing models. Select the best model by the AIC forecast into the test set and store the forecasting error.
  And I continue doing this until I've used all of my data. So, the final time series crossvalidation group for this example would use rows one through 22 for the training set, and then use the rest of the data, the final six rows as the test set.
  So I mentioned that I'm storing the forecasting error for every time series crossvalidation group.
  What the add-in does is it reports overall RMSE, MAE, and MAPE for the two model classes and it gives you a plot of the RMSE for every time series crossvalidation group.
  And you can use this table and this plot to help you decide, should you fit that every model for forecasting or state space smoothing model forecasting.
  Let's take a look at the add-in in action with this data.
  So I'll open my box office data that I got from box office mojo and I'm just going to run this script which just launches the Time Series platform for this data, because I just want to take a look at the time series.
  I'm forecasting, the average gross at the box office by week and I can see that as the weeks go on the variance and the average gross gets larger and so I'd like to take a log transformation
  to the average gross. I can do that by creating a log formula column for the average gross but
  it's not necessary, the add-in gives you the ability to do a Box Cox transformation, just like the standard Time Series platform. So, I will launch my add-in, click past the disclaimer. I want to forecast the average close the box office by week for the next six weeks.
  and I want to take a log transformation to the average gross, so I'll use a Box Cox transformation with Lambda equal to zero.
  And I will click OK.
  So this launches the standard time series report for the Box Cox transformed response. I could fit any kind of time series models, I wanted that are available in JMP with this red triangle menu.
  But instead I'm going to go down to the bottom of the report and use time series crossvalidation to get a recommendation as to whether I should fit and an ARIMA model or a state space smoothing model.
  I'll keep the initial rows at 17, because that will get me six time series crossvalidation groups that I just showed in the previous example.
  I'll keep a forecast length of six because I want to forecast the next six weeks at the box office and I'm going to keep the period equal to missing for this example.
  So I'll click run and now I have to say what ARIMA models would I like to fit in every time series crossvalidation group. So, I decide I'd like to use nonseasonal differencing of one but I'll let autoregressive and moving average orders be either zero or one.
  So when I click estimate what's going to happen is every time series crossvalidation group it's going to fit these four ARIMA models and the recommended State Space Smoothing models.
  So I click estimate, what the add-in is doing is is expanding this data set into my six time series crossvalidation groups fitting the requested models to the training set forecasting into the test set and storing the forecasting error.
  So I'll scroll down and view the results. You can see from the table that overall the ARIMA models produced lower RMSE, MAE, and MAPE than the State Space Smoothing Models. And, if I look at the plot
  I can see that the ARIMA models produced lower RMSE for every time series crossvalidation group.
  So this add-in would recommend using an ARIMA model to create forecasts for this data, instead of a state space smothing model. So I can go up to the red triangle menu and go ahead and fit an ARIMA model with the ARIMA model group option.
  Let's see another example of the add-in, and to do this next example I'm going to use a subset of the MC3 Quarterly sample data.
  The MC3 Quarterly sample data contains data for a little bit over 100 different time series from the MC3 modeling competition. The series are all quarterly.
  I think this is an interesting data set just because it's got so many different time series in it. However, just for time reasons in this talk I just took a random subset of three of the series from the table.
  So I will launch my add-in.
  And set up the launcher my Y column is my time series, time is the time ID. And, since I have data for multiple series I need to use the by option for the series column. Let's again say that I want to forecast the next two years, so I'll enter eight forecast periods and click OK.
  This brought up of the time series report for every different series in my table as you would expect. And I also have a time series crossvalidation controls for each series in the table as well.
  So I could use time series crossvalidation for each of these series to get a recommendation on using an ARIMA model or state space smoothing model for forecast.
  In the interest of time I'll only do it for the first series. I'm going to use a period of four because I would like to fit seasonal models for this series. I'll click run.
  So I will use a nonseasonal differencing order of one, but all of the other autoregressive moving average orders and the seasonal differencing order I'll let range from zero or one.
  Set observation period to four because, again, this is quarterly data. So what's going to happen when I click estimate is it's going to fit these 32 ARIMA models
  in each of the time series crossvalidation groups. So the add-in is going to where the series is N 691.
  It is expanding it into six different time series crossvalidation groups. It's fitting my requested ARIMA models to the training set. The recommended state space smoothing models to the training set.
  Selecting the best model by AIC. Forecasting into the test set and storing the forecasting error.
  I'll give it a second to finish.
  Here we go.
  And I can scroll and view the results. So, as you can see from the table, the ARIMA models produced lower RMSE, MAE, and MAPE than the state space smoothing models.
  And from the plot, I can see that in all but the last time series crossvalidation group the ARIMA models had lower RMSE
  than a state space smoothing model, and it was pretty much the same in the last group.
  So once again it's time series crossvalidation and would recommend using an ARIMA model to create forecast for the series N 691. And, you can go to the other different series and repeat the process and for those it might recommend a state space smoothing model.
  So I'll end the demo by showing another feature of the add-in is actually not related to time series crossvalidation.
  I've also given you the ability to use a test set to evaluate forecast with the add-in, and I'll show this using the Steel Shipments sample data again. So I will launch my add-in. Okay, I want to forecast the steel shipments by date.
  Let's say I want to forecast the next two years, so take 24 forecast periods and, as I mentioned before, if your series is long enough, for a holdback set you can use the forecast on holdback option
  to select models. So, since I'm doing that I don't really need to use time series crossvalidation. So, I can use the training test set option
  to have an independent test set to evaluate the model that I choose. I'll decide I want the length of my test set to be 12.
  And what this means is it's going to take the last 12 rows of my data and completely keep that out of my analysis, so that I can use it to evaluate how my forecasting performed. So I will click okay.
  And you can see that what's happened is I have added another column to the table. It's just an indicator column for the test set where it's equal to one if the row is included in the test set and zero if it's not.
  And it's only going to fit these time series models to rows where the test set is equal to zero. This allows me to preserve independence of the test set.
  So, let's fit some ARIMA models to this data. I use the same controls that I did at the start of this talk with a seasonal and nonseasonal differencing order of one but let every other order range from zero to one.
  So, I click estimate to run these 16 different seasonal ARIMA models and I'll also fit the recommended states space smoothing models.
  So, I will scroll down to the model comparison window, and I can see that the model which minimize the root mean squared error in the holdback set was the State Space Smoothing model with additive error no trend an additive seasonality.
  Let's say that I'm also just interested in how the best performing ARIMA model would perform in the test set. I can select it as well.
  And I'll remove the other models that I'm no longer considering for forecasting.
  So, again I've used the forecast on holdback option, so I would choose this state space smoothing model to use for forecasting but I'm just curious how the ARIMA model would do in the test set.
  So I can scroll down to my test set analysis box. And, what this is going to do, since I used the forecast on holdback option is it's going to refit both of these selected models to the combined training and holdback data to get the updated parameters.
  And then it's going to forecast into the test set and evaluate how the forecast performed compared to the actual test set data.
  So, I'll click Run to perform that analysis. And, I can see I have a table of the test set RMSE, MAE, and MAPE, and a plot of the test set compared to the forecast.
  I can see that not only did the State Space Smoothing Model perform better in the holdback set it also performed better in the test set. So this test set analysis option just gives you a tool to evaluate how your model fit performed.
  So, just to sum up. The add-in allows you to perform time series crossvalidation.
  I think this add-in could be useful to you if you have a time series data that's not long enough to use the forecast on holdback option that's in JMP but you still want to consider using different model classes.
  The add-in also gives you the ability to use a test set to evaluate forecast. And, just as a disclaimer, as with all model selection measures for time series data, the add-in does not guarantee more accurate forecasts.
  Thank you very much for your time. I hope you try out the add-in, and I hope you enjoy the rest of Discovery. Thank you very much.