Forecasting Singapore’s Retail Sales Index

Keywords: R, random walk, holt-winters, ETS, STL, ARIMA, dynamic regression, bootstrapping

This project is hosted on GitHub. It was done as part of the requirements for the module HE3022 Econometric Modelling and Forecasting at NTU, along with my group members Heena Agarwal, Gopal Agarwal, and Manasi Murali.

The Retail Sales Index is an important indicator of the health of an economy. In this project, we used various time series forecasting methods in R to forecast Singapore's Retail Sales Index. The project consisted of visualising the dataset, fitting it with various models, diagnosing the residuals and comparing the different models. The best models were selected to make an ensemble model with custom weights, and used to make predictions for the next five years starting 2020. Bootstrapping was used to add confidence intervals to the predictions.

A major part of the project was on fitting the dataset with established forecasting models: random walk, exponential smoothing (Holt-Winters, ETS, STL-ETS), and auto-ARIMA. The residuals of the fitted values were plotted, and diagnostics were performed to check that the models were efficient and incorporated all the available information. Ideally, the residuals should be white noise, and be normally distributed.

The most interesting part of the project was building regression models for the RSI. The first step was finding appropriate predictors: we selected the per capita GNI, number of residents in Singapore, the average CPF contribution rates, total Certificate of Entitlement (COE) quota, and dummy variables for months. The datasets were downloaded and cubic spline interpolation was used to convert the datasets to the required frequency.

We first fit a simple linear regression model. Residual diagnostics showed that although the model didn’t suffer from the problem of endogeneity, it did suffer from a problem of autocorrelation. So, we decided to move to a dynamic regression model, where the errors were assumed to be modelled by an ARIMA process, selected using auto ARIMA. This fixed the problem of autocorrelation in the residuals. We also tried running a dynamic regression model with ARIMA errors and lagged predictors (as we hypothesized that the effect of a change in the COE quota is only felt after a few months), but the previous model performed better.

The three best models out of the ones we tested were auto-ARIMA, Holt-Winters multiplicative method, and STL-ETS method. Although the dynamic regression model performed well ex-post, it didn’t hold its ground predicting ex-ante, as the errors in predicting the predictors themselves were propagated to the final results. We created an ensemble model using the three best models, with custom weights to either discount or strengthen a particular model’s predictions based on the outlier effects of COVID-19 on each model.

As the predictions from the ensemble model lacked prediction intervals, we had to innovate. We used bootstrapping to generate similar time series as our dataset, and used the ensemble model on these generated values. Different percentiles of the results were taken to generate the confidence intervals.

All of this was put together, and we used our ensemble model to forecast RSI values for the next 5 years starting 2020. The results seemed promising, and they showed a slow but steady recovery from the effects of the COVID-19 pandemic.

Previous
Previous

Generating Insights from LinkedIn Profile Data

Next
Next

Predicting Credit Card Churn