ARIMA-based forecasting of the dynamics of confirmed Covid-19 cases for selected European countries

Research background: On 11 March 2020, the Covid-19 epidemic was identif ied by the World Health Organization (WHO) as a global pandemic. The rapid increase in the scale of the epidemic has led to the introduction of non-pharmaceutical c ountermeasures. Forecast of the Covid-19 prevalence is an essential element in the actions u ndertaken by authorities. Purpose of the article: The article aims to assess the usefulness of the Au to-regressive Integrated Moving Average (ARIMA) model for predicting the dyn amics of Covid-19 incidence at different stages of the epidemic, from the first phase of gro wth, to the maximum daily incidence, until the phase of the epidemic's extinction. Methods: ARIMA( p,d,q) models are used to predict the dynamics of virus distribution in many diseases. Model estimates, forecasts, and the accur acy of forecasts are presented in this paper. Findings & Value added: Using the ARIMA(1,2,0) model for forecasting the dy namics of Covid-19 cases in each stage of the epidemic is a w ay of evaluating the implemented nonpharmaceutical countermeasures on the dynamics of t he epidemic.


Introduction
Covid-19 has infected over 7 million people since its appearance, covering 114 countries (status for 8 June 2020). The epidemic began in December 2019 in China. The first lockdown was introduced on 23 January 2020 in Hubei province in China. Efficient models for short-term forecasting are needed to forecast the number of future cases. In this context, it is essential to develop strategic planning methods in the public health system to avoid deaths, as well as to introduce non-pharmaceutical countermeasures, such as ordered school closure, case-base measures, the banning of public events, the encouragement of social distancing, and lockdown, to reduce infection. In Europe, the first non-pharmaceutical countermeasures, including an ordered lockdown, were introduced by many countries between 11 and 24 March 2020. These countermeasures were aimed at reducing the number of people infected with Covid-19 while also reducing the dynamics of the infection and allowing health care services to operate effectively. Disease rate projections allow recommendations on an effective date and the date of withdrawal from government interventions. This issue has been widely presented in previous papers (Flaxman et al., 2020(Flaxman et al., , 2020aGuzzetta et al., 2020;Rogers, 2020;Patwardham, 2020;Mena, 2020;Marsland III & Mehta, 2020;Pai, 2020;Azad & Poonia, 2020;Iacus et al., 2020;Kumar, P. et al., 2020;de Wolff et al., 2020;Radiom & Berret, 2020;Ainslie et al., 2020). In European countries, the first restrictions began to be introduced very early in some countries, like Switzerland (5 March), and much later in other countries, such as Russia. In some countries, there were no restrictions, such as in Belarus. In previous studies (Grassly et al., 2020;de Wolff et al., 2020), the authors pointed out the critical role of testing strategies, as different countries have adopted different testing models. This fact should also be taken into account when considering disease dynamics.
The aim of this article is to evaluate the usefulness of the ARIMA (1,2,0) model for predicting the dynamics of Covid-19 cases at each stage of the epidemic, i.e., at the first stage of development, at the stage of reaching the maximum number of daily cases, and at the stage of the epidemic's extinction. The choice of such models resulted from the cumulative confirmed cases of Covid-19 and was also confirmed by diagnostic measures of the model. The remainder of this paper is as follows. In the first section, the review of the literature shows examples of the use of ARIMA models and their modifications for forecasting epidemics. The research methodology section contains a description of the procedure for selecting parameters of the ARIMA(p,d,q) model using the ADF test and AIC information criterion, as well as a discussion of forecasting errors. We also carry out data characterization. The Results section includes an evaluation of the usefulness of the ARIMA (1,2,0) model for forecasting the disease dynamics using the example of 32 European countries for 6 time moments for 7 days. The last section concludes the study.
A comparative analysis of forecast accuracy indicated the advantage of ARIMA models over the wavelet neural network or the support vector machine (see Zhang et al., 2019). In Singh et al. (2020), the advantages of a hybrid model of discrete wavelet decomposition and ARIMA were indicated. Similarly, in Chakraborty and Ghosh (2020), the modified ARIMA-WBF model was used. In the work of Fong et al. (2020), 11 methods of machine learning (deep learning) were compared to ARIMA in the forecasting of epidemics, and no clear advantage of any of the machine learning methods was found. Similar studies for machine learning modeling can be found in previous paper (Tuli et al., 2020;Magri & Doan, 2020). In contrast, Chimmula and Zhang's work (2020) points the greater usefulness of the LSTM approach compared to ARIMA models, but it points out that ARIMA has been used for many years, while only the first attempts have been made to use the LSTM approach. LSTMs were also used in previous work by Yan et al. (2020) and Yudistira (2020). Wu et al. (2020) presented a dynamic model of the spread of the epidemic from Wuhan (China) to other Chinese cities and beyond China. Spatial dynamics forecasting was presented by Wang et al. (2020) for the USA, Azad and Poonia (2020) for India, and Kevrekidis et al. (2020) for Greece and Andalusia (Spain). Bandt (2020) presented simple statistical indicators to assess the turning point of an epidemic. The work by Radio and Berret (2020) identified three types of model for each phase of the epidemic. The Bayes approach is presented in work by Calvetti et al. (2020) for selected US counties.
On many websites, real-time systems that re-estimate models and forecasts with daily frequency for all countries of the world can be found, for example, in the paper by Tarassow (2020). The mentioned publications show the importance of two elements: the accuracy of forecasts and the simplicity of the used models.

Research methodology
The ARIMA(p,d,q) model is a classic time series model and is determined by three parameters. The parameters p and q are the lag order in the AR(p) component and the MA(q) component, respectively, while d is the differentiation level (Box et al., 2015). The ARIMA(p,d,q) model has the form: The differentiation parameter of process Y t was set at d = 2, which derives from two arguments. The cumulative number of confirmed cases of Covid-19 was analyzed, where the first differences ΔY t =Y t -Y t-1 indicated the daily number of infections. The second difference is due to the nonstationary variance for ΔY t , which was indicated by the ADF test results (Dickey & Fuller, 1981). Moreover, for the Δ 2 Y t process, a correlogram (ACF and PACF) was estimated in order to initially assess the order of magnitude of the delay for AR(p) and MA(q) polynomials. The final choice of parameters for d = 2 and p,q = {0,1,2,3} was made using the Akaike Information Criterion (AIC). All calculations were done in the open-source software gretl (Cottrell & Lucchetti, 2020;Baiocchi & Distaso, 2003). (1)

Data source
A broad overview of Covid-19 databases is presented in the paper by Alamo et al., 2020. Recommended databases are maintained by the Johns Hopkins University Center for Systems Science and Engineering (JHU CCSE). This database contains information on confirmed cases, deaths, and recovered cases for more than 250 countries/provinces since 22 January 2020 with a daily update. The JHU CCSE database is made available to DBnomics (db.nomisc.world), which enables automatic data downloading by gretl software. Figure 1 presents the cumulative number of confirmed cases (Y t ) and Figure 2 presents the daily cases (ΔY t ) of Covid-19 for a selected European country for the period 1 February 2020 to 24 May 2020.

Results
The 32 European countries with the highest infection levels were selected for analysis (a full list of countries is presented in Table 1). The first confirmed cases of Covid-19 were reported in France (24 Jan), Finland (29 Jan), and Italy, the UK, Sweden, and Russia (31 Jan), and the last were reported in Turkey (11 Mar) and Bulgaria (8 Mar). For each country, 6 ARIMA models were estimated, which differed in terms of their sample periods. The starting date was the first date of a confirmed case, and the end date was 15 Mar, 29 Mar, 12 Apr, 26 Apr, 10 May, or 24 May, i.e., each sample period was extended by 14 days. The number of observations (T) for the first period from the beginning of the epidemic in a given country until 15 Mar was between T = 52 days (France) and T = 5 days (Turkey). Figure 1 presents the cumulative number of confirmed cases of Covid-19 for a selected European country in the period from 1 Feb 2020 to 24 May 2020, while Figure 2 shows the first differences.
For a vast majority (over 80%) of the estimated ARIMA(p,2,q) models for p, q={0,1,2,3}, the minimum value of the AIC criterion pointed to the ARIMA(1,2,0) model. Other model parameters (p≠1, q≠0) occurred in models with small samples, or outliers, or situations when the forecast cumulative number of cases decreased, which is in contrast to the epidemic theory (usually obtained when the epidemic was extinguishing). This also refers to the model for France (for the period 24 Jan-12 Apr), where, on 5 Apr, confirmed cases were corrected (decreased) by over 25,000 cases. The above automatic selection of parameters is consistent with the results obtained in previous studies (Beneventuto et al., 2020;Ceynan, 2020;Chintalapudiet et al., 2020;Tarrasow, 2020) and is the basis for further analyses and forecasts. For each model, a forecast of 7 days was calculated. The forecasts and their errors are presented in Figure 3. Table 1 presents estimated models with the sample (Start, End), number of observations (T), coefficient estimates (const (α), phi_1 (φ1)), standard error (sigma), R-squared (R2), and the following forecast errors: the mean absolute percentage forecast error for horizon one day and 7 days (MAPE(1) and MAPE(7)) and the mean forecast error for 7 days (ME (7)).
An essential feature of the ARIMA(1,2,0) is modeling the Δ 2 Y t process by the autoregression model of order 1. In the parameters of the estimated equations, for all models, φ 1 satisfies the condition of the AR process stationarity (|φ 1 |<1). For most models, it has a negative value, which means that the estimated predictions will oscillate (sinusoidally) to the expected value of the process. The constant term (const) has a meaningful interpretation. The value of the constant parameter indicates a daily increase in new cases. A comparison of this parameter for subsequent samples reveals the direction of dynamics in the number of new cases. For example, the constant terms for the following countries are as follows: -For Poland (POL), Ukraine (UKR), and Belarus (BLR), the value of the const parameter indicates that there has not yet been a decrease in the number of cases, i.e., the epidemic will be spread over time; -For the United Kingdom (GBR) and Turkey (TUR), the value of the const parameter is high but lower than in previous sub-periods; -For Russia (RUS), the value of the const parameter is the highest among European countries, and the increase in the number of cases is significant. The evaluation of the accuracy of the forecasts indicates two aspects: the usefulness for governments to determine the effectiveness of nonpharmaceutical countermeasures and the location of the disease curve. During the initial period of the epidemic, when a rapidly growing number of cases is observed, one can notice the underestimation of forecasts (ME < 0); the forecasting errors are high (over 5%-10%). For the extinction stage of the epidemic, the forecast errors are lower (less than 2%), while ME > 0 (overestimated forecast). For Bulgaria (BGR), Belarus (BLR), the United Kingdom (GBR), Poland (POL), Romania (ROU), Russia (RUS), Sweden (SWE), Turkey (TUR), and Ukraine (UKR), we can observe a continuous and significant increase in the number of cases (status for 24 May 2020), that is, the extinction of the epidemic is still not coming and will last for a longer time.

Discussion
ARIMA(1,2,0) was used to assess the dynamics of the epidemic, although in some countries for different sub-periods, the AIC criterion indicated different parameters. An alternative set of parameters usually resulted from single outlier observations, which negatively affected the accuracy of forecasts (explosive prognoses). For the period of the epidemic's extinction, ARIMA models with the parameters p={2, 3} caused a decrease in the cumulative number of forecast cases, which is contrary to the theory. Similar issue has appeared in work of Perone (2020).
For some sub-periods, in several countries, the effect of weekly periodicity could be observed, but this is only the result of the ARIMA model's approximative adjustment to the time series. Therefore, at the stage of assumptions, the inclusion of the periodical component was rejected.
Some publications indicate that ARIMA models are useful only for short-term forecasts. However, the ARIMA(1,2,0) model for the assessment of cumulative case dynamics and parameter analysis is highly useful.

Conclusions
The article presented the usefulness of the ARIMA(1,2,0) model for predicting the dynamics of COVID-19 cases at different stages of the development of the epidemic, i.e., at the first stage of development, at the time when the maximum number of daily cases is reached, and at the stage of the epidemic's extinction. ARIMA(1,2,0) models were estimated for 32 European countries for six samples, and forecasts for 7 days were made for each sample. The obtained results (parameter estimates) can be interpreted and compared between countries and, more importantly, between different stages of the epidemic.
The results of Covid-19 forecasts using the ARIMA(1,2,0) model should be addressed in further studies in terms of the roles of two elements limiting the number of cases: non-pharmaceutical interventions and population testing policies. Moreover, the evaluation should also concern the impact of non-pharmaceutical interventions on economic aspects. The first adverse effects of the pandemic on the economy are presented in a number of previous papers (Iacus et al., 2020;Karina et al., 2020;Centeno & Marquez, 2020;Narajewski & Ziel, 2020).
By employing Covid-19 databases that are updated daily (by DBnomics), models and forecasts can be re-estimated daily, which is also indicated by (Benvenuto et. al., 2020). That is why ARIMA models can be viewed as an immediate and straightforward system for monitoring the epidemic at national and regional levels.