FORECASTING INFLATION IN INDONESIA USING THE AUTOREGRESSIVE INTEGRATED MOVING AVERAGE METHOD

Indonesia faces significant economic challenges, particularly inflation, which affects the economic, social, and cultural sectors. High inflation can exacerbate poverty, alter consumption patterns, and contribute to social injustice, whereas low inflation can enhance national income and stimulate economic activities. Given its fluctuating nature, inflation in Indonesia requires accurate forecasting to inform policy-making and economic decisions. This study aims to forecast inflation in Indonesia for the next eight months using the Autoregressive Integrated Moving Average (ARIMA) method. Monthly inflation data from January 2020 to April 2024 obtained from Bank Indonesia were analyzed. The ARIMA model, suitable for short-term forecasting, was selected due to its ability to handle data trends, non-stationarity, and noise filtering. The Augmented Dickey-Fuller (ADF) and Kwiatkowski-Phillips-Schmidt-Shin (KPSS) tests to ensure stationarity. Initial ADF tests showed the presence of a unit root in the original data and the first differencing data, but data became stationary after the second differencing. The KPSS test confirmed a unit root in the original data and trend stationarity after the second and third differencing. Ordinary Least Squares (OLS) regression on the original data revealed a significant time trend, indicating deterministic trends. The optimal model identified was ARIMA(0,2,1) with AIC=51.81, as it met the criteria for normality, independence, and zero mean of residuals. This model effectively forecasts inflation from May to December 2024, which showed an increase with inflation values of 3.02, 3.05, 3.07, 3.10, 3.12, 3.14, 3.17, and 3.19.


INTRODUCTION
Indonesia is a developing country that cannot be separated from economic problems like inflation.Inflation refers to the broad and ongoing rise in the prices of goods and services, which results from a decline in the value of currency over a specific period (Bank Indonesia, 2023).The consequences of high inflation can affect economic, social, cultural and other fields, such as increasing poverty levels (Mardiatillah et al., 2021), changing people's consumption patterns (Efendi et al., 2020), and giving rise to social injustice (Mulyani, 2020).On the other hand, low inflation can increase national income and interest in working, saving and investing (Fajri, 2022), thereby improving the economy.
Inflation is the leading indicator of a region's economic stability.Inflation in Indonesia experiences fluctuations from year to year.Inflation can be linked to economic growth.As stated by Indonesia's president, Joko Widodo, the country has experienced swift economic recovery, demonstrated by consistent quarterly GDP growth since the second quarter of 2021 (Indonesian Cabinet Secretariat, 2023), with economic growth from quarterly GDP growth since the second quarter of 2021.High economic growth can encourage inflation because aggregate demand exceeds production capacity increase.High inflation can affect the increase in prices of goods and services.So, inflation needs to be predicted so that policy makers and economic actors can take appropriate steps to control inflation and create a conducive economic environment.
Future inflation figures can be predicted using time series analysis.One time series analysis method that can be used to forecast future inflation data is the Autoregressive Integrated Moving Average (ARIMA).The ARIMA method can be used to deal with data that shows a linear trend with an autoregressive (AR) component (Asrirawan et al., 2022), to deal with non-stationary data with an integrated (I) component (Qadrini et al., 2021).It can filter random fluctuations (noise) through the moving average (MA) component (Stockhausen & Fogerty, 2007).Apart from that, ARIMA is very good for short-term forecasting and generally requires a minimum of 50 to 100 data to fit the model.At the same time, it is unsuitable for long-term forecasting (Mahayana et al., 2022).
Numerous studies have explored inflation dynamics and the application of time series models, in forecasting inflation.Sekine (2001) used a moving average (MA) model to forecast inflation in Japan.The study calculates the inflation function and forecasts one-year ahead inflation for Japan.Tchakondo (2022) proposed a simple autoregressive (AR) model for forecasting inflation in Togo, West Africa, finding that an AR(1) model effectively forecasts inflation using annual percentage change data in the consumer price index (CPI) from 1967 to 2019.While extensive research covers the application of AR and MA in inflation forecasting, a gap still needs to be in integrating these aspects within Indonesia's unique economic environment.This study aims to combine autoregressive, integrated and moving average processes in bridging this gap by using the ARIMA method to forecast inflation based on past periods in Indonesia for the next eight months.

MATERIALS AND METHODS
The data used is secondary data, namely time series data of the month-to-month inflation rate in Indonesia from January 2020 to April 2024, obtained from the Bank Indonesia website, www.bi.go.id.The data analysis method in this study used the Autoregressive Integrated Moving Average (ARIMA).

Autoregressive Integrated Moving Average (ARIMA)
Autoregressive Integrated Moving Average (ARIMA) is a time series analysis method used to forecast data that moves in a specific pattern.The ARIMA method consists of three main components, namely autoregressive (AR), integrated (I), and moving average (MA).The AR component considers the relationship between observed values at the previous and current times.Component I can change data into stationary data (average and constant variance over time) by taking the difference between the currently and previously observed values.The MA component can calculate the relationship between the residual or prediction error at a previous time and the residual value at the current time.In the AR model, forecasting is performed using a linear combination of past values, whereas in the MA model, it relies on a linear combination of past residuals.
In general, the ARIMA model is expressed as ARIMA(p, d, q), where p is the autoregressive (AR) order, namely the number of lags (lagging levels) of previous data used in the model, d is the integrated order, namely the number of differencing transformations required to makes the data stationary, and q is the moving average order, namely the number of residuals or previous prediction errors used in the model.In ARIMA, some assumptions must be met, namely that the data is stationary regarding the mean and variance.The ARIMA model can be expressed in equation form as follows: (1) where   is the time series at time t,  1 ,  2 , … ,   are the autoregressive coefficients (AR),  1 ,  2 , … ,   are the moving average coefficients (MA), and   is the residual at time t.The p, d, and q values can be selected by exploring and analyzing ACF and PACF to determine which lags are significant and included in the ARIMA model.The steps in ARIMA model are as follows: 1. Identification of data plot.2. Stationarity testing using ADF and KPPS tests.If the results of one of these tests show that the data is not stationary, then continue to stage 3, whereas if the results show that the data is stationary, continue to stage 4. 3. Differencing until the data are stationary.4. Identify temporary models using ACF and PACF plots. 5. Estimation of ARIMA model parameters.6. Residual diagnostics checking.7. Forecasting.

Stationarity
Stationarity is the most essential thing in time series analysis.Times series data is stationary if the average does not show a systematic upward or downward trend, the variance does not grow significantly, and the covariance (in the multivariate case) remains stable over time.Time series data that is not stationary may have a unit root or deterministic trend.Unit root refers to a stochastic process in a time series where the impact of a shock (disruption) is permanent and does not disappear over time.Data that has a unit root shows a random walk pattern and long-term instability that makes it difficult to forecast In contrast, data that does not have a unit root is assumed to have fluctuations around a temporary deterministic trend that is not indicative of long-term instability.Methods that can be used to test stationary data are the Augmented Dickey-Fuller (ADF) test and the Kwiatkowski-Phillips-Schmidt-Shin (KPPS) test.

Augmented Dickey-Fuller (ADF) Test
The Augmented Dickey-Fuller (ADF) test, a statistical tool in time series analysis, is designed to test whether a variable has a unit root and the variable is a non-stationary time series or follows a random walk equivalently.The Dickey-Fuller (1979) test involves fitting the model: where   is the value of the time series at time t with  0 = 0,  is the constant,  is the coefficient of the lagged level of the time series ( −1 ),  is the coefficient of the time trend, and et~NID(0,   2 ).However, the regression is likely to be disturbed by serial correlation.To control for this, the augmented Dickey-Fuller test instead fits the form model (Dickey & Fuller, 1979): where k is the lag order of the autoregressive process.The ADF test incorporates three types of linear regression models as in Table 1.They differ in whether the null hypothesis includes a drift term and whether the regression used to obtain the test statistics includes a constant term () and time trend ().The ADF test statistics is defined as: where  ̂ is an autoregressive coefficient estimation of the ARIMA model(p, d, q) and se( ̂) is a standard error for each type of linear model.Equation ( 4) can be compared to the corresponding critical value for the Dickey-Fuller test.If the calculated test statistic is lower (more negative) than the critical value, the null hypothesis of  = 0 is rejected, indicating the absence of a unit root.

Kwiatkowski-Phillips-Schmidt-Shin (KPPS) Test
The Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test evaluates the null hypothesis that a series is stationary around a deterministic trend (trend-stationary) against the alternative hypothesis that the series possesses a unit root (non-stationary).Let   is an observation whose stationarity will be tested.Assume the series is expressed as the sum of the deterministic trend, random walk, and stationary error as follows (Kwiatkowski et al., 1992): where   =  −1 +   ;   is a random walk with  0 as a fixed and   ~(0,   2 ).The stasionarity hypothesis is that the random walk has zero variance or   2 = 0. Since   is assumed to be stationary, so the null hypothesis   is trend-stationary.We consider the case of equation ( 5) by setting =0 with the null hypothesis   stationary around a level ( 0 ) rather than a trend.The KPPS test statistics is the Lagrange Multiplier test defined as (Kwiatkowski et al., 1992): where   2 = ∑    =1 ;  = 1, 2, … , ,  ̂2 is the estimate of the error variance of regression (the residual sum of squares, divided by T) in equation In the KPPS test, the presence of a unit root is not tested, but a trend-stationary is.The average can rise or fall over time, whether in a unit root or trend-stationary process.However, when there is a shock, in the trend-stationary process, the time series will converge again towards an increasing mean, which is not affected by the shock.In contrast, in the unit root process, it has a permanent impact on the mean (does not converge over time).

Differencing
Differentiating is a method for changing (transforming) time series data to stationary by reducing the current value from the previous value.The first differencing mathematical model can be expressed in the form of the following equation: ∆  =   −  −1 (7) If the data is still not stationary after the first differencing, the second differencing can be done using the following equation: Differentiating is an essential technique in time series analysis to achieve stationarity by removing trends and seasonal components.Component I (integrated) in ARIMA shows the level of differencing required to eliminate unit roots so that the data becomes stationary.Applying differencing can help handle complex time series data, build more accurate models, and better forecasts.

Autocorrelation Function (ACF)
The Autocorrelation Function (ACF) is used to select AR and MA model parameters and identify trends or seasonal patterns in the data.ACF can measure the correlation between the observed value of a variable at time t (  ) and the observed value at a time (t-1), (t-2), … , (t-lag k) or  − .The ACF value at lag k is calculated using the formula: The ACF graph shows the lag on the x-axis and the correlation coefficient on the y-axis.A positive or negative ACF value indicates that if the observed value at time t is above or below the average, then the value at time t-k is also above or below the average.Lag 0 indicates that the correlation is between values simultaneously (ACF at lag 0 is always 1 because it represents the correlation with itself).If there is a strong correlation at lag 1, the observed value at time t strongly correlates with the observed value at time t-1.If there is a strong correlation at lag 2, the observed value at time t strongly correlates with the observed value at time t-2, and so on.When lag k increases, the number of pairs for ACF calculations decreases so that ACF cannot be relied on for large lags.

Partial Autocorrelation Function (PACF)
Partial Autocorrelation Function (PACF) is used to select AR model parameters that contain several lags.The difference between PACF and ACF is that PACF can determine the direct correlation between two observed values of a variable at a certain distance (lag) after eliminating or without involving correlation with observed values at several previous lags.The PACF value at lag k is calculated using the formula: where   is autocovariance at lag k,  −, is PACF at lag j of the AR(k-1) model.PACF for lag 1 is   =   .The PACF graph shows the partial correlation coefficient on the y-axis and the lag on the xaxis.Lag 0 in PACF also indicates the correlation between values at the same time (partial correlation lag 0 is 1 because a value has a perfect partial correlation with itself).Partial correlation at lag 1 shows the correlation between the value at time t and the value at time t-1 after removing the influence of the value at time t-2, t-3, and further for the next lags.

Residual Diagnostics
The residual in a time series model is the remainder after fitting the model.It is the difference between the observations and the corresponding fitted values.Residuals are helpful in determining whether the model has adequately captured the information in the data.An effective forecasting method will generate residuals that exhibit the following characteristics: they are normally distributed, independent or uncorrelated, have constant variance, and have zero mean.
The residuals of the ARIMA model were tested for normality using the Jarque Bera test as follows (Jarque & Bera, 1980): where ) 2 , 3 and 3 are the estimate of the third and fourth central moment, is the estimate of the second central moment (variance).The null hypothesis in Jarque-Bera test is that the residuals are normally distributed.
The independence and variance homogeneity of the ARIMA residuals were tested using the Ljung-Box test using the following formula (Ljung & Box, 1978): where n is the sample size, h is the number of lags being tested,   is the sample autocorrelation at lag k, and k is the lag value.The null hypothesis in the Ljung-Box test for independence of residuals is that the residuals are independently distributed, while the null hypothesis for homogeneity of variances is that the variance of the residuals is homogeneous.
The residuals of the ARIMA model also were tested whether they have a mean of zero using tstatistic test as follows: where ̅ and   are the sample mean and the sample standard deviation of the residuals.The null hypothesis is that the mean of residuals is zero.

Akaike Information Criterion (AIC)
AIC is a relative measure of the quality of statistical models for a given data set.AIC provides information about the model's fit and allows researchers to compare different models.The main goal of AIC is to choose a model with the most minor prediction error while considering the number of parameters in the model.Generally, the smaller the AIC value, the better the model fits the data.AIC combines two essential elements, the model's fit to the data and the model's complexity, by penalizing the number of parameters used.The formula for calculating AIC is as follows: where k is the number of parameters in the model,  likelihood of the fitted model.In the context of the ARIMA model, likelihood () is the value of the likelihood function based on the estimated parameters.Log-likelihood (ln()) measures how well the model explains the observed data.

RESULTS AND DISCUSSION
Figure 1 shows a data plot from 52 observations of inflation figures in Indonesia from January 2020 to April 2024.Based on the plot in Figure 1, it appears that the data tends to be non-stationary or allows for a trend (type 3 of ADF).Therefore, it is necessary to carry out ADF and KPPS tests to ensure stationarity.The following are the results of the Augmented Dickey-Fuller (ADF) tests before and after differencing to determine the presence of unit roots in each data set.As shown in Table 2, the ADF test result with original data before differencing yielded a p-value=0.12> α= 0.05, so the null hypothesis was not rejected, indicating the data has a unit root.This is consistent with the data plot exploration results in Figure 1.After performing the first differencing, the data still had a unit root with a p-value=0.53> α=0.05, so the null hypothesis was not rejected.Furthermore, after performing the second differencing, a p-value=0.01< α=0.05 was obtained, so the null hypothesis was rejected, indicating that the data did not have a unit root.
To determine whether the null hypothesis of the KPPS test involves stationary at the level or trend, we must fit a linear model using OLS (Ordinary Least Square) to the original data and test the stationarity of the residuals.The results of the OLS regression in Table 3 show that the coefficient of the time trend has a p-value=0.0001456< α=0.05, thus rejecting the null hypothesis and indicating that a deterministic trend is present.This means the data are stationary around a trend rather than a level.4, KPPS test result with original data before differencing yielded a p-value=0.04< α= 0.05, so the null hypothesis was rejected, indicating that the data has a unit root (nonstationary).This is also consistent with the data plot exploration results in Figure 1.After performing the first differencing (d=1), the data has a trend stationary with a p-value=α=0.05,so the null hypothesis was not rejected.Furthermore, after performing the second differencing (d=2), a p-value=0.10> α=0.05 was obtained, so the null hypothesis was not rejected, indicating that the data have a trend-stationary.Next, it is necessary to check the stationarity of the model residuals.The KPPS results on the OLS regression residuals in Table 5 show that p-value=0.1 > α=0.05, so the null hypothesis is not rejected, which means that the residuals have a deterministic trend.In other words, the data is predictable, and fluctuations around the trend are temporary or do not indicate longterm instability.Figure 2 shows that the ACF terms fall below our significance level at lag 1, suggesting that we consider q=1.Additionally, the PACF shows significance at lag 1, crossing the significance threshold (indicated by the dotted blue line).Based on this, we can determine that p=2 for our ARIMA model.Therefore, the potential ARIMA models for inflation are ARIMA(1,2,0), ARIMA(0,2,1), and ARIMA(1,2,1).Based on the estimation and residual diagnostics of ARIMA model in Table 6, the model in which all parameters are significant are ARIMA (1,2,0) and ARIMA(0,2,1).Even though the AIC=50.93 in ARIMA(1,2,1) is the smallest, the AR(1) parameter is not significant.The residual normality of the three models is normally distributed.However, from the independence test, only ARIMA(1,2,1) and ARIMA(0,2,1) have independent residuals.If residuals exhibit correlations, it indicates that there is additional information within the residuals that should be utilized in generating forecasts.The three residuals of the model do not have a constant variance.These models that do not satisfy this property cannot necessarily be improved.We can usually do little to ensure that the residuals have constant variance.All three model residuals have a mean of zero, so the estimates are closer to unbiased.
Based on the identification, estimation, and diagnostic checking results, the best model for forecasting inflation is ARIMA (0,2,1) with AIC=51.81.Table 7 shows the inflation forecasting results for May to December 2024, which have a increasing pattern, which are then depicted in Figure 3.

CONCLUSION
Analyzing inflation data from January 2020 to April 2024 in Indonesia reveals insights into its behaviour over time.Initially showing signs of non-stationarity or a discernible trend, the ADF and KPPS tests were conducted to confirm stationarity.The ADF tests initially indicated non-rejection of the unit root hypothesis, which was consistent with the observed trend.After differencing, the unit root hypothesis was rejected, suggesting stationarity without a unit root.OLS regression further confirmed the presence of a deterministic trend, indicating stationary data around a trend rather than a level.Parameter significance tests identified ARIMA(0,2,1) as the most effective model whose residuals are normally distributed, uncorrelated, and have zero mean.This model was used to forecast inflation for May to December 2024, highlighting its predictive capabilities in capturing future inflation trends.

Table 1 .
Type of ADF Linear Model

Table 2 .
ADF Test Results Before and After Differencing

Table 3 .
OLS Regression Results*The p-value is significant at =5%

Table 5 .
KPSS Test Results of OLS Regression Residuals *The p-value is significant at =5%

Table 6 .
Estimation dan Residual Diagnostics of ARIMA model *The p-value is significant at =5%

Table 7 .
Forcasting Results Figure 3. Inflation Forecast Results for May to December 2024