EXPENDITURE PER CAPITA MODEL WITH SPATIAL SMALL AREA ESTIMATION

Indonesia is one of many countries around the world that attempt to suffer from high poverty rates. Since, poverty information in a certain area is a point of interest to researchers and policy makers. One problem faced is for the development program to be carried out more effectively and efficiently, it is necessary to have data availability up to the micro-scale. The technique used to reach the goal is Small Area Estimation (SAE). Fay-herriot (FH) model is one method on Small Area Estimation. Since, the SAE techniques require “borrow strength” across neighbor areas so thus Fay-Herriot model approach was developed by integrating spatial information into the model. This method known as Spatial FayHerriot Model (SFH) or Spatial Empirical Best Linear Unbiased Prediction (SEBLUP). This study aims to compare MSE of direct estimation, FH, and SFH Model to see which method gives the best result in estimating expenditure. The MSE value of the estimated SFH is smaller than direct estimation and FH, but it does not significant. It means adding spatial information in the small area estimation model does not give a better prediction than the simple small area estimation which is takes account the area as a specific random effect.


INTRODUCTION
In many places around the world, economic growth has fueled reduction in poverty levels. Besides, poverty reduction is the first Sustainable Development goals (SDGs). Many countries around the world attempt to suffer from high poverty rates. Of course, Indonesia is one of them. Poverty is a problem faced by every region in Indonesia. One indicator that can describe poverty well is per capita expenditure. Since, poverty information in a certain area is a point of interest to researchers and policy makers.
Statistics Indonesia, has done some survey named Socio Economic Surveys (SUSENAS). This survey is held twice a year. One of indicator produced by this survey is expenditure percapita to estimate poverty in Indonesia. This indicator helps policy makers to mapping the poverty level on each region. The estimate level of this survey is until districts level.
One problem faced is for the development program to be carried out more effectively and efficiently, it is necessary to have data availability up to the micro-scale. We need data until lower lever such as sub-districts level or even village level or smaller area. If the estimation is made directly to a small area, it will produce a low precision or high error value because of the sample size is insufficient. Solution for the problem is provide the budget and increase the number of samples, so that the survey design can give output data until small area level.
The next problem arises because of the limitation of budget. By paying attention to small areas information needs and seeing the conditions of limited resources, a statistical method that can meet the availability of information needs to be applied even though the resources are limited. This method is known as the Small Area Estimation (SAE).
SAE is a technique that is often used and is expected to produce better precision to estimate smaller areas. Estimates in SAE are based on the model, so additional information is needed from variables related to the observed variables called covariates or auxiliary variabel. The additional information is in the form of administrative data or previous census data. All this additional information must be related to the parameters observed (Rao, 2003).
One of SAE model is Fay-Herriot Model or Empirical Best Linear Unbiased Prediction (EBLUP). The use of the Fay-Herriot Model to estimate parameters has not include spatial influence into the model. Since, the SAE techniques require "borrow strength" across neighbor areas so thus Fay-Herriot model approach was developed by integrating spatial information into the model. It takes into account the random effects of spatial correlation areas is known as Spatial Fay-Herriot Model or Spatial Empirical Best Linear Unbiased Prediction (SEBLUP). The spatial fay-herriot model improve the variance covariance structure of small area models that have spatial correlations between areas. The spatial fayherriot model is presented as the area-based model since the spatial data type are available for area level.
This study aims to compare direct estimation, Fay-Herriot model, and Spatial Fay-Herriot model in order to estimate per capita expenditure. This paper is organized into several section. The first section presents the background of this study. Second section will describe about the data used. The third section explain the methodology used in this paper. The empirical results obtained in the application of estimation methods are presented in section four and the final section summarizes the main findings of the analysis and discusses the further possible researches.

Data Sources
The data used in this study are secondary data from various sources. The following are details of the data used: • Village unit cooperativesthat are still operating (x8) For mapping the estimated expenditure per capita, we need shapefile of Bangka Belitung Island.

Methods
The analysis used in this research is descriptive analysis and inferential analysis. Data processing uses some softwares for analysis purposes. Descriptive analysis used QGIS 2.18.19 and Microsoft Excel 2016 in making charts. Meanwhile GeoDa, R Studio 3.4.2, and Microsoft Excel 2016 are used together to do inferential analysis.
The descriptive analysis provides a simple summary of the data that has been converted into clearer information. The descriptive analysis is presented in the form of tables, pictures, or graphics so that it is easier to understand. This study uses descriptive analysis which aims to find out an overview of the per capita expenditure distribution in the Bangka Belitung Islands Province in 2017 using direct estimation and indirect estimation approaches. Meanwhile, the inferential analysis used is the analysis when estimating using the small area estimation.
Small Area Estimation (SAE) is an indirect estimation method that combines survey data with other supporting data such as from the previous census. Supporting data that used is called auxiliary variables or covariates. Auxiliary variables must contain same characteristics variables as the survey data. This is to make estimated variables in smaller areas can provide a better level of precision.
There are two types of SAE based on the data availability, that are area-level and unit-level. This paper used area-level of SAE because the auxiliary variables that are available only reaches the area level. The area-level model connects the small area direct estimator with supporting data from other domains for each area, that is = ( 1 , … , ). The parameter that will be estimated is . The linear model model is as follows: Where: The estimated value under equation 1 is known as θî and can be written: Where ei = sampling error, assumed e i~ N(0, Ψ i ) Area-level SAE made of two levels: -indirect estimation model component showed by equation (1), -direct component showed by equation (2).
If we combine equations (1) and (2), the equation is as follows:

Empirical Best Linear Unbiased Predictor (EBLUP)
EBLUP is one of the parameter estimation methods in Linear Mixed Model (LMM). EBLUP is used for continuous variables and is less suitable for binary or counted data. EBLUP consists of two types, namely area level EBLUP or Fay Herriot Model and unit level EBLUP. The Fay-Herriot model is written as follows: where v i ~ iid N(0, σ v 2 ) and e i~ N(0, Ψ i ) with variance Ψ i knew from the data. Where v i and e i mutually independent with i which is the area. The Fay-Herriot model is then used in parameter estimation using the EBLUP approach. When σ i 2 is known, then EBLUP becomes the estimator of Best Linear Unbiased Predictor (BLUP) as follows: Where: In practice, the variance of the random effect area (σ v 2 ) is unknown, so it must be estimated first. One method that can be used to estimate the variance of random effects (σ v 2 ) is the Maximum Likelihood (ML) or Residual Maximum Likelihood (REML) method. So that a new estimator is obtained, namely: So, to measure how good the EBLUP estimator is, we use the following formula: (̂) = 1 (̂2) + 2 (̂2) + 2 3 (̂2) where:

Spatial Empirical Best Linear Unbiased Predictor (SEBLUP)
The EBLUP model does not included the effect of spatial correlation in its model. To add the spatial correlation in the model, Pratesi and Salvati (2007) has done a research and found out assuming the spatial dependence in the error component of randem effect area will give the best result. The spatial dependence included in the error component of random effect area follows the model of the Simultaneous Autoregressive (SAR) process. The SAR model (Spatial Fay-Herriot Model) itself was first introduced by Anselin (Anselin 1992) where area v random influence vectors are as follows: = + Where: ρ = spatial autoregressive coefficient W = spatial weighted matrix v = random effect u = error vector of v (random effect) The Spatial Autoregressive coefficient shows the strength of the spatial relationship between the random effects. The value of ρ has a range of -1 to 1.
Equation 9 can be rewritten into the following equation: where I is the identity matrix of size m × m. From equation 10, it can be seen that the mean of v is 0 and the covariance matrix of v is as follows: equation 10 is substituted into equation 4, it will produce: ̂= + ( − ) −1 + The covariance matrix of ̂ with = Ψ i is as follows: BLUP spatial estimator for parameters with 2 , 2 and known is as follows: where: β = (X T V -1 X) -1 X T V -1 θ b i T = sized vector of 1 × n (0, 0, ...0, 1, 0, …0) The BLUP spatial estimator is obtained by entering the covariance matrix in equation 14 into the BLUP estimator. Spatial BLUP will be the same as BLUP if ρ = 0. So that MSE calculations from spatial BLUP can be obtained as in Rao (2015), namely:

Direct Estimation of Per Capita Expenditure
Regions with at least a sample of the Field Work Practices (PKL) can only direct estimates of the per capita expenditure. Bangka Belitung has 43 sub-districts and 387 villages. The direct estimation is done only on that sample with 42 sub-district and 135 village. Per capita expenditure for sub-districts and villages were obtained as shown in table 1. At the level of sub-district, the lowest average per capita expenditure is Tukak Sadai, and the highest average per capita expenditure is Rangkui in Pangkalpinang City. At the level of village, lowest average per capita expenditure is Peradong village and the highest average per capita expenditure is Masjid Jamik Village. A thematic map of the per capita expenditure can be seen in Figure 1.

Fay-Herriot of Per Capita Expenditure
The next step after making direct estimates is to apply the EBLUP. The auxiliary variable selection method will be is backward elimination. With backward elimination selection to 8 auxiliary variables available, only 5 variables are left, namely variables x1, x2, x4, x5 and x6. Meanwhile, for the village level, it shows that of all the accompanying variables used, there are only five remaining variables, namely variables d1, d2, d3, d4 and x7. A summary of the estimation of the accompanying variables is presented in Table 2. Maximum Likelihood Estimation (MLE) method is used to estimate the random effect variance ( 2 ). The random effect variance calculated using the R program. It gives result random effect variance at subdistrict level is 6085453710 and at the village level, is 36945804795.
The estimation results of random effect variance and regression coefficients are used to estimate the average per capita expenditure using the EBLUP method. The results can be seen in appendices 2 and 3. The descriptive statistics of the EBLUP results on per capita expenditure at the sub-district and village levels are presented in Table 3. Mapping the distribution of expenditure per capita at the subdistrict and village levels is presented in Figure 2.

Spatial Fay-Herriot of Per Capita Expenditure
Weighted spatial matrix is needed when estimating the Spatial Fay-Herriot model. In this study, the weighted spatial matrix used is the Queen-type which has been standardized in the row. Queen-type spatial matrix is used because it produces the lower AIC value than the other matrix. Spatial weighted type queen contiguity takes into account the proximity of a region to another region. The Illustration of queen contiguity matrix is shown in Figure 3.  Table 4. The hypotheses of the test are as follows: H0: I = 0 (there is no spatial autocorrelation) H1: I ≠ 0 (there is spatial autocorrelation) According to Moran's I test results in the table 4, it resulted that there is spatial autocorrelation in the random effect area at the village level if we used queen contiguity matrix. REML procedure used to applying the Spatial Fay-Herriot model. Using backward elimination at the sub-district level shows that from the eight auxiliary variables used, only three variables are left, that are variables x1, x5, and x6. For the village level, only five variables are remaining, that are variables d1, d2, d3, d4, and x7. A summary of the estimation from covariates is presented in Table 5. The Maximum Likelihood Estimation (MLE) method is used to estimate random effect variance ( 2 ). The result shows that, random effect variance at the sub-district level is 3918068286 and at the village level is 34761096201. Apart of estimating 2 , we must calculate the coefficient of spatial autoregression ( ). The value at sub-district level is -0.981, and at the village level, the value is 0.1961. Descriptive statistics on per capita expenditure based on EBLUP (Fay-Herriot) are presented in Table 6.

Comparison of MSE and RRMSE
This comparison will give the best estimation results. The method is made by comparing the MSE value of each estimation method. After that, the MSE value was used to count the Relative Root Mean Square Error (RRMSE) value. The comparison is made for each subdistrict and village level. Based on the comparison above, we see that at the sub-district and village levels, the RRMSE value of the Spatial Fay-Herriot model estimation method is better than the direct estimate. We can say that because the RRMSE line of the Spatial Fay-Herriot model is lower than the RRMSE line of EBLUP or direct estimation, but it does not significant. It means adding spatial information in small area estimation model does not give better prediction than the simple small area estimation which is take account area as specific random effect. So, adding spatial information in small area estimation model does not give better prediction than the simple small area estimation which is take account area as specific random effect.

CONCLUSION
The conclusion of this paper are: 1. An overview of direct estimates of per capita expenditure is as follows a. Sub-district level, the highest per capita expenditure was IDR.1,788,768 which is in Rangkui sub-district, Pangkalpinang city. Meanwhile, the lowest average per capita expenditure is in Tukak Sadai sub-district, South Bangka district. b. Village level, the highest per capita expenditure was IDR 1,877,401 which was in the Masjid Jamik, Rangkui sub-district, Pangkalpinang. While the lowest average per capita expenditure was in Peradong village, Simpang Teritip sub-district, West Bangka. 2. Complementary variables that are good for estimating EBLUP and SEBLUP are as follows: a. Subdistrict i. EBLUP: Agriculture (x1); Mining and excavation (x2); Wholesale / retail trade and restaurants (x4); transportation, warehousing, communication (x5); and Services (x6). ii. SEBLUP: The main source of income for most of the population is agriculture (x1); transportation, warehousing, communication (x5); and services (x6). b. Village i. EBLUP: mining and excavation (d1); wholesale / retail and restaurants (d2); transportation, warehousing and communication (d3); and services (d4); number of active hospitals (x8). ii. SEBLUP: Significant companion variables also have the same results as the EBLUP method. 3. The MSE value of the estimated SEBLUP is smaller than direct estimation and EBLUP estimation, but it does not significant. It means adding spatial information in small area estimation model does not give better prediction than the simple small area estimation which is take account area as specific random effect.