APPLICATION OF NEGATIVE BINOMIAL REGRESSION ANALYSIS TO OVERCOME THE OVERDISPERSION OF POISSON REGRESSION MODEL FOR MALNUTRITION CASES IN INDONESIA

Indonesia is one of the developing countries that is struggling to eradicate the malnutrition problem. Malnutrition that occurs over a long period of time can have an impact on the deaths of sufferers and decrease human quality of life. This study aims to model the case of malnutrition that occurred in Indonesia Provinces during 2015 and get the main factors that cause the malnutrition problem. Variables studied consist of Malnutrition (Y), Vitamin A consumption (X 1 ), Exclusive breastfeeding (X 2 ), Immunization (X 3 ), Water quality (X 4 ), Healthcare center (X 5 ), and Poverty level (X 6 ). Based on the Kolmogorov-Smirnov test, the results of malnutrition data in Indonesia Province in 2015 do not follow Poisson distribution because of overdispersion. The presence of overdispersion cases in the Poisson regression model will have an impact on the inappropriateness of inferences. An alternative model that accommodates this case is the negative binomial regression model. By using this model, factors that are considered influencing malnutrition cases in Indonesia provinces in 2015 are Immunization (X 3 ), Water quality (X 4 ), and Poverty level (X 6 ). The best model obtained from the negative binomial regression analysis is 𝜇̂ 𝑖 = 𝑒𝑥𝑝( 2 . 5111 − 0 . 0338 𝑋 3 + 0.0295𝑋 4 + 0.0576𝑋 6 ).


INTRODUCTION
Nowadays, malnutrition is one of the problems that is still a major concern for developing countries. A chronic malnutrition can affect the decreased quality of life such as stunting that can reduce a person's productivity and increase risks of non-communicable diseases such as diabetes and heart disease. This is called the double burden of malnutrition (World Bank, 2013). Besides above effects, malnutrition is a major cause of child morbidity and mortality (Tette, 2015).
The number of malnutrition cases for under five-year children found in Indonesia in 2015 amounted to 26,518 people or 0.11% of the total number of under five-year children in Indonesia (Ministry of Health the Republic of Indonesia, 2016). Whereas in 2014 cases of malnutrition found in Indonesia was recorded at 32,521 people or 0.14% of the total number of under five-year children in Indonesia. This figure shows that the case of malnutrition in Indonesia Provinces in 2015 has slightly decreased. A small decline indicates that the government's coping strategies to address malnutrition in Indonesia was not optimal. Therefore, a deep analysis to determine the factors that cause malnutrition problem in Indonesia is still needed. The factors consist of socio-economic and health-related related factors. Unemployment and a lack of knowledge about recommended infant and child feeding practices are some examples of socio-economic factors, while low birth weight, inadequate vitamin A supplementation are health-related factors affecting the malnutrition problem for under age children (Kadima, 2012).
Poisson regression analysis is an analysis used to describe the relationship between response variable (Y) in the form of discrete data and one or more predictor variables (X). In the Poisson regression analysis, the response variable (Y) is assumed to be Poisson distributed, so that the variance value for the response variable (Y) must equal to its mean or commonly called equidispersion (Olsson, 2002). However, it is often found that the variance of the response variable (Y) is much greater than its mean namely overdispersion. The existence of overdispersion cases may result in incorrect standard errors of parameter estimators that lead to incorrect assessment of the significance of individual regression parameters. Consequently, our interpretation of the regression model will also be incorrect. Inappropriate imposition of the Poisson may underestimate the standard errors and overstate the significance of the regression parameters. As a result, it can give misleading inference about the regression parameters (Ismail, 2007). Therefore, alternative models should be proposed when overdispersion problem occurs. The negative binomial regression model offers a good solution to this problem.

MATERIAL AND METHOD
This study used the method of negative binomial regression analysis to overcome the problem of overdispersion in the Poisson regression model for estimating malnutrition level (Y) in Indonesia provinces. The predictors used to estimate are Vitamin A consumption (X1), Exclusive breastfeeding (X2), Immunization (X3), Water quality (X4), Healthcare center (X5), and Poverty level (X6).
There are several types of negative binomial model such as canonical negative binomial (NBC), linear negative binomial (NB1), and traditional negative binomial (NB2). The negative binomial regression model referring to NB2 can be formed from the Poisson-Gamma mixture distribution (Hilbe, 2011). The negative binomial regression model has the same usage as the Poisson regression model to analyze the relationship between a discrete response variable and one or more predictor variables. However, the negative binomial regression analysis has the dispersion parameter useful for describing the variation of the data. When the dispersion parameter approaches to zero, the data is said to be Poisson distributed.

Poisson Regression
Poisson regression is generally used to describe the relationship between response variable (Y) which is assumed Poisson distributed and predictor variables (X). If Y is a discrete data distributed Poisson with parameter μ>0, then the probability density function is It can be proved that ( ) = ( ) = . While the equation model on Poisson regression is as follows: = ( 0 + 1 1 + 2 2 + ⋯ + ) + , = 0,1,2, . . .
where, μi is the expected value of the response variable (Y), β0, β1, ..., βk are the coefficients of the regression parameters, and εi is the error of the i th observation.
The estimation of the parameter coefficients β0, β1, ..., βk in the Poisson regression analysis is performed by the Maximum Likelihood Estimation (MLE) method, i.e. by maximizing its log-likelihood function with respect to the parameters to be estimated. This process is done iteratively using Newton-Raphson method. The likelihood function for Poisson regression is = ∏ ( ; ) =1 , so that its loglikelihood function is: The maximum likelihood estimator for 0 , 1 , … , is obtained by solving equations: The above nonlinear system of equations can be solved iteratively by using The Newton-Raphson method. This process involves complex numerical computation.
Overdispersion is a condition that occurs in Poisson regression analysis when the variance value of the response variable is greater than its mean. The existence of overdispersion cases may result in misleading conclusion, since the standard error from the estimated regression parameters generated becomes lower than it should be, the estimated value of the parameters that should not necessarily be significant will be considered significant. This will lead to incorrect prediction and interpretation of the model. Overdispersed conditions could be apparent or real (Hilbe, 2011). Apparent overdispersion can be caused by omitting important explanatory variables or by the existence of outliers. By including appropriate explanatory variables or adjusting outliers, apparent dispersion usually can be overcome. Sometimes adding interaction factors or transforming response or predictor variables is needed. When the real overdispersion persists, several methods can be used to resolve the problem such as the use of the generalized Poisson or negative binomial model.
The negative binomial regression is an alternative solution used to overcome the problem when overdispersion occurs. In this regression analysis, the response variable is assumed to be negative binomial distributed, a distribution that can be approached by the Poisson-Gamma mixture distribution, that is a variable Y having probability density function as follows: If a → 0 then Var (yi) → μ, so that the negative binomial distribution converges to the Poisson distribution. While the equation model in negative binomial regression is as follows: where, μi: the expected value of the response variable (Y), β0, β1, ..., βk: the coefficients of the regression parameters, and εi: the error of the i th observation.
The log-likelihood function for negative binomial regression is: Let, = (β0, β1, ..., βk) and = exp ( ′ ), The above log likelihood function become: The Maximum Likelihood Estimation of the parameter coefficients β0, β1, ..., βk in the negative binomial regression model is obtained by finding partial derivatives of the log-likelihood function with respect to each parameter to be estimated equated to zero [4], and then solved iteratively by using the Newton-Raphson method.

Methods
The data used in this study is secondary data obtained from the book of Indonesia Health Profile Year 2015 published by the Ministry of Health of the Republic of Indonesia. The variables used consist of Malnutrition (Y), Vitamin A consumption (X1), Exclusive breastfeeding (X2), Immunization (X3), Water quality (X4), Healthcare center (X5), and Poverty level (X6). While the steps of analysis in this study were as follows: a. Descriptive analysis for response variables (Y) and predictor variables (X). b. Assumption test for the distribution of the response variable (Y). The distribution of the response variable (Y) is assumed to be Poisson distributed and it will be tested by Kolmogorov-Smirnov test, i.e. using the statistic: = | ( ) − 0 ( | ) |.
If the value of > * ( ) (value obtained from the Kolmogorov-Smirnov table) or if the value of Asymp. Sig. (2-tailed) < α then the data is said not to be Poisson distributed. c. Examination of mean and variance of the response variable (Y). d. Overdispersion checking using quantity: if > 1 then the Poisson regression model is overdispersed. e. Establishment of negative binomial regression model.

f. Model Conformity Test
Test of conformity of the negative binomial regression model using the Deviance test: If the value of D < 2 (α, df), then the model is said to be feasible to use. Note: α: significant level, df: n-k, n: the number of observations, k: the number of parameters g. Likelihood Ratio Test.
The simultaneous significance test for negative binomial regression parameters is: If the value of LR> 2 (α, df) then it can be said that there are significant parameters to the model. h. Wald Test If the likelihood ratio test is rejected, then partial test for each regression parameters is done by Wald test. The Wald statistics is: If > ,1 2 or if p-value < α then it can be said that the j th predictor variable is significant.
i. The predictor variables removal process Predictor variables (X) that are considered insignificant to the model are removed one by one starting from the least significant variable to the model. j. Selection of the Best Model The best model selection is done by comparing the smallest AIC values of some models formed from the combination of predictor variables that are considered significant against the negative binomial regression model obtained from process No.9. (Zwilling, 2013).

RESULTS AND DISCUSSION
The average number of under five children affected by malnutrition cases (Y) for every 10,000 under five-year children in Indonesia Provinces in 2015 are 14.323≈14 children. Average percentage of Vitamin A coverage for under five year children (X1) equals to 76.74%, average percentage of under five year children getting exclusive breastfeeding (X2) is 57.89%, average percentage of under five year having complete immunization (X3) is 82.41%, average of households having access to good water quality (X4) is 68.88%, average number of health centers for every 30.000 population (X5) is 1.863 ≈ 2 health centers, and average percentage of poverty level (X6) is 11.70%. The following table presents the descriptive analysis used to see the characteristics of each variable used in this study. The first step taken in the Poisson regression analysis is Kolmogorov-Smirnov test used to see the distribution of malnutrition data in the Provinces of Indonesia. The result is as follows: Based on the results of output in Table 2, the value of Asymp. Sig. (2-tailed) < 0.05. It can be concluded that with a confidence level of 95%, the malnutrition data is not Poisson distributed. The average value and variance of malnutrition data are shown in Table 3. Based on Table 3 the variance of malnutrition data that occurred in the province of Indonesia in 2015 is much greater than its mean value, so it is indicated that data from Poisson regression analysis has overdispersion problem. Examination of overdispersion cases in Poisson regression can be done by looking at deviance value divided by its degree of freedom. If the value is greater than one, then the data is said to be overdispersion.
The results of output in Table 4 shows that = 8.98 > 1, so it can be concluded that overdispersion occurs in the Poisson regression model. The existence of overdispersion cases may result in misleading conclusion due to underestimate standard errors and overstate the significance of the regression parameters. Therefore, the Poisson model is not applicable. The next step is constructing negative binomial regression as an alternative model used when the data in Poisson regression analysis experienced overdispersion. The result is as follows: To see the feasibility of the above negative binomial regression model, it can be checked from output result in Table 6. Based on the result of output in Table 6 obtained value D = 34.669 < χ 2 (0.05; 27) = 40.113. It can be concluded that with a confidence level of 95% negative binomial regression model obtained is feasible to use. The likelihood ratio test is used to test the significance of the negative binomial regression parameters simultaneously. The result is as follows: Based on the result of output in Table 7 we get the value of LR is: = −2(−251.240 − (−229.575)) = 43.33 > 2 0.05;27 = 40.113, therefore with 95% level of confidence, there are parameters significantly affecting the variability of the response variable. The next step to carry out is Wald test to see the parameters considered significant to the model. The result is shown in Table 8. Based on the results of the output in Table 8 obtained value of P-value for Water Quality variable (X4) of 0.0325 and for the variable Poverty (X6) of 0.0418. Both values of these variables show the value of P-value <0.05. Therefore, at the level of confidence of 95% it can be said that there is a significant influence between the variable Water Quality (X4) and Poverty (X6) on cases of malnutrition that occurred in the Province of Indonesia in the year 2015. The next steps are the process of removing predictor variables that are considered insignificant to the model to get the new model. These steps are executed one by one, starting from the predictor variable having the highest p-value. The result of these steps is removing predictor variables (X1), (X2) and (X5), and the following final results are obtained:  Table 9 it can be concluded that there are three significant variables to the model, i.e. (X3), (X4) and (X6) variables. Negative binomial regression model using these three predictor variables is:
From these three variables, it can be formed several new models used to find the best model. Akaike Information Criterion (AIC) is used to get the best fitted model. There are 7 negative binomial models that can be formed. The variables included in each model and their AIC values are shown in Table 10 below.