PANEL DATA REGRESSION ANALYSIS FOR MODELING THE HUMAN DEVELOPMENT INDEX IN NORTH SULAWESI PROVINCE

The regression analysis is a technique used in hypothesis testing to determine the impact of one variable on another. Research related to cross sectional and time series analysis of the Human Development Index in North Sulawesi has not been carried out much by researchers so it is very necessary for monitoring socio-economic progress that has been achieved by North Sulawesi Province, for other purposes as well as a tool for identifying development needs so that factors can be identified. which one is more important to prioritize and the magnitude of its influence. This research will identify how big the influence of significant variables is to be used as a basis for more effective and targeted development planning in North Sulawesi Province. The analytical method that is often used to determine the magnitude of the effect is regression analysis. One development of regression analysis is path analysis. This study uses Panel Data Regression Analysis, which combines cross-sectional and time series data. This study aims to analyze the impact of Life Expectancy, Income Per Capita, Expected School Years, and Average School Years on the Human Development Index. According to the result of the analysis, the Common Effect Model (CEM), which used Ordinary Least Squares (OLS) estimation, was the most suitable model. The equation obtained is 𝑦 𝑖𝑡 = 5.29 + 0.47𝑥 1 𝑖𝑡 + 0.00𝑥 2 𝑖𝑡 + 0.96𝑥 3 𝑖𝑡 + 1.90𝑥 4 𝑖𝑡 . Moreover, according to the significance test, all independent variables were significantly related to the dependent variable.


INTRODUCTION
The term "development" refers to activities undertaken by a country or region to improve the quality of life of its residents.Development is a process in which several factors influence each other to cause it.Identifying and analyzing these factors will allow the sequence of events that arise to be known, thereby increasing the level of social welfare from one stage of development to the next (Rustiadi dkk., 2011).The Human Development Index is one of the indicators used to measure regional development.
The Human Development Index (HDI) in a province can provide an overview of the progress of human welfare in that region, such as life expectancy, literacy rates, school participation rates, labor force participation rates, and per capita income.Analysis with this data can provide an overview of the human development index, especially a holistic picture of human development in a region.Based on data from the North Sulawesi IPM Central Statistics Agency from 2018 to 2020, it shows an increase in life expectancy from year to year, a reduction in literacy levels, an increase in school enrollment rates.However, several challenges still need to be considered, namely in terms of the labor force participation rate or per capita income which is still low.The Human Development Index has become a measure of human development in the world, and the direction of development itself has changed from developments in the economic sector which focused on increasing population income.To achieve prosperity and development, human development must be prioritized (Azfirmawan et al., 2023).
The HDI measures the performance of regional development in terms of life expectancy, education, and decent living standards of a region's population, and it has vast dimensions (Melliana dan Zain, 2013).HDI indicators are used in many policy-making processes in Indonesia.For example, the central government uses IPM in determining the amount of General Allocation Funds (DAU) received by a region.For the example, HDI is an indicator of the government's development targets in discussing macro assumptions in the DPR-RI.The HDI components (expected length of schooling, average length of schooling, and per capita expenditure) are indicators used in calculating Regional Incentive Funds (Nasution, 2019).Changes in HDI are not only important to see directly, but also in the context of comparison with previous years.Further analysis can identify the causes behind the changes, such as new education policies, increased access to health services, or specific economic factors.Handling the Human Development Index (HDI) in North Sulawesi Province faces several obstacles.The most frequently encountered obstacle, namely public awareness and understanding of the importance of human development, can also be a factor that influences the handling of HDI.Lack of awareness of the importance of access to education, health and decent work can hinder efforts to increase HDI.Nasution (2020) in his research using multiple regression analysis stated that the human development index influences the growth of Indonesia's creative economy.
In order to determine whether several of the components above have an impact on HDI, a regression analysis can be used.Regression analysis is used to determine whether two or more variables have a significant relationship and determine the direction of the relationship between the dependent and independent variables (Imam Gozali, 2013).This research utilizes panel data that combines crosssectional data and time series data.Generally, panel data can provide many benefits, both statistically and economically.In an econometric model, panel data allow individual-specific variables to be considered explicitly to account for individual heterogeneity.Furthermore, the ability to control individual heterogeneity can reduce the problem of omitted variables by a considerable amount if an effect is significantly correlated with other explanatory variables.Panel data is also based on repeated cross-section observations, making it suitable for studying dynamic adjustment processes, such as labor mobility and job entry and exit rates (Ekananda Mahyus, 2016).Panel data regression is a method for analyzing the panel data.
Several studies related to the human development index have been carried out and can be seen in the previous paragraph.Research using panel data analysis has been carried out with the result that the higher the human development index will increase the Gini ratio so that it can increase the condition of income inequality between regions (Carla, L.M et al., 2023).Hutagalung (2022) in his research stated that the human development index in North Sumatra is influenced by life expectancy, expected length of schooling, average length of schooling, per capita consumption, and percentage of poor people.The model obtained is that the percentage of poor people only affects the human development index negatively, that is, if one unit of the percentage of poor people increases, it will reduce the human development index in North Sumatra by 0.07%.
The review of this research is on the variables used, in previous research the focus was more on the influence of the human development index in various sectors.This research only uses four predictor variables which have a positive influence from the model obtained in Hutagalung's (2020) research.This research does not use the variable percentage of poor people by looking at the results of previous research that the percentage of poor people only affects the human development index negatively.Apart from that, research on the human development index using indicators of these variables in North Sulawesi in that year had never been carried out, this is interesting to research as a recommendation to the North Sulawesi regional government regarding determining future policies.

MATERIALS AND METHODS
The data used in this research is based on secondary data published by the North Sulawesi Provincial Central Bureau of Statistics in 2021.Among the predictor variables in this research are Life Expectancy (X1), Income Per Capita (X2), Expected School Years (X3), and Average School Years (X4), while the response variable is the Human Development Index (Y).This research uses saturation sampling, the saturated sampling technique is a sample determination technique when all members of the population are used as samples (Sari, 2021).Therefore, the author chose the sample using a saturated sampling technique because the population size was relatively small.Calculations were performed using eviews software.
A panel data regression analysis is used in this study, in which panel data are a combination of time series and cross-sectional data (individual data) (Widarjono, 2009).The panel data regression model is generally expressed as follows: =   +    +   ,  = 1,2, … , , and  = 1,2, … ,  (1) which,   : a value of the response variable of the i-th cross-section unit for the t-th time   : an intercept, which is the effect of the i-th individual or group unit cross-section for the t-th time  ′ : ( 1 ,  2 , … ,   ), which is 1 × k slope vectors, where k is the number of predictor variables   : ( 1  ,  2  , … ,    ), 1 × k observation vectors of predictor variables   : regression error of the i-th cross-section unit for the t-th time.
There are several possibilities between the intercept, slope coefficient and error term when using a panel data regression model.Some of the possibilities above show that the larger the independent variable, the more complete the parameter estimates will be.This method requires the use of several techniques to assess parameter estimates, such as the Common Effect Model (CEM), Fixed Effect Model (FEM) and Random Effect Model (REM) approaches.(Gujarati, 2003).CEM explains differences in intercept and slope coefficients over time and individuals.FEM is one of the many estimation techniques used in panel data regression types.The random effect model is useful for solving problems caused by the fixed effect model.For panel data, fixed effect models with dummy variables cause the problem of missing degrees of freedom from the model.Furthermore, dummy variables can obscure the original model (Hutagalung, 2020).Testing the model specifications is the first step in selecting the model that will be used for estimation.a) Chow Test A Chow test is used to determine whether the Fixed Effect Model (FEM) or the Common Effect Model (CEM) will be used.The assumption that each cross-sectional unit behaves the same is unrealistic when considering that each has a variety of behaviors, which is the premise of the Chow test.Here are the hypotheses tested: As shown in equation ( 2) below, the Chow test statistic can be calculated (Greene, 2008)

b)
Hausman Test In order to determine whether a Fixed Effect Model (FEM) or Random Effect Model (REM) should be used, the Hausman test is used.Hausman test is based on FEM, which involves some tradeoffs due to the inclusion of dummy variables, as well as REM, which must be aware of whether any assumptions have been violated.The hypotheses tested are:  0 : (  ,   ) = 0 (REM)  1 : (  ,   ) ≠ 0 (FEM) Hausman test statistic can be calculated as shown in equation (3) (Greene, 2008): At a significance level of , when  2 >  2 (;) (or  −  < ) demonstrates a rejection of the null hypothesis, in which K is the number of predictor variables.c) Lagrange Multiplier test A Lagrange Multiplier test determines whether a CEM or REM will be used.Here is the hypothesis that was tested:  0 : CEM will be used  1 : REM will be used Equation ( 4) illustrates how to calculate the Lagrange Multiplier statistic (Rohmana,2010): which: n: number of observations T: number of time e: error of the Ordinary Least Squares (OLS) model The null hypothesis should be rejected if the value of LM is more than the value of the chi-square table at the level of significance alpha (or  −  < ).
A regression equation's quality can be determined by its determination value ( 2 ), where, mathematically, it is represented by a square of the correlation coefficient (r).The value of  2 is often overestimated, which is why some statistical software calculates the adjusted  2 . 2 value indicates how important predictor variables are to determining the response variable.As shown in equation ( 5), the value of  2 can be calculated: 2 =   (5) A small value of  2 indicates that the predictor variables cannot adequately explain the response variables.The  2 value close to one shows that the predictor variables provide almost all the information to explain variations in the response variable (Doni Silalahi, 2014).A simultaneous test and a partial test can be used to test the significance of this parameter.a) Simultaneous test The simultaneous testing procedure is used in order to determine whether the model fits the following hypothesis based on the influence of all predictor variables on the response variable: 0 :  1 =  2 = ⋯ =   = 0  1 : at least one   ≠ 0, with  = 1,2, … ,  A simultaneous test using the F statistics, which can be calculated as shown in equation ( 6): where,  2 : determination value  : number of observation  : number of predictor variables The null hypothesis is rejected at a significance level of α when  >  (+−1;−−;) .

b) Partial test
An independent variable has a significant independent effect on a dependent variable by using a partial test with the following hypothesis:  0 :   = 0  1 :   ≠ 0, with  = 1,2, … ,  Equation ( 7) shows how to calculate the t statistics for the partial test: which,   ̂ : regression coefficients for each predictor variable (  ̂) : standard error for each predictor variable The null hypothesis is rejected at a significance level of α when  >  (−;/2) .

Estimation of Panel Data Regression Models
Choosing an appropriate model for estimating the panel data regression model begins with a test of the model specifications.
a) CEM According to Table 1, the panel data regression model with CEM can be estimated as follows:   = 5.29 + 0.47 1 + 0.00 2 + 0.96 3 + 1.09 4 .Based on the model estimates above, it can be interpreted that every increase in AHH of 1 unit will cause an increase in HDI of 0.47 units, every increase in PPK of 1 unit will cause an increase in HDI of 0.00 units, every increase in HLS of 1 unit will cause an increase in HDI of 0.96 units and Every increase in RLS of 1 unit will cause an increase in HDI of 1.09 units provided that the other variables are constant/fixed.b) FEM The panel data regression model with FEM can be estimated as follows according to Table 2:   = −15.13+ 0.77 1 + 0.00 2 + 0.90 3 + 0.65 4 Based on the model estimates above, it can be interpreted that every increase in AHH of 1 unit will cause an increase in HDI of 0.77 units, every increase in PPK of 1 unit will cause an increase in HDI of 0.00 units, every increase in HLS of 1 unit will cause an increase in HDI of 0.90 units and Every increase in RLS of 1 unit will cause an increase in HDI of 0.65 units provided that the other variables are constant/fixed..As shown in Table 3, the panel data regression model with REM can be estimated as follows:   = 5.22 + 0.47 1 + 0.00 2 + 0.96 3 + 1.08 4 Based on the model estimates above, it can be interpreted that every increase in AHH of 1 unit will cause an increase in HDI of 0.47 units, every increase in PPK of 1 unit will cause an increase in HDI of 0.00 units, every increase in HLS of 1 unit will cause an increase in HDI of 0.96 units and Every increase in RLS of 1 unit will cause an increase in HDI of 1.08 units provided that the other variables are constant or fixed.

Selection of Panel Data Regression Model
An analysis of the data using the Chow test led to a p-value of 0,0153, which is less than  (0,05); therefore, FEM was chosen.Further, using the Hausman test results, a p-value of 0.3479 was obtained; the value is greater than  (0,05), so REM was selected.Lastly, the Lagrange multiplier analysis revealed a p-value of 0,5572, which was greater than  (0,05), thereby indicating that CEM should be selected.Below is the panel data regression model with CEM: = 5.29 + 0.47 1 + 0.00 2 + 0.96 3 + 1.09 4
Based on these findings, the Human Development Index is significantly influenced by Life Expectancy, namely Income Per Capita, Expected School Years, and Average School Years.
Meanwhile, the results for the partial test using t statistics are explained below: a) Life Expectancy (X1) With the hypothesis null is  0 :  1 = 0 (not significant relationship between the variable Life Expectancy and the Human Development Index) and 95% a significance level, the value of t statistics obtained was 10,412, which is greater than 2,0181 ( (−;/2) ); so,  0 is rejected.There is, therefore, a significant relationship between the variable Life Expectancy and the Human Development Index.This is in accordance with the theory that life expectancy (AHH) at birth is an indicator that can reflect the level of health of an area, both in terms of infrastructure, access, and health quality (Asmawani, 2021).
b) Income Per Capita (X2) With the hypothesis null is  0 :  2 = 0 (not significant relationship between the Income Per Capita and Human Development Index) and 95% a significance level, the value of t statistics obtained was 19,972, which is greater than 2,0181 ( (−;/2) ); so,  0 is rejected.Consequently, it can be concluded that the Income Per Capita and Human Development Index significantly impact one another.This is in accordance with the theory that humans have a decent life and have a well-established economy which can be seen by the economic growth of a region and expenditure (Asmawani, 2021).c) Expected School Years (X3) With the hypothesis null is  0 :  3 = 0 (not significant between the Expected School Years significantly influence the Human Development Index) and 95% a significance level, the value of t statistics obtained was 8,025, which is greater than 2,0181 ( (−;/2) ); so,  0 is rejected.Therefore, it can be concluded that the Expected School Years significantly influence the Human Development Index.HLS has the ability to measure a person's educational opportunities starting from the age of seven.In simple terms, HLS can be translated as a single age-based school enrollment rate.This indicator reflects the length of education (in years) that a child at a certain age is expected to undergo in the future.This HLS will play a very important role in measuring the human development index in the field of education (Ginting, 2023).d) Average School Years (X4) With the hypothesis null is  0 :  4 = 0 (not significant between the Average School Years and the Human Development Index) and 95% a significance level, the value of t statistics obtained was 9,588, which is greater than 2,0181 ( (−;/2) ); so,  0 is rejected.Accordingly, the Average School Years and the Human Development Index significantly impact each other.Higher education will have greater hope in developing humans than lower education, when job opportunities are limited for those with lower education, people will position themselves to obtain higher education (Asmawani, 2021).
The  2 value of CEM was 0.9938.Consequently, 99.38% of the variation in the Human Development Index (Y) can be attributed to the predictor variables, namely Life Expectancy (X1), Income Per Capita (X2), Expected School Years (X3), and Average School Years (X4).Additionally, the remaining 0,62 is accounted for by other variables outside of the model that were not explored in this research.

CONCLUSION
According to the result of the analysis, the Common Effect Model (CEM), which used Ordinary Least Squares (OLS) estimation, was the most suitable model.The equation obtained is   = 5.29 + 0.47 1  + 0.00 2  + 0.96 3  + 1.09 4  .Moreover, according to the significance test, all independent variables were significantly related to the dependent variable.Based on the analysis results, it is concluded that if the other variables remain constant, an increase of one unit in life expectancy will increase to 0.47 units in the human development index.Similarly, if the other variables remain the same, a one-unit increase in the income per capita will result in a 0.00-unit increase in the human development index.In addition, if the other variables remain constant, for every one-unit increase in expected school years, the human development index increases by 0.96 units.Accordingly, if all other variables remain constant, the human development index will increase by 1.09 units for every unit increase in average school years.The big influence is given by the average length of schooling variable so that for further research we can use other indicators in the education sector besides the average length of schooling variable to get more comprehensive results regarding the human development index in North Sulawesi.For deeper analysis regarding the influence of the education sector on the human development index, you can use the path analysis method to determine direct and indirect effects.
error of CEM   = sum square error of FEM N = number of cross-section units T = number of time series units k = number of estimated parameters A value of  >  (−1;−−;) (or  −  < ) indicates a rejection of the null hypothesis at a significance level of .

Table 1 .
Panel Data Regression with CEM

Table 2 .
Panel Data Regression with FEM

Table 3 .
Panel Data Regression with REM