REGRESSION ANALYSIS OF ROBUST ESTIMATION-S WITH TUKEY BISQUARE WEIGHTING ON POVERTY LEVEL ON SULAWESI ISLAND

Poverty is a situation where a person experiences difficulty in meeting basic needs. There are several factors that influence poverty, including population, unemployment, gross regional domestic product, human development index, average years of schooling and labor force participation rate. Therefore, it is necessary to carry out regression analysis to determine the relationship between one variable and other variables. One method for estimating regression parameters is the least squares method. Some classic assumptions are not met because there are outlier data. Outliers are data that do not follow the overall distribution pattern, so a method is used that can overcome outliers, namely the S-estimation robust regression method with the Tukey bisquare weighting function. The results of the research show that the best model was obtained from robust S-estimation regression with Tukey bisquare weighting, namely 𝑌 = −0,21023 + 0,46522 𝑥 1 + 0,16551 𝑥 4 − 0,33444 𝑥 5 + 0,15864 𝑥 6 . factors that influence the level of poverty on the island of Sulawesi, namely Population Number (𝑋 1 ), Human Development Index ( 𝑋 4 ), Average Years of Schooling ( 𝑋 5 ) and, Force Participation Level. Work (𝑋 6 ) .


INTRODUCTION
National development is an effort to make society just and prosperous.Various efforts have been made for development, especially in areas with relatively high levels of poverty which continue to increase every year.Another factor that causes poverty was presented by Mirah et al (2020) and Hasanah et al (2021) who said that the level of labor force participation and the average number of years of schooling have an influence on poverty.
A method called regression analysis is used to determine the relationship between the independent variable and the dependent variable.One of the methods used to estimate regression coefficients is the least squares method (MKT).The least squares method can estimate parameters by minimizing the sum of residual squares.The MKT model requires several assumptions that must be met.However, it is often the case that assumptions are not met due to outliers, so the use of the least squares method is not appropriate.Therefore, robust regression is used to overcome the presence of outliers.
Robust Regression is a method used to analyze data that is influenced by outliers so as to produce a model that is robust against outliers.One of the methods used in robust regression is S-estimation which is one of the estimation techniques that has the highest breakdown point value of up to 50%, which means that S-estimation can overcome half of the outliers.This method uses the Tukey Bisquare weighting function to produce a weighting scale with iteration until the estimator obtained converges.The iteration process used in the Tukey bisquare weighting is also less compared to other weightings.
Several previous studies have been carried out using robust regression analysis, including Susanti et al (2014) regarding M-Estimation, S-Estimation, and MM-Estimation in Robust Regression analysis on corn production data in Indonesia.It was found that S-estimation produces a better model.Good.Based on the description above, this research uses the Robust Estimation-S Regression method with Tukey Bisquare weighting to analyze the factors that influence the level of poverty on Sulawesi Island.

DFFITS Test
DFFITS test is a measurement that provides information regarding the influence of the first case on the overall regression equation.The following is the DFFITS test hypothesis as follows: 0 :   = 0( outliers have no effect)  1 :   ≠ 0( influential outliers) So, outlier data detection is used from the DFFITS value which is calculated using the formula:

S-Estimation
The S-estimation was first introduced by Rousseew and Yohai (1984) as a robust estimate that can reach a breakdown point of up to 50%.Breakdown points are used to address outlier problems before observations affect the model.Scale estimates can be seen using the formula.
The initial estimates used are as follows: Solve equation (1.10) by finding its derivative to obtain: is called the influence function which is the derivative of ( ′ = ).The derivative of the function is: Iteratively Reweighted Least Square (IRLS) is used to solve equation ( 11) which can be written as follows: (6) It is assumed that an initial estimate  0 exists  ̂when using IRLS.j is the number of parameters to be estimated, then the equation can be written as: (7) The estimated parameters in the first iteration are  0 and the weight values in the initial iteration are   0 .Then the equation can be written as: Where W is an nxn matrix with diagonal elements containing weights.By providing an estimator is:  = (  ) −1 (  )
Since (  )it is the first derivative of the bisquare Tukey influence function, the following equation can be obtained: Tukey Bisquare weighting function is: Where the residual scale in the i-th observation is the value   and the c value is the constant tuning value that has been determined to determine the level of robustness .

Parameter Testing F test
The F test is a test used to determine whether the independent variables as a whole have a significant effect simultaneously on the dependent variable, with the following hypothesis [4].
0 :   = 0( all independent variables have no significant effect simultaneously on the dependent variable)  1 :   ≠ 0( all independent variables have a significant effect simultaneously on the dependent variable) The test criteria are if the value  ℎ >  (∝,,−−1) or significant value is <0.05, then reject it  0 , which means that all independent variables have a significant effect simultaneously on the dependent variable.

T test
The T test was carried out to see the effect of each independent variable on the dependent variable individually with hypothesis [6].
0 :   = 0 (The independent variable has no effect on the dependent variable)  1 ∶   ≠ 0( The independent variable has an effect on the dependent variable) ),−−1) or significant value is <0.05, it is said to be rejected  0 , which means the independent variable has an effect on the dependent variable.

Data
The data in this research is secondary data obtained from the Central Statistics Agency.The variables that will be used in this research are the response variable ( y ) and the predictor variable ( x ) which can be seen in the table as follows:

RESEARCH RESULT Least Squares Method parameter estimation
The aim of the least squares method (MKT) is to minimize the residual sum of squares.With the parameter estimation results written in the table as follows: 7,469× 10 −2 In Table 2, the initial regression model using the least squares method (MKT) is obtained as follows: With adjusted R square is 0.6349, which means that the variables of population (  1 ), gross regional domestic product (  3 ), and average years of schooling )  5 )have an effect on the poverty level on the island of Sulawesi by 63.5% while the remaining 36.5% is explained by other variables.

Classic assumption test Normality test
One of the tests used to test the normality assumption is the Kolmogorov-Smirnov test which can be seen in the Table 3.

Kolmogorov-Smirnov p-value
0.10195 0.03675 Table 3 above shows that the p-value < α, so it can be concluded that reject  0 means the residual is not normally distributed.

Multicollinearity Test
To test multicollinearity, it can be done by looking at the VIF value seen in the table as follows: From Table 4, it can be seen that the VIF value is < 10, so it can be concluded that accept  0 means there is no multicollinearity problem.

Autocorrelation Test
To test for autocorrelation problems, the Durbin-Watson test can be done as seen in the following table :

Breusch-Pagan p-value
16,321 0.01213 Table 6 above shows that the p-value < α, so it can be concluded that rejecting  0 means there are symptoms of heteroscedasticity.

Identify Outliers
To see whether there are outliers in the data, you can plot the data against the i-th observation , as seen in the following picture: In Figure 1, it can be seen that there are several points that are far from the data set pattern, meaning this indicates the presence of outliers.. Residual plot above cannot provide information on which data are outliers, therefore to identify outliers needs to be done by testing DFFITS.The following are the results of the DFFITS test values as seen in the table as follows:

S-Estimation Robust Regression with Tukey Bisquare Weighting
The iterative S-estimation calculation process begins by determining the initial estimate of the regression coefficient, then based on the S-estimation algorithm, residual and residual values are calculated.This iteration process uses Tukey bisquare weighting which is carried out repeatedly until a convergent value is obtained.The calculation results for each S-estimation iteration are as follows:

Parameter Testing F test
The F test aims to determine the influence of the relationship between the independent variable and the dependent variable as a whole.It can be seen in the table as follows: In Table 9 above, the values are obtained  ℎ >   or P-value < 0.05, so it can be concluded that reject  0 means that all independent variables have a significant effect simultaneously on the poverty level on the island of Sulawesi.

T test
The T test aims to determine the effect of each independent variable on the dependent variable.It can be seen in the table as follows: Based on Table 10, it can be seen that  ℎ >   the P-value is < 0.05 .So it can be concluded that  0 it means that there is a partial influence of the independent variable on the level of poverty on the island of Sulawesi.

CONCLUSION
Based on the results and discussions that have been carried out previously, the best model obtained from robust S-estimation regression with Tukey bisquare weighting is:  = −0,21023 + 0,46522  1 + 0,16551  4 − 0,33444  5 + 0,15864  6 Factors that influence the level of poverty on the island of Sulawesi using the S-estimation robust regression analysis method with Tukey bisquare weighting include Population Number ( 1 ), Human Development Index (  4 ), Average Years of Schooling (  5 ) and, Labor Force Participation Rate ( 6 ).
of squared errors K : Number of independent variables n : Number of samples   : Data of the i-th variable X The test criteria for  0 rejection are values || > 1 for small data clusters ( ≤ 30),and large data clusters ( > 30)use values|| > 2 √   , Where  =  + 1.

Figure
Figure 1.Residual Plot

Table 1 .
Research data Repeat steps b to e until a convergent β value is obtained.Which is a difference  +1 that is   close to or equal to zero.7. Carry out simultaneous tests and partial tests to find out whether the independent variable has an effect on the dependent variable.8. Determining the goodness of the model (Adjusted R square).

Table 3 .
Normality Test Results

Table 4 .
Normality Test Results

Table 5 .
Autocorrelation Test ResultsIn Table5the values are obtained   <   so it can be concluded that the rejection  0 means there are symptoms of autocorrelation.

Table 6 .
Heteroscedasticity Test Results

Table 7
shows that, of the 81 data, there are several that have values |DFFITS|> 0.587.So it can be concluded that the observation data has outliers.

Table 9 .
Results of F Test Statistical Values

Table 10 .
Results of T Test Statistical Values