APPLICATION OF TIME SERIES CLUSTER ANALYSIS IN CLUSTERING THE CENTRAL JAVA PROVINCE BASED ON THE POVERTY DEPTH INDEX

Poverty is a problem that continues to be faced, especially in developing countries such as Indonesia. Poverty is included in one of the Sustainable Development Goals (SDGs) programs, which is related to hunger and health. The time series data can be clustered based on the characteristics of the time series data and adjusted to the time series pattern. The choice of distance and method used must be adjusted to the dynamic structure of time series data. The purpose of this research is to cluster districts/cities in Central Java Province based on the poverty depth index value from 2017 to 2022. The variable that used in this research is the Poverty Depth Index of 35 districts in Central Java Province from 2017 to 2022. This research used cluster time series with DTW similarity measurment. Based on theDTW and cophenetic coefficient correlation value using three linkage methods, the average linkage method has the highest cophenetic coefficient correlation value of 0.8017988. Testing the quality of clusters using the silhouette coefficient using DTW distance and average linkage method and 2 clusters are included in the good cluster category with a silhouette coefficient value of 0.60. The resulting clusters using the DTW distance and average linkage method are cluster 1 consisting of 25 districts / cities and cluster 2 consisting of 10 districts.


INTRODUCTION
Poverty is an issue that continues to be faced, especially by developing countries like Indonesia. Poverty is a complex problem with multiple underlying factors. It refers to the condition in which an individual is unable to meet their basic needs (Nafi'ah, 2021). Poverty is included as one of the Sustainable Development Goals (SDGs), specifically related to hunger and health. This indicates that poverty impacts every aspect of life and needs to be addressed seriously. Some government programs that have been implemented to reduce poverty include the Family Hope Program (PKH), the Smart Indonesia Card (KIP) provided for school-age children, and the Healthy Indonesia Card (KIS) (Lestari dan Busnetty, 2022).
Badan Pusat Statistik (2023) recorded that there is at least a 14.38% poverty rate in rural areas and an 11.98% poverty rate in urban areas. Meanwhile, in Java Island, Central Java Province ranks second with the highest percentage of poverty at 10.98% (Badan Perencanaan Pembangunan Nasional, 2023). When viewed based on another poverty indicator, namely the poverty depth index, Central Java Province also ranks second after Yogyakarta Special Region (DIY), with a rate of 1.77% in the first semester of 2022 and 1.75% in the second semester of 2022 (Badan Pusat Statistik, 2022). Different demographic characteristics will greatly influence the poverty rate in each region. These distinct demographic characteristics can be identified using a statistical analysis tool called cluster analysis. This analysis aims to assist government policies or intervention programs to be more focused and targeted.
Cluster analysis is a multivariate analysis tool used to group n objects into k clusters (k ≤ n) based on their characteristics. Clustering is performed based on the similarity or dissimilarity between objects. Objects within the same cluster are more similar to each other compared to objects in different clusters. (Suhaeni et al., 2018). Cluster analysis can be applied to time series data. The time series data can be clustered based on the characteristics of the time series and adjusted to the temporal patterns. The choice of distance metric and clustering method should be tailored to the dynamic structure of the time series data. (Munthe, 2019). Several previous studies have utilized time series cluster analysis, such as the research conducted by Buaton et al. (2019), Dani et al. (2020), dan Soleha et al. (2022). The previous research did not discussed about clustering of Districtin Central Java based on the depth of proverty. Research by (Soleha et al., 2022) discuss about clustering Province in Indonesia based on non oil and gas export valeu, research by (Buaton et al., 2019) discussed about clutering time series with manhattan distance. Therefore, the objective of this research is to cluster the districts and cities in Central Java Province based on the n values of the poverty depth index from 2017 to 2022.

MATERIALS AND METHODS
The data used in this research is secondary data from the official website of the Badan Pusat Statistika of Central Java Province. The variable that used in this research is the Poverty Depth Index of 35 districts in Central Java Province from 2017 to 2022. This research used cluster time series method for answering research aims. Cluster time series analysis is used to cluster objects with dynamic data.

Cluster Analysis
Cluster analysis is an analytical tool used to group objects based on their similarities. Objects within a cluster exhibit a high level of similarity, while the similarity between clusters is low. (Yusfar et al., 2020). The advantages of cluster analysis are that it can handle a large and relatively large amount of data, and it can be applied to data with minimal ordinal measurement scale. However, the disadvantages of cluster analysis are its subjective nature as researchers base the analysis results on dendrograms, the difficulty in determining the number of clusters formed for heterogeneous data, significant variations between different methods used, and an increasing error rate with a larger number of observations. (Anggraini and Arum, 2021)

Cluster Time Series Analysis
Cluster time series analysis is used to cluster objects with dynamic data. In this time series cluster analysis, the algorithms used for static data are modified to be applicable to time series data. According to Yanti dan Rahardiantoro (2018) There are three categories of time series cluster analysis: (1) raw data-based approach, which involves calculating distances between clusters, (2) approach that eliminates outlier data and reduces data dimensions, followed by distance calculation and clustering, and (3) approach that utilizes pre-formed models for clustering. This research utilizes the first approach.

Similarity Measurement
The similarity measurement utilizes Dynamic Time Warping (DTW) distance. DTW distance is an algorithm used to compare two data series and calculate the optimal alignment between them. DTW distance is a generalization of classic algorithms that compare continuous value sequences with discrete value sequences (Munthe, 2019). This research used three similarity measurement method which are average linkage, complete linkage, dan centroid linkage. The DTW formula followed by.

Cluster Validity
The accuracy and quality of the formed clusters are determined using the cophenetic correlation coefficient and the Silhouette coefficient. The cophenetic correlation coefficient measures the correlation between the Euclidean distance matrix and the cophenetic matrix (based on the distance metric and linkage method used). The value of the cophenetic correlation coefficient ranges from -1 to 1. The limitations in determining the formed clusters are that k ≤ n, with k = 1 and k = n being excluded. The criteria for the adequacy and quality of clustering based on the Silhouette Coefficient can be expressed as follows (Kaufman dan Rousseeuw, 1990).

RESULTS AND DISCUSSION Descriptive Analysis
Before conducting time series cluster analysis, the initial step in data analysis is to describe the poverty depth index in 35 regencies/cities in Central Java from 2017 to 2022. The following is a descriptive analysis of the poverty depth index in Central Java Province from 2017 to 2022. Based on Figure 1 above, we can observe the distribution of the trend of the poverty depth index from 2017 to 2018. The poverty depth index is one of the indicators used to assess the level of poverty in a region. The poverty depth index measures the average expenditure inequality of each individual relative to the poverty line. (BPS, 2023). The poverty depth index in Central Java Province in Figure 1 exhibits a similar trend over time. However, there is a slight difference in the trend in 2022 compared to previous years.

Clustering Validity
After conducting time series cluster analysis based on DTW and creating a dendrogram, the next step is to calculate the clustering validity to determine the most optimal similarity measurement based on Cophenetic Correlation. Based on Table 2, it can be observed that each linkage method has different Cophenetic Correlation coefficient values with DTW distance. The cophenetic correlation coefficient ranges from -1 to 1, where a higher value indicates a better measure of similarity in the clustering process. Based on the cophenetic correlation coefficient values of the three linkage methods mentioned above, it can be concluded that average linkage has the highest coefficient value, which is 0.8017988. The next step is to test the quality of the clustering in the study using the silhouette coefficient. This quality testing is conducted to determine whether the number of clusters, which is 2 in this case, is representative or not. Based on the silhouette coefficient values in Figure 5 above, it can be observed that the 2-cluster solution has an average silhouette coefficient of 0.6. The clusters formed in each dendrogram with 3 similarity measurements form the same clusters, namely the cluster with high poverty depth index and the cluster with low poverty index. The cluster with high poverty depth index had 10 District/Cities and the cluster with low poverty index had 25 District/Cities. This average silhouette coefficient value indicates that the clustering using average linkage with DTW distance measurement falls into the "good cluster" category.

CONCLUSION
The time series clustering analysis in this study aims to cluster the Districts/Cities in Central Java Province based on the poverty depth index values from 2017 to 2022. Based on the correlation coefficient values using three linkage methods, the average linkage method has the highest cophenetic correlation coefficient of 0.8017988. The quality testing of the clusters using the silhouette coefficient with DTW distance and the average linkage method shows that the 2-cluster solution falls into the "good cluster" category with a silhouette coefficient value of 0.60. The resulting clusters using DTW distance and average linkage method consist of Cluster 1 with 25 Districts/Cities and Cluster 2 with 10 Districts/Cities.