Main Article Content
Abstract
Breast cancer is the most prevalent cancer among women in Indonesia and remains a major public health concern, making the identification of key risk factors essential for early detection. This study applies four machine learning classification algorithms—XGBoost, Random Forest, CatBoost, and LightGBM—to classify breast cancer risk factors using a breast cancer dataset consisting of 400 samples. Data preprocessing was performed prior to analysis, and the dataset was divided into 75% training and 25% testing data using 10-fold cross-validation. Model performance was evaluated using accuracy, precision, recall, F1-score, and area under the curve (AUC). The results show that CatBoost outperforms the other models, achieving the highest AUC value of 0.72. Feature importance analysis indicates that a high-fat diet, menopause status, and working status are the most influential risk factors, while breastfeeding shows a protective effect. These findings demonstrate that CatBoost provides strong predictive performance and effectively identifies key factors associated with breast cancer risk in Indonesia.
Keywords
Article Details

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
References
- Bekkar, M., Djemaa, K. D., & Alitouche, A. D. (2013). Evaluation Measures for Models Assessment over Imbalanced Data Sets. Journal of Information Engineering and Applications.
- Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System.
- Chen, T., Xu, J., Ying, H., Chen, X., Feng, R., Fang , X., Wu, A. J. (2019). Prediction of Extubation Failure for Intensive Care Unit Patienst.
- Cutler, A., Cutler, D. R., & Stevens, J. R. (2014). Random Forests.
- Daoud, E. A. (2019). Comparison between XGBoost, LightGBM and CatBoost Using a Home Credit Dataset. 13(1).
- DQLAB. (2022, September 21). Studi Kasus Random Forest Machine Learning untuk Pemula Data. Retrieved Desember 20, 2022, from dqlab.id: https://dqlab.id/studi-kasus-random-forest-machine-learning-untuk-pemula-data
- Gokgoz, E., & Subasi, A. (2015). Comparison of Decision Tree Algorithms for EMG Signal Classification using DWT. 18(138-144).
- Han, J., Kamber , M., & Pei, J. (2012). Data Mining: Concepts and Techniques. USA: Elsevier Inc.
- Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., . . . Liu, T.-Y. (2017). LightGBM: A Highly Efficient Gradient Boosting Decision Tree.
- Kurniasari, S.Gz., MPH, F. N., Harti, S.Gz, MsiMed, L. B., Ariestiningsih, S.Gz., M.P, A. D., Wardhani, SpPD, d. O., & Nugroho, SpA (K), d. (2017). Buku Ajar Gizi dan Kanker. Malang: UB Press.
- Maria, I. L., Sainal, A. A., & Nyorong, M. (2017). RISIKO GAYA HIDUP TERHADAP KEJADIAN KANKER PAYUDARA PADA WANITA. 13(2).
- Mohite, V. R., Pratinidhi, A. K., & Mohite, R. V. (2014). Dietary factors and breast cancer: A case control study from rural India. 6(1).
- Pedregosa, F., Varoquaux, Gramfort, A., Michel, V., & Thirion, B. (2011). Scikit-Learn: Machine Learning in Python. 12.
- Putra, S. R. (2015). Buku Lengkap Kanker Payudara. Yogyakarta: Laksana.
- Ray, S. (2020, Juni 07). Analytics Vidhya. Retrieved Desember 20, 2022, from CatBoost: A machine learning library to handle categorical (CAT) data automatically: https://www.analyticsvidhya.com/blog/2017/08/catboost-automated-categorical-data/
- Ray, S. (2020`, Juni 7). CatBoost: A machine learning library to handle categorical (CAT) data automatically. Retrieved Desember 19, 2022, from Analytics Vidhya: https://www.analyticsvidhya.com/blog/2017/08/catboost-automated-categorical-data/
- Rokom. (2022, Februari 09). Kanker Payudara Paling Banyak di Indonesia, Kemenkes Targetkan Pemerataan Layanan Kesehatan. Retrieved Desember 18, 2022, from Sehat Negeriku, Kementerian Kesehatan: https://sehatnegeriku.kemkes.go.id/baca/umum/20220202/1639254/kanker-payudaya-paling-banyak-di-indonesia-kemenkes-targetkan-pemerataan-layanan-kesehatan/
- Ruiz, R. B., & Hernandez, P. S. (2014). Diet and Cancer : Risk Factors and epidemiological evidence. (Maturitas 77).
- Saha, S. (2022, November 14). XGBoost vs LightGBM: How Are They Different. Retrieved Desember 20, 2022, from Neptune.ai.
- Tim Edukasi Medis Kanker Payudara. (2017). Cerdas menghadapi Kanker Payudara. Depok: Sinergi Publishing.
- Yulianti, S. E., Soesanto, O., & Sukmawaty, Y. (2022). Penerapan Metode Extreme Gradient Boosting (XGBOOST) pada Klasifikasi Nasabah Kartu Kredit. 4(1).
- Zhang, Y., & Haghani, A. (2015). A gradient boosting method to improve travel. 58.