Comparison of Methods for Handling Imbalanced Data in Customer Churn Prediction with Feature Selection Using SHAP and mRMR Frameworks

Luong Thanh Tam; Luong Gia Vi; Nguyen Manh Tuan

doi:10.2478/cait-2025-0023

.blurhash-client-img { display: none !important; }

Comparison of Methods for Handling Imbalanced Data in Customer Churn Prediction with Feature Selection Using SHAP and mRMR Frameworks

Cybernetics and Information Technologies

Volume 25 (2025): Issue 3 (September 2025)

By: Luong Thanh Tam, Luong Gia Vi and Nguyen Manh Tuan

Open Access

|Sep 2025

Abstract

This study compared methods for handling imbalanced data in predicting customer churn in banking and e-commerce, using datasets with features selected via SHAP and MRMR. Two approaches were evaluated: data-level (Oversampling, Undersampling, and Hybrid resampling) and algorithm-level. Oversampling excelled on small to medium datasets, while Undersampling improved Recall but reduced Precision, lowering overall performance. Ensemble models outperformed single models, with tree-based Decision Trees showing better learning on imbalanced data among single models. The study recommends ensemble models for churn prediction, proposing the SHAP framework to enhance their interpretability through global and local explanations. Two models, ROS-CatBoost and CW-XGBoost, achieved exceptional results, with metrics like Accuracy, Precision, Recall, F1-score, ROC AUC, and PR AUC all above 0.9, indicating strong predictive accuracy for both churn and retention. These findings highlight the effectiveness of ensemble models and interpretability tools in addressing imbalanced data challenges.

References

Y, N. N., T. V. Ly, D. V. T. Son. Churn Prediction in Telecommunication Industry Using Kernel Support Vector Machines. – PLOS ONE, Vol. 17, 2022, No 5, e0267935.
Search in Google Scholar Back to article
Burez, J., D. Van den Poel. Handling Class Imbalance in Customer Churn Prediction. – Expert Systems with Applications, Vol. 36, 2009, No 3, Part 1, pp. 4626-4636.
Search in Google Scholar Back to article
Zhu, B., B. Baesens, S. K. L. M. van den Broucke. An Empirical Comparison of Techniques for the Class Imbalance Problem in Churn Prediction. – Information Sciences, Vol. 408, 2017, pp. 84-99.
Search in Google Scholar Back to article
Ahmad, A. K., A. Jafar, K. Aljoumaa. Customer Churn Prediction in Telecom Using Machine Learning in Big Data Platform. – Journal of Big Data, Vol. 6, 2019, No 1, 28.
Search in Google Scholar Back to article
P. Bhuse, A. Gandhi, P. Meswani, R. Muni, N. Katre, Eds. Machine Learning Based Telecom-Customer Churn Prediction. – In: Proc. of 3rd International Conference on Intelligent Sustainable Systems (ICISS’20), 3-5 December 2020.
Search in Google Scholar Back to article
Jain, H., A. Khunteta, S. Srivastava. Churn Prediction in Telecommunication Using Logistic Regression and Logit Boost. – Procedia Computer Science, Vol. 167, 2020, pp. 101-112.
Search in Google Scholar Back to article
Lalwani, P., M. K. Mishra, J. S. Chadha, P. Sethi. Customer Churn Prediction System: A Machine Learning Approach. – Computing, Vol. 104, 2022, No 2, pp. 271-294.
Search in Google Scholar Back to article
Pustokhina, I. V., D. A. Pustokhin, P. T. Nguyen, M. Elhoseny, K. Shankar. Multi-Objective Rain Optimization Algorithm with WELM Model for Customer Churn Prediction in Telecommunication Sector. – Complex & Intelligent Systems, Vol. 9, 2023, No 4, pp. 3473-3485.
Search in Google Scholar Back to article
Sudharsan, R., E. Ganesh. A Swish RNN Based Customer Churn Prediction for the Telecom Industry with a Novel Feature Selection Strategy. – Connection Science, Vol. 34, 2022, No 1, pp. 1855-1876.
Search in Google Scholar Back to article
Long, H. V., L. H. Son, M. Khari, K Arora, S. Chopra, R. Kumar et al. A New Approach for Construction of Geodemographic Segmentation Model and Prediction Analysis. – Computational Intelligence and Neuroscience, Vol. 2019, 2019, No 1, 9252837.
Search in Google Scholar Back to article
M. Rahman, V. Kumar, Eds. Machine Learning Based Customer Churn Prediction in Banking. – In: Proc. of 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA’20), 5-7 November 2020.
Search in Google Scholar Back to article
De Lima Lemos, R. A., T. C. Silva, B. M. Tabak. Propension to Customer Churn in a Financial Institution: a Machine Learning Approach. – Neural Computing and Applications, Vol. 34, 2022, No 14, pp. 11751-11768.
Search in Google Scholar Back to article
Peng, K., Y. Peng, W. Li. Research on Customer Churn Prediction and Model Interpretability Analysis. – PLOS ONE, Vol. 18, 2023, No 12, e0289724.
Search in Google Scholar Back to article
Xiahou, X., Y. Harada. B2C E-Commerce Customer Churn Prediction Based on K-Means and SVM. – Journal of Theoretical and Applied Electronic Commerce Research, Vol. 17, 2022, No 2, pp. 458-475.
Search in Google Scholar Back to article
AL-Najjar, D., N. Al-Rousan, H. AL-Najjar. Machine Learning to Develop Credit Card Customer Churn Prediction. – Journal of Theoretical and Applied Electronic Commerce Research, Vol. 17, 2022, No 4, pp. 1529-1542.
Search in Google Scholar Back to article
Amin, A., S. Anwar, A. Adnan, M. Nawaz, N. Howard, J. Qadir et al. Comparing Oversampling Techniques to Handle the Class Imbalance Problem: A Customer Churn Prediction Case Study. – IEEE Access, Vol. 4, 2016, pp. 7940-7957.
Search in Google Scholar Back to article
Bharathi, S. V., D. Pramod, R. Raman. An Ensemble Model for Predicting Retail Banking Churn in the Youth Segment of Customers. – Data, Vol. 7, 2022, No 5, 61.
Search in Google Scholar Back to article
Brito, J. B. G., G. B. Bucco, R. Heldt, J. L. Becker, C. S. Silveira, F. B. Luce et al. A Framework to Improve Churn Prediction Performance in Retail Banking. – Financial Innovation, Vol. 10, 2024, No 1, 17.
Search in Google Scholar Back to article
Xiahou, X., Y. Harada. B2C E-Commerce Customer Churn Prediction Based on K-Means and SVM. – Journal of Theoretical and Applied Electronic Commerce Research [Internet], Vol. 17, 2022, No 2, pp. 458-475.
Search in Google Scholar Back to article
Xu, T., Y. Ma, K. Kim. Telecom Churn Prediction System Based on Ensemble Learning Using Feature Grouping. – Applied Sciences, Vol. 11, 2021, No 11, 4742.
Search in Google Scholar Back to article
Asif, D., M. S. Arif, A. Mukheimer. A Data-Driven Approach with Explainable Artificial Intelligence for Customer Churn Prediction in the Telecommunications Industry. – Results in Engineering, Vol. 26, 2025, 104629.
Search in Google Scholar Back to article
Zhou, Y., W. Chen, X. Sun, D. Yang. Early Warning of Telecom Enterprise Customer Churn Based on Ensemble Learning. – PLOS ONE, Vol. 18, 2023, No 10, e0292466.
Search in Google Scholar Back to article
Ngo, V.-B., V.-H. Vu. Multi-Level Machine Learning Model to Improve the Effectiveness of Predicting Customers Churn Banks. – Cybernetics and Information Technologies, Vol. 24, 2024, No 3, pp. 3-20.
Search in Google Scholar Back to article
Lundberg, S. M., S.-I. Lee. A Unified Approach to Interpreting Model Predictions. – Advances in Neural Information Processing Systems, Vol. 30, 2017.
Search in Google Scholar Back to article
Hanchuan, P., L. Fuhui, C. Ding. Feature Selection Based on Mutual Information Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy. – IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 27, 2005, No 8, pp. 1226-1238.
Search in Google Scholar Back to article
Nitesh, V. C. SMOTE: Synthetic Minority Over‐Sampling Technique. – J. Artif. Intell. Res., Vol. 16, 2002, No 1, 321.
Search in Google Scholar Back to article
H. Haibo, B. Yang, E. A. Garcia, L. Shutao, Eds. ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning. – In: Proc. of IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), 1-8 June 2008.
Search in Google Scholar Back to article
H. Han, W.-Y. Wang, B.-H. Mao, Eds. Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning. – In: Proc. of International Conference on Intelligent Computing, Springer, 2005.
Search in Google Scholar Back to article
Batista, G. E., R. C. Prati, M. C. Monard. A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data. – ACM SIGKDD Explorations Newsletter, Vol. 6, 2004, No 1, pp. 20-29.
Search in Google Scholar Back to article
I. Mani, I. Zhang, Eds. kNN Approach to Unbalanced Data Distributions: A Case Study Involving Information Extraction. – In: Proc. of Workshop on Learning from Imbalanced Datasets, 2003, ICML United States.
Search in Google Scholar Back to article
Two Modifications of CNN. – IEEE Transactions on Systems, Man, and Cybernetics. SMC-Vol. 6, 1976, No 11, pp. 769-772.
Search in Google Scholar Back to article
Wilson, D. L. Asymptotic Properties of Nearest Neighbor Rules Using Edited Data. – IEEE Transactions on Systems, Man, and Cybernetics, SMC- Vol. 2, 1972, No 3, pp. 408-421.
Search in Google Scholar Back to article
Fernández, A., S. García, M. Galar, R. C. Prati, B. Krawczyk, F. Herrera. Cost-Sensitive Learning. – In: A. Fernández, S. García, M. Galar, R. C. Prati, B. Krawczyk, F. Herrera, Eds. Learning from Imbalanced Data Sets. Cham, Springer International Publishing, 2018, pp. 63-78.
Search in Google Scholar Back to article
Al-Najjar, D., N. Al-Rousan, H. Al-Najjar. Machine Learning to Develop Credit Card Customer Churn Prediction. – Journal of Theoretical and Applied Electronic Commerce Research [Internet], Vol. 17, 2022, No 4, pp. 1529-1542.
Search in Google Scholar Back to article
Wu, Z., L. Jing, B. Wu, L. Jin. A PCA-AdaBoost Model for E-Commerce Customer Churn Prediction. – Annals of Operations Research, 2022.
Search in Google Scholar Back to article
John Lu, Z. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Oxford University Press, 2010.
Search in Google Scholar Back to article
Breiman, L. Bagging Predictors. – Machine Learning, Vol. 24, 1996, pp. 123-140.
Search in Google Scholar Back to article

Articles in this issue

DOI: https://doi.org/10.2478/cait-2025-0023 | Journal eISSN: 1314-4081 | Journal ISSN: 1311-9702

Journal RSS Feed

Language: English

Page range: 68 - 87

Published on: Sep 25, 2025

Published by: Bulgarian Academy of Sciences, Institute of Information and Communication Technologies

In partnership with: Paradigm Publishing Services

Keywords:

Related subjects:

Information technology

© 2025 Luong Thanh Tam, Luong Gia Vi, Nguyen Manh Tuan, published by Bulgarian Academy of Sciences, Institute of Information and Communication Technologies
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Volume 25 (2025): Issue 3 (September 2025)