Have a personal or library account? Click to login
Robust Hybrid Data-Level Approach for Handling Skewed Fat-Tailed Distributed Datasets and Diverse Features in Financial Credit Risk Cover

Robust Hybrid Data-Level Approach for Handling Skewed Fat-Tailed Distributed Datasets and Diverse Features in Financial Credit Risk

Open Access
|Jun 2025

References

  1. Ahmad M., Aftab S., Muhammad S. S., Ahmad S., Machine learning techniques for sentiment analysis: A review, International Journal of Multidisciplinary Sciences and Engineering, 2017, 8(3), 27.
  2. Akosa J., Predictive accuracy: A misleading performance measure for highly imbalanced data, in Proceedings of the SAS Global Forum, 2017, 12, 1–4. SAS Institute Inc. Cary, NC, USA.
  3. Arafa A., El-Fishawy N., Badawy M., Radad M., RN-SMOTE: Reduced Noise SMOTE based on DBSCAN for enhancing imbalanced data classification, Journal of King Saud University-Computer and Information Sciences, 2022, 34(8), 5059–5074.
  4. Araf I., Idri A., Chairi I., Cost-sensitive learning for imbalanced medical data: a review, Artificial Intelligence Review, 2024, 57(4), 1–72.
  5. Asselman A., Khaldi M., Aammou S., Enhancing the prediction of student performance based on the machine learning XGBoost algorithm, Interactive Learning Environments, 2023, 31(6), 3360–3379.
  6. Awaji B., Senan E.M., Olayah F., Alshari E.A., Alsulami M., Abosaq H.A., Alqahtani J., Janrao P., Hybrid techniques of facial feature image analysis for early detection of autism spectrum disorder based on combined CNN features, Diagnostics, 2023, 13(18), 2948.
  7. Azhar N.A., Pozi M.S.M., Din A.M., Jatowt A., An investigation of SMOTE based methods for imbalanced datasets with data complexity analysis, IEEE Transactions on Knowledge and Data Engineering, 2022.
  8. Bader-El-Den M., Teitei E., Perry T., Biased random forest for dealing with the class imbalance problem, IEEE Transactions on Neural Networks and Learning Systems, 2018, vol. 30, no. 7, pp. 2163–2172, IEEE.
  9. Bansal A., Jain A., Analysis of focussed under-sampling techniques with machine learning classifiers, 2021 IEEE/ACIS 19th International Conference on Software Engineering Research, Management and Applications (SERA), 2021, 91–96.
  10. Batista G.E.A.P.A., Prati R.C., Monard M.C., A study of the behavior of several methods for balancing machine learning training data, ACM SIGKDD Explorations Newsletter, 2004, 6(1), 20–29.
  11. Belgiu M., Drăguţ L., Random forest in remote sensing: A review of applications and future directions, ISPRS Journal of Photogrammetry and Remote Sensing, 2016, 114, 24–31.
  12. Bi J., Zhang C., An empirical comparison on state-of-the-art multi-class imbalance learning algorithms and a new diversified ensemble learning scheme, Knowledge-Based Systems, 2018, vol. 158, pp. 81–93, Elsevier.
  13. Brown I., Mues C., An experimental comparison of classification algorithms for imbalanced credit scoring data sets, Expert Systems with Applications, 2012, 39(3), 3446–3453.
  14. Breiman L., Random forests, Machine Learning, 2001, 45, 5–32.
  15. Budjač R., Nikmon M., Schreiber P., Zahradníková B., Janáčová D., Automated machine learning overview, Research Papers Faculty of Materials Science and Technology Slovak University of Technology, 2019, 27(45), 107–112.
  16. Carmona P., Climent F., and Momparler A., Predicting failure in the US banking sector: An extreme gradient boosting approach, International Review of Economics & Finance, 2019, vol. 61, pp. 304–323, Elsevier.
  17. Caouette J.B., Altman E.I., Narayanan P., Nimmo R., Managing credit risk: The great challenge for global financial markets, John Wiley & Sons, 2011.
  18. Carrington A.M., Fieguth P.W., Qazi H., Holzinger A., Chen H.H., Mayr F., Manuel D.G., A new concordant partial AUC and partial c statistic for imbalanced data in the evaluation of machine learning algorithms, BMC Medical Informatics and Decision Making, 2020, 20, 1–12.
  19. Chandra W., Suprihatin B., Resti Y., Median-KNN Regressor-SMOTE-Tomek links for handling missing and imbalanced data in air quality prediction, Symmetry, 2023, 15(4), 887.
  20. Chawla N.V., Bowyer K.W., Hall L.O., Kegelmeyer W.P., SMOTE: Synthetic minority over-sampling technique, Journal of Artificial Intelligence Research, 2002, 16, 321–357.
  21. Chen T., Guestrin C., XGBoost: A scalable tree boosting system, Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, 785–794.
  22. Chen N., Ribeiro B., Chen A., Financial credit risk assessment: a recent review, Artificial Intelligence Review, 2016, 45, 1–23, Springer.
  23. Chen W., Liu C., Yu W., Wen J., Zhou L., Asymmetric Loss-Oriented Cost Sensitive Learning Model for Credit Risk Assessment, (Journal Name Pending), 2024.
  24. Cheng K., Zhang C., Yu H., Yang X., Zou H., Gao S., Grouped SMOTE with noise filtering mechanism for classifying imbalanced data, IEEE Access, 2019, 7, 170668–170681.
  25. Cheng L.-C., Wu Y.-T., Chao C.-T., Wang J.-H., Detecting fake reviewers from the social context with a graph neural network method, Decision Support Systems, 2024, vol. 179, p. 114150, Elsevier.
  26. Couronné R., Probst P., Boulesteix A.-L., Random forest versus logistic regression: A large-scale benchmark experiment, BMC Bioinformatics, 2018, 19, 1–14.
  27. Davis J., Goadrich M., The relationship between Precision-Recall and ROC curves, in Proceedings of the 23rd International Conference on Machine Learning, 2006, 233–240.
  28. Demšar J., Statistical comparisons of classifiers over multiple data sets, The Journal of Machine Learning Research, 2006, 7, 1–30.
  29. De Corso E., Pasquini E., Trimarchi M., La Mantia I., Pagella F., Ottaviano G., Garzaro M., Pipolo C., Torretta S., Seccia V., et al., Dupilumab in the treatment of severe uncontrolled chronic rhinosinusitis with nasal polyps (CRSwNP): a multicentric observational phase IV real-life study (DUPIREAL), Allergy, 2023, 78(10), 2669–2683.
  30. Ding H., Sun Y., Wang Z., Huang N., Shen Z., Cui X., RGAN-EL: A GAN and ensemble learning-based hybrid approach for imbalanced data classification, Information Processing & Management, 2023, 60(2), 103235.
  31. Dong J., Chen Y., Yao B., Zhang X., Zeng N., A neural network boosting regression model based on XGBoost, Applied Soft Computing, 2022, 125, 109067.
  32. Douzas G., Bacao F., Geometric SMOTE: a geometrically enhanced drop-in replacement for SMOTE, Information Sciences, 2019, 501, 118–135.
  33. Elreedy D., Atiya A.F., A comprehensive analysis of synthetic minority oversampling technique (SMOTE) for handling class imbalance, Information Sciences, 2019, 505, 32–64.
  34. Ferreira A.J., Figueiredo M.A.T., Boosting algorithms: A review of methods, theory, and applications, Ensemble Machine Learning: Methods and Applications, 2012, 35–85.
  35. Fernández A., García S., Herrera F., Chawla N.V., SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary, Journal of Artificial Intelligence Research, 2018, 61, 863–905.
  36. Fränti P., Mariescu-Istodor R., Soft precision and recall, Pattern Recognition Letters, 2023, 167, 115–121.
  37. Friedman M., A comparison of alternative tests of significance for the problem of m rankings, The Annals of Mathematical Statistics, 1940, 11(1), 86–92.
  38. Fonseca J., Bacao F., Geometric SMOTE for imbalanced datasets with nominal and continuous features, Expert Systems with Applications, 2023, 234, 121053.
  39. Fu G.-H., Yi L.-Z., Pan J., Tuning model parameters in class-imbalanced learning with precision-recall curve, Biometrical Journal, 2019, 61(3), 652–664.
  40. Galar M., Fernández A., Barrenechea E., Bustince H., Herrera F., A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 2011, 42(4), 463–484.
  41. García S., Fernández A., Luengo J., Herrera F., Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power, Information Sciences, 2010, 180(10), 2044–2064.
  42. Ganganwar V., An overview of classification algorithms for imbalanced datasets, International Journal of Emerging Technology and Advanced Engineering, 2012, 2(4), 42–47.
  43. Gnip P., Vokorokos L., Drotár P., Selective oversampling approach for strongly imbalanced data, PeerJ Computer Science, 2021, 7, e604.
  44. Gu J., Random Forest Based Imbalanced Data Cleaning and Classification, 2007, Citeseer.
  45. Gu Q., Cai Z., Zhu L., Huang B., Data mining on imbalanced data sets, in 2008 International Conference on Advanced Computer Theory and Engineering, 2008, 1020–1024, IEEE.
  46. Gür E., Duyan Y.A., Balcı F., Mice make temporal inferences about novel locations based on previously learned spatiotemporal contingencies, Animal Cognition, 26(3), 771–779, 2023.
  47. Guo K., Wan X., Liu L., Gao Z., Yang M., Fault diagnosis of intelligent production line based on digital twin and improved random forest, Applied Sciences, 2021, 11(16), 7733.
  48. Han S., Williamson B.D., Fong Y., Improving random forest predictions in small datasets from two-phase sampling designs, BMC Medical Informatics and Decision Making, 2021, 21, 1–9, Springer.
  49. Han H., Wang W.-Y., Mao B.-H., Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning, International Conference on Intelligent Computing, 2005, 878–887, Springer.
  50. Hand D. J., Till R. J., A simple generalisation of the area under the ROC curve for multiple class classification problems, Machine Learning, 2001, 45, 171–186.
  51. Hancock J.T., Khoshgoftaar T.M., Johnson J.M., Evaluating classifier performance with highly imbalanced big data, Journal of Big Data, 2023, 10(1), 42.
  52. Hanley J.A., McNeil B.J., The meaning and use of the area under a receiver operating characteristic (ROC) curve, Radiology, 1982, 143(1), 29–36.
  53. He H., Garcia E.A., Learning from imbalanced data, IEEE Transactions on Knowledge and Data Engineering, 2009, 21(9), 1263–1284, IEEE.
  54. He H., Ma Y., Imbalanced learning: foundations, algorithms, and applications, 2013, John Wiley & Sons.
  55. Hossin M., Sulaiman M.N., A review on evaluation metrics for data classification evaluations, International Journal of Data Mining & Knowledge Management Process, 2015, 5(2), 1.
  56. Islam A., Belhaouari S.B., Rehman A.U., Bensmail H., KNNOR: An oversampling technique for imbalanced datasets, Applied Soft Computing, 2022, 115, 108288.
  57. Ishwaran H., Lu M., Standard errors and confidence intervals for variable importance in random forest regression, classification, and survival, Statistics in Medicine, 2019, 38(4), 558–582.
  58. Jiang C., Lu W., Wang Z., Ding Y., Benchmarking state-of-the-art imbalanced data learning approaches for credit scoring, Expert Systems with Applications, 2023, 213, 118878, Elsevier.
  59. Johnson J.M., Khoshgoftaar T.M., Survey on deep learning with class imbalance, Journal of Big Data, 2019, 6(1), 1–54.
  60. Kaur H., Pannu H.S., Malhi A.K., A systematic review on imbalanced data challenges in machine learning: Applications and solutions, ACM Computing Surveys (CSUR), 2019, 52(4), 1–36.
  61. Ke G., Meng Q., Finley T., Wang T., Chen W., Ma W., Ye Q., Liu T.-Y., LightGBM: A highly efficient gradient boosting decision tree, Advances in Neural Information Processing Systems, 2017, 30.
  62. Khalilia M., Chakraborty S., Popescu M., Predicting disease risks from highly imbalanced data using random forest, BMC Medical Informatics and Decision Making, 2011, 11, 1–13.
  63. Khoshgoftaar T.M., Golawala M., and Van H.Ja, An empirical study of learning from imbalanced data using random forest, 19th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2007), 2007, vol. 2, pp. 310–317, IEEE.
  64. Kim T., Lee J.-S., Maximizing AUC to learn weighted naive Bayes for imbalanced data classification, Expert Systems with Applications, 2023, 217, 119564.
  65. Kotsiantis S., Kanellopoulos D., Pintelas P., Handling imbalanced datasets: A review, GESTS International Transactions on Computer Science and Engineering, 2006, 30(1), 25–36.
  66. Kraiem M.S., Sánchez-Hernández F., Moreno-García M.N., Selecting the suitable resampling strategy for imbalanced data classification regarding dataset properties. An approach based on association models, Applied Sciences, 2021, 11(18), 8546.
  67. Kumar V., Lalotra G.S., Sasikala P., Rajput D.S., Kaluri R., Lakshmanna K., Shorfuzzaman M., Alsufyani A., Uddin M., Addressing binary classification over class imbalanced clinical datasets using computationally intelligent techniques, Healthcare, 2022, 10(7), 1293.
  68. Kyiu A., Tawiah V., IFRS 9 implementation and bank risk, in Accounting Forum, 2023, 1–25, Taylor & Francis.
  69. Lai S.B.S., Shahri N.H.N.B.M., Mohamad M.B., Rahman H.A.B., Rambli A.B., Comparing the performance of AdaBoost, XGBoost, and logistic regression for imbalanced data, Mathematics and Statistics, 2021, 9(3), 379–385.
  70. Laker J., The Evolution of Risk and Risk Management–A Prudential Regulator’s Perspective, Conference–2007, 2007, Reserve Bank of Australia.
  71. Le T., Vo M.T., Vo B., Lee M.Y., Baik S.W., A hybrid approach using oversampling technique and cost-sensitive learning for bankruptcy prediction, Complexity, 2019, 2019, 8460934.
  72. Li X., Liu Q., A hybrid sampling algorithm for imbalanced and class-overlap data based on natural neighbors and density estimation, Knowledge and Information Systems, 2024, 1–32.
  73. Liang X.W., Jiang A.P., Li T., Xue Y.Y., Wang G.T., LR-SMOTE—An improved unbalanced data set oversampling based on K-means and SVM, Knowledge-Based Systems, 2020, 196, 105845.
  74. Limanto S., Buliali J.L., Saikhu A., GLoW SMOTE-D: Oversampling Technique to Improve Prediction Model Performance of Students Failure in Courses, IEEE Access, 2024.
  75. Ling C. X., Huang J., Zhang H., AUC: a statistically consistent and more discriminating measure than accuracy, Ijcai, 2003, 3, 519–524.
  76. Liu F., Qian Q., Cost-sensitive variational autoencoding classifier for imbalanced data classification, Algorithms, 2022, 15(5), 139.
  77. Liu Y., Yu X., Huang J.X., An Aijun, Combining integrated sampling with SVM ensembles for learning from imbalanced datasets, Information Processing & Management, 2011, 47(4), 617–631.
  78. Liu Y., Wang H., Fei Y., Liu Y., Shen L., Zhuang Z., Zhang X., Research on the prediction of green plum acidity based on improved XGBoost, Sensors, 2021, 21(3), 930.
  79. Lommers K., Harzli O. E., Kim J., Confronting machine learning with financial research, arXiv preprint arXiv:2103.00366, 2021.
  80. Malhotra R., Jain J., Predicting defects in imbalanced data using resampling methods: an empirical investigation, PeerJ Computer Science, 2022, 8, e573.
  81. Ma G., Wang Y., Can the Chinese domestic bond and stock markets facilitate a globalising renminbi, Economic and Political Studies, 2020, vol. 8, no. 3, pp. 291–311, Taylor & Francis.
  82. Mellor A., Boukir S., Haywood A., and Jones S., Exploring issues of training data imbalance and mislabelling on random forest performance for large area land cover classification using the ensemble margin, ISPRS Journal of Photogrammetry and Remote Sensing, vol. 105, 2015, pp. 155–168, Elsevier.
  83. More A.S., Rana D.P., Performance enrichment through parameter tuning of random forest classification for imbalanced data applications, Materials Today: Proceedings, 2022, vol. 56, pp. 3585–3593, Elsevier.
  84. Mukherjee M., Khushi M., SMOTE-ENC: A novel SMOTE-based method to generate synthetic data for nominal and continuous features, Applied System Innovation, 2021, 4(1), 18.
  85. Niu A., Cai B., Cai S., Big Data Analytics for Complex Credit Risk Assessment of Network Lending Based on SMOTE Algorithm, Complexity, 2020, vol. 2020, no. 1, p. 8563030, Wiley Online Library.
  86. Ozdemir B., Evolution of risk management from risk compliance to strategic risk management: From Basel I to Basel II, III and IFRS 9, Journal of Risk Management in Financial Institutions, vol. 11, no. 1, 2018, pp. 76–85, Henry Stewart Publications.
  87. Pes B., Learning from high-dimensional and class-imbalanced datasets using random forests, Information, 2021, 12(8), 286.
  88. Paing M.P., Pintavirooj C., Tungjitkusolmun S., Choomchuay S., Hamamoto K., Comparison of sampling methods for imbalanced data classification in random forest, 2018 11th Biomedical Engineering International Conference (BMEiCON), 2018, 1–5.
  89. Prakash A., Thangaraj J., Roy S., Srivastav S., Mishra J. K., Model-aware XG-Boost method towards optimum performance of flexible distributed Raman amplifier, IEEE Photonics Journal, 2023, 15(4), 1–10.
  90. Qiu W., Credit risk prediction in an imbalanced social lending environment based on XGBoost, 2019 5th International Conference on Big Data and Information Analytics (BigDIA), 2019, pp. 150–156, IEEE.
  91. Qolomany B., Al-Fuqaha A., Gupta A., Benhaddou D., Alwajidi S., Qadir J., Fong A. C., Leveraging machine learning and big data for smart buildings: A comprehensive survey, IEEE Access, 2019, 7, 90316–90356.
  92. Brennan, P. “A comprehensive survey of methods for overcoming the class imbalance problem in fraud detection.” Institute of Technology Blanchardstown, Dublin, Ireland, 2012.
  93. Rao C.R., Karl Pearson chi-square test the dawn of statistical inference, Goodness-of-fit tests and model validity, 2002, 9–24, Springer.
  94. Ratnaningsih T., SMOTE-MRS: A Novel SMOTE-Multiresolution Sampling Technique for Imbalanced Distribution to Improve Prediction of Anemia, (Journal Name Pending), 2024.
  95. Rawat S.S., Mishra A.K., Review of Methods for Handling Class-Imbalanced in Classification Problems, arXiv preprint arXiv:2211.05456, 2022.
  96. Rijsbergen V., Information retrieval; Butterworth, 1978, J. Librariansh., 1979, 11, 237.
  97. Rodriguez-Galiano, V. F., Ghimire, B., Rogan, J., Chica-Olmo, M., & Rigol-Sanchez, J. P., An assessment of the effectiveness of a random forest classifier for land-cover classification, ISPRS Journal of Photogrammetry and Remote Sensing, 2012, 67, 93–104.
  98. Rodríguez-Galiano V.F., Abarca-Hernández F., Ghimire B., Chica-Olmo M., Atkinson P. M., Jeganathan C., Incorporating spatial variability measures in land-cover classification using Random Forest, Procedia Environmental Sciences, vol. 3, 2011, pp. 44–49, Elsevier.
  99. Rout N., Mishra D., Mallick M.K., Handling imbalanced data: a survey, in International Proceedings on Advances in Soft Computing, Intelligent Systems and Applications: ASISA 2016, 2018, 431–443, Springer.
  100. Schapire R.E., Freund Y., Boosting: Foundations and algorithms, Kybernetes, 2013, 42(1), 164–166.
  101. Sarker I. H., Machine learning: Algorithms, real-world applications and research directions, SN Computer Science, 2021, 2(3), 160.
  102. Shi S., Li J., Zhu D., Yang F., Xu Y., A hybrid imbalanced classification model based on data density, Information Sciences, 2023, 624, 50–67.
  103. Singh A., Ranjan R. K., Tiwari A., Credit card fraud detection under extreme imbalanced data: a comparative study of data-level algorithms, Journal of Experimental & Theoretical Artificial Intelligence, 2022, 34(4), 571–598.
  104. Singla P., Domingos P., Discriminative training of Markov logic networks, in AAAI, 2005, 868–873.
  105. Sibindi R., Mwangi R. W., Waititu A. G., A boosting ensemble learning based hybrid light gradient boosting machine and extreme gradient boosting model for predicting house prices, Engineering Reports, 2023, 5(4), e12599.
  106. Soltanzadeh P., Hashemzadeh M., RCSMOTE: Range-Controlled synthetic minority over-sampling technique for handling the class imbalance problem, Information Sciences, 542, 92–111, 2021.
  107. Srivastava J., Sharan A., SMOTEEN Hybrid Sampling Based Improved Phishing Website Detection, Authorea Preprints, 2023.
  108. Sun Z., Song Q., Zhu X., Sun H., Xu B., Zhou Y., A novel ensemble method for classifying imbalanced data, Pattern Recognition, 2015, 48(5), 1623–1637.
  109. Sun Z., Ying W., Zhang W., Gong S., Undersampling method based on minority class density for imbalanced data, Expert Systems with Applications, 2024, 249, 123328.
  110. Szeghalmy S., Fazekas A., A comparative study on noise filtering of imbalanced data sets, Knowledge-Based Systems, 2024, 301, 112236.
  111. Tao X., Guo X., Zheng Y., Zhang X., Chen Z., Self-adaptive oversampling method based on the complexity of minority data in imbalanced datasets classification, Knowledge-Based Systems, 2023, 277, 110795.
  112. Taye M.M., Understanding of machine learning with deep learning: architectures, workflow, applications and future directions, Computers, 2023, 12(5), 91.
  113. Ting K.M., An instance-weighting method to induce cost-sensitive trees, IEEE Transactions on Knowledge and Data Engineering, 2002, vol. 14, no. 3, pp. 659–665, IEEE.
  114. Tomašević N., Mladenić D., Class imbalance and the curse of minority hubs, Knowledge-Based Systems, 2013, 53, 157–172.
  115. Tomek I., Two modifications of CNN, IEEE Transactions on Systems, Man, and Cybernetics, 1976, 6, 769–776.
  116. Uddin Md G., Nash S., Diganta M. T. M., Rahman A., Olbert A. I., Robust machine learning algorithms for predicting coastal water quality index, Journal of Environmental Management, 2022, 321, 115923.
  117. Verikas A., Gelzinis A., Bacauskiene M., Mining data with random forests: A survey and results of new tests, Pattern Recognition, 2011, 44(2), 330–349.
  118. Wang A.X., Chukova S.S., Nguyen B.P., Synthetic minority oversampling using edited displacement-based k-nearest neighbors, Applied Soft Computing, 2023, 148, 110895.
  119. Wang K., Wan J., Li G., Sun H., A hybrid algorithm-level ensemble model for imbalanced credit default prediction in the energy industry, Energies, 2022, 15(14), 5206.
  120. Wang C., Deng C., Wang S., Imbalance-XGBoost: leveraging weighted and focal losses for binary label-imbalanced classification with XGBoost, Pattern Recognition Letters, 2020, 136, 190–197.
  121. Wilson D.L., Asymptotic properties of nearest neighbor rules using edited data, IEEE Transactions on Systems, Man, and Cybernetics, 1972, 3, 408–421.
  122. Yan J., Tang B., He H., Detection of false data attacks in smart grid with supervised learning, 2016 International Joint Conference on Neural Networks (IJCNN), 2016, 1395–1402.
  123. Yanenkova I., Nehoda Y., Drobyazko S., Zavhorodnii A., Berezovska L., Modeling of bank credit risk management using the cost risk model, Journal of Risk and Financial Management, 2021, 14(5), 211.
  124. Yin S., Dey D.K., Valdez E.A., Gan G., Vadiveloo J., Skewed link regression models for imbalanced binary response with applications to life insurance, arXiv preprint arXiv:2007.15172, 2020.
  125. Xu T., Comparative Analysis of Machine Learning Algorithms for Consumer Credit Risk Assessment, Transactions on Computer Science and Intelligent Systems Research, 2024, vol. 4, pp. 60–67.
  126. Zhang C., Tan K.C., Li H., Hong G.S., A cost-sensitive deep belief network for imbalanced classification, IEEE Transactions on Neural Networks and Learning Systems, 2018, 30(1), 109–122.
  127. Zhang C., Wang G., Zhou Y., Yao L., Jiang Z.L., Liao Q., Wang X., Feature selection for high dimensional imbalanced class data based on F-measure optimization, in 2017 International Conference on Security, Pattern Analysis, and Cybernetics (SPAC), 2017, 278–283, IEEE.
  128. Zhang W., He Y., Wang L., Liu S., Meng X., Landslide susceptibility mapping using random forest and extreme gradient boosting: A case study of Fengjie, Chongqing, Geological Journal, 2023, 58(6), 2372–2387.
  129. Zhang W., Wu C., Zhong H., Li Y., Wang L., Prediction of undrained shear strength using extreme gradient boosting and random forest based on Bayesian optimization, Geoscience Frontiers, 2021, 12(1), 469–477.
  130. Zhao Z., Cui T., Ding S., Li J., Bellotti A.G., Resampling Techniques Study on Class Imbalance Problem in Credit Risk Prediction, Mathematics, 2024, vol. 12, no. 5, p. 701, MDPI.
  131. Zheng M., Wang F., Hu X., Miao Y., Cao H., Tang M., A method for analyzing the performance impact of imbalanced binary data on machine learning models, Axioms, 2022, 11(11), 607.
  132. Zhu L., Qiu D., Ergu D., Ying C., Liu K., A study on predicting loan default based on the random forest algorithm, Procedia Computer Science, 2019, 162, 503–513.
  133. Zhu T., Lin Y., Liu Y., Synthetic minority oversampling technique for multiclass imbalance problems, Pattern Recognition, 2017, 72, 327–340.
  134. Zou Y., Gao C., Gao H., Business failure prediction based on a cost-sensitive extreme gradient boosting machine, IEEE Access, 2022, vol. 10, pp. 42623–42639, IEEE.
DOI: https://doi.org/10.2478/fcds-2025-0009 | Journal eISSN: 2300-3405 | Journal ISSN: 0867-6356
Language: English
Page range: 229 - 270
Submitted on: Aug 31, 2024
Accepted on: Feb 13, 2025
Published on: Jun 10, 2025
Published by: Poznan University of Technology
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2025 Keith R Musara, Edmore Ranganai, Charles Chimedza, Florence Matarise, Sheunesu Munyira, published by Poznan University of Technology
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.