A Gaussian–Based WGAN–GP Oversampling Approach for Solving the Class Imbalance Problem

Zhou, Qian; Sun, Bo

doi:10.61822/amcs-2024-0021

References

Arjovsky, M., Chintala, S. and Bottou, L. (2017). Wasserstein generative adversarial networks, International Conference on Machine Learning, Sydney, Australia, pp. 214–223.
Search in Google Scholar Back to article
Barua, S., Islam, M.M. and Murase, K. (2013). PROWSYN: Proximity weighted synthetic oversampling technique for imbalanced data set learning, Advances in Knowledge Discovery and Data Mining: 17th Pacific-Asia Conference, PAKDD 2013, Gold Coast, Australia, pp. 317–328.
Search in Google Scholar Back to article
Bourou, S., El Saer, A., Velivassaki, T.-H., Voulkidis, A. and Zahariadis, T. (2021). A review of tabular data synthesis using GANs on an IDS dataset, Information 12(09): 375.
Search in Google Scholar Back to article
Breiman, L. (2001). Random forests, Machine Learning 45(1): 5–32.
Search in Google Scholar Back to article
Breiman, L. (2017). Classification and Regression Trees, Routledge, London.
Search in Google Scholar Back to article
Budach, L., Feuerpfeil, M., Ihde, N., Nathansen, A., Noack, N., Patzlaff, H., Naumann, F. and Harmouch, H. (2022). The effects of data quality on machine learning performance, arXiv: 2207.14529.
Search in Google Scholar Back to article
Chaabane, I., Guermazi, R. and Hammami, M. (2020). Enhancing techniques for learning decision trees from imbalanced data, Advances in Data Analysis and Classification 14(3): 1–69.
Search in Google Scholar Back to article
Chawla, N.V., Bowyer, K.W., Hall, L.O. and Kegelmeyer, W.P. (2002). SMOTE: Synthetic minority over-sampling technique, Journal of Artificial Intelligence Research 16: 321–357.
Search in Google Scholar Back to article
Chen, J., Huang, H., Cohn, A.G., Zhang, D. and Zhou, M. (2022). Machine learning-based classification of rock discontinuity trace: SMOTE oversampling integrated with GBT ensemble learning, International Journal of Mining Science and Technology 32(2): 309–322.
Search in Google Scholar Back to article
Chen, J., Yan, Z., Lin, C., Yao, B. and Ge, H. (2023). Aero-engine high speed bearing fault diagnosis for data imbalance: A sample enhanced diagnostic method based on pre-training WGAN-GP, Measurement 213(7): 112709.
Search in Google Scholar Back to article
Cover, T. and Hart, P. (1967). Nearest neighbor pattern classification, IEEE Transactions on Information Theory 13(1): 21–27.
Search in Google Scholar Back to article
Cui, J., Zong, L., Xie, J. and Tang, M. (2023). A novel multi-module integrated intrusion detection system for high-dimensional imbalanced data, Applied Intelligence 53(1): 272–288.
Search in Google Scholar Back to article
Derrac, J., Garcia, S., Sanchez, L. and Herrera, F. (2015). Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework, Journal of Multiple-Valued Logic and Soft Computing 17(2–3): 255–287.
Search in Google Scholar Back to article
Douzas, G. and Bacao, F. (2018). Effective data generation for imbalanced learning using conditional generative adversarial networks, Expert Systems with Applications 91(1): 464–471.
Search in Google Scholar Back to article
Dua, D. and Graff, C. (2019). UCI Machine Learning Repository, http://archive.ics.uci.edu/ml.
Search in Google Scholar Back to article
Fernández, A., Garcia, S., Herrera, F. and Chawla, N.V. (2018). Smote for learning from imbalanced data: Progress and challenges, marking the 15-year anniversary, Journal of Artificial Intelligence Research 61: 863–905.
Search in Google Scholar Back to article
Freund, Y. and Schapire, R.E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting, Journal of Computer and System Sciences 55(1): 119–139.
Search in Google Scholar Back to article
García, S., Luengo, J. and Herrera, F. (2016). Tutorial on practical tips of the most influential data preprocessing algorithms in data mining, Knowledge-Based Systems 98(7): 1–29.
Search in Google Scholar Back to article
Gazzah, S. and Amara, N.E.B. (2008). New oversampling approaches based on polynomial fitting for imbalanced data sets, 2008 8th IAPR International Workshop on Document Analysis Systems, Nara, Japan, pp. 677–684.
Search in Google Scholar Back to article
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A. and Bengio, Y. (2014). Generative adversarial nets, Advances in Neural Information Processing Systems 27: 2672–2680.
Search in Google Scholar Back to article
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V. and Courville, A.C. (2017). Improved training of Wasserstein GANs, Advances in Neural Information Processing Systems 30: 5767–5777.
Search in Google Scholar Back to article
Guyon, I. and Elisseeff, A. (2003). An introduction to variable and feature selection, Journal of Machine Learning Research 3(Mar): 1157–1182.
Search in Google Scholar Back to article
He, H. and Garcia, E.A. (2009). Learning from imbalanced data, IEEE Transactions on Knowledge and Data Engineering 21(9): 1263–1284.
Search in Google Scholar Back to article
Hernandez, M., Epelde, G., Alberdi, A., Cilla, R. and Rankin, D. (2022). Synthetic data generation for tabular health records: A systematic review, Neurocomputing 493(27): 28–45.
Search in Google Scholar Back to article
James, G., Witten, D., Hastie, T. and Tibshirani, R. (2013). An Introduction to Statistical Learning: With Applications to R, 2nd Edn, Springer, New York.
Search in Google Scholar Back to article
Janicka, M., Lango, M. and Stefanowski, J. (2019). Using information on class interrelations to improve classification of multiclass imbalanced data: A new resampling algorithm, International Journal of Applied Mathematics and Computer Science 29(4): 769–781, DOI: 10.2478/amcs-2019-0057.
Search in Google Scholar Back to article
Japkowicz, N. (2003). Class imbalances: Are we focusing on the right issue, Workshop on Learning from Imbalanced Data Sets II, Washington, USA, p. 63.
Search in Google Scholar Back to article
Kaggle (2024), Datasets: Lower Back Pain, https://www.kaggle.com/datasets/sammy123/lower-back-pain-symptoms-dataset, and Telecom Churn, https://www.kaggle.com/datasets/mnassrib/telecom-churn-datasets.
Search in Google Scholar Back to article
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection, 14th International Joint Conference on Artificial Intelligence (IJCAI), Montreal, Canada, pp. 1137–1145.
Search in Google Scholar Back to article
Kovács, G. (2019). An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets, Applied Soft Computing 83(9): 105662.
Search in Google Scholar Back to article
Liu, X.-Y., Wu, J. and Zhou, Z.-H. (2008). Exploratory undersampling for class-imbalance learning, IEEE Transactions on Systems, Man, and Cybernetics B: Cybernetics 39(2): 539–550.
Search in Google Scholar Back to article
López, V., Fernández, A., García, S., Palade, V. and Herrera, F. (2013). An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics, Information Sciences 250(33): 113–141.
Search in Google Scholar Back to article
Mirza, M. and Osindero, S. (2014). Conditional generative adversarial nets, arXiv: 1411.1784.
Search in Google Scholar Back to article
Miyato, T., Kataoka, T., Koyama, M. and Yoshida, Y. (2018). Spectral normalization for generative adversarial networks, arXiv: 1802.05957.
Search in Google Scholar Back to article
Moreo, A., Esuli, A. and Sebastiani, F. (2016). Distributional random oversampling for imbalanced text classification, Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, Pisa, Italy, pp. 805–808.
Search in Google Scholar Back to article
Napierala, K. and Stefanowski, J. (2016). Types of minority class examples and their influence on learning classifiers from imbalanced data, Journal of Intelligent Information Systems 46: 563–597.
Search in Google Scholar Back to article
Nik, A.H.Z., Riegler, M.A., Halvorsen, P. and Storås, A.M. (2023). Generation of synthetic tabular healthcare data using generative adversarial networks, International Conference on Multimedia Modeling, Bergen, Norway, pp. 434–446.
Search in Google Scholar Back to article
Ohsaki, M., Wang, P., Matsuda, K., Katagiri, S., Watanabe, H. and Ralescu, A. (2017). Confusion-matrix-based kernel logistic regression for imbalanced data classification, IEEE Transactions on Knowledge and Data Engineering 29(9): 1806–1819.
Search in Google Scholar Back to article
Park, N., Mohammadi, M., Gorde, K., Jajodia, S., Park, H. and Kim, Y. (2018). Data synthesis based on generative adversarial networks, Proceedings of the VLDB Endowment 11(10): 1071–1083.
Search in Google Scholar Back to article
Park, S. and Park, H. (2021). Combined oversampling and undersampling method based on slow-start algorithm for imbalanced network traffic, Computing 103(3): 401–424.
Search in Google Scholar Back to article
Powers, D.M. (2020). Evaluation: From precision, recall and f-measure to ROC, informedness, markedness and correlation, arXiv: 2010.16061.
Search in Google Scholar Back to article
Ren, J., Wang, Y., Cheung, Y.-m., Gao, X.-Z. and Guo, X. (2023). Grouping-based oversampling in kernel space for imbalanced data classification, Pattern Recognition 133(1): 108992.
Search in Google Scholar Back to article
Sáez, J.A., Luengo, J., Stefanowski, J. and Herrera, F. (2015). SMOTE–IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering, Information Sciences 291(2): 184–203.
Search in Google Scholar Back to article
Sun, B., Zhou, Q., Wang, Z., Lan, P., Song, Y., Mu, S., Li, A., Chen, H. and Liu, P. (2023). Radial-based undersampling approach with adaptive undersampling ratio determination, Neurocomputing 553(39): 126544.
Search in Google Scholar Back to article
Sun, Y., Wong, A.K. and Kamel, M.S. (2009). Classification of imbalanced data: A review, International Journal of Pattern Recognition and Artificial Intelligence 23(04): 687–719.
Search in Google Scholar Back to article
Wasserman, L. (2004). All of Statistics: A Concise Course in Statistical Inference, Springer, New York.
Search in Google Scholar Back to article
Wold, S., Esbensen, K. and Geladi, P. (1987). Principal component analysis, Chemometrics and Intelligent Laboratory Systems 2(1–3): 37–52.
Search in Google Scholar Back to article
Woods, K.S., Doss, C.C., Bowyer, K.W., Solka, J.L., Priebe, C.E. and Kegelmeyer Jr, W.P. (1993). Comparative evaluation of pattern recognition techniques for detection of microcalcifications in mammography, International Journal of Pattern Recognition and Artificial Intelligence 7(06): 1417–1436.
Search in Google Scholar Back to article
Xie, Y. and Zhang, T. (2018). Imbalanced learning for fault diagnosis problem of rotating machinery based on generative adversarial networks, 2018 37th Chinese Control Conference (CCC), Wuhan, China, pp. 6017–6022.
Search in Google Scholar Back to article
Xu, L., Skoularidou, M., Cuesta-Infante, A. and Veeramachaneni, K. (2019). Modeling tabular data using conditional GAN, Advances in Neural Information Processing Systems 32: 7335–7345.
Search in Google Scholar Back to article
Yang, C., Lu, X., Lin, Z., Shechtman, E., Wang, O. and Li, H. (2017). High-resolution image inpainting using multi-scale neural patch synthesis, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp. 6721–6729.
Search in Google Scholar Back to article
Zhang, M., Wan, X., Gang, L., Lv, X., Wu, Z. and Liu, Z. (2021). An automated driving strategy generating method based on WGAIL–DDPG, International Journal of Applied Mathematics and Computer Science 31(3): 461–470, DOI: 10.34768/amcs-2021-0031.
Search in Google Scholar Back to article
Zhang, Y., Liu, Y., Wang, Y. and Yang, J. (2023). An ensemble oversampling method for imbalanced classification with prior knowledge via generative adversarial network, Chemometrics and Intelligent Laboratory Systems 235(4): 104775.
Search in Google Scholar Back to article
Zhao, Y., Li, H., Bissyandé, T.F., Klein, J. and Grundy, J. (2021). On the impact of sample duplication in machine-learning-based android malware detection, ACM Transactions on Software Engineering and Methodology 30(3): 1–38.
Search in Google Scholar Back to article
Zhao, Z., Kunar, A., Birke, R. and Chen, L.Y. (2021). CTAB-GAN: Effective table data synthesizing, Asian Conference on Machine Learning, pp. 97–112, (virtual).
Search in Google Scholar Back to article
Zheng, M., Li, T., Zhu, R., Tang, Y., Tang, M., Lin, L. and Ma, Z. (2020a). Conditional wasserstein generative adversarial network-gradient penalty-based approach to alleviating imbalanced data classification, Information Sciences 512(7): 1009–1023.
Search in Google Scholar Back to article
Zheng, W. and Zhao, H. (2020b). Cost-sensitive hierarchical classification for imbalance classes, Applied Intelligence 50(8): 2328–2338.
Search in Google Scholar Back to article
Zhu, B., Pan, X., vanden Broucke, S. and Xiao, J. (2022). A GAN-based hybrid sampling method for imbalanced customer classification, Information Sciences 609(28): 1397–1411.
Search in Google Scholar Back to article

A Gaussian–Based WGAN–GP Oversampling Approach for Solving the Class Imbalance Problem

References

Paradigm

My account