Have a personal or library account? Click to login
A New Big Data Model Using Distributed Cluster-Based Resampling for Class-Imbalance Problem Cover

A New Big Data Model Using Distributed Cluster-Based Resampling for Class-Imbalance Problem

Open Access
|Feb 2020

References

  1. [1] M. K. Saggi and S. Jain, “A Survey Towards an Integration of Big Data Analytics to Big Insights for Value-Creation,” Information Processing & Management, vol. 54, no. 5, pp. 758–790, Sep. 2018. https://doi.org/10.1016/j.ipm.2018.01.01010.1016/j.ipm.2018.01.010
  2. [2] A. Oussous, F. Z. Benjelloun, A. A. Lahcen, and S. Belfkih, “Big Data Technologies: A survey,” Journal of King Saud University – Computer and Information Sciences, vol. 30, no. 4, pp. 431–448, Oct. 2018. https://doi.org/10.1016/j.jksuci.2017.06.00110.1016/j.jksuci.2017.06.001
  3. [3] G. Haixiang, L. Yijing, J. Shang, G. Mingyun, H. Yuanyue, and G. Bing, “Learning From Class-Imbalanced Data: Review of Methods and Applications,” Expert Systems with Applications, vol. 73, pp. 220–239, May 2017. https://doi.org/10.1016/j.eswa.2016.12.03510.1016/j.eswa.2016.12.035
  4. [4] H. He and E. A. Garcia, “Learning From Imbalanced Data,” IEEE Transactions on Knowledge & Data Engineering, vol. 21, no. 9, pp. 1263–1284, Sep. 2009. https://doi.org/10.1109/TKDE.2008.23910.1109/TKDE.2008.239
  5. [5] S. Das, S. Datta, and B. B. Chaudhuri, “Handling Data Irregularities in Classification: Foundations, Trends, and Future Challenges,” Pattern Recognition, vol. 81, pp. 674–693, Sep. 2018. https://doi.org/10.1016/j.patcog.2018.03.00810.1016/j.patcog.2018.03.008
  6. [6] J. Stefanowski, “Dealing With Data Difficulty Factors While Learning From Imbalanced Data,” in Challenges in Computational Statistics and Data Mining, pp. 333–363, 2016. https://doi.org/10.1007/978-3-319-18781-5_1710.1007/978-3-319-18781-5_17
  7. [7] A. Fernández, S. del Río, N. V. Chawla, and F. Herrera, “An Insight Into Imbalanced Big Data Classification: Outcomes and Challenges,” Complex & Intelligent Systems, vol. 3, no. 2, pp. 105–120, Jun. 2017. https://doi.org/10.1007/s40747-017-0037-910.1007/s40747-017-0037-9
  8. [8] S. del Río, V. López, J. M. Benítez, and F. Herrera, “On the Use of MapReduce for Imbalanced Big Data Using Random Forest,” Information Sciences, vol. 285, pp. 112–137, 2014. https://doi.org/10.1016/j.ins.2014.03.04310.1016/j.ins.2014.03.043
  9. [9] S. S. Patil and S. P. Sonavane, “Enriched Over_Sampling Techniques for Improving Classification of Imbalanced Big Data,” in 2017 IEEE Third International Conference on Big Data Computing Service and Applications (BigDataService), USA, 2017, pp. 1–10. https://doi.org/10.1109/BigDataService.2017.1910.1109/BigDataService.2017.19
  10. [10] M. Ghanavati, R. K. Wong, F. Chen, Y. Wang, and C. S. Perng, “An Effective Integrated Method for Learning Big Imbalanced Data,” in 2014 IEEE International Congress on Big Data, USA, 2014, pp. 691–698. https://doi.org/10.1109/BigData.Congress.2014.10210.1109/BigData.Congress.2014.102
  11. [11] D. Galpert, S. del Río, F. Herrera, E. Ancede-Gallardo, A. Antunes, and G. Agüero-Chapin, “An Effective Big Data Supervised Imbalanced Classification Approach for Ortholog Detection in Related Yeast Species,” BioMed Research International, vol. 2015, Article ID 748681, 2015. https://doi.org/10.1155/2015/74868110.1155/2015/748681464194326605337
  12. [12] S. del Río, J. M. Benítez, and F. Herrera, “Analysis of Data Preprocessing Increasing the Oversampling Ratio for Extremely Imbalanced Big Data Classification,” in 2015 IEEE Trustcom/BigDataSE/ISPA, pp. 180–185, Finland, 2015. https://doi.org/10.1109/Trustcom.2015.57910.1109/Trustcom.2015.579
  13. [13] I. Triguero, S. del Río, V. López, J. Bacardit, J. M. Benítez, and F. Herrera, “ROSEFW-RF: The Winner Algorithm for the ECBDL’14 Big Data Competition: An Extremely Imbalanced Big Data Bioinformatics Problem,” Knowledge-Based Systems, vol. 87, pp. 69–79, Oct. 2015. https://doi.org/10.1016/j.knosys.2015.05.02710.1016/j.knosys.2015.05.027
  14. [14] I. Triguero, M. Galar, S. Vluymans, C. Cornelis, H. Bustince, F. Herrera, and Y. Saeys, “Evolutionary Undersampling for Imbalanced Big Data Classification,” in 2015 IEEE Congress on Evolutionary Computation (CEC), Japan, 2015, pp. 715–722. https://doi.org/10.1109/CEC.2015.725696110.1109/CEC.2015.7256961
  15. [15] I. Triguero, M. Galar, D. Merino, J. Maillo, H. Bustince, and F. Herrera, “Evolutionary Undersampling for Extremely Imbalanced Big Data Classification Under Apache Spark,” in 2016 IEEE Congress on Evolutionary Computation (CEC), Canada, 2016, pp. 640–647. https://doi.org/10.1109/CEC.2016.774385310.1109/CEC.2016.7743853
  16. [16] S. Kamal, S.H. Ripon, N. Dey, A.S. Ashour, and V. Santhi, “A MapReduce approach to diminish imbalance parameters for big deoxyribonucleic acid dataset,” Computer methods and programs in biomedicine, vol. 131, pp. 191–206, Jul. 2016. https://doi.org/10.1016/j.cmpb.2016.04.00510.1016/j.cmpb.2016.04.00527265059
  17. [17] F. Hu, H. Li, H. Lou, and J. Dai, “A parallel oversampling algorithm based on NRSBoundary-SMOTE,” Journal of Information & Computational Science, vol. 11, no. 13, pp. 4655–4665, Sep. 2014. https://doi.org/10.12733/jics2010448410.12733/jics20104484
  18. [18] R. C. Bhagat and S. S. Patil, “Enhanced SMOTE Algorithm for Classification of Imbalanced Big-Data Using Random Forest,” in 2015 IEEE International Advance Computing Conference (IACC), India, 2015, pp. 403–408. https://doi.org/10.1109/IADCC.2015.715473910.1109/IADCC.2015.7154739
  19. [19] C. K. Maurya, D. Toshniwal, and G. V. Venkoparao, “Online Sparse Class Imbalance Learning on Big Data,” Neurocomputing, vol. 216, pp. 250–260, Dec. 2016. https://doi.org/10.1016/j.neucom.2016.07.04010.1016/j.neucom.2016.07.040
  20. [20] M. Tang, C. Yang, K. Zhang, Q. Xie, “Cost-Sensitive Support Vector Machine Using Randomized Dual Coordinate Descent Method for Big Class-Imbalanced Data Classification,” Abstract and Applied Analysis, vol. 2014, Article ID 416591, Jul. 2014. https://doi.org/10.1155/2014/41659110.1155/2014/416591
  21. [21] X. Wang, X., Liu, and S. Matwin, “A distributed instance-weighted SVM algorithm on large-scale imbalanced datasets”. in 2014 IEEE International Conference on Big Data, USA, 2014, pp. 45–51. https://doi.org/10.1109/BigData.2014.700446710.1109/BigData.2014.7004467
  22. [22] V. López, S. del Río, J. M. Benítez, and F. Herrera, “Cost-Sensitive Linguistic Fuzzy Rule Based Classification Systems Under the MapReduce Framework for Imbalanced Big Data,” Fuzzy Sets and Systems, vol. 258, pp. 5–38, Jan. 2015. https://doi.org/10.1016/j.fss.2014.01.01510.1016/j.fss.2014.01.015
  23. [23] S. del Rio, V. Lopez, J. M. Benítez, and F. Herrera, “A MapReduce Approach to Address Big Data Classification Problems Based on the Fusion of Linguistic Fuzzy Rules,” International Journal of Computational Intelligence Systems, vol. 8, no. 3, pp. 422–437, May 2015. https://doi.org/10.1080/18756891.2015.101737710.1080/18756891.2015.1017377
  24. [24] J. Zhai, S. Zhang, M. Zhang, and X. Liu, “Fuzzy Integral-Based ELM Ensemble for Imbalanced Big Data Classification,” Soft Computing, vol. 22, no. 11, pp. 3519–3531, Jun. 2018. https://doi.org/10.1007/s00500-018-3085-110.1007/s00500-018-3085-1
  25. [25] Z. Wang, J. Xin, H. Yang, S. Tian, G. Yu, C. Xu, and Y. Yao, “Distributed and Weighted Extreme Learning Machine for Imbalanced Big Data Learning,” Tsinghua Science and Technology, vol. 22, no. 2, pp. 160–173, Apr. 2017. https://doi.org/10.23919/TST.2017.788963810.23919/TST.2017.7889638
  26. [26] N. B. Abdel-Hamid, S. ElGhamrawy, A. El Desouky, and H. Arafat, “A Dynamic Spark-Based Classification Framework for Imbalanced Big Data,” Journal of Grid Computing, vol. 16, no. 4, pp. 607–626, Dec. 2018. https://doi.org/10.1007/s10723-018-9465-z10.1007/s10723-018-9465-z
  27. [27] J. L. Leevy, T. M. Khoshgoftaar, R. A. Bauder, and N. Seliya, “A Survey on Addressing High-Class Imbalance in Big Data,” Journal of Big Data, vol. 5, no. 42, Dec. 2018. https://doi.org/10.1186/s40537-018-0151-610.1186/s40537-018-0151-6
  28. [28] J. W. Huang, C. W. Chiang, and J. W. Chang, “Email Security Level Classification of Imbalanced Data Using Artificial Neural Network: The Real Case in a World-Leading Enterprise,” Engineering Applications of Artificial Intelligence, vol. 75, pp. 11–21, Oct. 2018. https://doi.org/10.1016/j.engappai.2018.07.01010.1016/j.engappai.2018.07.010
  29. [29] T. Jo, and N. Japkowicz, “Class Imbalances Versus Small Disjuncts,” ACM SIGKDD Explorations Newsletter, vol. 6, no. 1, pp. 40–49, Jun. 2004. https://doi.org/10.1145/1007730.100773710.1145/1007730.1007737
  30. [30] A. Agrawal, H. L. Viktor, E. Paquet, “SCUT: Multi-Class Imbalanced Data Classification Using SMOTE and Cluster-Based Undersampling,” in 2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K), 2015, vol. 1, pp. 226–234. https://doi.org/10.5220/000559550226023410.5220/0005595502260234
  31. [31] W. C. Lin, C. F. Tsai, Y. H. Hu, and J. S. Jhang, “Clustering-Based Undersampling in Class-Imbalanced Data,” Information Sciences, vol. 409, pp. 17–26, Oct. 2017. https://doi.org/10.1016/j.ins.2017.05.00810.1016/j.ins.2017.05.008
  32. [32] I. Nekooeimehr and S. K. Lai-Yuen, “Adaptive Semi-Unsupervised Weighted Oversampling (A-SUWO) for Imbalanced Datasets,” Expert Systems with Applications, vol. 46, pp. 405–416, Mar. 2016. https://doi.org/10.1016/j.eswa.2015.10.03110.1016/j.eswa.2015.10.031
  33. [33] A. Estabrooks, T. Jo, and N. Japkowicz, “A Multiple Resampling Method for Learning from Imbalanced Data Sets,” Computational Intelligence, vol. 20, no. 1, pp. 18–36, Feb. 2004. https://doi.org/10.1111/j.0824-7935.2004.t01-1-00228.x10.1111/j.0824-7935.2004.t01-1-00228.x
  34. [34] H. Guo, J. Zhou, and C. A. Wu, “Imbalanced Learning Based on Data-Partition and SMOTE,” Information, vol. 9, no. 238, Sep. 2018. https://doi.org/10.3390/info909023810.3390/info9090238
  35. [35] GAZİ-BIDISEC. Gazi University Big Data and Information Security Center. [Online]. Available: http://bigdatacenter.gazi.edu.tr/ [Accessed: Sep. 2019].
  36. [36] T. Hasanin and T. Khoshgoftaar, “The Effects of Random Undersampling with Simulated Class Imbalance for Big Data,” in 2018 IEEE International Conference on Information Reuse and Integration (IRI), USA, 2018, pp. 70–79. https://doi.org/10.1109/IRI.2018.0001810.1109/IRI.2018.00018
DOI: https://doi.org/10.2478/acss-2019-0013 | Journal eISSN: 2255-8691 | Journal ISSN: 2255-8683
Language: English
Page range: 104 - 110
Published on: Feb 20, 2020
Published by: Riga Technical University
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2020 Duygu Sinanc Terzi, Seref Sagiroglu, published by Riga Technical University
This work is licensed under the Creative Commons Attribution 4.0 License.