Have a personal or library account? Click to login

The Influence of Unbalanced Economic Data on Feature Selection and Quality of Classifiers

Open Access
|Aug 2020

References

  1. Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.10.1023/A:1010933404324
  2. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, 16, 321–357.10.1613/jair.953
  3. Chawla, N.V., Japkowicz, N., Kołcz, A. (2004). Special issue on learning from imbalanced data sets. ACM Sigkdd Explorations Newsletter, 6 (1), 1–6.10.1145/1007730.1007733
  4. Chen, C., Liaw, A., Breiman, L. (2004) Using random forest to learn imbalanced data. University of California, Berkeley, 110, 1–12.
  5. Dua, D., Graff, C. (2019). UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science. Retrieved from: http://archive.ics.uci.edu/ml (17.06.2019).
  6. Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27, 861–874.10.1016/j.patrec.2005.10.010
  7. Fayyad, U., Irani, K. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence (pp. 1022–1027).
  8. Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F. (2011). A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42 (4), 463–484.10.1109/TSMCC.2011.2161285
  9. Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. (2006). Feature Extraction: Foundations and Applications. New York: Springer.10.1007/978-3-540-35488-8
  10. Guyon, I., Weston, J., Barnhill, S., Vapnik, V. (2002). Gene Selection for Cancer Classification using Support Vector Machines. Machine Learning, 46, 389–422.10.1023/A:1012487302797
  11. Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., Bing, G. (2017). Learning from class-imbalanced data: Review of methods and applications. Expert Systems with Applications, 73, 220–239.10.1016/j.eswa.2016.12.035
  12. Japkowicz, N., Shah, M. (2011). Evaluating learning algorithms: a classification perspective. Cambridge University Press.10.1017/CBO9780511921803
  13. King, G., Zeng, L. (2001). Logistic regression in rare events data. Political Analysis, 9, 137–163.10.1093/oxfordjournals.pan.a004868
  14. Kononenko, I. (1994). Estimating attributes: Analysis and extensions of RELIEF, Proceedings of European Conference on Machine Learning (pp. 171–182).10.1007/3-540-57868-4_57
  15. Kubus, M. (2015). Rekurencyjna eliminacja cech w metodach dyskryminacji. Prace Naukowe Uniwersytetu Ekonomicznego we Wrocławiu, 384. Taksonomia, 24, 154–162. DOI: 10.15611/pn.2015.384.16.10.15611/pn.2015.384.16
  16. Kubus, M. (2016). Lokalna ocena mocy dyskryminacyjnej zmiennych. Prace Naukowe Uniwersytetu Ekonomicznego we Wrocławiu, 427, Taksonomia 27, 143–152. DOI: 10.15611/pn.2016.427.15.10.15611/pn.2016.427.15
  17. Longadge, R., Dongre, S.S., Malik, L. (2013). Class Imbalance Problem in Data Mining: Review. International Journal of Computer Science and Network, 2 (1), 83–87.
  18. Menardi, G., Torelli, N. (2014). Training and assessing classification rules with imbalanced data. Data Mining and Knowledge Discovery, 28, 92–122.10.1007/s10618-012-0295-5
  19. Pociecha, J., Pawełek, B., Baryła, M., Augustyn, S. (2014). Statystyczne metody prognozowania bankructwa w zmieniającej się koniunkturze gospodarczej. Kraków: Fundacja Uniwersytetu Ekonomicznego w Krakowie.
  20. Tomek, I. (1976). Two modifications of CNN. IEEE Trans. Systems, Man and Cybernetics, 6, 769–772.10.1109/TSMC.1976.4309452
  21. Tsamardinos, I., Aliferis, C.F. (2003). Towards principled feature selection: relevancy, filters and wrappers. In Proceedings of the Workshop on Artificial Intelligence and Statistics.
  22. Weiss, G. (2004). Mining with rarity: A unifying framework. SIGKDD Explorations, 6 (1), 7–19.10.1145/1007730.1007734
  23. Yu, L., Liu, H. (2004). Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research, 5, 1205–1224.
  24. Zou, H., Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B, 67 (2), 301–320.10.1111/j.1467-9868.2005.00503.x
DOI: https://doi.org/10.2478/foli-2020-0014 | Journal eISSN: 1898-0198 | Journal ISSN: 1730-4237
Language: English
Page range: 232 - 247
Submitted on: Nov 19, 2019
Accepted on: Mar 31, 2020
Published on: Aug 20, 2020
Published by: University of Szczecin
In partnership with: Paradigm Publishing Services
Publication frequency: 2 times per year

© 2020 Mariusz Kubus, published by University of Szczecin
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.