Have a personal or library account? Click to login
Cross-Project Defect Prediction with Metrics Selection and Balancing Approach Cover

Cross-Project Defect Prediction with Metrics Selection and Balancing Approach

Open Access
|Jan 2023

References

  1. [1] T. Menzies, B. Turhan, A. Bener, G. Gay, B. Cukic, and Y. Jiang, “Implications of ceiling effects in defect predictors,” in Proceedings of the 4th international workshop on Predictor models in software engineering - PROMISE ’08, 2008, pp. 47–54. https://doi.org/10.1145/1370788.1370801
  2. [2] M. Nevendra and P. Singh, “Software defect prediction using deep learning,” Acta Polytech. Hungarica, vol. 18, no. 10, pp. 173–189, 2021. https://doi.org/10.12700/aph.18.10.2021.10.9
  3. [3] M. Shepperd, D. Bowes, and T. Hall, “Researcher Bias: The use of machine learning in software defect prediction,” IEEE Trans. Softw. Eng., vol. 40, no. 6, pp. 603–616, Jun. 2014. https://doi.org/10.1109/TSE.2014.2322358
  4. [4] T. Wang, W. Li, H. Shi, and Z. Liu, “Software defect prediction based on classifiers ensemble,” J. Inf. C. Sci., vol. 8, no. 16, pp. 4241–4254, 2011.
  5. [5] J. Nam, S. J. Pan, and S. Kim, “Transfer defect learning,” in 2013 35th International Conference on Software Engineering (ICSE), San Francisco, CA, USA, May 2013, pp. 382–391. https://doi.org/10.1109/ICSE.2013.6606584
  6. [6] X. Xia, D. Lo, S. J. Pan, N. Nagappan, and X. Wang, “HYDRA: Massively compositional model for cross-project defect prediction,” IEEE Trans. Softw. Eng., vol. 42, no. 10, pp. 977–998, Oct. 2016. https://doi.org/10.1109/TSE.2016.2543218
  7. [7] Z. He, F. Shu, Y. Yang, M. Li, and Q. Wang, “An investigation on the feasibility of cross-project defect prediction,” Autom. Softw. Eng., vol. 19, no. 2, pp. 167–199, Jul. 2012.https://doi.org/10.1007/s10515-011-0090-3
  8. [8] H. Chen, X. Jing, Z. Li, D. Wu, Y. Peng, and Z. Huang, “An empirical study on heterogeneous defect prediction approaches,” IEEE Trans. Softw. Eng., vol. 47, no. 12, pp. 2803–2822, Jan. 2020. https://doi.org/10.1109/TSE.2020.2968520
  9. [9] H. Turabieh, M. Mafarja, and X. Li, “Iterated feature selection algorithms with layered recurrent neural network for software fault prediction,” Expert Syst. Appl., vol. 122, pp. 27–42, May 2019. https://doi.org/10.1016/j.eswa.2018.12.033
  10. [10] T. Hall, S. Beecham, D. Bowes, D. Gray, and S. Counsell, “A systematic literature review on fault prediction performance in software engineering,” IEEE Trans. Softw. Eng., vol. 38, no. 6, pp. 1276–1304, Oct. 2012. https://doi.org/10.1109/TSE.2011.103
  11. [11] S. Hosseini, B. Turhan, and D. Gunarathna, “A systematic literature review and meta-analysis on cross project defect prediction,” IEEE Trans. Softw. Eng., vol. 45, no. 2, pp. 111–147, Nov. 2019. https://doi.org/10.1109/TSE.2017.2770124
  12. [12] F.J. Provost, “Machine learning from imbalanced data sets 101,” inProc. AAAI’2000 Workshop onImbalanced Data Sets, 2000, pp. 1–3.
  13. [13] D. Ryu, J. I. Jang, and J. Baik, “A transfer cost-sensitive boosting approach for cross-project defect prediction,” Softw. Qual. J., vol. 25, no. 1, pp. 235–272, Sep. 2017. https://doi.org/10.1007/s11219-015-9287-1
  14. [14] S. Wang and X. Yao, “Using class imbalance learning for software defect prediction,” IEEE Trans. Reliab., vol. 62, no. 2, pp. 434–443, Apr. 2013. https://doi.org/10.1109/TR.2013.2259203
  15. [15] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic minority over-sampling technique,” J. Artif. Intell. Res., vol. 16, pp. 321–357, Jun. 2002. https://doi.org/10.1613/jair.953
  16. [16] H. He, Y. Bai, E. A. Garcia, and S. Li, “ADASYN: Adaptive synthetic sampling approach for imbalanced learning,” inProc. Int. Jt. Conf. Neural Networks, Hong Kong, Jun. 2008, pp. 1322–1328. https://doi.org/10.1109/IJCNN.2008.4633969
  17. [17] A. Agrawal and T. Menzies, “Is ‘better data’ better than ‘better data miners’?: on the benefits of tuning SMOTE for defect prediction,” in2018 IEEE/ACM 40th Int. Conf. Softw. Eng., May 2018, pp. 1050–1061. https://doi.org/10.1145/3180155.3180197
  18. [18] C. Tantithamthavorn and A. E. Hassan, “The impact of class rebalancing techniques on the performance and interpretation of defect prediction models,” IEEE Trans. Softw. Eng., 2018. [Online]/Available: chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://chakkrit.com/assets/papers/tantithamthavorn2018imbalance.pdf
  19. [19] S. S. Rathore and S. Kumar, “Linear and non-linear heterogeneous ensemble methods to predict the number of faults in software systems,” Knowledge-Based Syst., vol. 119, pp. 232–256, Mar. 2017. https://doi.org/10.1016/j.knosys.2016.12.017
  20. [20] S. S. Rathore and S. Kumar, “An approach for the prediction of number of software faults based on the dynamic selection of learning techniques,” IEEE Trans. Reliab., vol. 68, no. 1, pp. 216–236, Aug. 2018. https://doi.org/10.1109/TR.2018.2864206
  21. [21] T. G. Dietterich, “Ensemble methods in machine learning,” in Multiple Classifier Systems, MCS 2000. Lecture Notes in Computer Science, vol 1857. Springer, Berlin, Heidelberg, Jan. 2000, pp. 1–15. https://doi.org/10.1007/3-540-45014-9_1
  22. [22] Q. Song, Z. Jia, M. Shepperd, S. Ying, and J. Liu, “A general software defect-proneness prediction framework,” IEEE Trans. Softw. Eng., vol. 37, no. 3, pp. 356–370, 2011. https://doi.org/10.1109/TSE.2010.90
  23. [23] X. Rong, F. Li, and Z. Cui, “A model for software defect prediction using support vector machine based on CBA,” Int. J. Intell. Syst. Technol. Appl., vol. 15, no. 1, pp. 19–34, Apr. 2016. https://doi.org/10.1504/IJISTA.2016.076102
  24. [24] K. Dejaeger, T. Verbraken, and B. Baesens, “Toward comprehensible software fault prediction models using bayesian network classifiers,” IEEE Trans. Softw. Eng., vol. 39, no. 2, pp. 237–257, Apr. 2013. https://doi.org/10.1109/TSE.2012.20
  25. [25] T. Zimmermann, N. Nagappan, H. Gall, E. Giger, and B. Murphy, “Cross-project defect prediction: a large scale experiment on data vs. domain vs. process,” in Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering,Aug. 2009, pp. 91–100. https://doi.org/10.1145/1595696.1595713
  26. [26] B. Turhan, T. Menzies, A. B. Bener, and J. Di Stefano, “On the relative value of cross-company and within-company data for defect prediction,” Empir. Softw. Eng., vol. 14, pp. 540–578, Jan. 2009. https://doi.org/10.1007/s10664-008-9103-7
  27. [27] B. Turhan, A. T. Misirli, and A. Bener, “Empirical evaluation of the effects of mixed project data on learning defect predictors,” Inf. Softw. Technol., vol. 55, no. 6, pp. 1101–1118, Jun. 2013. https://doi.org/10.1016/j.infsof.2012.10.003
  28. [28] P. He, Y. Ma, and B. Li, “TDSelector: a training data selection method for cross-project defect prediction,” 2016.[Online]. Available: https://arxiv.org/ftp/arxiv/papers/1612/1612.09065.pdf
  29. [29] S. Herbold, A. Trautsch, and J. Grabowski, “A comparative study to benchmark cross-project defect prediction approaches,” IEEE Trans. Softw. Eng., vol. 44, no. 9, pp. 811–833, Sep. 2018. https://doi.org/10.1109/TSE.2017.2724538
  30. [30] J. Nam, W. Fu, S. Kim, T. Menzies, and L. Tan, “Heterogeneous defect prediction,” IEEE Trans. Softw. Eng., vol. 44, no. 9, pp. 874–896, Sep. 2018. https://doi.org/10.1109/TSE.2017.2720603
  31. [31] P. He, B. Li, X. Liu, J. Chen, and Y. Ma, “An empirical study on software defect prediction with a simplified metric set,” Inf. Softw. Technol., vol. 59, pp. 170–190, Mar. 2015. https://doi.org/10.1016/j.infsof.2014.11.006
  32. [32] Y. Zhang, D. Lo, X. Xia, and J. Sun, “An empirical study of classifier combination for cross-project defect prediction,” in 2015 IEEE 39th Annual Computer Software and Applications Conference, vol. 2, Taichung, Taiwan, Jul. 2015, pp. 264–269. https://doi.org/10.1109/COMPSAC.2015.58
  33. [33] C. Ni, W. S. Liu, X. Chen, Q. Gu, D. X. Chen, and Q. G. Huang, “A cluster based feature selection method for cross-project software defect prediction,” J. Comput. Sci. Technol., vol. 32, no. 6, pp. 1090–1107, Dec. 2017. https://doi.org/10.1007/s11390-017-1785-0
  34. [34] B. Turhan, “On the dataset shift problem in software engineering prediction models,” Empir. Softw. Eng., vol. 17, no. 1–2, pp. 62–74, Oct. 2012. https://doi.org/10.1007/s10664-011-9182-8
  35. [35] Q. Song, Y. Guo, and M. Shepperd, “A comprehensive investigation of the role of imbalanced learning for software defect prediction,” IEEE Trans. Softw. Eng., vol. 45, no. 12, Dec. 2018. https://doi.org/10.1109/TSE.2018.2836442
  36. [36] T. Fukushima, Y. Kamei, S. McIntosh, K. Yamashita, and N. Ubayashi, “An empirical study of just-in-time defect prediction using cross-project models,” in Proceedings of the 11th Working Conference on Mining Software Repositories –MSR 2014, May 2014, pp. 172–181. https://doi.org/10.1145/2597073.2597075
  37. [37] X.-Y. Jing, F. Wu, X. Dong, and B. Xu, “An improved SDA based defect prediction framework for both within-project and cross-project class-imbalance problems,” IEEE Trans. Softw. Eng., vol. 43, no. 4, pp. 321–339, Apr. 2017. https://doi.org/10.1109/TSE.2016.2597849
  38. [38] A. K. Shukla, P. Singh, and M. Vardhan, “A hybrid gene selection method for microarray recognition,” Biocybern. Biomed. Eng., vol. 38, no. 4, pp. 975–991, 2018. https://doi.org/10.1016/j.bbe.2018.08.004
  39. [39] C. Ding and H. Peng, “Minimum redundancy feature selection from microarray gene expression data,” inBioinforma. Conf. 2003.CSB 2003. Proceedings of the 2003 IEEE Bioinformatics Conference, Stanford, CA, USA, Aug. 2003, pp. 523–528. https://doi.org/10.1109/CSB.2003.1227396
  40. [40] N. Settouti, M. E. A. Bechar, and M. A. Chikh, “Statistical comparisons of the top 10 algorithms in data mining for classification task,” Int. J. Interact. Multimed. Artif. Intell., vol. 4, no. 1, pp. 46–51, Sep. 2016. https://doi.org/10.9781/ijimai.2016.419
  41. [41] R. Malhotra, “A systematic review of machine learning techniques for software fault prediction,” Appl. Soft Comput. J., vol. 27, pp. 504–518, Feb. 2015. https://doi.org/10.1016/j.asoc.2014.11.023
  42. [42] Z. Zhang and X. Xie, “Research on AdaBoost.M1 with Random Forest,” in ICCET 2010–2010 Int. Conf. Comput. Eng. Technol. Proc., vol. 1, Chengdu, Apr. 2010, pp. V1-647–V1-652. https://doi.org/10.1109/ICCET.2010.5485910
  43. [43] C. Tantithamthavorn, S. McIntosh, A. E. Hassan, and K. Matsumoto, “The impact of automated parameter optimization on defect prediction models,” IEEE Trans. Softw. Eng., vol. 45, no. 7, pp. 683–711, Jul. 2019. https://doi.org/10.1109/TSE.2018.2794977
  44. [44] M. Jureczko and L. Madeyski, “Towards identifying software project clusters with regard to defect prediction,” inProc. 6th Int. Conf. Predict. Model. Softw. Eng. –PROMISE’10, Art. no.9, Sep. 2010, pp. 1–10. https://doi.org/10.1145/1868328.1868342
  45. [45] G. Boetticher, T. Menzies, and T. Ostrand, “Tera-PROMISE: Welcome to one of the largest repositories of SE research data.” [Online]. Available: http://openscience.us/repo/.Accessed on: Nov. 30, 2017.
  46. [46] Z. He, F. Peters, T. Menzies, and Y. Yang, “Learning from open-source projects: An empirical study on defect prediction,” inInt. Symp. Empir. Softw. Eng. Meas., Baltimore, MD, USA, Oct. 2013, pp. 45–54. https://doi.org/10.1109/ESEM.2013.20
  47. [47] Y. Liu, T. M. Khoshgoftaar, and N. Seliya, “Evolutionary optimization of software quality modeling with multiple repositories,” IEEE Trans. Softw. Eng., vol. 36, no. 6, pp. 852–864, May 2010. https://doi.org/10.1109/TSE.2010.51
  48. [48] G. Canfora, A. De Lucia, M. Di Penta, R. Oliveto, A. Panichella, and S. Panichella, “Multi -objective cross-project defect prediction,” in Proceedings –IEEE 6th International Conference on Software Testing, Verification and Validation, ICST 2013, Luxembourg, Mar. 2013, pp. 252–261. https://doi.org/10.1109/ICST.2013.38
  49. [49] A. Panichella, R. Oliveto, and A. De Lucia, “Cross-project defect prediction models: L’Union fait la force,” in 2014 Software Evolution Week –IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE), Antwerpen, Belgium, Feb. 2014, pp. 164–173. https://doi.org/10.1109/CSMR-WCRE.2014.6747166
  50. [50] J. Demšar, “Statistical comparisons of classifiers over multiple data sets,” J. Mach. Learn. Res., vol. 7, no. 4, pp. 1–30, Aug. 2006. https://doi.org/10.1016/j.jecp.2010.03.00520451214
  51. [51] E. G. Jelihovschi, J. C. Faria, and I. B. Allaman, “The ScottKnott clustering algorithm,” Univ. Estadual St. Cruz-UESC, Ilheus, Bahia, Bras., 2014.
  52. [52] Q. Yu, J. Qian, S. Jiang, Z. Wu, and G. Zhang, “An empirical study on the effectiveness of feature selection for cross-project defect prediction,” IEEE Access, vol. 7, pp. 35710–35718, Jan. 2019. https://doi.org/10.1109/ACCESS.2019.2895614
  53. [53] M.Friedman, “The use of ranks to avoid the assumption of normality implicit in the analysis of variance,” J. Am. Stat. Assoc., vol. 32, no. 200, pp. 675–701, 1937. https://doi.org/10.1080/01621459.1937.10503522
DOI: https://doi.org/10.2478/acss-2022-0015 | Journal eISSN: 2255-8691 | Journal ISSN: 2255-8683
Language: English
Page range: 137 - 148
Published on: Jan 24, 2023
In partnership with: Paradigm Publishing Services
Publication frequency: Volume open

© 2023 Meetesh Nevendra, Pradeep Singh, published by Riga Technical University
This work is licensed under the Creative Commons Attribution 4.0 License.