Cross-Project Defect Prediction with Metrics Selection and Balancing Approach

Nevendra, Meetesh; Singh, Pradeep

doi:10.2478/acss-2022-0015

References

[1] T. Menzies, B. Turhan, A. Bener, G. Gay, B. Cukic, and Y. Jiang, “Implications of ceiling effects in defect predictors,” in Proceedings of the 4th international workshop on Predictor models in software engineering - PROMISE ’08, 2008, pp. 47–54. https://doi.org/10.1145/1370788.1370801
Search in Google Scholar Back to article
[2] M. Nevendra and P. Singh, “Software defect prediction using deep learning,” Acta Polytech. Hungarica, vol. 18, no. 10, pp. 173–189, 2021. https://doi.org/10.12700/aph.18.10.2021.10.9
Search in Google Scholar Back to article
[3] M. Shepperd, D. Bowes, and T. Hall, “Researcher Bias: The use of machine learning in software defect prediction,” IEEE Trans. Softw. Eng., vol. 40, no. 6, pp. 603–616, Jun. 2014. https://doi.org/10.1109/TSE.2014.2322358
Search in Google Scholar Back to article
[4] T. Wang, W. Li, H. Shi, and Z. Liu, “Software defect prediction based on classifiers ensemble,” J. Inf. C. Sci., vol. 8, no. 16, pp. 4241–4254, 2011.
Search in Google Scholar Back to article
[5] J. Nam, S. J. Pan, and S. Kim, “Transfer defect learning,” in 2013 35th International Conference on Software Engineering (ICSE), San Francisco, CA, USA, May 2013, pp. 382–391. https://doi.org/10.1109/ICSE.2013.6606584
Search in Google Scholar Back to article
[6] X. Xia, D. Lo, S. J. Pan, N. Nagappan, and X. Wang, “HYDRA: Massively compositional model for cross-project defect prediction,” IEEE Trans. Softw. Eng., vol. 42, no. 10, pp. 977–998, Oct. 2016. https://doi.org/10.1109/TSE.2016.2543218
Search in Google Scholar Back to article
[7] Z. He, F. Shu, Y. Yang, M. Li, and Q. Wang, “An investigation on the feasibility of cross-project defect prediction,” Autom. Softw. Eng., vol. 19, no. 2, pp. 167–199, Jul. 2012.https://doi.org/10.1007/s10515-011-0090-3
Search in Google Scholar Back to article
[8] H. Chen, X. Jing, Z. Li, D. Wu, Y. Peng, and Z. Huang, “An empirical study on heterogeneous defect prediction approaches,” IEEE Trans. Softw. Eng., vol. 47, no. 12, pp. 2803–2822, Jan. 2020. https://doi.org/10.1109/TSE.2020.2968520
Search in Google Scholar Back to article
[9] H. Turabieh, M. Mafarja, and X. Li, “Iterated feature selection algorithms with layered recurrent neural network for software fault prediction,” Expert Syst. Appl., vol. 122, pp. 27–42, May 2019. https://doi.org/10.1016/j.eswa.2018.12.033
Search in Google Scholar Back to article
[10] T. Hall, S. Beecham, D. Bowes, D. Gray, and S. Counsell, “A systematic literature review on fault prediction performance in software engineering,” IEEE Trans. Softw. Eng., vol. 38, no. 6, pp. 1276–1304, Oct. 2012. https://doi.org/10.1109/TSE.2011.103
Search in Google Scholar Back to article
[11] S. Hosseini, B. Turhan, and D. Gunarathna, “A systematic literature review and meta-analysis on cross project defect prediction,” IEEE Trans. Softw. Eng., vol. 45, no. 2, pp. 111–147, Nov. 2019. https://doi.org/10.1109/TSE.2017.2770124
Search in Google Scholar Back to article
[12] F.J. Provost, “Machine learning from imbalanced data sets 101,” inProc. AAAI’2000 Workshop onImbalanced Data Sets, 2000, pp. 1–3.
Search in Google Scholar Back to article
[13] D. Ryu, J. I. Jang, and J. Baik, “A transfer cost-sensitive boosting approach for cross-project defect prediction,” Softw. Qual. J., vol. 25, no. 1, pp. 235–272, Sep. 2017. https://doi.org/10.1007/s11219-015-9287-1
Search in Google Scholar Back to article
[14] S. Wang and X. Yao, “Using class imbalance learning for software defect prediction,” IEEE Trans. Reliab., vol. 62, no. 2, pp. 434–443, Apr. 2013. https://doi.org/10.1109/TR.2013.2259203
Search in Google Scholar Back to article
[15] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic minority over-sampling technique,” J. Artif. Intell. Res., vol. 16, pp. 321–357, Jun. 2002. https://doi.org/10.1613/jair.953
Search in Google Scholar Back to article
[16] H. He, Y. Bai, E. A. Garcia, and S. Li, “ADASYN: Adaptive synthetic sampling approach for imbalanced learning,” inProc. Int. Jt. Conf. Neural Networks, Hong Kong, Jun. 2008, pp. 1322–1328. https://doi.org/10.1109/IJCNN.2008.4633969
Search in Google Scholar Back to article
[17] A. Agrawal and T. Menzies, “Is ‘better data’ better than ‘better data miners’?: on the benefits of tuning SMOTE for defect prediction,” in2018 IEEE/ACM 40th Int. Conf. Softw. Eng., May 2018, pp. 1050–1061. https://doi.org/10.1145/3180155.3180197
Search in Google Scholar Back to article
[18] C. Tantithamthavorn and A. E. Hassan, “The impact of class rebalancing techniques on the performance and interpretation of defect prediction models,” IEEE Trans. Softw. Eng., 2018. [Online]/Available: chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://chakkrit.com/assets/papers/tantithamthavorn2018imbalance.pdf
Search in Google Scholar Back to article
[19] S. S. Rathore and S. Kumar, “Linear and non-linear heterogeneous ensemble methods to predict the number of faults in software systems,” Knowledge-Based Syst., vol. 119, pp. 232–256, Mar. 2017. https://doi.org/10.1016/j.knosys.2016.12.017
Search in Google Scholar Back to article
[20] S. S. Rathore and S. Kumar, “An approach for the prediction of number of software faults based on the dynamic selection of learning techniques,” IEEE Trans. Reliab., vol. 68, no. 1, pp. 216–236, Aug. 2018. https://doi.org/10.1109/TR.2018.2864206
Search in Google Scholar Back to article
[21] T. G. Dietterich, “Ensemble methods in machine learning,” in Multiple Classifier Systems, MCS 2000. Lecture Notes in Computer Science, vol 1857. Springer, Berlin, Heidelberg, Jan. 2000, pp. 1–15. https://doi.org/10.1007/3-540-45014-9_1
Search in Google Scholar Back to article
[22] Q. Song, Z. Jia, M. Shepperd, S. Ying, and J. Liu, “A general software defect-proneness prediction framework,” IEEE Trans. Softw. Eng., vol. 37, no. 3, pp. 356–370, 2011. https://doi.org/10.1109/TSE.2010.90
Search in Google Scholar Back to article
[23] X. Rong, F. Li, and Z. Cui, “A model for software defect prediction using support vector machine based on CBA,” Int. J. Intell. Syst. Technol. Appl., vol. 15, no. 1, pp. 19–34, Apr. 2016. https://doi.org/10.1504/IJISTA.2016.076102
Search in Google Scholar Back to article
[24] K. Dejaeger, T. Verbraken, and B. Baesens, “Toward comprehensible software fault prediction models using bayesian network classifiers,” IEEE Trans. Softw. Eng., vol. 39, no. 2, pp. 237–257, Apr. 2013. https://doi.org/10.1109/TSE.2012.20
Search in Google Scholar Back to article
[25] T. Zimmermann, N. Nagappan, H. Gall, E. Giger, and B. Murphy, “Cross-project defect prediction: a large scale experiment on data vs. domain vs. process,” in Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering,Aug. 2009, pp. 91–100. https://doi.org/10.1145/1595696.1595713
Search in Google Scholar Back to article
[26] B. Turhan, T. Menzies, A. B. Bener, and J. Di Stefano, “On the relative value of cross-company and within-company data for defect prediction,” Empir. Softw. Eng., vol. 14, pp. 540–578, Jan. 2009. https://doi.org/10.1007/s10664-008-9103-7
Search in Google Scholar Back to article
[27] B. Turhan, A. T. Misirli, and A. Bener, “Empirical evaluation of the effects of mixed project data on learning defect predictors,” Inf. Softw. Technol., vol. 55, no. 6, pp. 1101–1118, Jun. 2013. https://doi.org/10.1016/j.infsof.2012.10.003
Search in Google Scholar Back to article
[28] P. He, Y. Ma, and B. Li, “TDSelector: a training data selection method for cross-project defect prediction,” 2016.[Online]. Available: https://arxiv.org/ftp/arxiv/papers/1612/1612.09065.pdf
Search in Google Scholar Back to article
[29] S. Herbold, A. Trautsch, and J. Grabowski, “A comparative study to benchmark cross-project defect prediction approaches,” IEEE Trans. Softw. Eng., vol. 44, no. 9, pp. 811–833, Sep. 2018. https://doi.org/10.1109/TSE.2017.2724538
Search in Google Scholar Back to article
[30] J. Nam, W. Fu, S. Kim, T. Menzies, and L. Tan, “Heterogeneous defect prediction,” IEEE Trans. Softw. Eng., vol. 44, no. 9, pp. 874–896, Sep. 2018. https://doi.org/10.1109/TSE.2017.2720603
Search in Google Scholar Back to article
[31] P. He, B. Li, X. Liu, J. Chen, and Y. Ma, “An empirical study on software defect prediction with a simplified metric set,” Inf. Softw. Technol., vol. 59, pp. 170–190, Mar. 2015. https://doi.org/10.1016/j.infsof.2014.11.006
Search in Google Scholar Back to article
[32] Y. Zhang, D. Lo, X. Xia, and J. Sun, “An empirical study of classifier combination for cross-project defect prediction,” in 2015 IEEE 39th Annual Computer Software and Applications Conference, vol. 2, Taichung, Taiwan, Jul. 2015, pp. 264–269. https://doi.org/10.1109/COMPSAC.2015.58
Search in Google Scholar Back to article
[33] C. Ni, W. S. Liu, X. Chen, Q. Gu, D. X. Chen, and Q. G. Huang, “A cluster based feature selection method for cross-project software defect prediction,” J. Comput. Sci. Technol., vol. 32, no. 6, pp. 1090–1107, Dec. 2017. https://doi.org/10.1007/s11390-017-1785-0
Search in Google Scholar Back to article
[34] B. Turhan, “On the dataset shift problem in software engineering prediction models,” Empir. Softw. Eng., vol. 17, no. 1–2, pp. 62–74, Oct. 2012. https://doi.org/10.1007/s10664-011-9182-8
Search in Google Scholar Back to article
[35] Q. Song, Y. Guo, and M. Shepperd, “A comprehensive investigation of the role of imbalanced learning for software defect prediction,” IEEE Trans. Softw. Eng., vol. 45, no. 12, Dec. 2018. https://doi.org/10.1109/TSE.2018.2836442
Search in Google Scholar Back to article
[36] T. Fukushima, Y. Kamei, S. McIntosh, K. Yamashita, and N. Ubayashi, “An empirical study of just-in-time defect prediction using cross-project models,” in Proceedings of the 11th Working Conference on Mining Software Repositories –MSR 2014, May 2014, pp. 172–181. https://doi.org/10.1145/2597073.2597075
Search in Google Scholar Back to article
[37] X.-Y. Jing, F. Wu, X. Dong, and B. Xu, “An improved SDA based defect prediction framework for both within-project and cross-project class-imbalance problems,” IEEE Trans. Softw. Eng., vol. 43, no. 4, pp. 321–339, Apr. 2017. https://doi.org/10.1109/TSE.2016.2597849
Search in Google Scholar Back to article
[38] A. K. Shukla, P. Singh, and M. Vardhan, “A hybrid gene selection method for microarray recognition,” Biocybern. Biomed. Eng., vol. 38, no. 4, pp. 975–991, 2018. https://doi.org/10.1016/j.bbe.2018.08.004
Search in Google Scholar Back to article
[39] C. Ding and H. Peng, “Minimum redundancy feature selection from microarray gene expression data,” inBioinforma. Conf. 2003.CSB 2003. Proceedings of the 2003 IEEE Bioinformatics Conference, Stanford, CA, USA, Aug. 2003, pp. 523–528. https://doi.org/10.1109/CSB.2003.1227396
Search in Google Scholar Back to article
[40] N. Settouti, M. E. A. Bechar, and M. A. Chikh, “Statistical comparisons of the top 10 algorithms in data mining for classification task,” Int. J. Interact. Multimed. Artif. Intell., vol. 4, no. 1, pp. 46–51, Sep. 2016. https://doi.org/10.9781/ijimai.2016.419
Search in Google Scholar Back to article
[41] R. Malhotra, “A systematic review of machine learning techniques for software fault prediction,” Appl. Soft Comput. J., vol. 27, pp. 504–518, Feb. 2015. https://doi.org/10.1016/j.asoc.2014.11.023
Search in Google Scholar Back to article
[42] Z. Zhang and X. Xie, “Research on AdaBoost.M1 with Random Forest,” in ICCET 2010–2010 Int. Conf. Comput. Eng. Technol. Proc., vol. 1, Chengdu, Apr. 2010, pp. V1-647–V1-652. https://doi.org/10.1109/ICCET.2010.5485910
Search in Google Scholar Back to article
[43] C. Tantithamthavorn, S. McIntosh, A. E. Hassan, and K. Matsumoto, “The impact of automated parameter optimization on defect prediction models,” IEEE Trans. Softw. Eng., vol. 45, no. 7, pp. 683–711, Jul. 2019. https://doi.org/10.1109/TSE.2018.2794977
Search in Google Scholar Back to article
[44] M. Jureczko and L. Madeyski, “Towards identifying software project clusters with regard to defect prediction,” inProc. 6th Int. Conf. Predict. Model. Softw. Eng. –PROMISE’10, Art. no.9, Sep. 2010, pp. 1–10. https://doi.org/10.1145/1868328.1868342
Search in Google Scholar Back to article
[45] G. Boetticher, T. Menzies, and T. Ostrand, “Tera-PROMISE: Welcome to one of the largest repositories of SE research data.” [Online]. Available: http://openscience.us/repo/.Accessed on: Nov. 30, 2017.
Search in Google Scholar Back to article
[46] Z. He, F. Peters, T. Menzies, and Y. Yang, “Learning from open-source projects: An empirical study on defect prediction,” inInt. Symp. Empir. Softw. Eng. Meas., Baltimore, MD, USA, Oct. 2013, pp. 45–54. https://doi.org/10.1109/ESEM.2013.20
Search in Google Scholar Back to article
[47] Y. Liu, T. M. Khoshgoftaar, and N. Seliya, “Evolutionary optimization of software quality modeling with multiple repositories,” IEEE Trans. Softw. Eng., vol. 36, no. 6, pp. 852–864, May 2010. https://doi.org/10.1109/TSE.2010.51
Search in Google Scholar Back to article
[48] G. Canfora, A. De Lucia, M. Di Penta, R. Oliveto, A. Panichella, and S. Panichella, “Multi -objective cross-project defect prediction,” in Proceedings –IEEE 6th International Conference on Software Testing, Verification and Validation, ICST 2013, Luxembourg, Mar. 2013, pp. 252–261. https://doi.org/10.1109/ICST.2013.38
Search in Google Scholar Back to article
[49] A. Panichella, R. Oliveto, and A. De Lucia, “Cross-project defect prediction models: L’Union fait la force,” in 2014 Software Evolution Week –IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE), Antwerpen, Belgium, Feb. 2014, pp. 164–173. https://doi.org/10.1109/CSMR-WCRE.2014.6747166
Search in Google Scholar Back to article
[50] J. Demšar, “Statistical comparisons of classifiers over multiple data sets,” J. Mach. Learn. Res., vol. 7, no. 4, pp. 1–30, Aug. 2006. https://doi.org/10.1016/j.jecp.2010.03.00520451214
Search in Google Scholar Back to article
[51] E. G. Jelihovschi, J. C. Faria, and I. B. Allaman, “The ScottKnott clustering algorithm,” Univ. Estadual St. Cruz-UESC, Ilheus, Bahia, Bras., 2014.
Search in Google Scholar Back to article
[52] Q. Yu, J. Qian, S. Jiang, Z. Wu, and G. Zhang, “An empirical study on the effectiveness of feature selection for cross-project defect prediction,” IEEE Access, vol. 7, pp. 35710–35718, Jan. 2019. https://doi.org/10.1109/ACCESS.2019.2895614
Search in Google Scholar Back to article
[53] M.Friedman, “The use of ranks to avoid the assumption of normality implicit in the analysis of variance,” J. Am. Stat. Assoc., vol. 32, no. 200, pp. 675–701, 1937. https://doi.org/10.1080/01621459.1937.10503522
Search in Google Scholar Back to article

Cross-Project Defect Prediction with Metrics Selection and Balancing Approach

References

Paradigm

My account