Automatic detection of technical debt in large-scale java codebases: a multi-model deep learning methodology for enhanced software quality

Bagane, Pooja; Sengar, Chahak; Dongre, Sumedh; Prabhakar, Siddharth; Jebessa, Obsa Amenu

doi:10.2478/ijssis-2025-0012

Abstract

Management of technical debt (TD) is crucial in long-term software projects for sustaining code quality. We proposed an effective deep learning-based approach to automatically detect and analyze self-admitted TD from large-scale Java codebases. Using a dataset consisting of over 55 million Java source files, we have designed several insightful machine learning models, including random forest, gradient boosting, long short-term memory, and gated recurrent unit, for making predictions about the presence and severity regarding TD. This proposed approach automates the risky component identification; therefore, one can manage TD proactively, thus reducing its costs and augmenting the overall project outcomes. Our results also confirm that these models have much increased detection accuracies of TD, thus giving a lot back to the software engineering domain.

References

D. Tsoukalas, “Machine learning for technical debt identification,” IEEE Transactions on Software Engineering, p. 1, Jan. 2021, doi: 10.1109/tse.2021.3129355.
Open DOI Search in Google Scholar Back to article
Y. Li, M. Soliman, and P. Avgeriou, “Identification and Remediation of Self-Admitted Technical Debt in Issue Trackers,” 46th Euromicro Conference on Software Engineering and Advanced Applications (SEAA), pp. 495–503, Aug. 2020, doi: 10.1109/seaa51224.2020.00083.
Open DOI Search in Google Scholar Back to article
Z. Liu, Q. Huang, X. Xia, E. Shihab, D. Lo, and S. Li, “SATD detector,” 2018 IEEE/ACM 40th International Conference on Software Engineering, May 2018, doi: 10.1145/3183440.3183478.
Open DOI Search in Google Scholar Back to article
J. Tan, D. Feitosa and P. Avgeriou, “The life-cycle of Technical Debt that manifests in both source code and issue trackers”, Information and Software Technology, Volume 159, 2023, 107216, ISSN 0950-5849, doi: 10.1016/j.infsof.2023.107216.
Open DOI Search in Google Scholar Back to article
L. Xavier, F. Ferreira, R. Brito and M. Valente, “Beyond the Code: Mining Self-Admitted Technical Debt in Issue Tracker Systems,” in 2020 IEEE/ACM 17th International Conference on Mining Software Repositories (MSR), Seoul, Korea, Republic of, 2020 pp. 137–146. doi: 10.1145/3379597.3387459
Open DOI Search in Google Scholar Back to article
W. S. Tan, M. Wagner, and C. Treude, “Detecting outdated code element references in software repository documentation,” arXiv (Cornell University), Jan. 2022, doi: 10.48550/arxiv.2212.01479.
Open DOI Search in Google Scholar Back to article
Y. Li, M. Soliman, and P. Avgeriou, “Automatic identification of self-admitted technical debt from four different sources,” Empirical Software Engineering, vol. 28, no. 3, Apr. 2023, doi: 10.1007/s10664-023-10297-9.
Open DOI Search in Google Scholar Back to article
F. Zampetti, A. Serebrenik and M. Di Penta, “Automatically Learning Patterns for Self-Admitted Technical Debt Removal,” in 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER), London, ON, Canada, 2020 pp. 355–366. doi: 10.1109/SANER48275.2020.9054868
Open DOI Search in Google Scholar Back to article
P. Bagane, C. Sengar, S. Dongre, S. Prabhakar, S. Baldua, and S. Gurav, ‘Total Electron Content Forecasting in Low Latitude Regions of India: Machine and Deep Learning Synergy’, Communications in Computer and Information Science, vol. 2054 CCIS, pp. 104–119, 2024. doi: 10.1007/978-3-031-56703-2_9
Open DOI Search in Google Scholar Back to article
P. Bagane, M. Thawani, P. Singh, R. Ahmad, R. Mital, and O. A. Jebessa, ‘Breaking the Silence: An innovative ASL to Text Conversion System Leveraging Computer Vision & Machine Learning for Enhanced Communication’, International Journal of Intelligent Systems and Applications in Engineering, vol. 12, no. 14s, pp. 246–255, 2024.
Search in Google Scholar Back to article
E. Gama, S. Freire, M. Mendonça, R. O. Spínola, M. Paixao, and M. I. Cortés, ‘Using Stack Overflow to Assess Technical Debt Identification on Software Projects’. In Proceedings of the XXXIV Brazilian Symposium on Software Engineering (SBES ‘20). Association for Computing Machinery, New York, NY, USA, 2020, pp. 730–739. doi: 10.1145/3422392.3422429
Open DOI Search in Google Scholar Back to article
F. Bi, B. Vogel-Heuser, Z. Huang, F. Ocker ‘Characteristics, causes, and consequences of technical debt in the automation domain’, Journal of Systems and Software, vol. 204, 2023. doi: 10.1016/j.jss.2023.111725
Open DOI Search in Google Scholar Back to article
C. Jaspan and C. Green, “Defining, Measuring, and Managing Technical Debt,” in IEEE Software, vol. 40, no. 3, pp. 15–19, May–June 2023, doi: 10.1109/MS.2023.3242137.
Open DOI Search in Google Scholar Back to article
D. Pina, A. Goldman and G. Tonin, “Technical Debt Prioritization: Taxonomy, Methods Results, and Practical Characteristics,” 2021 47th Euromicro Conference on Software Engineering and Advanced Applications (SEAA), Palermo, Italy, 2021, pp. 206–213, doi: 10.1109/SEAA53835.2021.00034.
Open DOI Search in Google Scholar Back to article
J. S. De Jesus and A. C. V. De Melo, “Technical Debt and the Software Project Characteristics. A Repository-Based Exploratory Analysis,” 2017 IEEE 19th Conference on Business Informatics (CBI), Thessaloniki, Greece, pp. 444–453, Jul. 2017, doi: 10.1109/cbi.2017.62.
Open DOI Search in Google Scholar Back to article

Automatic detection of technical debt in large-scale java codebases: a multi-model deep learning methodology for enhanced software quality

Abstract

Paradigm

My account