Abstract
Critical failures and financial losses are two of the biggest problems with bug detection. Traditional debugging methods don’t work well as the problem gets more complicated. Large language models and deep learning have lately given promise to the automation of finding and fixing problems in software. These systems are better at finding defects and making patches than traditional methods because they learn syntactic and semantic patterns. This study presents a systematic review of benchmark datasets, detection algorithms, and repair frameworks published between 2018 and 2025. This article compares the creation of models based on graphs and tokens with that of transformer architectures and large language model -driven methodologies. It also talks about their pros and cons and how they are used in the real world. The paper also discusses unresolved challenges related to explainability, accuracy guarantees, and cross-project generalization. It also talks about scalability, validation, and evaluation metrics. It identifies research deficiencies and delineates prospective avenues for developing more reliable and robust software systems by integrating contemporary breakthroughs and offering a current summary of automated debugging research utilizing deep learning and large language model methodologies.
