Have a personal or library account? Click to login

Codeguard: Utilizing Advanced Pattern Recognition in Language Models for Software Vulnerability Analysis

By:
Open Access
|Feb 2024

Abstract

Enhancing software quality and security hinges on the effective identification of vulnerabilities in source code. This paper presents a novel approach that combines pattern recognition training with cloze-style examination techniques in a semi-supervised learning framework. Our methodology involves training a language model using the SARD and Devign datasets, which contain numerous examples of vulnerable code. During training, specific code sections are deliberately obscured, challenging the model to predict the hidden tokens. Through rigorous empirical testing, we demonstrate the effectiveness of our approach in accurately identifying code vulnerabilities. Our results highlight the significant advantages of employing pattern recognition training alongside cloze-style questioning, leading to improved accuracy in detecting vulnerabilities in source code.

DOI: https://doi.org/10.2478/raft-2024-0011 | Journal eISSN: 3100-5071 | Journal ISSN: 3100-5063
Language: English
Page range: 108 - 118
Published on: Feb 28, 2024
Published by: Nicolae Balcescu Land Forces Academy
In partnership with: Paradigm Publishing Services
Publication frequency: 4 times per year

© 2024 Rebet Jones, Marwan Omar, published by Nicolae Balcescu Land Forces Academy
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.