Have a personal or library account? Click to login

Analysing Software Quality of AI-Translated Code: A Comparative Study of Large Language Models Using Static Analysis

Open Access
|Oct 2025

Abstract

Context: Source code translation enables cross-platform compatibility, code reusability, legacy system migration, and developer collaboration. Numerous state-of-the-art techniques have emerged to address demand for efficient and accurate translation methodologies.

Objective: This study compares code translation capabilities of Large Language Models (LLMs), specifically DeepSeek R1 and ChatGPT 4.1, evaluating their proficiency in translating code between programming languages. We systematically assess model outputs through quantitative and qualitative measures, focusing on translation accuracy, execution efficiency, and coding standard conformity. By examining each model’s strengths and limitations, this work provides insights into their applicability for various translation scenarios and contributes to discourse on LLM potential in software engineering.

Method: We evaluated translation quality from ChatGPT 4.1 and DeepSeek R1 using SonarQube Analyzer to identify strengths and weaknesses through comprehensive software metrics including translation accuracy, code quality, and clean code attributes. SonarQube’s framework enables objective quantification of maintainability, reliability, technical debt, and code smells which are critical factors in software quality measurement. The protocol involved randomly sampling 500 code instances from 1695 Java programming problems. Java samples were translated to Python by both models, then analysed quantitatively using SonarQube metrics to evaluate adherence to software engineering best practices.

Results: This comparative analysis reveals capabilities and limitations of state-of-the-art LLM-based translation systems, providing developers, researchers, and practitioners actionable guidance for model selection. Identified gaps highlight future research directions in automated code translation. Result s demonstrate that DeepSeek R1 consistently generates superior software quality compared to ChatGPT 4.1 across Sonar-Qube metrics.

DOI: https://doi.org/10.2478/acss-2025-0013 | Journal eISSN: 2255-8691 | Journal ISSN: 2255-8683
Language: English
Page range: 105 - 121
Submitted on: Jul 19, 2025
Accepted on: Oct 3, 2025
Published on: Oct 23, 2025
Published by: Riga Technical University
In partnership with: Paradigm Publishing Services
Publication frequency: 1 times per year

© 2025 Vikram Bhutani, Farshad Ghassemi Toosi, Jim Buckley, published by Riga Technical University
This work is licensed under the Creative Commons Attribution 4.0 License.