Analysing Software Quality of AI-Translated Code: A Comparative Study of Large Language Models Using Static Analysis

Bhutani, Vikram; Toosi, Farshad Ghassemi; Buckley, Jim

Analysing Software Quality of AI-Translated Code: A Comparative Study of Large Language Models Using Static Analysis

Applied Computer Systems

Volume 30 (2025): Issue 1 (January 2025)

By:

Vikram Bhutani

, Farshad Ghassemi Toosi

and Jim Buckley

Open Access

|Oct 2025

Abstract

Context: Source code translation enables cross-platform compatibility, code reusability, legacy system migration, and developer collaboration. Numerous state-of-the-art techniques have emerged to address demand for efficient and accurate translation methodologies.

Objective: This study compares code translation capabilities of Large Language Models (LLMs), specifically DeepSeek R1 and ChatGPT 4.1, evaluating their proficiency in translating code between programming languages. We systematically assess model outputs through quantitative and qualitative measures, focusing on translation accuracy, execution efficiency, and coding standard conformity. By examining each model’s strengths and limitations, this work provides insights into their applicability for various translation scenarios and contributes to discourse on LLM potential in software engineering.

Method: We evaluated translation quality from ChatGPT 4.1 and DeepSeek R1 using SonarQube Analyzer to identify strengths and weaknesses through comprehensive software metrics including translation accuracy, code quality, and clean code attributes. SonarQube’s framework enables objective quantification of maintainability, reliability, technical debt, and code smells which are critical factors in software quality measurement. The protocol involved randomly sampling 500 code instances from 1695 Java programming problems. Java samples were translated to Python by both models, then analysed quantitatively using SonarQube metrics to evaluate adherence to software engineering best practices.

Results: This comparative analysis reveals capabilities and limitations of state-of-the-art LLM-based translation systems, providing developers, researchers, and practitioners actionable guidance for model selection. Identified gaps highlight future research directions in automated code translation. Result s demonstrate that DeepSeek R1 consistently generates superior software quality compared to ChatGPT 4.1 across Sonar-Qube metrics.

DOI: https://doi.org/10.2478/acss-2025-0013 | Journal eISSN: 2255-8691 | Journal ISSN: 2255-8683

Journal RSS Feed

Language: English

Page range: 105 - 121

Submitted on: Jul 19, 2025

Accepted on: Oct 3, 2025

Published on: Oct 23, 2025

Published by: Riga Technical University

In partnership with: Paradigm Publishing Services

Publication frequency: 1 times per year

Keywords:

Artificial intelligence,

C++,

Java,

NMT,

programming languages translation,

Python,

source code translation,

SMT

Related subjects:

Computer sciences,

Artificial intelligence,

Information technology,

Project management,

Software development

© 2025 Vikram Bhutani, Farshad Ghassemi Toosi, Jim Buckley, published by Riga Technical University
This work is licensed under the Creative Commons Attribution 4.0 License.

Previous article Volume 30 (2025): Issue 1 (January 2025)Next article