Skip to main content
Have a personal or library account? Click to login
An Empirical Study of Automated Machine Learning Python Libraries Using Source Code Analysis Cover

An Empirical Study of Automated Machine Learning Python Libraries Using Source Code Analysis

Open Access
|Jun 2026

Abstract

The growth of Automated Machine Learning (AutoML) has expanded access to machine learning workflows by enabling the automation of tasks and reducing the technical barrier to entry. However, the reliability and maintainability of these libraries depend on the quality of their underlying source code. This study presents a novel, systematic analysis of 16 Python AutoML libraries utilising SonarQube – an industry-standard SCA platform – and Python analysis tools: Bandit, Coverage.py, Prospector, Pylint, Radon, and Ruff. The AutoML Libraries are evaluated using software quality metrics, which collectively reflect overall code complexity, maintainability, security, and adherence to Python coding standards.

Strong agreement was observed between SonarQube-based rankings and rankings derived from Python-based tools. Based on median SCA rankings, the libraries were ordered (highest to lowest estimated code quality) as follows: Hyperopt-sklearn, AutoKeras, GAMA, MLBox, FEDOT, TPOT, MLJAR, LightAutoML, Auto-sklearn, PyCaret, FLAML, Auto-PyTorch, Ludwig, EvalML, AutoTS, and AutoGluon.

An additional exploratory Spearman rank correlation analysis examined the relationship between SCA metrics and forecasting performance measures from a prior electricity price prediction benchmark (n = 7). Several SCA metrics exhibit strong monotonic relationships with forecasting error measures, e.g., SonarQube Violations and Code Smells correlate positively with mean absolute error (ρ = 0.86), while Class Cyclomatic Complexity (ρ = −0.89) and Duplicated Files (ρ = − 0.86) correlate negatively with library execution time. Due to the limited sample size, these findings are descriptive and non-parametric. The results suggest that code quality scores may relate to lower-bound predictive performance and computational efficiency, warranting further validation.

DOI: https://doi.org/10.2478/acss-2026-0009 | Journal eISSN: 2255-8691 | Journal ISSN: 2255-8683
Language: English
Page range: 95 - 115
Submitted on: Feb 19, 2026
Accepted on: May 14, 2026
Published on: Jun 2, 2026
In partnership with: Paradigm Publishing Services
Publication frequency: Volume open

© 2026 Christian O’Leary, Conor Lynch, Farshad Ghassemi Toosi, published by Riga Technical University
This work is licensed under the Creative Commons Attribution 4.0 License.