Skip to main content
Have a personal or library account? Click to login
An Empirical Study of Automated Machine Learning Python Libraries Using Source Code Analysis Cover

An Empirical Study of Automated Machine Learning Python Libraries Using Source Code Analysis

Open Access
|Jun 2026

References

  1. C. McHugh, S. Coleman, and D. Kerr, “Hourly electricity price forecasting with NARMAX,” Machine Learning with Applications, vol. 9, Sep. 2022, Art. no. 100383. https://doi.org/10.1016/j.mlwa.2022.100383
  2. A. Bauer, M. Züfle, S. Eismann, J. Grohmann, N. Herbst, and S. Kounev, “Libra: A benchmark for time series forecasting methods,” in Proceedings of the ACM/SPEC International Conference on Performance Engineering, USA, Apr. 2021, pp. 189–200. https://doi.org/10.1145/3427921.3450241
  3. X. Zhang et al., “Robust log-based anomaly detection on unstable log data,” in Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Estonia, Aug. 2019, pp. 807–817. https://doi.org/10.1145/3338906.3338931
  4. A. Warzynski, L. Falas, and P. Schauer, “Excess-mass and mass-volume anomaly detection algorithms applicability in unsupervised intrusion detection systems,” in 30th International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises, Bayonne, France, Oct. 2021, pp. 131–136. https://doi.org/10.1109/WETICE53228.2021.00035
  5. H. Ismail Fawaz, G. Forestier, J. Weber, L. Idoumghar, and P.-A. Muller, “Deep learning for time series classification: a review,” Data Mining and Knowledge Discovery, vol. 33, pp. 917–963, Mar. 2019. https://doi.org/10.1007/s10618-019-00619-1
  6. N. Mohammadi Foumani, L. Miller, C. W. Tan, G. I. Webb, G. Forestier, and M. Salehi, “Deep learning for time series classification and extrinsic regression: A current survey,” ACM Comput. Surv., vol. 56, no. 9, Apr. 2024, Art. no. 217. https://dl.acm.org/doi/10.1145/3649448
  7. D. Nam, A. Macvean, V. Hellendoorn, B. Vasilescu, and B. Myers, “Using an LLM to help with code understanding” in Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, USA, Apr. 2024, pp. 1–13. https://doi.org/10.1145/3597503.3639187
  8. L. Pepino, P. Riera, L. Ferrer, and A. Gravano, “Fusion approaches for emotion recognition from speech using acoustic and text-based features,” in ICASSP 2020, Barcelona, Spain, Apr. 2020, pp. 6484–6488. https://doi.org/10.1109/ICASSP40776.2020.9054709
  9. M. Schubert, T. Riedlinger, K. Kahl, D. Kröll, S. Schoenen, S. Šegvic, and M. Rottmann, “Identifying label errors in object detection datasets by loss inspection,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, USA, Jan. 2024, pp. 4570–4579. http://doi.org/10.1109/WACV57701.2024.00452
  10. K. Chachula, J. Lyskawa, B. Olber, P. Fratczak, A. Popowicz, and K. Radlak, “Combating noisy labels in object detection datasets,” arXiv:2211.13993, Dec. 2023. https://doi.org/10.48550/arXiv.2211.13993
  11. A. Thessen, “Adoption of machine learning techniques in ecology and Earth science,” One Ecosystem, vol. 1, Jun. 2016, Art. no. e8621. https://doi.org/10.3897/oneeco.1.e8621
  12. A. Alsharef, K. Aggarwal, Sonia, M. Kumar, and A. Mishra, “Review of ML and AutoML solutions to forecast time-series data,” Archives of Computational Methods in Engineering, vol. 29, pp. 5297–5311, Nov. 2022. https://doi.org/10.1007/s11831-022-09765-0
  13. Stack Exchange Inc, “Stack Overflow Developer Survey 2023,” 2023. [Online]. Available: https://survey.stackoverflow.co/2023/
  14. Stack Exchange Inc, “Stack Overflow Developer Survey 2024,” 2024. [Online]. Available: https://survey.stackoverflow.co/2024/
  15. H. A. M. Salih and Q. I. Sarhan, “A study of large language models in detecting Python code violations,” ARO – The Scientific Journal of Koya University, vol. 13, no. 2, pp. 215–225, Oct. 2025. https://doi.org/10.14500/aro.12395
  16. N. Erickson, J. Mueller, A. Shirkov, H. Zhang, P. Larroy, M. Li, and A. Smola, “AutoGluon-Tabular: Robust and accurate AutoML for structured data,” arXiv:2003.06505, Mar. 2020. https://doi.org/10.48550/arXiv.2003.06505
  17. H. Jin, F. Chollet, Q. Song, and X. Hu, “AutoKeras: An AutoML library for deep learning,” Journal of Machine Learning Research, vol. 24, no. 6, pp. 1–6, 2023. [Online]. Available: http://jmlr.org/papers/v24/20-1355.html
  18. C. Catlin, “winedarksea/AutoTS,” 2025, original date: 2019-11-26. [Online]. Available: https://github.com/winedarksea/AutoTS
  19. L. Zimmer, M. Lindauer, and F. Hutter, “Auto-Pytorch: Multi-fidelity MetaLearning for efficient and robust AutoDL,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 9, pp. 3079–3090, Sep. 2021. https://doi.org/10.1109/TPAMI.2021.3067763
  20. M. Feurer, K. Eggensperger, S. Falkner, M. Lindauer, and F. Hutter, “Auto-Sklearn 2.0: Hands-free AutoML via Meta-Learning,” arXiv:2007.04074, Oct. 2022. https://doi.org/10.48550/arXiv.2007.04074
  21. Alteryx, “alteryx/evalml,” Aug. 2022, originaldate: 2019-07-17. [Online]. Available: https://github.com/alteryx/evalml
  22. N. O. Nikitin et al., “Automated evolutionary approach for the design of composite machine learning pipelines,” Future Generation Computer Systems, vol. 127, pp. 109–125, Feb. 2022. https://doi.org/10.1016/j.future.2021.08.022
  23. C. Wang, Q. Wu, M. Weimer, and E. E. Zhu, “FLAML: A fast and lightweight AutoML library,” in Proceedings of the Fourth Conference on Machine Learning and Systems, MLSys 2021, Apr. 2021. [Online]. Available: https://www.microsoft.com/en-us/research/publication/flamla-fast-and-lightweight-automl-library/
  24. P. Gijsbers and J. Vanschoren, “GAMA: Genetic automated machine learning assistant,” Journal of Open Source Software, vol. 4, no. 33, Jan. 2019, Art. no. 1132. https://doi.org/10.21105/joss.01132
  25. B. Komer, J. Bergstra, and C. Eliasmith, “HyperoptSklearn,” in Automated Machine Learning: Methods, Systems, Challenges, ser. The Springer Series on Challenges in Machine Learning, F. Hutter, L. Kotthoff, and J. Vanschoren, Eds. Cham: Springer International Publishing, 2019, pp. 97–111. https://doi.org/10.1007/978-3-030-05318-55
  26. A. Vakhrushev, A. Ryzhkov, M. Savchenko, D. Simakov, R. Damdinov, and A. Tuzhilin, “LightAutoML: AutoML solution for a large financial services ecosystem,” arXiv:2109.01528, Apr. 2022. https://doi.org/10.48550/arXiv.2109.01528
  27. P. Molino, Y. Dudin, and S. S. Miryala, “Ludwig: a typebased declarative deep learning toolbox,” arXiv:1909.07930, Sep. 2019. https://doi.org/10.48550/arXiv.1909.07930
  28. A. De Romblay, “MLBox,” 2025. [Online]. Available: https://github.com/AxeldeRomblay/MLBox
  29. A. Plonska and P. Płonski, “MLJAR: State-of-the-art automated machine learning framework for tabular data,” 2021. [Online]. Available: https://github.com/mljar/mljar-supervised
  30. M. Ali, “PyCaret: An open source, low-code machine learning library in Python,” 2020. [Online]. Available: https://github.com/pycaret/pycaret
  31. R. S. Olson, N. Bartley, R. J. Urbanowicz, and H. Moore, “Evaluation of a tree-based pipeline optimization tool for automating data science,” in Proceedings of the Genetic and Evolutionary Computation Conference, USA, July 2016, pp. 485–492. https://doi.org/10.1145/2908812.2908918
  32. D. Binkley, H. Feild, D. Lawrie, and M. Pighin, “Increasing diversity: Natural language measures for software fault prediction,” Journal of Systems and Software, vol. 82, no. 11, pp. 1793–1803, Nov. 2009. https://doi.org/10.1016/j.jss.2009.06.036
  33. S. Afshan, P. McMinn, and M. Stevenson, “Evolving readable string test inputs using a natural language model to reduce human oracle cost,” in Verification and Validation 2013 IEEE Sixth International Conference on Software Testing, Luxembourg, Mar. 2013, pp. 352–361. https://doi.org/10.1109/ICST.2013.11
  34. N. Medeiros, N. Ivaki, P. Costa, and M. Vieira, “An empirical study on software metrics and machine learning to identify untrustworthy code,” in 2021 17th European Dependable Computing Conference (EDCC), Munich, Germany, Sep. 2021, pp. 87–94. https://doi.org/10.1109/EDCC53658.2021.00020
  35. J. Pantiuchina, M. Lanza, and G. Bavota, “Improving code: The (mis) perception of quality metrics,” in 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME), Madrid, Spain, Sep. 2018, pp. 80–91. https://doi.org/10.1109/ICSME.2018.00017
  36. A. Wingkvist, M. Ericsson, R. Lincke, and W. Löwe, “A metrics-based approach to technical documentation quality,” in QUATIC’10: Proceedings of the 2010 Seventh International Conference on the Quality of Information and Communications Technology, ser. QUATIC’10, Porto, Portugal, Sep.–Oct. 2010, pp. 476–481. https://doi.org/10.1109/QUATIC.2010.88
  37. M. H. Halstead, Elements of Software Science (Operating and programming systems series), 3rd ed. USA: Elsevier Science Inc., 1977.
  38. R. P. Buse and W. R. Weimer, “Learning a metric for code readability,” IEEE Transactions on Software Engineering, vol. 36, no. 4, pp. 546–558, Nov. 2010. http://doi.org/10.1109/TSE.2009.70
  39. SonarSource, “SonarQube,” 2025. [Online]. Available: https://www.sonarqube.org/
  40. Python Code Quality Authority, “Bandit,” 2025. [Online]. Available: https://bandit.readthedocs.io/
  41. N. Batchelder and Contributors to Coverage.py, “Coverage.py: The code coverage tool for Python,” 2025, original-date: 2018-06-23T17:44:53Z. [Online]. Available: https://github.com/nedbat/coveragepy
  42. S. Brunner and C. Crowder, “landscapeio/prospector,” 2025. [Online]. Available: https://github.com/prospector-dev/prospector
  43. M. Murphy, M. O’Mahony, L. Shalloo, P. French, and J. Upton, “Comparison of modelling techniques for milk-production forecasting,” Journal of Dairy Science, vol. 97, no. 6, pp. 3352–3363, Jun. 2014. https://doi.org/10.3168/jds.2013-7451
  44. Michele Lacchia, “Radon 4.1.0 documentation,” 2025. [Online]. Available: https://radon.readthedocs.io/
  45. Charles Marsh, “Ruff,” 2025. [Online]. Available: https://docs.astral.sh/ruff/
  46. F. G. Toosi, “Source code features and their dependencies: An aggregative statistical analysis on open-source Java software systems,” Applied Computer Systems, vol. 28, no. 2, pp. 221–231, Jan. 2024. https://doi.org/10.2478/acss-2023-0022
  47. V. Bhutani, F. G. Toosi, and J. Buckley, “Analysing the analysers: An investigation of source code analysis tools,” Applied Computer Systems, vol. 29, no. 1, pp. 98–111, Jun. 2024. https://doi.org/10.2478/acss-2024-0013
  48. D. Lawrie, H. Feild, and D. Binkley, “Leveraged quality assessment using information retrieval techniques,” in 14th IEEE International Conference on Program Comprehension, Athens, Greece, 2006. https://ieeexplore.ieee.org/abstract/document/1631117
  49. S. Scalabrino, “Automatically assessing and improving code readability and understandability,” PhD dissertation, Università degli Studi del Molise, Campobasso, Italy, 2019. [Online]. Available: https://iris.unimol.it/retrieve/handle/11695/90885/92359/Tesi_S_Scalabrino.pdf
  50. T. McCabe, “A complexity measure,” IEEE Transactions on Software Engineering, vol. SE-2, no. 4, pp. 308–320, Dec. 1976. https://doi.org/10.1109/TSE.1976.233837
  51. S. Chidamber and C. Kemerer, “A metrics suite for object oriented design,” IEEE Transactions on Software Engineering, vol. 20, no. 6, pp. 476–493, June 1994. https://doi.org/10.1109/32.295895
  52. B. Henderson-Sellers, L. L. Constantine, and I. M. Graham, “Coupling and cohesion (towards a valid metrics suite for object-oriented analysis and design),” Object Oriented Systems, vol. 3, no. 3, pp. 143–158, 1996.
  53. A. Marcus, D. Poshyvanyk, and R. Ferenc, “Using the conceptual cohesion of classes for fault prediction in object-oriented systems,” IEEE Transactions on Software Engineering, vol. 34, no. 2, pp. 287–300, Apr. 2008. https://doi.org/10.1109/TSE.2007.70768
  54. F. Deissenbock and M. Pizka, “Concise and consistent naming,” in 13th International Workshop on Program Comprehension (IWPC’05), St. Louis, USA, May 2005, pp. 97–106. http://doi.org/10.1109/WPC.2005.14
  55. T. Brown et al., “Language models are few-shot learners,” in Advances in Neural Information Processing Systems, vol. 33, 2020. https://papers.nips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html
  56. E. Daka, J. Campos, G. Fraser, J. Dorn, and W. Weimer, “Modeling readability to improve unit tests,” in Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, Bergamo Italy, Aug. 2015, pp. 107–118. https://doi.org/10.1145/2786805.2786838
  57. D. Posnett, A. Hindle, and P. Devanbu, “A simpler model of software readability,” in Proceedings International Conference on Software Engineering, USA, May 2011, pp. 73–82. https://doi.org/10.1145/1985441.1985454
  58. S. Scalabrino, M. Linares-Vásquez, D. Poshyvanyk, and R. Oliveto, “Improving code readability models with textual features,” in 2016 IEEE 24th International Conference on Program Comprehension (ICPC), Austin, USA, May 2016, pp. 1–10. https://doi.org/10.1109/ICPC.2016.7503707
  59. G. A. Miller, “WordNet: a lexical database for English,” Communications of the ACM, vol. 38, no. 11, pp. 39–41, Nov. 1995. https://doi.org/10.1145/219717.219748
  60. R. Flesch, “A new readability yardstick,” Journal of Applied Psychology, vol. 32, no. 3, pp. 221–233, 1948. https://doi.org/10.1037/h0057532
  61. H. Alves, B. Fonseca, and N. Antunes, “Software metrics and security vulnerabilities: Dataset and exploratory study,” in 2016 12th European Dependable Computing Conference (EDCC), Gothenburg, Sweden, Sep. 2016, pp. 37–44. https://doi.org/10.1109/EDCC.2016.34
  62. M. M. Mohajer, R. Aleithan, N. S. Harzevili, M. Wei, A. B. Belle, H. V. Pham, and S. Wang, “SkipAnalyzer: A tool for static code analysis with large language models,” arXiv:2310.18532, Dec. 2023. https://doi.org/10.48550/arXiv.2310.18532
  63. “ChatGPT,” 2025. [Online]. Available: https://chatgpt.com
  64. Pylint contributors, “Pylint,” 2025. [Online]. Available: https://github.com/pylint-dev/pylint
  65. PyCQA, “flake8,” Jan. 2026, original-date: 2014-0913T17:06:24Z. [Online]. Available: https://github.com/PyCQA/flake8
  66. “h2o-3,” 2025. [Online]. Available: https://github.com/h2oai/h2o-3
  67. C. O’Leary, C. Lynch, and F. G. Toosi, “A comparative analysis of automated machine learning libraries for electricity price forecasting,” Applied Computer Systems, vol. 29, no. 2, pp. 43–52, Dec. 2024. https://doi.org/10.2478/acss-2024-0020
  68. C. Francois, “Keras: the Python deep learning API,” 2025. [Online]. Available: https://keras.io/
  69. M. Abadi et al., “TensorFlow: LargeScale machine learning on heterogeneous distributed systems,” 2015. [Online]. Available: https://research.google/pubs/pub45166/
  70. F. Pedregosa et al., “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, Oct. 2011, Art. no. 6. https://www.researchgate.net/publication/51969319_Scikitlearn_Machine_Learning_in_Python
  71. T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug. 2016, pp. 785–794. https://doi.org/10.1145/2939672.2939785
  72. J. Bergstra, D. Yamins, and D. Cox, “Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures,” in Proceedings of the 30th International Conference on Machine Learning, Feb. 2013, pp. 115–123. https://proceedings.mlr.press/v28/bergstra13.html
  73. J. Bergstra, R. Bardenet, Y. Bengio, and B. Kégl, “Algorithms for hyperparameter optimization,” in Advances in neural information processing systems, J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, and K. Weinberger, Eds., vol. 24. Curran Associates, Inc., 2011, pp. 2546–2554. https://proceedings.neurips.cc/paper_files/paper/2011/file/86e8f7ab32cfd12577bc2619bc635690-Paper.pdf
  74. T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, “Optuna: a nextgeneration hyperparameter optimization framework,” in KDD’19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, USA, July 2019, pp. 2623–2631. https://doi.org/10.1145/3292500.3330701
  75. G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y. Liu, “LightGBM: a highly efficient gradient boosting decision tree,” in Proceedings of the 31st International Conference on Neural Information Processing Systems, ser. NIPS’17, USA, 2017.
  76. P. Gijsbers, M. L. P. Bueno, S. Coors, E. LeDell, S. Poirier, J. Thomas, B. Bischl, and J. Vanschoren, “AMLB: an AutoML benchmark,” arXiv:2207.12560, Nov. 2023. https://doi.org/10.48550/arXiv.2207.12560
  77. P. Oman and J. Hagemeister, “Metrics for assessing a software system’s maintainability,” in Proceedings Conference on Software Maintenance 1992, Orlando, FL, USA, Nov. 1992, pp. 337–344. https://doi.org/10.1109/ICSM.1992.242525
  78. G. A. Campbell, “Cognitive complexity – An overview and evaluation,” in Proceedings of the 2018 International Conference on Technical Debt, ser. TechDebt’18, New York, NY, USA, May 2018, pp. 57–58. https://doi.org/10.1145/3194164.3194186
  79. J.-L. Letouzey, “The SQALE method for evaluating Technical Debt,” in 2012 Third International Workshop on Managing Technical Debt (MTD), Zurich, Switzerland, June 2012, pp. 31–36. https://doi.org/10.1109/MTD.2012.6225997
  80. D. Grimes, G. Ifrim, B. O’Sullivan, and H. Simonis, “Analyzing the impact of electricity price forecasting on energy cost-aware scheduling,” Sustainable Computing: Informatics and Systems, vol. 4, no. 4, pp. 276–291, Dec. 2014. https://doi.org/10.1016/j.suscom.2014.08.009
  81. P. Schober, C. Boer, and L. A. Schwarte, “Correlation coefficients: Appropriate use and interpretation,” Anesthesia & Analgesia, vol. 126, no. 5, pp. 1763–1768, May 2018. http://doi.org/10.1213/ANE.0000000000002864
  82. S. Ajel, F. Ribeiro, R. Ejbali, and J. Saraiva, “Energy efficiency of Python machine learning frameworks,” in Intelligent Systems Design and Applications, A. Abraham, S. Pllana, G. Casalino, K. Ma, and A. Bajaj, Eds. Cham: Springer Nature Switzerland, 2023, pp. 586–595. https://doi.org/10.1007/978-3-031-35507-3_57
  83. K. Lottick, S. Susai, S. Friedler, and J. Wilson, “Energy usage reports: Environmental awareness as part of algorithmic accountability,” in Climate Change AI. Climate Change AI, Dec. 2019. [Online]. Available: https://www.climatechange.ai/papers/neurips2019/8
  84. F. G. Toosi, “Green software engineering: A DualPerspective overview of stakeholder and societal interpretations,” in 2025 Computing, Communications and IoT Applications (ComComAp), Madrid, Spain, Dec. 2025, pp. 385–394. https://doi.org/10.1109/ComComAp68359.2025.11353190
  85. J. Dorn, “A general software readability model,” M. S. thesis, University of Virginia, Charlottesville, Virginia, USA, 2012. https://web.eecs.umich.edu/~weimerw/students/dornmcs-paper.pdf
DOI: https://doi.org/10.2478/acss-2026-0009 | Journal eISSN: 2255-8691 | Journal ISSN: 2255-8683
Language: English
Page range: 95 - 115
Submitted on: Feb 19, 2026
Accepted on: May 14, 2026
Published on: Jun 2, 2026
In partnership with: Paradigm Publishing Services
Publication frequency: Volume open

© 2026 Christian O’Leary, Conor Lynch, Farshad Ghassemi Toosi, published by Riga Technical University
This work is licensed under the Creative Commons Attribution 4.0 License.