Have a personal or library account? Click to login
Application of machine learning in diabetes diagnosis Cover

Application of machine learning in diabetes diagnosis

Open Access
|Dec 2025

Abstract

Diabetes is a serious global health problem that affects millions of people and leads to many complications if not diagnosed early. Early and accurate diagnosis is very important for improving patient outcomes and reducing healthcare costs. Machine learning can help to analyze medical data and predict diabetes more effectively. This study compares three machine learning models – logistic regression, random forests, and XGBoost – for predicting diabetes based on medical data. The models were tested in their basic forms and with different techniques for balancing the dataset, such as undersampling, oversampling, SMOTE, and an asymmetric approach. Additionally, variable reduction and probability averaging as a form of ensemble learning were applied. The experiments are based on the dataset available on the Kaggle platform, which contains 100,000 observations. The problem is interesting because diagnostic criteria based on glycated hemoglobin and blood glucose levels do not enable automatic and unambiguous diagnosis in this dataset. However, they will be important independent variables in the classification models considered. The results of the evaluation show the potential of machine learning in supporting specialists in diabetes diagnosis, and highlight the importance of proper data preprocessing for achieving better model performance.

DOI: https://doi.org/10.2478/bile-2025-0008 | Journal eISSN: 2199-577X | Journal ISSN: 1896-3811
Language: English
Page range: 113 - 140
Published on: Dec 31, 2025
In partnership with: Paradigm Publishing Services
Publication frequency: 2 issues per year

© 2025 Aleksandra Rosińska, Łukasz Smaga, published by Polish Biometric Society
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.