Have a personal or library account? Click to login
Application of data science methods, including machine learning, in the classification of focal lesions in the thyroid gland Cover

Application of data science methods, including machine learning, in the classification of focal lesions in the thyroid gland

Open Access
|Dec 2025

Abstract

Aim

The aim of the study was to train, evaluate, and optimize machine learning models for classifying focal lesions in the thyroid gland as benign or malignant based on their features.

Material and methods

A dataset of 841 focal thyroid lesions described by 17 features (ultrasonographic and patient characteristics) was considered. Using the Python programming language, statistical and then exploratory data analyses were conducted using the libraries, including the generation of graphs and heat maps of correlations between the considered features. Binary classification models were selected to categorize the focal lesion on the basis of their characteristics into one of two classes (benign lesion, malignant lesion). The following models were used: logistic regression-based, support vector machine-based, k-nearest neighbor model, Random Forest model, and decision tree classifier. We applied formulas to select those focal lesion features that most contributed to the models’ classification decisions. The final dataset consisted of 841 focal thyroid lesions described by seven ultrasonographic features and histopathological assessment of malignancy (benign or malignant). Classifiers were validated using 10-fold cross-validation. Model performance was evaluated using sensitivity, accuracy, measure-F, precision, area under the ROC curve, PPV, NPV, specificity.

Results

The best-performing model (in term of sensitivity) was the classifier based on a support vector machine: sensitivity = 71.17%, accuracy = 83.24%, area under the ROC curve = 84.86%, measure f1 = 69.13%, precision = 68.85%, PPV = 68.49%, NPV = 89.06%.

Conclusions

The study demonstrates the usefulness of data science methods in predicting the malignant nature of focal lesions in the thyroid gland. It proves that classification decisions made by the studied models are based on specific ultrasonographic features associated with increased or reduced risk of malignancy.

DOI: https://doi.org/10.15557/jou.2025.0036 | Journal eISSN: 2451-070X | Journal ISSN: 2084-8404
Language: English
Submitted on: Jul 8, 2025
|
Accepted on: Dec 16, 2025
|
Published on: Dec 31, 2025
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2025 Paweł Mariusz Gadzicki, Małgorzata Krzywicka, Katarzyna Dobruch-Sobczak, Bartosz Migda, Ewelina Szczepanek-Parulska, Agnieszka Wosiak, Zbigniew Adamczewski, published by MEDICAL COMMUNICATIONS Sp. z o.o.
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.