Have a personal or library account? Click to login
Application of data science methods, including machine learning, in the classification of focal lesions in the thyroid gland Cover

Application of data science methods, including machine learning, in the classification of focal lesions in the thyroid gland

Open Access
|Dec 2025

Figures & Tables

Fig. 1.

Data science process
Data science process

Fig. 2.

Feature correlation heat map for one of the datasets (14 nodule features). The lighter the color, the stronger the positive correlation; the darker the color, the stronger the negative correlation. The exception is the white color, which indicates that it is impossible to determine the correlation due to insufficient data or a large number of gaps in the set for a given feature
Feature correlation heat map for one of the datasets (14 nodule features). The lighter the color, the stronger the positive correlation; the darker the color, the stronger the negative correlation. The exception is the white color, which indicates that it is impossible to determine the correlation due to insufficient data or a large number of gaps in the set for a given feature

Fig. 3.

Diagram of supervised machine learning
Diagram of supervised machine learning

Fig. 4.

K-fold cross-validation
K-fold cross-validation

Fig. 5.

Mean ROC curves for the studied binary classifiers: Support vector machine-based classifier (SVC), Logistic Regression-based (LogReg), Random Forest classifier (RF), and Decision Tree classifier (DT), k-nearest neighbor classifier (KNN)
Mean ROC curves for the studied binary classifiers: Support vector machine-based classifier (SVC), Logistic Regression-based (LogReg), Random Forest classifier (RF), and Decision Tree classifier (DT), k-nearest neighbor classifier (KNN)

Fig. 6.

Thyroid focal lesion: solid composition, hypoechogenicity, irregular margins, irregular shape, and extrathyroidal expansion
Thyroid focal lesion: solid composition, hypoechogenicity, irregular margins, irregular shape, and extrathyroidal expansion

Fig. 7.

Maximum-margin hyperplane and margins for an SVM trained on samples from two classes (red and green circles)
Maximum-margin hyperplane and margins for an SVM trained on samples from two classes (red and green circles)

Performance metrics of the studied binary classifiers: Support vector machine-based classifier (SVC), Logistic regression-based (LogReg), Random Forest classifier (RF) and Decision Tree classifier (DT), k-nearest neighbors (KNN)

SVCLogRegRFDTKNN
Sensitivity71.17%68.99%69.86%69.86%23.54%
AUC84.86%87.11%84.57%84.18%64.91%
Accuracy83.24%83.96%84.08%84.19%74.80%
F-measure69.13%69.34%70.17%69.93%29.11%
Precision68.85%71.49%70.9%71.34%40.40%
PPV68.49%71.17%70.18%69.7%44.36%
NPV89.05%88.53%88.74%88.85%80.7%
Specificity87.74%89.55%88.89%89.55%75.82%
DOI: https://doi.org/10.15557/jou.2025.0036 | Journal eISSN: 2451-070X | Journal ISSN: 2084-8404
Language: English
Submitted on: Jul 8, 2025
|
Accepted on: Dec 16, 2025
|
Published on: Dec 31, 2025
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2025 Paweł Mariusz Gadzicki, Małgorzata Krzywicka, Katarzyna Dobruch-Sobczak, Bartosz Migda, Ewelina Szczepanek-Parulska, Agnieszka Wosiak, Zbigniew Adamczewski, published by MEDICAL COMMUNICATIONS Sp. z o.o.
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.