A PRISMA-based Comparative Analysis of Machine Learning Techniques in Diagnosing Electrocardiogram, Diabetes, Chronic Kidney Disease, and Breast Cancer

Baljit Kaur; Renu Popli; Hiran Mani Bala

doi:10.2478/ijssis-2026-0013

Introduction

In today’s healthcare environment, due to concerns with data security and diagnostic accuracy, quick and accurate disease detection is crucial. Machine learning (ML) algorithms provide possibilities and solutions for a range of medical fields. This review uses the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) technique for analyzing ML applications in ECG issues, diabetes, chronic kidney disease (CKD), and breast cancer. It emphasizes ML’s transformational impact on disease diagnosis and management while addressing data quality, implementation complexities, and model interpretability. These advancements highlight how ML can improve healthcare by improving personalized treatment plans, early disease detection, and diagnostic accuracy. Focusing on these diseases enables the implementation of large datasets, improves algorithm robustness, and directly addresses some of the most critical health problems.

The effect of a ML algorithm on different diseases is highly variable due to the inherent variations in the characteristics of the data and the significance of the features. For example, K-nearest neighbors (KNN) and random forest (RF) are well suited for heart disease prediction because these models excel in identifying complex patterns and non-linear relationships. Support vector machine (SVM) is very accurate in breast cancer because it is efficient in both high-dimensional and binary classification. In type 2 diabetes mellitus, logistic regression (LR) is more accurate compared to other interventions, probably because of the linear separability of features and interpretability [1,2,3]. Also, studies of CKD demonstrate that ML, especially RF and SVM, can achieve a good result in early diagnosis since they process an unbalanced Data Collection and complex relations between the risk types.

Prioritizing high-impact areas increases patient outcomes and healthcare efficiency, justifying the employment of ML in these fields. An overview of these is discussed below:

An ECG includes details about the heart rate, rhythm interval, and any possible abnormalities. ML models enhance the accuracy of diagnosis and early diagnosis of cardiovascular diseases. [4]. There are several restrictions on ML. SVM suffers problems with noisy data, and RF fails to determine the significance of variables [5]. On the other hand, CNN classifiers face the problem of low accuracy. To achieve both of these objectives and create a successful classifier model, the author uses the MPA algorithm. It provides effectiveness in ECG signal classification. The CNN method achieved satisfactory results in image classification [6]. The patient-specific ECG arrhythmia system is divided into four parts: data preprocessing, heartbeat segmentation, feature extraction, and ML classification, shown in Figure 1. Extracting these features and an ensemble of SVMs obtained the result with an overall accuracy of 94%. Using an imbalanced dataset (MIT-BIH and Physikalisch-Technische Bundesanstalt (PTB) diagnosis) with ML classifiers provides 98% accuracy [7].

Diabetes mellitus is a common illness characterized by continuously elevated blood sugar levels in the body. It is the leading cause of death. Diabetes mellitus occurs when the blood sugar level in the blood/urine is high beyond an expected level. Diabetes is a severe metabolic disorder that has several forms -Prediabetes, Types 1 and 2, Gestational- with different causes and risks. Knowledge of the types is essential in early detection, treatment, and management of health in the long term. The complications can be greatly decreased with lifestyle changes, adequate treatment, and awareness, increasing the positive results [8]. Figure 2 represents the four types of diabetes: Types 1 and 2, gestational, and moderate. Each type represents a different disease mechanism and a different level of toughness in diabetes [9].

CKD is one of the fastest-growing non-communicable illnesses. It is important to have methods for the early prediction of CKD. ML classifiers: RF and decision tree classifiers showed 100% accuracy in the result of CKD prediction [10]. Several factors are responsible for CKD. Figure 3 below shows the common symptoms that occur during CKD.

In Breast cancer research, ML has offered innovative approaches to detection, diagnosis, and prognosis. Provided a comprehensive review of deep learning methods for breast cancer detection and diagnosis, highlighting their potential to improve accuracy and efficiency [11]. ML approaches emphasize their role in personalized medicine [12]. Additionally, a review of ML techniques for breast cancer detection from mammographic images provides an understanding of the application of artificial intelligence (AI) in breast cancer detection and diagnosis [13]. These studies collectively underscore the significant contributions of ML to advancing breast cancer research and clinical practice.

ML in healthcare

ML is necessary for developing algorithms and statistical models. It is a subset of AI technology. ML enables computers to analyze and learn based on data without being manually programmed [14]. Regarding classification, prediction, and clustering tasks, ML has become more popular than biostatistical techniques for integrating and analyzing huge amounts of healthcare data. A large amount of data is required for prediction mode to improve the decision for accurate prediction [15]. ML can transform patient care by enabling accurate diagnosis, customized treatments, and predictive analysis. Doctors can enhance patient outcomes by optimizing resource allocation, making better decisions, and analyzing huge amounts of medical data using ML algorithms. ML represents algorithms on data to produce predictions without the need for explicit programming.

For example, ML algorithms can analyze data from digitally damaged data to predict specific health outcomes, such as the beginning of chronic diseases. To avoid negative outcomes, this may help doctors recognize high-risk individuals and take steps to avoid them. ML algorithms can be used to examine medical imaging data, such as X-rays and CT scans, which help diagnose the most suitable treatment for a patient [16]. With the application of ML, anyone can predict which medicines would work best for a particular patient according to characteristics like genetics and previous medical records. ML algorithms help healthcare professionals make better patient care decisions [17]. ML can analyze large data to find trends and patterns, which can help inform the creation of public health programs. Figure 4 illustrates ML applications that enhance healthcare research. It can completely transform patient care and improve health outcomes globally.

In summary, ML techniques offer promise for disease prediction and management, but there are challenges such as data quality dependency, interpretability issues, and complexities in model deployment. Future research should focus on addressing these challenges to realize the full potential of ML in healthcare. Table 1 provides a comprehensive focus on accuracy metrics of the classification and clustering algorithm’s performance within the healthcare industry. Additionally, regression and unsupervised learning performance summaries highlight the potential impact of ML on resource utilization.

Table 1:

Summary of the ML classification and clustering algorithms in terms of accuracy.

Supervised ML
Classification algorithms	References	Year	Task	Accuracy (%)
Support vector regression	[18]	2022	Predicting and illustrating the COVID-19 pandemic	94
Decision trees	[19]	2022	Prediction of diabetes	96
Naïve Bayes	[20]	2020	Skin disease detection	94.3
Naïve Bayes	[21]	2020	Heart disease detection	88.16
Ensemble techniques	[22]	2020	Predict the normal weekly cost that patients will spend on specific medicines	98
Decision trees	[23]	2020	Heart disease prediction	88
SVM	[24]	2019	Speech recognition, facial recognition	91.3

Unsupervised ML
Clustering algorithms	References	Year	Task	Accuracy (%)
Gaussian mixture model	[25]	2021	Anomaly decision	95
K-Medoids	[26]	2021	Detecting problems in smart healthcare	75.89
K-Means	[27]	2021	Heart disease prediction	88
Fuzzy c-means	[28]	2019	Review of patient satisfaction	76
Hierarchical clustering	[29]	2018	Mental health prediction	90

COVID-19, Coronavirus disease 2019; ML, machine learning; SVM, support vector machine.

Motivations and contributions

The ML application in healthcare represents a fundamental change that can change disease detection and patient treatment. The current healthcare landscape presents many challenges, including the requirement for timely and precise disease diagnosis, which is frequently complicated. ML algorithms use large datasets and advanced computational approaches to improve diagnostic accuracy and prediction. This paper provides a comprehensive overview of the advancements and challenges in utilizing ML for these diseases by systematically analyzing existing literature and implementing the PRISMA methodology. The goal is to highlight the potential of ML for more efficient and effective healthcare solutions, in improving patient outcomes, optimizing treatment plans, and addressing some of the most urgent public health challenges.

This review paper makes several key contributions to understanding the application of ML in healthcare, with a specific focus on four critical diseases: ECG abnormalities, diabetes, CKD, and breast cancer. The following points highlight the primary contributions of this study:

➢
A comprehensive review of ML algorithms in detecting and diagnosing ECG abnormalities, diabetes, CKD, and breast cancer by analyzing research from 2014 to 2025.
➢
The PRISMA technique conducts a systematic review and meta-analysis of the literature on the four diseases.
➢
Provides insights into future research directions for overcoming challenges in ML deployment in healthcare.

Paper organization

Figure 5 illustrates the structure of this study. In Section II, describe related work done on diseases. Section III outlines the steps of the literature review methodology in research. In addition, inclusion and exclusion are discussed in Section III. Section IV describes the results and discussion.

II.

Related Work

This section discusses the effectiveness of modern ML techniques used in four diseases. Table 2 presents a literature review from 2020 to 2024. The objective of this study is to present a thorough overview of ML technology in the healthcare system.

Table 2:

Literature review from 2020 to 2024.

Reference	Year	Algorithm implemented	Contribution
[42]	2024	12 different classifiers that belong to six learning strategies were evaluated using two datasets.	Diagnosing diabetes
[43]	2024	Ten ML classifiers were used	Prediction of diabetes disease through a mobile app
[35]	2024	Used sensor technology	Work on gestational diabetes
[7]	2023	Gradient boosting tree, RF, KNN, and SVM	The authors study an automatic arrhythmia classification method for the healthcare system, the MHO algorithm, and the ML classifier.
[6]	2023	Convolutional neural network	The CNN algorithm is used for feature extraction on the ECG image dataset
[44]	2023	NB, LR, SVM, KNN, DT, AdaBoost, XGboost	Compared the performance level by using AUC
[40]	2022	SVM, LR, DT, XGBOOST, RF, AdaBoost	ADAboost provides 100% sensitivity for the detection of CKD
[45]	2022	SVM, RF, Gradient boosting, Ada boost	Predicting breast cancer based on different medical symptoms
[41]	2021	RF, XGBOOST, neural network	Evaluated the risk of CKD
[10]	2021	8 ML classifiers were used	Performance analysis of CKD
[46]	2021	SVM, RF, LR, DT, KNN	Breast cancer prediction and diagnosis
[47]	2020	Six supervised ML algorithms	Prediction of breast cancer using various ML algorithms
[9]	2020	K-nearest neighbor, LR, RF, SVM, and decision tree	Diabetes prediction
[38]	2020	Eleven ML classifiers were used	Prediction of CKD
[39]	2020	Seven ML techniques are utilized	Classifying the kidney patient dataset as CKD or NOTCKD
[30]	2020	SVM	An SVM classifier is proposed to classify the heartbeat. The result of SVM was compared with other classifiers
[29]	2020	SVM, KNN, ANN	Construct a biometric recognition system

ANN, artificial neural network; AUC, area under the curve; CKD, chronic kidney disease; CNN, convolutional neural network; DT, decision tree; KNN, k-nearest neighbors; LR, logistic regression; MHO, metaheuristic optimisation; ML, machine learning; NB, Naïve Bayes; NOTCKD, not chronic kidney disease; RF, random forest; SVM, support vector machine.

Hassaballah et al. [5] introduced an automatic arrhythmia classification approach that integrated the MHO algorithm with ML classifiers for an assisted smart healthcare system. Four supervised ML classifiers—SVM, KNN, gradient boosting decision tree (GBDT), and RF—are used for classification. They were optimized using the MHO algorithm.

Abubaker and Babayigit [6] proposed a lightweight CNN-based model to classify the four major cardiac abnormalities: abnormal heartbeat, myocardial infarction, history of myocardial infarction, and normal person classes using the public ECG images dataset of cardiac patients. CNN is a feature extraction tool for traditional ML classifiers and provides remarkable results.

Pandey et al. [30] focused on four ECG classes: one class is a normal beat and three are abnormal beats. The authors proposed an ensemble SVM classifier for heartbeat. The authors also provide a comparison of results obtained from different classification techniques.

Kuila et al. [31] proposed an ECG-based biometric recognition system that processes the raw ECG signal. Different filters for noise elimination support the entire study. The leading algorithms such as KNN, artificial neural network, and SVM, are used to classify different ECG features and are tested using the Massachusetts Institute of Technology – Beth Israel Hospital Electrocardiogram Identification Database (MIT-BIH ECG-ID) database.

Ahamed et al. [7] proposed a technique that can efficiently identify patients and help to determine the best treatment. The authors used ML approaches, modifying and comparing them to other state-of-the-art related technologies. They examined highly imbalanced datasets and penalized the loss value of the ANN by assigning class weights.

Alzyoud et al. [32] identified the most relevant features that detect diabetes mellitus and the best classifier that can effectively diagnose diabetes mellitus based on a set of relevant features. The authors used various ML classifiers. They proved that the multi-classifier was the best classifier to handle diabetes datasets.

Vizhi and Dash [9] predicted the patient’s diabetes status with high accuracy using the decision tree method. This approach demonstrated excellent precision in diabetes prediction.

El-Sofany et al. [33] developed a mobile application for the prediction of diabetes. This prediction is based on user-entered features and predicts diabetes.

Bhat et al. [34] authors test the application of six ML algorithms in the early prediction of Type 2 Diabetes Mellitus. They consider the influence of three types of feature selection on model performance. In their performance, the Decision Tree is the most precise model as it has 96.10% accuracy.

Lu et al. [35] examined digital health and ML technologies used for gestational diabetes. Gestational diabetes is a type of diabetes that develops during pregnancy. They make innovations inexpensive and sustainable for all women.

Agrawal and Jain [36] presented that ML algorithms can predict a person’s probability of developing breast cancer. Compare different ML methods to identify a common approach that works effectively.

Gupta and Garg [37] used a rectified linear unit (RELU) function that allows the model to perform better and learn faster.

Kaur et al. [38] achieved a 0.995 accuracy using a multilayer perceptron with neural network preprocessing. In comparison, a 2017 study reported 0.991 accuracy with a multiclass decision forest, highlighting the impact of handling missing data and feature selection on model performance. However, challenges remain in ensuring models’ reliability and generalizability, particularly with properly managing missing values and incorporating domain knowledge.

Khan et al. [39] concluded that research indicates a significant global burden of CKD, affecting approximately 10% of the population and contributing to high mortality rates. Despite the prevalence of ML applications in healthcare, few studies have comprehensively compared different ML algorithms for CKD prediction. However, the study uniquely compares seven ML techniques, emphasizing the superior performance of compressed high-intensity radar pulse (CHIRP) in reducing error rates and enhancing predictive accuracy for CKD.

Ebiaredoh-Mienye et al. [40] CKD is a major global health challenge, particularly in developing countries, where high diagnostic costs and limited healthcare infrastructure restrict early detection. ML has shown promise in facilitating early CKD diagnosis, offering cost-effective and efficient solutions. Previous research has explored various ML approaches, such as integrating k-nearest neighbor imputation with multiple classifiers, achieving accuracy enhancements up to 99.83%. Additionally, advanced techniques like improved sparse autoencoder networks and ensemble methods have demonstrated accuracy improvements in CKD detection. Despite these advancements, there remains a need for effective feature selection to optimize ML models for CKD diagnosis, as redundant attributes can increase computational complexity without contributing to model performance.

Wang et al. [41] identified that CKD is a global health concern, affecting millions and often progressing silently due to the lack of early symptoms. Existing methods for CKD prediction primarily rely on the measurement of glomerular filtration rate (GFR), which is difficult to measure directly, making serum creatinine a common proxy. Recent studies have explored ML techniques to predict CKD using the University of California (UCI) dataset, but these efforts faced limitations due to dataset biases and the reliance on creatinine values. This study proposes a novel two-stage ML approach that predicts creatinine from commonly available health parameters and assesses CKD risk, potentially improving early detection and public health outcomes.

Emon et al. [10] observed that CKD presents significant challenges in healthcare due to its asymptomatic nature in its early stages, making timely diagnosis crucial. Traditional diagnostic methods often rely on costly and less frequently performed tests like serum creatinine measurements. ML approaches have emerged as promising alternatives for CKD prediction, utilizing algorithms such as LR, naive Bayes, and RF. Studies have demonstrated that RF achieves the highest accuracy, up to 99%. These advanced methods enhance the early detection and management of CKD, particularly in resource-limited settings.

III.

Literature Review Methodology

The methodology illustrates a systematic research approach to examine the academic literature, as shown in Figure 6. This procedure originates with a description of research objectives and scope, followed by the gathering of related bibliographic data from the Scopus database. Several data repositories, like Scopus, Web of Science, and IEEE Xplore, were checked for the availability of material on the above topic. The keywords used for forming search queries are given below:

Query for ECG: ((TITLE-ABS-KEY ({ecg monitoring}) OR TITLE-ABS-KEY ({electrocardiogram ecg}) OR TITLE-ABS-KEY ({ecg signal}) AND TITLE-ABS-KEY ({machine learning}) OR TITLE-ABS-KEY ({deep learning}) OR TITLE-ABS-KEY ({federated learning})) AND PUBYEAR > 2013 AND PUBYEAR < 2025 AND (LIMIT-TO (SUBJAREA, “COMP”)))
Query for Diabetes: ((TITLE-ABS-KEY({diabetes}) OR TITLE-ABS-KEY ({diabetes mellitus}) OR TITLE-ABS-KEY ({diabetes mellitus type 2}) OR TITLE-ABS-KEY ({diabetes insipidus}) AND TITLE-ABS-KEY ({deep learning}) OR TITLE-ABS-KEY ({machine learning}) OR TITLE-ABS-KEY ({federated learning})) AND PUBYEAR > 2013 AND PUBYEAR < 2025 AND (LIMIT-TO (SUBJAREA, “COMP”)))
Query for Chronic Kidney Disease: ((TITLE-ABS-KEY ({kidney disease}) OR TITLE-ABS-KEY ({Chronic Kidney Disease}) AND TITLE-ABS-KEY ({deep learning}) OR TITLE-ABS-KEY ({machine learning}) OR TITLE-ABS-KEY ({federated learning})) AND PUBYEAR > 2013 AND PUBYEAR < 2025 AND (LIMIT-TO (SUBJAREA, “COMP”)))
Query for Breast Cancer: ((TITLE-ABS-KEY ({Breast Cancer}) AND TITLE-ABS-KEY ({deep learning}) OR TITLE-ABS-KEY ({machine learning}) OR TITLE-ABS-KEY ({federated learning})) AND PUBYEAR > 2013 AND PUBYEAR < 2025 AND (LIMIT-TO (SUBJAREA, “COMP”)))

The R 4.3.2 software tool is used to analyze the data, enabling the analysis of author productivity, scientific production, and average citation per year.

Bibliometrics uses the bibliographic data collected from the databases to represent the scientific discipline organization in a visual form. Web accessibility, text mining, sustainable business, and healthcare are just a few scientific fields that have employed bibliometrics to examine trends and patterns in earlier research [48]. This paper provides a bibliometric analysis of the current research from the last 10 years. In the scope of this research, our objective is to utilize bibliometric analysis to investigate pivotal research questions. The primary focus of these research questions is to form the core of our analysis. This analysis will help researchers determine emerging trends in the four fields. Table 3 outlines the key research questions and objectives for understanding trends and dynamics in scientific research across four critical fields. The study will be beneficial to everyone who is attempting to advance in healthcare.

Table 3:

Research questions.

Research question No.	Research question	Objective
1.	How can the integration of AI-driven methodologies in breast cancer research improve early detection and personalized treatment plans?	Review recent advancements in AI methodologies as reported in journals like AI in Medicine. Identify key AI techniques that have been successfully applied to breast cancer detection and treatment personalization.
2.	How does the volume of research publications in ECG, CKD, diabetes, and breast cancer correlate with healthcare challenges and research priorities in different countries?	Interpret how these research patterns might indicate the healthcare challenges faced by these countries.
3.	How does author productivity vary among ECG, diabetes, CKD, and breast cancer research fields according to Lotka’s law?	Determine which field has the highest and lowest percentage of highly productive authors.
4.	What insights can Biblioshiny keyword co-occurrence analysis provide for a specific field of research?	Keyword co-occurrence analysis identifies the most frequent terms in a field, revealing key research themes, emerging trends, and areas of academic focus or future opportunities.

AI, artificial intelligence; CKD, chronic kidney disease.

Table 4 details the criteria used to select relevant studies for evaluating ML applications in the medical field. It ensures that the included research offers insights into the performance and effectiveness of ML techniques. By excluding irrelevant studies, the criteria help maintain the relevance and precision of the findings.

Table 4:

Inclusion and exclusion criteria.

Inclusion criteria

Exclusion criteria

Studies focusing on applying ML algorithms in medical diagnosis and prognosis.
Research articles discussing the use of ML in disease prediction, including but not limited to diabetes, CKD, breast cancer, and heartbeat categorization on an ECG.
Papers that present ML classifiers’ performance metrics, such as sensitivity, specificity, accuracy, and AUC.
Investigations into the integration of ML techniques for disease detection and management, emphasizing efficiency and accuracy.
Studies published within the last 4 years to ensure relevance and currency

Articles do not focus on ML applications in healthcare or medical diagnosis.
Studies unrelated to disease prediction or classification using ML methodologies.
Papers lacking performance metrics or evaluations of ML classifiers’ effectiveness.
Research published before 2020 or after 2023 to maintain the focus on recent advancements in the field.
Investigations not aligned with the specific diseases mentioned, such as those addressing unrelated medical conditions or applications.

AUC, area under the curve; CKD, chronic kidney disease; ECG, electrocardiogram; ML, machine learning.

IV.

Results and Discussion

The research was performed using data from the Scopus database, covering studies published over the last 10 years (2014–2025). R software (version 4.3.2) is utilized for data processing and visualization. The results of the R software are discussed below in detail. The results of each study’s analysis are discussed in second, centered around the research questions initially formulated in Section III.

Keyword analysis

Figure 7 is typically used to examine the frequency of words in a bibliometric dataset. Using this output, researchers may analyze the most frequently occurring terms in scientific publications to find trends, key ideas, and recurring topics in a particular area. It displays many terms, with the size of each word reflecting its frequency. The chart provides a comparative analysis of key terms across four medical categories: ECG, Diabetes, CKD, and Breast Cancer. Breast Cancer stands out with the highest frequency of terms, especially “CNN” and “Deep Learning,” indicating a strong emphasis on these advanced techniques in research. Diabetes also shows a notable focus on “Machine Learning” and “Deep Learning,” though it utilizes “CNN” less compared to Breast Cancer. ECG demonstrates moderate usage of “Main Title” and “CNN” but has lower frequencies of other terms, suggesting it is less explored in terms of ML applications. On the other hand, CKD has the least focus across all categories, with much lower frequencies for all terms. That indicates the research is less established in ML integration. As compared to CKD, Breast Cancer, and Diabetes lead to the application of ML and deep learning techniques.

Annual scientific production

The Annual Scientific Production figures represent the number of papers published each year between 2014 and 2024 in four medical research areas: electrocardiogram (ECG), diabetes, CKD, and breast cancer. The yearly production for each year related to the article is shown in Figure 8. The trend in scientific research output has shown substantial growth for all four fields over the past decade. ECG research output increased from six publications in 2014 to 277 in 2023, indicating significant growth. Diabetes research output began rising in 2016, peaking between 2022 and 2023 before declining in 2024. CKD research followed a similar pattern, with a rapid increase starting in 2017, peaking just below diabetes, and declining until 2023. Breast cancer research showed the highest volume of scientific production, with a steep increase starting in 2016 and peaking around 2023–2024. Despite a notable decline in 2024, breast cancer research still had a relatively large number of publications compared to other diseases.

Average citation per year

Figure 9 illustrates the average citation per year for research topics related to ECG, Diabetes, CKD, and Breast Cancer from 2012 to 2026. ECG and CKD experienced sharp peaks in citation rates around 2017–2019, with ECG reaching approximately 10 citations per year and CKD peaking even higher, indicating heightened research interest. In contrast, Diabetes and Breast Cancer maintained a more stable but lower citation rate, showing fewer dramatic fluctuations. However, all four-research domains witnessed a noticeable decline in citations after 2020, suggesting either a shift in research focus or maturation in these fields, leading to fewer groundbreaking studies. This decline may reflect a natural progression as certain areas reach research saturation or as attention moves to emerging technologies and fields. Overall, the diagram highlights the varying impact and research dynamics across these medical fields over time.

Author productivity through Lotka’s law

Lotka’s Law is one of the three classical bibliometric laws (alongside Bradford’s Law and Zipf’s Law) and is particularly important for studying author productivity in scientific literature. It states that the number of authors who publish a set number of articles is inversely proportional to the square of the number of papers published. In bibliometric studies, Lotka’s Law is often applied to:

Analyze author productivity patterns in specific research fields.
Identify the most productive authors and understand their role in advancing a field.
Evaluate scientific output distribution in databases of research publications.

Lotka’s law is mathematically expressed as: $y = \frac{C}{x^{n}}$ y = {C \over {{x^n}}} Where: y = percentage of authors contributing xxx publications, C = constant, n = exponent.

The results shown in Figure 10 illustrate the author’s productivity in terms of the number of documents written, applying Lotka’s Law. Lotka’s Law suggests that the number of authors publishing a certain number of papers is inversely proportional to the square of the number of papers published. Applying Lotka’s Law to analyze author productivity provides insights into the research dynamics of ECG, Diabetes, CKD, and Breast Cancer. Lotka’s Law suggests that a small number of authors contribute a large proportion of publications. The empirical data show that Breast Cancer and CKD align more closely with this expectation, with a higher percentage of authors producing multiple documents. In contrast, ECG and Diabetes have fewer prolific authors, deviating from the theoretical distribution. This analysis helps identify the fields with strong contributions from leading researchers and highlights areas where increasing author productivity could enhance research output.

Core sources by Bradford’s law

Bradford’s Law is one of the three fundamental laws of bibliometrics, alongside Lotka’s Law and Zipf’s Law. It focuses on the distribution of articles across journals and helps identify the core sources in any given field of research. Bradford’s Law is important because it describes how scientific literature is concentrated in a few journals and dispersed across many others.

Bradford’s Law Formula:

It can be expressed mathematically, but it is most commonly described in terms of zones:

Zone 1: A few core journals produce a large number of articles on a given topic.
Zone 2: A moderate number of journals produce a smaller, but still significant, number of articles.
Zone 3: A large number of journals produce very few relevant articles each.

Based on the rankings provided across various journals, future research in the fields of ECG, diabetes, CKD, and breast cancer could focus on different key areas. As depicted in Figure 11, ECG has the potential to explore advanced signal processing techniques highlighted in journals like IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, aiming to enhance diagnostic accuracy and real-time monitoring capabilities. In diabetes research, using the results from platforms such as SENSORS could drive development in simple glucose monitoring and personalized treatment strategies. For CKD, advances in publications like BIOMEDICAL SIGNAL PROCESSING AND CONTROL suggest opportunities in predictive modelling for early detection and management of kidney complications. Finally, in breast cancer research, concentrating on AI-driven methodologies in journals like ARTIFICIAL INTELLIGENCE IN MEDICINE opens opportunities for improving early detection methods and personalized treatment plans, potentially transforming outcomes through precision medicine approaches. Future studies could further integrate these technological advancements into clinical practice to improve patient outcomes and healthcare delivery efficiency in these critical medical fields.

Country scientific production

Country Scientific Production is a bibliometric measure that evaluates the research output of different countries by analyzing the number of scientific publications or other forms of research contributions (e.g., patents, conference papers) produced by researchers affiliated with institutions in those countries. Understanding country-level scientific production is crucial for several reasons, especially in the context of global research competitiveness, policy development, and international collaborations.

The provided data illustrates a comparative analysis of research publications across different countries in the four fields in Figure 12. India stands out with the highest number of publications in ECG (991) and diabetes (6,683), and leads in CKD (1,219) and breast cancer (5,914). China follows with significant contributions in breast cancer (4,078) and diabetes (2,130), while the USA shows a balanced research output across all four areas, with a notable 2,309 publications in breast cancer and 1,556 in diabetes. South Korea, though lower in numbers, shows focused research efforts, particularly in diabetes (339). Bangladesh and Indonesia have moderate outputs, with Bangladesh having 407 publications on breast cancer and Indonesia showing the least overall research activity. Saudi Arabia and Turkey contribute modestly, with Saudi Arabia having 340 publications on diabetes. European countries like Spain and Poland show diverse research interests, though Poland has the fewest publications in CKD (19) and breast cancer (165). Australia and Malaysia exhibit similar research activities, particularly in diabetes and breast cancer. Italy shows a strong focus on breast cancer (638) and CKD (77), while the UK has substantial publications in diabetes (665) and breast cancer (533). This comparison highlights the varied research priorities and outputs of different countries, reflecting their unique healthcare challenges and research capabilities.

Most cited countries

Figure 13 compares the number of citations for papers on electrocardiograms, diabetes, CKD, and breast cancer in different countries. The following is an overview of the comparisons:

China has the most citations across all four-research disciplines combined. The country has the highest citation counts for diabetes and breast cancer, with over 8,000 and nearly 12,000 citations, respectively.
India has an important role, particularly in diabetes and breast cancer. However, its contribution to ECG is noticeably smaller than that of breast cancer. It has a major proportion in ECG, but a smaller one in CKD.

The USA is at the leading edge of research on diabetes and breast cancer, with strong citations in ECG and CKD. This indicates a balanced involvement with all areas of medical research. The United Kingdom also contributes significantly to Breast Cancer and Diabetes research, but at a lower level compared to the USA, China, and India.

Trend topic

The Trend Topics Figures 14–17 track the appearance and frequency of multiple research subjects from 2014 until 2024. Every keyword is associated with a line that displays the years it was popular. The size of the dots indicates how frequently a phrase appears in research papers. The trend analysis depicts the frequency of various research terms over time. The larger circles highlight an important increase in the terms “machine learning” and “deep learning,” particularly between 2020 and 2021. While phrases like “insulin” and “females” peaked earlier and are currently more uncommon, words like “convolutional neural networks” and “random forests” continue to garner interest. This graph highlights both recent and long-lasting trends in this field.

Three fields plot

The Three Fields Plot is an excellent instrument for determining how key bibliometric components relate. It can help with research decisions, cooperation, and publication strategies. The three-field map, created using the Biblioshiny tool and RStudio, relates high-impact authors, keywords, and journals. Figures 18–21 illustrate the three-field plot diagram for four distinct diseases. This type of visual representation typically links three bibliometric fields:

Authors: Shows, which authors contribute to a particular research area.
Keywords: Highlights the concepts or key terms the authors are researching.
Journals: Displays the academic journals where these studies are being published.

This paper conducts a bibliometric analysis to examine the evolution and structure and explains the major themes related to breast cancer. The data utilized is from 2014 to 2024 for bibliometric analysis. The yearly production for each year related to the article is shown in Figure 8. Citations play a more important role in research. Figure 9 shows the average citations per year. From 2018 to 2022, citation trends have been going up in ECG and CKD, and four diseases have been down till the present time. As per the analysis findings, it is concluded that India achieved a higher frequency of in-country scientific production than for other diseases. Figure 12 depicts the country’s scientific production. Figures 10 and 11 represent the Lotka law and Broadford law, respectively. Figure 21 illustrates three field plots of three fields with 15 nodes. It explained the top 15 keywords on which the top authors are working and publishing in which top 15 sources. It provides help for researchers in the future to know about mostly using keywords, publishers in the field, and future keywords. A keyword analysis is also helpful in drawing some important conclusions. However, this paper provides directions for future researchers. The researchers can take help from the mentioned outcomes in terms of identification of keywords, understanding the pattern of the author’s citations, and identifying the most published.

Conclusion and Future Scope

In conclusion, this review illustrates the transformative impact of ML in diagnosing and managing diseases such as ECG abnormalities, diabetes, CKD, and breast cancer. ML techniques have advanced disease detection and diagnosis significantly, though challenges like data quality, model interpretability, and deployment complexities persist. Breast cancer research has experienced the most significant growth and citation rates, underscoring its critical importance. Diabetes and CKD research also shows promising advancements, but faces recent declines in output. ECG research, while vital, requires further innovation to match the progress seen in other areas. Future research should focus on overcoming these challenges by enhancing ML model robustness, improving data quality, and developing interpretable models that can be seamlessly integrated into clinical workflows. Expanding research to incorporate emerging technologies such as federated learning and AI-driven personalized medicine could further advance disease management. In addition, the incorporation of novel research like real-time analytics can enhance the precision of medical diagnosis and ensure the safety of healthcare delivery systems.

A PRISMA-based Comparative Analysis of Machine Learning Techniques in Diagnosing Electrocardiogram, Diabetes, Chronic Kidney Disease, and Breast Cancer

Full Article

Paradigm

My account