Have a personal or library account? Click to login
ARMDiaRD: A robust multi-class diabetic retinopathy detection using hybrid swin transformers with hierarchical fusion Cover

ARMDiaRD: A robust multi-class diabetic retinopathy detection using hybrid swin transformers with hierarchical fusion

Open Access
|Feb 2026

Abstract

Diabetic retinopathy (DR) is an eye issue of prolonged diabetes, which continues to be a major cause of vision loss among middle-aged adults. Early detection of DR through advanced imaging and artificial intelligence (AI) techniques can lead to a significant reduction in risk and severity of visual loss. This proposed ARMDiaRD combines state-of-the-art techniques from EfficientNet, Swin Transformers, and multi-scale feature fusion (MSFF) to enhance multiclass classification of DR severity levels across diverse generalised datasets. In the proposed architecture, a global context (GC) block is integrated into EfficientNet to capture long-range dependencies and contextual relationships. This is followed by Swin Transformer layers equipped with an MSFF block that hierarchically aggregates features from multiple levels, enabling the model to learn richer and more discriminative representations. The fundus images of four publicly available datasets for DR: asia pacific tele ophthalmology society (APTOS), Indian diabetic retinopathy image dataset (IDRID), MESSIDOR-V1, and EYEPACS, are combined through a comprehensive aggregation process to facilitate the training and testing of the generalised model. This design demonstrates consistent improvements across multiple evaluation metrics, underlining its potential to reduce misclassification in medical diagnosis. To evaluate the proposed model, the performance metrics, such as accuracy, precision, recall, specificity, and F1-score, are calculated along with quadratic weighted kappa (QWK), Spearman rank correlation coefficient, and mean absolute error (MAE). Simulation results revealed that the proposed model achieved an accuracy of 87.59%, precision of 87.6%, recall of 87.9%, QWK of 91.47% and Spearman coefficients with 92.53% respectively. Importantly, the MAE, a critical metric for evaluating false predictions in medical diagnosis, is 0.1736 in the proposed model. The outcome clearly demonstrates the supremacy of the proposed model over the other models in handling multiple larger datasets. This research shows the ubiquitous nature of the proposed model to predict the severity grades using larger complex datasets. This proposed ARMDiaRD confirms that combining local, global, and hierarchical characteristics is highly effective in preventing overfitting issues with the existing architecture, such as CNN, when tested using new data for reliable clinical image analysis.

Language: English
Submitted on: Aug 22, 2025
|
Published on: Feb 20, 2026
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2026 J. Dhiviya Rose, Ved Prakash Bhardwaj, published by Professor Subhas Chandra Mukhopadhyay
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.