Figure 1:

Figure 2:

Figure 3:

Figure 4:

Figure 5:

Figure 6:

Figure 7:

Figure 8:

Figure 9:

Figure 10:

Figure 11:

Figure 12:

Figure 13:

Figure 14:

Figure 15:

Summary of major fundus image datasets relevant to the ICDR grading system
| Dataset | Year | Images | Grade | Resolution | Country | Strengths | Limitations |
|---|---|---|---|---|---|---|---|
| EyePACS [19] | 2015 | 88,702 | 5 | Varying dimensions | USA | Large-scale dataset | Class imbalance |
| IDRiD [21] | 2018 | 516 | 5 | 4,288 × 2,848 | India | Classification and segmentation | Small size, limited masks |
| APTOS [20] | 2019 | 3,662 | 512 × 512 | Asia-Pacific | Clean, uniform images | Small dataset, class imbalance | |
| FGADR [22] | 2020 | 1,842 | 5 | 2,048 × 3,072 | China | Balanced dataset | Moderate size, population-specific |
| MESSIDOR [23] | 2008 | 1,200 | 4 | Varying dimensions | France | Good for bench-marking | Uses only 4 grading |
| EOphtha [24] | 2014 | 611 | Not graded | 2,544 × 1,696 | France | Early DR detection | No severity grading |
| DDR | 2020 | 13,673 | 5 | High-resolution | China | Large-scale, diverse images | Variable image quality |
Literature review of various deep learning techniques for DR detection
| Year | Model used | Dataset | Accuracy (%) | Key contributions | Study finding |
|---|---|---|---|---|---|
| 2023 [31] | Resnet50 | APTOS | 83.90 | Compared the model with VGG16, Xception, AlexNet and found Resnet outperformed | Performance can be improved |
| 2021 [32] | Multi-scale attention network (MSA-Net) | EYEPACS and APTOS | 84.40 | MSA-Net helps to retrieve features at multiple scales | Performance can be improved |
| 2022 [33] | Inception-V3 | 89,947 images from various datasets | 99 | Robust binary classification for fundus image quality | ICDR grading can be implemented |
| 2023 [34] | Resnet50 | APTOS | 83.90 | Compared the model with VGG16, Xception, AlexNet and found Resnet outperformed | Performance can be improved |
| 2023 [35] | Inception-V3 | APTOS | 98.7 | Image enhancement using CLAHE | Better performance with preprocessing |
| 2023 [36] | DenseNet121 | APTOS | 97.30 | Combines VGG16, XGBoost, and DenseNet121; highlights overfitting risk | Overfitting due to class 0 dominance |
| 2023 [37] | SqueezeNet, Darknet-53, EfficientNet-B0 | ODIR | 95, 99.4, 90 | Multi-classification: normal, glaucoma, cataract | ICDR DR grading can be extended |
| 2024 [38] | DeepDR Plus (ResNet-50 + self-attention) | 83,500 images from various datasets | ∼84.6 | Predicts DR progression time; supports personalized screening | Time prediction solved; classification remains open |
| 2025 [39] | Inception ResNet V2, MobileNet, Residual Net | 645 clinical images | 93 | Binary classification of eye diseases | DR grading can be implemented |
| 2025 [40] | Inception-ResNet-v2 + GRU | APTOS | 98.00 | FFO fine-tunes GRU | Optimization improves accuracy |
HP achieved using IMPO optimizer for training the proposed ARMDiaRD
| Sl. No. | Hyperparameter | Specification |
|---|---|---|
| 01 | Learning rate | 0.00039 |
| 02 | Number of epochs | 40 |
| 03 | Batch size | 32 |
| 04 | Dropout | 0.428 |
| 05 | Embed dim | 80 |
Class-wise image summary after dataset agglomeration process used for ARMDiaDR system experimentation
| textbfDataset | No DR | Mild NPDR | Moderate NPDR | Severe NPDR | PDR | Total Images | Dimensions |
|---|---|---|---|---|---|---|---|
| APTOS [20] | 1,805 | 370 | 999 | 193 | 295 | 3,662 | Mixed |
| IDRID [21] | 168 | 25 | 168 | 93 | 62 | 516 | 4,288 × 2,848 |
| EYEPACS [19] | 25,802 | 2,438 | 5,288 | 872 | 708 | 35,108 | Mixed |
| MESSIDOR-V1 [23] | 546 | 153 | 247 | 254 | – | 1,200 | 2,240 × 1,488 |
| MESSIDOR-V2 [46] | 1,017 | 270 | 347 | 75 | 35 | 1,744 | 2,240 × 1,488 |
| Training Dataset | 5,600 | 5,600 | 5,600 | 5,600 | 5,600 | 28,000 | 224 × 224 |
| Testing Dataset | 1,400 | 1,400 | 1,400 | 1,400 | 1,400 | 7,000 | 224 × 224 |
Evaluation performance metrics used
| Performance Metric | Mathematical Expression | Description |
|---|---|---|
| Accuracy |
| Provides a high-level overview of model performance. |
| Recall |
| Measures how many actual positive instances the model correctly identified. |
| Precision |
| Measures the accuracy of positive predictions. |
| F1-Score |
| Harmonic mean of precision and recall; balances false positives and false negatives. |
| QWK |
| Measures agreement between two raters, penalizing bigger disagreements quadratically. |
| MAE |
| Measures average magnitude of the errors between predicted and actual values. |
| MSE |
| Similar to MAE but penalizes larger errors more by squaring them. |
| Spearman Correlation |
| Measures strength and direction of a monotonic relationship in ranked or ordinal data. |