
Figure 1
Spatial data distribution: homogeneous within subregions, non-IID across subregions.

Figure 2
Data distribution patterns across five subregions: (a) IID data, (b) Dirichlet (non-IID), and (c) Hard (Highly non-IID). Each color represents a different subregion.
Table 1
Summary of the characteristics of the datasets included in the benchmark. The first five datasets are designed for classification tasks, with target values corresponding to discrete classes. In contrast, the last dataset is used for a regression task, where the target values span a continuous range.
| DATASET | TRAINING SIZE | TEST SIZE | FEATURES | TARGETS |
|---|---|---|---|---|
| MNIST | 60,000 | 10,000 | 784 | 10 |
| Fashion MNIST | 60,000 | 10,000 | 784 | 10 |
| EMNIST | 124,800 | 20,800 | 784 | 27 |
| CIFAR-10 | 50,000 | 10,000 | 3,072 | 10 |
| CIFAR-100 | 50,000 | 10,000 | 3,072 | 100 |
| UTKFace | 20,150 | 3,557 | 120,000 | [1;116] |

Listing 1
An example of how ProFed is used to partition the EMNIST dataset among devices.

Figure 3
Validation accuracy results across MNIST, FashionMNIST, and EMNIST datasets using Dirichlet and hard partitioning methods.
