ProFed: A Benchmark for Proximity-Based Non-IID Federated Learning

Davide Domini; Christian Otte Ingemann; Gianluca Aguzzi; Lukas Esterle; Mirko Viroli

doi:10.5334/jors.624

Figures & Tables

Spatial data distribution: homogeneous within subregions, non-IID across subregions.

Data distribution patterns across five subregions: **(a)** IID data, **(b)** Dirichlet (non-IID), and **(c)** Hard (Highly non-IID). Each color represents a different subregion.

Table 1

Summary of the characteristics of the datasets included in the benchmark. The first five datasets are designed for classification tasks, with target values corresponding to discrete classes. In contrast, the last dataset is used for a regression task, where the target values span a continuous range.

DATASET	TRAINING SIZE	TEST SIZE	FEATURES	TARGETS
MNIST	60,000	10,000	784	10
Fashion MNIST	60,000	10,000	784	10
EMNIST	124,800	20,800	784	27
CIFAR-10	50,000	10,000	3,072	10
CIFAR-100	50,000	10,000	3,072	100
UTKFace	20,150	3,557	120,000	[1;116]

An example of how ProFed is used to partition the EMNIST dataset among devices.

Validation accuracy results across MNIST, FashionMNIST, and EMNIST datasets using Dirichlet and hard partitioning methods.

Table 2

Results on the test set for different algorithms with different partitioning methods.

Algorithm	IID	Dirichlet	Hard
FedAvg	0.95 ± 0.001	0.9 ± 0.04	0.81 ± 0.01
FedProx	✗	0.886 ± 0.04	0.86 ± 0.01
Scaffold	✗	0.889 ± 0.06	0.81 ± 0.01

ProFed: A Benchmark for Proximity-Based Non-IID Federated Learning

Figures & Tables

Figure 1

Figure 2

Table 1

Listing 1

Figure 3

Table 2

Paradigm

My account