Abstract
Federated Learning (FL) has emerged as a key paradigm in machine learning but its performance often deteriorates under non-independent and identically distributed (non-IID) client data. Such heterogeneity frequently reflects geographic factors—for example, regional linguistic variations or localized traffic patterns—leading to IID data within regions but with non-IID distributions across them. However, existing FL algorithms are typically evaluated by randomly splitting non-IID data across devices, disregarding their spatial distribution.
To address this gap, we introduce PROFED, a benchmark that simulates data splits with varying degrees of skewness across different regions. We incorporate several skewness methods from the literature and apply them to well-known datasets, including MNIST, FashionMNIST, Extended MNIST, CIFAR-10, CIFAR-100, and UTKFace. Our goal is to provide researchers with a standardized framework to evaluate FL algorithms more effectively and consistently against established baselines.
