Have a personal or library account? Click to login
Benchmarking Tabular Data Synthesis: Evaluating Tools, Metrics, and Datasets on Prosumer Hardware for End-Users Cover

Benchmarking Tabular Data Synthesis: Evaluating Tools, Metrics, and Datasets on Prosumer Hardware for End-Users

Open Access
|Dec 2025

Abstract

Synthetic data is a useful solution when data is scarce or private, as it supports reproducible experimentation, privacy-preserving data sharing, data re-purposing, and robust evaluation of data systems. This study presents a benchmark for tabular data synthesis (TDS) tools, evaluating their performance across six critical dimensions: handling dataset imbalance, dataset augmentation, handling missing values, privacy, machine learning (ML) utility, and computational performance. Our findings provide practical insights to guide tool selection based on specific use cases and constraints. We assessed 13 tools across 15 datasets from different use cases, focusing on prosumer hardware configurations for end-users and highlight the trade-offs among various TDS models. Sampling-based tools like SMOTE excelled in handling imbalance and efficiency but lacked privacy and variability. Hybrid and Transformer models demonstrated strong results across most dimensions but required substantial computational resources. Diffusion models achieved high scores but were complex to configure, while Bayesian Networks offered efficiency and privacy with limitations in utility. The study also emphasizes non-functional considerations such as runtime, resource efficiency, and configuration challenges. The source code and data have been made available at the Github Repository.

Language: English
Submitted on: Jul 7, 2025
Accepted on: Nov 19, 2025
Published on: Dec 9, 2025
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2025 Maria Fernanda Davila Restrepo, Benjamin Wollmer, Fabian Panse, Wolfram Wingerath, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.