Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Benchmarking Synthetic Tabular Data: A Multi-Dimensional Evaluation Framework

Apr 02, 2025

Andrey Sidorenko, Michael Platzer, Mario Scriminaci, Paul Tiwald

Share this with someone who'll enjoy it:

Abstract:Evaluating the quality of synthetic data remains a key challenge for ensuring privacy and utility in data-driven research. In this work, we present an evaluation framework that quantifies how well synthetic data replicates original distributional properties while ensuring privacy. The proposed approach employs a holdout-based benchmarking strategy that facilitates quantitative assessment through low- and high-dimensional distribution comparisons, embedding-based similarity measures, and nearest-neighbor distance metrics. The framework supports various data types and structures, including sequential and contextual information, and enables interpretable quality diagnostics through a set of standardized metrics. These contributions aim to support reproducibility and methodological consistency in benchmarking of synthetic data generation techniques. The code of the framework is available at https://github.com/mostly-ai/mostlyai-qa.

* 16 pages, 7 figures, 1 table

View paper on

Share this with someone who'll enjoy it:

Title:Benchmarking Synthetic Tabular Data: A Multi-Dimensional Evaluation Framework

Paper and Code