Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Milad Hoseinpour

Domain-Constrained Diffusion Models to Synthesize Tabular Data: A Case Study in Power Systems

Jun 12, 2025

Milad Hoseinpour, Vladimir Dvorkin

Abstract:Growing concerns over privacy, security, and legal barriers are driving the rising demand for synthetic data across domains such as healthcare, finance, and energy. While generative models offer a promising solution to overcome these barriers, their utility depends on the incorporation of domain-specific knowledge. We propose to synthesize data using a guided diffusion model that integrates domain constraints directly into the generative process. We develop the model in the context of power systems, with potential applicability to other domains that involve tabular data. Specifically, we synthesize statistically representative and high-fidelity power flow datasets. To satisfy domain constraints, e.g., Kirchhoff laws, we introduce a gradient-based guidance to steer the sampling trajectory in a feasible direction. Numerical results demonstrate the effectiveness of our approach.

* 9 pages, 6 figures, conference

Via

Access Paper or Ask Questions

Accuracy Amplification in Differentially Private Logistic Regression: A Pre-Training Approach

Jul 25, 2023

Mohammad Hoseinpour, Milad Hoseinpour, Ali Aghagolzadeh

Abstract:Machine learning (ML) models can memorize training datasets. As a result, training ML models over private datasets can violate the privacy of individuals. Differential privacy (DP) is a rigorous privacy notion to preserve the privacy of underlying training datasets in ML models. Yet, training ML models in a DP framework usually degrades the accuracy of ML models. This paper aims to boost the accuracy of a DP-ML model, specifically a logistic regression model, via a pre-training module. In more detail, we initially pre-train our model on a public training dataset that there is no privacy concern about it. Then, we fine-tune our model via the DP logistic regression with the private dataset. In the numerical results, we show that adding a pre-training module significantly improves the accuracy of the DP logistic regression.

Via

Access Paper or Ask Questions