Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wilhelm Ågren

Hierarchical Conditional Tabular GAN for Multi-Tabular Synthetic Data Generation

Nov 11, 2024

Wilhelm Ågren, Victorio Úbeda Sosa

Abstract:The generation of synthetic data is a state-of-the-art approach to leverage when access to real data is limited or privacy regulations limit the usability of sensitive data. A fair amount of research has been conducted on synthetic data generation for single-tabular datasets, but only a limited amount of research has been conducted on multi-tabular datasets with complex table relationships. In this paper we propose the algorithm HCTGAN to synthesize multi-tabular data from complex multi-tabular datasets. We compare our results to the probabilistic model HMA1. Our findings show that our proposed algorithm can more efficiently sample large amounts of synthetic data for deep and complex multi-tabular datasets, whilst achieving adequate data quality and always guaranteeing referential integrity. We conclude that the HCTGAN algorithm is suitable for generating large amounts of synthetic data efficiently for deep multi-tabular datasets with complex relationships. We additionally suggest that the HMA1 model should be used on smaller datasets when emphasis is on data quality.

Via

Access Paper or Ask Questions

The NT-Xent loss upper bound

May 06, 2022

Wilhelm Ågren

Abstract:Self-supervised learning is a growing paradigm in deep representation learning, showing great generalization capabilities and competitive performance in low-labeled data regimes. The SimCLR framework proposes the NT-Xent loss for contrastive representation learning. The objective of the loss function is to maximize agreement, similarity, between sampled positive pairs. This short paper derives and proposes an upper bound for the loss and average similarity. An analysis of the implications is however not provided, but we strongly encourage anyone in the field to conduct this.

Via

Access Paper or Ask Questions