GOGGLE: Generative modelling for tabular data by learning relational structure

T Liu, Z Qian, J Berrevoets…�- …�Conference on Learning�…, 2023 - openreview.net
The Eleventh International Conference on Learning Representations, 2023openreview.net
Deep generative models learn highly complex and non-linear representations to generate
realistic synthetic data. While they have achieved notable success in computer vision and
natural language processing, similar advances have been less demonstrable in the tabular
domain. This is partially because generative modelling of tabular data entails a particular set
of challenges, including heterogeneous relationships, limited number of samples, and
difficulties in incorporating prior knowledge. Additionally, unlike their counterparts in image�…
Deep generative models learn highly complex and non-linear representations to generate realistic synthetic data. While they have achieved notable success in computer vision and natural language processing, similar advances have been less demonstrable in the tabular domain. This is partially because generative modelling of tabular data entails a particular set of challenges, including heterogeneous relationships, limited number of samples, and difficulties in incorporating prior knowledge. Additionally, unlike their counterparts in image and sequence domain, deep generative models for tabular data almost exclusively employ fully-connected layers, which encode weak inductive biases about relationships between inputs. Real-world data generating processes can often be represented using relational structures, which encode sparse, heterogeneous relationships between variables. In this work, we learn and exploit relational structure underlying tabular data to better model variable dependence, and as a natural means to introduce regularization on relationships and include prior knowledge. Specifically, we introduce GOGGLE, an end-to-end message passing scheme that jointly learns the relational structure and corresponding functional relationships as the basis of generating synthetic samples. Using real-world datasets, we provide empirical evidence that the proposed method is effective in generating realistic synthetic data and exploiting domain knowledge for downstream tasks.
openreview.net
Showing the best result for this search. See all results