Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Christoph Miksovic

Bootstrapping Learned Cost Models with Synthetic SQL Queries

Aug 27, 2025

Michael Nidd, Christoph Miksovic, Thomas Gschwind, Francesco Fusco, Andrea Giovannini, Ioana Giurgiu

Abstract:Having access to realistic workloads for a given database instance is extremely important to enable stress and vulnerability testing, as well as to optimize for cost and performance. Recent advances in learned cost models have shown that when enough diverse SQL queries are available, one can effectively and efficiently predict the cost of running a given query against a specific database engine. In this paper, we describe our experience in exploiting modern synthetic data generation techniques, inspired by the generative AI and LLM community, to create high-quality datasets enabling the effective training of such learned cost models. Initial results show that we can improve a learned cost model's predictive accuracy by training it with 45% fewer queries than when using competitive generation approaches.

Via

Access Paper or Ask Questions

Business Entity Matching with Siamese Graph Convolutional Networks

May 08, 2021

Evgeny Krivosheev, Mattia Atzeni, Katsiaryna Mirylenka, Paolo Scotton, Christoph Miksovic, Anton Zorin

Figure 1 for Business Entity Matching with Siamese Graph Convolutional Networks

Figure 2 for Business Entity Matching with Siamese Graph Convolutional Networks

Abstract:Data integration has been studied extensively for decades and approached from different angles. However, this domain still remains largely rule-driven and lacks universal automation. Recent developments in machine learning and in particular deep learning have opened the way to more general and efficient solutions to data-integration tasks. In this paper, we demonstrate an approach that allows modeling and integrating entities by leveraging their relations and contextual information. This is achieved by combining siamese and graph neural networks to effectively propagate information between connected entities and support high scalability. We evaluated our approach on the task of integrating data about business entities, demonstrating that it outperforms both traditional rule-based systems and other deep learning approaches.

Via

Access Paper or Ask Questions

Knowledge Graph Embedding using Graph Convolutional Networks with Relation-Aware Attention

Feb 14, 2021

Nasrullah Sheikh, Xiao Qin, Berthold Reinwald, Christoph Miksovic, Thomas Gschwind, Paolo Scotton

Figure 1 for Knowledge Graph Embedding using Graph Convolutional Networks with Relation-Aware Attention

Figure 2 for Knowledge Graph Embedding using Graph Convolutional Networks with Relation-Aware Attention

Figure 3 for Knowledge Graph Embedding using Graph Convolutional Networks with Relation-Aware Attention

Figure 4 for Knowledge Graph Embedding using Graph Convolutional Networks with Relation-Aware Attention

Abstract:Knowledge graph embedding methods learn embeddings of entities and relations in a low dimensional space which can be used for various downstream machine learning tasks such as link prediction and entity matching. Various graph convolutional network methods have been proposed which use different types of information to learn the features of entities and relations. However, these methods assign the same weight (importance) to the neighbors when aggregating the information, ignoring the role of different relations with the neighboring entities. To this end, we propose a relation-aware graph attention model that leverages relation information to compute different weights to the neighboring nodes for learning embeddings of entities and relations. We evaluate our proposed approach on link prediction and entity matching tasks. Our experimental results on link prediction on three datasets (one proprietary and two public) and results on unsupervised entity matching on one proprietary dataset demonstrate the effectiveness of the relation-aware attention.

Via

Access Paper or Ask Questions