Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Liudmila Prokhorenkova

Relevance-Based Embeddings: Lightweight Candidate Retrieval via Heavy-Ranker Calls

Jul 03, 2026

Kirill Shevkunov, Andrey Ploskonosov, Liudmila Prokhorenkova

Abstract:In many machine learning applications, the most relevant items for a query should be efficiently retrieved. The relevance function is usually an expensive similarity model, making the exhaustive search infeasible. A typical solution is to train another model that separately embeds queries and items to a vector space, where similarity is defined via the dot product or cosine similarity. This allows one to search the relevant items through fast approximate nearest neighbor search at the cost of some reduction in quality. To compensate for this reduction, the found items (candidates) are re-ranked by the expensive ranking model. In this paper, we investigate an alternative approach to candidate selection that utilizes the scores of the expensive model to improve the representations of queries and items. The idea is to describe each query (item) by its relevance to a set of support items (queries) and use these new representations to obtain query (item) embeddings. We theoretically prove that such embeddings are powerful enough to approximate any complex similarity model (under mild conditions). We also investigate the choice of support items, which is a crucial ingredient of the proposed approach. The experiments on diverse academic and production datasets illustrate the power of our method.

Via

Access Paper or Ask Questions

A Fair Evaluation of Graph Foundation Models for Node Property Prediction

Jun 23, 2026

Oleg Platonov, Gleb Bazhenov, Dmitry Eremeev, Liudmila Prokhorenkova

Abstract:Due to the wide use of graph-structured data in different fields of industry and science, the development of Graph Foundation Models (GFMs) has recently attracted a lot of attention. While many different types of models are called GFMs, particular interest has been paid to GFMs designed for node property prediction tasks, which is one of the most popular settings in Graph ML with lots of real-world applications from fraud detection in financial and social networks to recommendation systems for e-commerce and user-generated content platforms. While a number of GFMs for this task have been recently proposed, the field has not converged to a unified evaluation setting, and different works evaluate their models in widely different ways, preventing reliable comparison of GFMs with each other and with other types of models. In this work, we conduct a fair and rigorous reevaluation of 9 recent GFMs for node property prediction, comparing them to strong Graph Neural Network (GNN) baselines. We find that, among these GFMs, only the most recent ones based on the Prior-data Fitted Networks paradigm outperform well-tuned GNNs in predictive performance, although at a higher inference cost.

* Accepted at The Workshop on Graph Foundation Models at ICML 2026

Via

Access Paper or Ask Questions

Cluster Attention for Graph Machine Learning

Apr 08, 2026

Oleg Platonov, Liudmila Prokhorenkova

Abstract:Message Passing Neural Networks have recently become the most popular approach to graph machine learning tasks; however, their receptive field is limited by the number of message passing layers. To increase the receptive field, Graph Transformers with global attention have been proposed; however, global attention does not take into account the graph topology and thus lacks graph-structure-based inductive biases, which are typically very important for graph machine learning tasks. In this work, we propose an alternative approach: cluster attention (CLATT). We divide graph nodes into clusters with off-the-shelf graph community detection algorithms and let each node attend to all other nodes in each cluster. CLATT provides large receptive fields while still having strong graph-structure-based inductive biases. We show that augmenting Message Passing Neural Networks or Graph Transformers with CLATT significantly improves their performance on a wide range of graph datasets including datasets from the recently introduced GraphLand benchmark representing real-world applications of graph machine learning.

Via

Access Paper or Ask Questions

Fine-Grained Urban Traffic Forecasting on Metropolis-Scale Road Networks

Oct 02, 2025

Fedor Velikonivtsev, Oleg Platonov, Gleb Bazhenov, Liudmila Prokhorenkova

Abstract:Traffic forecasting on road networks is a complex task of significant practical importance that has recently attracted considerable attention from the machine learning community, with spatiotemporal graph neural networks (GNNs) becoming the most popular approach. The proper evaluation of traffic forecasting methods requires realistic datasets, but current publicly available benchmarks have significant drawbacks, including the absence of information about road connectivity for road graph construction, limited information about road properties, and a relatively small number of road segments that falls short of real-world applications. Further, current datasets mostly contain information about intercity highways with sparsely located sensors, while city road networks arguably present a more challenging forecasting task due to much denser roads and more complex urban traffic patterns. In this work, we provide a more complete, realistic, and challenging benchmark for traffic forecasting by releasing datasets representing the road networks of two major cities, with the largest containing almost 100,000 road segments (more than a 10-fold increase relative to existing datasets). Our datasets contain rich road features and provide fine-grained data about both traffic volume and traffic speed, allowing for building more holistic traffic forecasting systems. We show that most current implementations of neural spatiotemporal models for traffic forecasting have problems scaling to datasets of our size. To overcome this issue, we propose an alternative approach to neural traffic forecasting that uses a GNN without a dedicated module for temporal sequence processing, thus achieving much better scalability, while also demonstrating stronger forecasting performance. We hope our datasets and modeling insights will serve as a valuable resource for research in traffic forecasting.

Via

Access Paper or Ask Questions

Turning Tabular Foundation Models into Graph Foundation Models

Aug 28, 2025

Dmitry Eremeev, Gleb Bazhenov, Oleg Platonov, Artem Babenko, Liudmila Prokhorenkova

Abstract:While foundation models have revolutionized such fields as natural language processing and computer vision, their application and potential within graph machine learning remain largely unexplored. One of the key challenges in designing graph foundation models (GFMs) is handling diverse node features that can vary across different graph datasets. Although many works on GFMs have been focused exclusively on text-attributed graphs, the problem of handling arbitrary features of other types in GFMs has not been fully addressed. However, this problem is not unique to the graph domain, as it also arises in the field of machine learning for tabular data. In this work, motivated by the recent success of tabular foundation models like TabPFNv2, we propose G2T-FM, a simple graph foundation model that employs TabPFNv2 as a backbone. Specifically, G2T-FM augments the original node features with neighborhood feature aggregation, adds structural embeddings, and then applies TabPFNv2 to the constructed node representations. Even in a fully in-context regime, our model achieves strong results, significantly outperforming publicly available GFMs and performing on par with well-tuned GNNs trained from scratch. Moreover, after finetuning, G2T-FM surpasses well-tuned GNN baselines, highlighting the potential of the proposed approach. More broadly, our paper reveals a previously overlooked direction of utilizing tabular foundation models for graph machine learning tasks.

Via

Access Paper or Ask Questions

Revisiting Graph Homophily Measures

Dec 12, 2024

Mikhail Mironov, Liudmila Prokhorenkova

Figure 1 for Revisiting Graph Homophily Measures

Figure 2 for Revisiting Graph Homophily Measures

Figure 3 for Revisiting Graph Homophily Measures

Figure 4 for Revisiting Graph Homophily Measures

Abstract:Homophily is a graph property describing the tendency of edges to connect similar nodes. There are several measures used for assessing homophily but all are known to have certain drawbacks: in particular, they cannot be reliably used for comparing datasets with varying numbers of classes and class size balance. To show this, previous works on graph homophily suggested several properties desirable for a good homophily measure, also noting that no existing homophily measure has all these properties. Our paper addresses this issue by introducing a new homophily measure - unbiased homophily - that has all the desirable properties and thus can be reliably used across datasets with different label distributions. The proposed measure is suitable for undirected (and possibly weighted) graphs. We show both theoretically and via empirical examples that the existing homophily measures have serious drawbacks while unbiased homophily has a desirable behavior for the considered scenarios. Finally, when it comes to directed graphs, we prove that some desirable properties contradict each other and thus a measure satisfying all of them cannot exist.

* 22 pages, 3 figures, Learning on Graphs Conference 2024

Via

Access Paper or Ask Questions

Measuring Diversity: Axioms and Challenges

Oct 18, 2024

Mikhail Mironov, Liudmila Prokhorenkova

Abstract:The concept of diversity is widely used in various applications: from image or molecule generation to recommender systems. Thus, being able to properly measure diversity is important. This paper addresses the problem of quantifying diversity for a set of objects. First, we make a systematic review of existing diversity measures and explore their undesirable behavior in some cases. Based on this review, we formulate three desirable properties (axioms) of a reliable diversity measure: monotonicity, uniqueness, and continuity. We show that none of the existing measures has all three properties and thus these measures are not suitable for quantifying diversity. Then, we construct two examples of measures that have all the desirable properties, thus proving that the list of axioms is not self-contradicting. Unfortunately, the constructed examples are too computationally complex for practical use, thus we pose an open problem of constructing a diversity measure that has all the listed properties and can be computed in practice.

* 17 pages, 7 figures

Via

Access Paper or Ask Questions

Challenges of Generating Structurally Diverse Graphs

Sep 27, 2024

Fedor Velikonivtsev, Mikhail Mironov, Liudmila Prokhorenkova

Figure 1 for Challenges of Generating Structurally Diverse Graphs

Figure 2 for Challenges of Generating Structurally Diverse Graphs

Figure 3 for Challenges of Generating Structurally Diverse Graphs

Figure 4 for Challenges of Generating Structurally Diverse Graphs

Abstract:For many graph-related problems, it can be essential to have a set of structurally diverse graphs. For instance, such graphs can be used for testing graph algorithms or their neural approximations. However, to the best of our knowledge, the problem of generating structurally diverse graphs has not been explored in the literature. In this paper, we fill this gap. First, we discuss how to define diversity for a set of graphs, why this task is non-trivial, and how one can choose a proper diversity measure. Then, for a given diversity measure, we propose and compare several algorithms optimizing it: we consider approaches based on standard random graph models, local graph optimization, genetic algorithms, and neural generative models. We show that it is possible to significantly improve diversity over basic random graph generators. Additionally, our analysis of generated graphs allows us to better understand the properties of graph distances: depending on which diversity measure is used for optimization, the obtained graphs may possess very different structural properties which gives insights about the sensitivity of the graph distance underlying the diversity measure.

Via

Access Paper or Ask Questions

TabGraphs: A Benchmark and Strong Baselines for Learning on Graphs with Tabular Node Features

Sep 26, 2024

Gleb Bazhenov, Oleg Platonov, Liudmila Prokhorenkova

Figure 1 for TabGraphs: A Benchmark and Strong Baselines for Learning on Graphs with Tabular Node Features

Figure 2 for TabGraphs: A Benchmark and Strong Baselines for Learning on Graphs with Tabular Node Features

Figure 3 for TabGraphs: A Benchmark and Strong Baselines for Learning on Graphs with Tabular Node Features

Figure 4 for TabGraphs: A Benchmark and Strong Baselines for Learning on Graphs with Tabular Node Features

Abstract:Tabular machine learning is an important field for industry and science. In this field, table rows are usually treated as independent data samples, but additional information about relations between them is sometimes available and can be used to improve predictive performance. Such information can be naturally modeled with a graph, thus tabular machine learning may benefit from graph machine learning methods. However, graph machine learning models are typically evaluated on datasets with homogeneous node features, which have little in common with heterogeneous mixtures of numerical and categorical features present in tabular datasets. Thus, there is a critical difference between the data used in tabular and graph machine learning studies, which does not allow one to understand how successfully graph models can be transferred to tabular data. To bridge this gap, we propose a new benchmark of diverse graphs with heterogeneous tabular node features and realistic prediction tasks. We use this benchmark to evaluate a vast set of models, including simple methods previously overlooked in the literature. Our experiments show that graph neural networks (GNNs) can indeed often bring gains in predictive performance for tabular data, but standard tabular models also can be adapted to work with graph data by using simple feature preprocessing, which sometimes enables them to compete with and even outperform GNNs. Based on our empirical study, we provide insights for researchers and practitioners in both tabular and graph machine learning fields.

Via

Access Paper or Ask Questions

Discrete Neural Algorithmic Reasoning

Feb 18, 2024

Gleb Rodionov, Liudmila Prokhorenkova

Figure 1 for Discrete Neural Algorithmic Reasoning

Figure 2 for Discrete Neural Algorithmic Reasoning

Abstract:Neural algorithmic reasoning aims to capture computations with neural networks via learning the models to imitate the execution of classical algorithms. While common architectures are expressive enough to contain the correct model in the weights space, current neural reasoners are struggling to generalize well on out-of-distribution data. On the other hand, classical computations are not affected by distribution shifts as they can be described as transitions between discrete computational states. In this work, we propose to force neural reasoners to maintain the execution trajectory as a combination of finite predefined states. Trained with supervision on the algorithm's state transitions, such models are able to perfectly align with the original algorithm. To show this, we evaluate our approach on the SALSA-CLRS benchmark, where we get perfect test scores for all tasks. Moreover, the proposed architectural choice allows us to prove the correctness of the learned algorithms for any test data.

Via

Access Paper or Ask Questions