Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Maksim Zhdanov

BSA: Ball Sparse Attention for Large-scale Geometries

Jun 14, 2025

Catalin E. Brita, Hieu Nguyen, Lohithsai Yadala Chanchu, Domonkos Nagy, Maksim Zhdanov

Abstract:Self-attention scales quadratically with input size, limiting its use for large-scale physical systems. Although sparse attention mechanisms provide a viable alternative, they are primarily designed for regular structures such as text or images, making them inapplicable for irregular geometries. In this work, we present Ball Sparse Attention (BSA), which adapts Native Sparse Attention (NSA) (Yuan et al., 2025) to unordered point sets by imposing regularity using the Ball Tree structure from the Erwin Transformer (Zhdanov et al., 2025). We modify NSA's components to work with ball-based neighborhoods, yielding a global receptive field at sub-quadratic cost. On an airflow pressure prediction task, we achieve accuracy comparable to Full Attention while significantly reducing the theoretical computational complexity. Our implementation is available at https://github.com/britacatalin/bsa.

* Long Context Foundation Models Workshop @ ICML 2025

Via

Access Paper or Ask Questions

Electrostatics from Laplacian Eigenbasis for Neural Network Interatomic Potentials

May 20, 2025

Maksim Zhdanov, Vladislav Kurenkov

Abstract:Recent advances in neural network interatomic potentials have emerged as a promising research direction. However, popular deep learning models often lack auxiliary constraints grounded in physical laws, which could accelerate training and improve fidelity through physics-based regularization. In this work, we introduce $\Phi$-Module, a universal plugin module that enforces Poisson's equation within the message-passing framework to learn electrostatic interactions in a self-supervised manner. Specifically, each atom-wise representation is encouraged to satisfy a discretized Poisson's equation, making it possible to acquire a potential $\boldsymbol{\phi}$ and a corresponding charge density $\boldsymbol{\rho}$ linked to the learnable Laplacian eigenbasis coefficients of a given molecular graph. We then derive an electrostatic energy term, crucial for improved total energy predictions. This approach integrates seamlessly into any existing neural potential with insignificant computational overhead. Experiments on the OE62 and MD22 benchmarks confirm that models combined with $\Phi$-Module achieve robust improvements over baseline counterparts. For OE62 error reduction ranges from 4.5\% to 17.8\%, and for MD22, baseline equipped with $\Phi$-Module achieves best results on 5 out of 14 cases. Our results underscore how embedding a first-principles constraint in neural interatomic potentials can significantly improve performance while remaining hyperparameter-friendly, memory-efficient and lightweight in training. Code will be available at \href{https://github.com/dunnolab/phi-module}{dunnolab/phi-module}.

Via

Access Paper or Ask Questions

AdS-GNN -- a Conformally Equivariant Graph Neural Network

May 19, 2025

Maksim Zhdanov, Nabil Iqbal, Erik Bekkers, Patrick Forré

Abstract:Conformal symmetries, i.e.\ coordinate transformations that preserve angles, play a key role in many fields, including physics, mathematics, computer vision and (geometric) machine learning. Here we build a neural network that is equivariant under general conformal transformations. To achieve this, we lift data from flat Euclidean space to Anti de Sitter (AdS) space. This allows us to exploit a known correspondence between conformal transformations of flat space and isometric transformations on the AdS space. We then build upon the fact that such isometric transformations have been extensively studied on general geometries in the geometric deep learning literature. We employ message-passing layers conditioned on the proper distance, yielding a computationally efficient framework. We validate our model on tasks from computer vision and statistical physics, demonstrating strong performance, improved generalization capacities, and the ability to extract conformal data such as scaling dimensions from the trained network.

Via

Access Paper or Ask Questions

Erwin: A Tree-based Hierarchical Transformer for Large-scale Physical Systems

Feb 24, 2025

Maksim Zhdanov, Max Welling, Jan-Willem van de Meent

Abstract:Large-scale physical systems defined on irregular grids pose significant scalability challenges for deep learning methods, especially in the presence of long-range interactions and multi-scale coupling. Traditional approaches that compute all pairwise interactions, such as attention, become computationally prohibitive as they scale quadratically with the number of nodes. We present Erwin, a hierarchical transformer inspired by methods from computational many-body physics, which combines the efficiency of tree-based algorithms with the expressivity of attention mechanisms. Erwin employs ball tree partitioning to organize computation, which enables linear-time attention by processing nodes in parallel within local neighborhoods of fixed size. Through progressive coarsening and refinement of the ball tree structure, complemented by a novel cross-ball interaction mechanism, it captures both fine-grained local details and global features. We demonstrate Erwin's effectiveness across multiple domains, including cosmology, molecular dynamics, and particle fluid dynamics, where it consistently outperforms baseline methods both in accuracy and computational efficiency.

Via

Access Paper or Ask Questions

Clifford-Steerable Convolutional Neural Networks

Feb 22, 2024

Maksim Zhdanov, David Ruhe, Maurice Weiler, Ana Lucic, Johannes Brandstetter, Patrick Forré

Figure 1 for Clifford-Steerable Convolutional Neural Networks

Figure 2 for Clifford-Steerable Convolutional Neural Networks

Figure 3 for Clifford-Steerable Convolutional Neural Networks

Figure 4 for Clifford-Steerable Convolutional Neural Networks

Abstract:We present Clifford-Steerable Convolutional Neural Networks (CS-CNNs), a novel class of $\mathrm{E}(p, q)$-equivariant CNNs. CS-CNNs process multivector fields on pseudo-Euclidean spaces $\mathbb{R}^{p,q}$. They cover, for instance, $\mathrm{E}(3)$-equivariance on $\mathbb{R}^3$ and Poincar\'e-equivariance on Minkowski spacetime $\mathbb{R}^{1,3}$. Our approach is based on an implicit parametrization of $\mathrm{O}(p,q)$-steerable kernels via Clifford group equivariant neural networks. We significantly and consistently outperform baseline methods on fluid dynamics as well as relativistic electrodynamics forecasting tasks.

Via

Access Paper or Ask Questions

Unveiling Empirical Pathologies of Laplace Approximation for Uncertainty Estimation

Dec 16, 2023

Maksim Zhdanov, Stanislav Dereka, Sergey Kolesnikov

Abstract:In this paper, we critically evaluate Bayesian methods for uncertainty estimation in deep learning, focusing on the widely applied Laplace approximation and its variants. Our findings reveal that the conventional method of fitting the Hessian matrix negatively impacts out-of-distribution (OOD) detection efficiency. We propose a different point of view, asserting that focusing solely on optimizing prior precision can yield more accurate uncertainty estimates in OOD detection while preserving adequate calibration metrics. Moreover, we demonstrate that this property is not connected to the training stage of a model but rather to its intrinsic properties. Through extensive experimental evaluation, we establish the superiority of our simplified approach over traditional methods in the out-of-distribution domain.

Via

Access Paper or Ask Questions

Catching Image Retrieval Generalization

Jun 23, 2023

Maksim Zhdanov, Ivan Karpukhin

Figure 1 for Catching Image Retrieval Generalization

Figure 2 for Catching Image Retrieval Generalization

Figure 3 for Catching Image Retrieval Generalization

Figure 4 for Catching Image Retrieval Generalization

Abstract:The concepts of overfitting and generalization are vital for evaluating machine learning models. In this work, we show that the popular Recall@K metric depends on the number of classes in the dataset, which limits its ability to estimate generalization. To fix this issue, we propose a new metric, which measures retrieval performance, and, unlike Recall@K, estimates generalization. We apply the proposed metric to popular image retrieval methods and provide new insights about deep metric learning generalization.

* 4 pages, 3 figures, 2 tables

Via

Access Paper or Ask Questions

Implicit Neural Convolutional Kernels for Steerable CNNs

Dec 12, 2022

Maksim Zhdanov, Nico Hoffmann, Gabriele Cesa

Figure 1 for Implicit Neural Convolutional Kernels for Steerable CNNs

Figure 2 for Implicit Neural Convolutional Kernels for Steerable CNNs

Figure 3 for Implicit Neural Convolutional Kernels for Steerable CNNs

Figure 4 for Implicit Neural Convolutional Kernels for Steerable CNNs

Abstract:Steerable convolutional neural networks (CNNs) provide a general framework for building neural networks equivariant to translations and other transformations belonging to an origin-preserving group $G$, such as reflections and rotations. They rely on standard convolutions with $G$-steerable kernels obtained by analytically solving the group-specific equivariance constraint imposed onto the kernel space. As the solution is tailored to a particular group $G$, the implementation of a kernel basis does not generalize to other symmetry transformations, which complicates the development of group equivariant models. We propose using implicit neural representation via multi-layer perceptrons (MLPs) to parameterize $G$-steerable kernels. The resulting framework offers a simple and flexible way to implement Steerable CNNs and generalizes to any group $G$ for which a $G$-equivariant MLP can be built. We apply our method to point cloud (ModelNet-40) and molecular data (QM9) and demonstrate a significant improvement in performance compared to standard Steerable CNNs.

Via

Access Paper or Ask Questions

Amortized Bayesian Inference of GISAXS Data with Normalizing Flows

Oct 04, 2022

Maksim Zhdanov, Lisa Randolph, Thomas Kluge, Motoaki Nakatsutsumi, Christian Gutt, Marina Ganeva, Nico Hoffmann

Figure 1 for Amortized Bayesian Inference of GISAXS Data with Normalizing Flows

Figure 2 for Amortized Bayesian Inference of GISAXS Data with Normalizing Flows

Figure 3 for Amortized Bayesian Inference of GISAXS Data with Normalizing Flows

Figure 4 for Amortized Bayesian Inference of GISAXS Data with Normalizing Flows

Abstract:Grazing-Incidence Small-Angle X-ray Scattering (GISAXS) is a modern imaging technique used in material research to study nanoscale materials. Reconstruction of the parameters of an imaged object imposes an ill-posed inverse problem that is further complicated when only an in-plane GISAXS signal is available. Traditionally used inference algorithms such as Approximate Bayesian Computation (ABC) rely on computationally expensive scattering simulation software, rendering analysis highly time-consuming. We propose a simulation-based framework that combines variational auto-encoders and normalizing flows to estimate the posterior distribution of object parameters given its GISAXS data. We apply the inference pipeline to experimental data and demonstrate that our method reduces the inference cost by orders of magnitude while producing consistent results with ABC.

Via

Access Paper or Ask Questions

Learning Generative Factors of Neuroimaging Data with Variational auto-encoders

Jun 04, 2022

Maksim Zhdanov, Saskia Steinmann, Nico Hoffmann

Figure 1 for Learning Generative Factors of Neuroimaging Data with Variational auto-encoders

Figure 2 for Learning Generative Factors of Neuroimaging Data with Variational auto-encoders

Figure 3 for Learning Generative Factors of Neuroimaging Data with Variational auto-encoders

Figure 4 for Learning Generative Factors of Neuroimaging Data with Variational auto-encoders

Abstract:Neuroimaging techniques produce high-dimensional, stochastic data from which it might be challenging to extract high-level knowledge about the phenomena of interest. We address this challenge by applying the framework of generative modelling to 1) classify multiple pathologies, 2) recover neurological mechanisms of those pathologies in a data-driven manner and 3) learn robust representations of neuroimaging data. We illustrate the applicability of the proposed approach to identifying schizophrenia, either followed or not by auditory verbal hallucinations. We further demonstrate the ability of the framework to learn disease-related mechanisms that are consistent with current domain knowledge. We also compare the proposed framework with several benchmark approaches and indicate its advantages.

Via

Access Paper or Ask Questions