Abstract:Solubility prediction is a standard benchmark in computational chemistry, yet multi-solvent models which reportedly approach the experimental-noise ceiling (i.e. the aleatoric limit) are not yet reliable enough to be deployed. We argue that this gap is partly artefactual: published benchmarks differ in curation policies, evaluate on count-weighted RMSE that hides failure on tail-heavy solvent distributions, and treat the widely cited 0.6-0.8 log S inter-laboratory figure as the aleatoric ceiling even though it reflects worst-case, not expected, disagreement. We introduce SC3, a multi-solvent solubility benchmark built on BigSolDB v2.1 with three contributions: (i) a reproducible curation pipeline yielding 101,535 measurements over 1,327 solutes and 206 solvents, with a recalibrated aleatoric floor of 0.106 log S-roughly 6 times tighter than the conventional figure; (ii) nested Gold/Silver/Bronze consensus tiers with per-point standard deviation, three leakage-checked splits, and a multi-solvent metric suite (PS-RMSE, Z-RMSE); and (iii) a 31-model benchmark across six families, whose best Bronze PS-RMSE sits at 5 times the aleatoric limit, and we observe this is a gap unclosed by any deep alternative tested. We perform three follow-on analyses: data scaling, transfer from quantum-chemistry solvation energies, and feature-level attribution, which demonstrates that calibrated per-point uncertainty is a reusable infrastructure for diagnosis beyond point prediction.
Abstract:Graph Neural Networks (GNNs) have revolutionized Neural Force Fields for atomistic simulations, achieving near-quantum accuracy at reduced cost, yet adapting these models to new chemical systems requires expensive retraining of foundation models. Inspired by model merging in vision and language processing, we introduce GFFMERGE, the first principled framework for closed-form model merging in GNNs. We exploit the linear structure of message-passing layers and formulate merging as a convex embedding-alignment problem with an analytical solution. Through the first systematic benchmarking of model merging for GNNs, we show that existing methods designed for vision and language catastrophically fail on force field regression, while GFFMERGE recovers performance approaching gold standard joint training. Across molecular (MD17, MD22), solid-state (LiPS20), and large-scale graph benchmarks, GFFMERGE and GNNMERGE (its generic GNN counterpart) achieve 5-27$\times$ speedups while enabling modular composition of specialized models. Remarkably, our closed-form solution alone outperforms all baseline methods before fine-tuning and provides superior initialization for faster, data-efficient convergence.
Abstract:Vision-Language Models (VLMs) are increasingly used in safety-critical applications because of their broad reasoning capabilities and ability to generalize with minimal task-specific engineering. Despite these advantages, they can exhibit catastrophic failures in specific real-world situations, constituting failure modes. We introduce REVELIO, a framework for systematically uncovering interpretable failure modes in VLMs. We define a failure mode as a composition of interpretable, domain-relevant concepts-such as pedestrian proximity or adverse weather conditions-under which a target VLM consistently behaves incorrectly. Identifying such failures requires searching over an exponentially large discrete combinatorial space. To address this challenge, REVELIO combines two search procedures: a diversity-aware beam search that efficiently maps the failure landscape, and a Gaussian-process Thompson Sampling strategy that enables broader exploration of complex failure modes. We apply REVELIO to autonomous driving and indoor robotics domains, uncovering previously unreported vulnerabilities in state-of-the-art VLMs. In driving environments, the models often demonstrate weak spatial grounding and fail to account for major obstructions, leading to recommendations that would result in simulated crashes. In indoor robotics tasks, VLMs either miss safety hazards or behave excessively conservatively, producing false alarms and reducing operational efficiency. By identifying structured and interpretable failure modes, REVELIO offers actionable insights that can support targeted VLM safety improvements.
Abstract:Approximate Nearest-Neighbor Search (ANNS) efficiently finds data items whose embeddings are close to that of a given query in a high-dimensional space, aiming to balance accuracy with speed. Used in recommendation systems, image and video retrieval, natural language processing, and retrieval-augmented generation (RAG), ANNS algorithms such as IVFPQ, HNSW graphs, Annoy, and MRPT utilize graph, tree, clustering, and quantization techniques to navigate large vector spaces. Despite this progress, ANNS systems spend up to 99\% of query time to compute distances in their final refinement phase. In this paper, we present PANORAMA, a machine learning-driven approach that tackles the ANNS verification bottleneck through data-adaptive learned orthogonal transforms that facilitate the accretive refinement of distance bounds. Such transforms compact over 90\% of signal energy into the first half of dimensions, enabling early candidate pruning with partial distance computations. We integrate PANORAMA into state-of-the-art ANNS methods, namely IVFPQ/Flat, HNSW, MRPT, and Annoy, without index modification, using level-major memory layouts, SIMD-vectorized partial distance computations, and cache-aware access patterns. Experiments across diverse datasets -- from image-based CIFAR-10 and GIST to modern embedding spaces including OpenAI's Ada 2 and Large 3 -- demonstrate that PANORAMA affords a 2--30$\times$ end-to-end speedup with no recall loss.
Abstract:Graph Edit Distance (GED) is a widely used metric for measuring similarity between two graphs. Computing the optimal GED is NP-hard, leading to the development of various neural and non-neural heuristics. While neural methods have achieved improved approximation quality compared to non-neural approaches, they face significant challenges: (1) They require large amounts of ground truth data, which is itself NP-hard to compute. (2) They operate as black boxes, offering limited interpretability. (3) They lack cross-domain generalization, necessitating expensive retraining for each new dataset. We address these limitations with GRAIL, introducing a paradigm shift in this domain. Instead of training a neural model to predict GED, GRAIL employs a novel combination of large language models (LLMs) and automated prompt tuning to generate a program that is used to compute GED. This shift from predicting GED to generating programs imparts various advantages, including end-to-end interpretability and an autonomous self-evolutionary learning mechanism without ground-truth supervision. Extensive experiments on seven datasets confirm that GRAIL not only surpasses state-of-the-art GED approximation methods in prediction quality but also achieves robust cross-domain generalization across diverse graph distributions.
Abstract:Model merging has gained prominence in machine learning as a method to integrate multiple trained models into a single model without accessing the original training data. While existing approaches have demonstrated success in domains such as computer vision and NLP, their application to Graph Neural Networks (GNNs) remains unexplored. These methods often rely on the assumption of shared initialization, which is seldom applicable to GNNs. In this work, we undertake the first benchmarking study of model merging algorithms for GNNs, revealing their limited effectiveness in this context. To address these challenges, we propose GNNMerge, which utilizes a task-agnostic node embedding alignment strategy to merge GNNs. Furthermore, we establish that under a mild relaxation, the proposed optimization objective admits direct analytical solutions for widely used GNN architectures, significantly enhancing its computational efficiency. Empirical evaluations across diverse datasets, tasks, and architectures establish GNNMerge to be up to 24% more accurate than existing methods while delivering over 2 orders of magnitude speed-up compared to training from scratch.




Abstract:Graph distillation has emerged as a promising avenue to enable scalable training of GNNs by compressing the training dataset while preserving essential graph characteristics. Our study uncovers significant shortcomings in current graph distillation techniques. First, the majority of the algorithms paradoxically require training on the full dataset to perform distillation. Second, due to their gradient-emulating approach, these methods require fresh distillation for any change in hyperparameters or GNN architecture, limiting their flexibility and reusability. Finally, they fail to achieve substantial size reduction due to synthesizing fully-connected, edge-weighted graphs. To address these challenges, we present Bonsai, a novel graph distillation method empowered by the observation that \textit{computation trees} form the fundamental processing units of message-passing GNNs. Bonsai distills datasets by encoding a careful selection of \textit{exemplar} trees that maximize the representation of all computation trees in the training set. This unique approach imparts Bonsai as the first linear-time, model-agnostic graph distillation algorithm for node classification that outperforms existing baselines across $6$ real-world datasets on accuracy, while being $22$ times faster on average. Bonsai is grounded in rigorous mathematical guarantees on the adopted approximation strategies making it robust to GNN architectures, datasets, and parameters.




Abstract:3D generative models have shown significant promise in structure-based drug design (SBDD), particularly in discovering ligands tailored to specific target binding sites. Existing algorithms often focus primarily on ligand-target binding, characterized by binding affinity. Moreover, models trained solely on target-ligand distribution may fall short in addressing the broader objectives of drug discovery, such as the development of novel ligands with desired properties like drug-likeness, and synthesizability, underscoring the multifaceted nature of the drug design process. To overcome these challenges, we decouple the problem into molecular generation and property prediction. The latter synergistically guides the diffusion sampling process, facilitating guided diffusion and resulting in the creation of meaningful molecules with the desired properties. We call this guided molecular generation process as TAGMol. Through experiments on benchmark datasets, TAGMol demonstrates superior performance compared to state-of-the-art baselines, achieving a 22% improvement in average Vina Score and yielding favorable outcomes in essential auxiliary properties. This establishes TAGMol as a comprehensive framework for drug generation.




Abstract:We address the growing apprehension that GNNs, in the absence of fairness constraints, might produce biased decisions that disproportionately affect underprivileged groups or individuals. Departing from previous work, we introduce for the first time a method for incorporating the Gini coefficient as a measure of fairness to be used within the GNN framework. Our proposal, GRAPHGINI, works with the two different goals of individual and group fairness in a single system, while maintaining high prediction accuracy. GRAPHGINI enforces individual fairness through learnable attention scores that help in aggregating more information through similar nodes. A heuristic-based maximum Nash social welfare constraint ensures the maximum possible group fairness. Both the individual fairness constraint and the group fairness constraint are stated in terms of a differentiable approximation of the Gini coefficient. This approximation is a contribution that is likely to be of interest even beyond the scope of the problem studied in this paper. Unlike other state-of-the-art, GRAPHGINI automatically balances all three optimization objectives (utility, individual, and group fairness) of the GNN and is free from any manual tuning of weight parameters. Extensive experimentation on real-world datasets showcases the efficacy of GRAPHGINI in making significant improvements in individual fairness compared to all currently available state-of-the-art methods while maintaining utility and group equality.
Abstract:The need to identify graphs having small structural distance from a query arises in biology, chemistry, recommender systems, and social network analysis. Among several methods to measure inter graph distance, Graph Edit Distance (GED) is preferred for its comprehensibility, yet hindered by the NP-hardness of its computation. State-of-the-art GED approximations predominantly employ neural methods, which, however, (i) lack an explanatory edit path corresponding to the approximated GED; (ii) require the NP-hard generation of ground-truth GEDs for training; and (iii) necessitate separate training on each dataset. In this paper, we propose an efficient algebraic unsuper vised method, EUGENE, that approximates GED and yields edit paths corresponding to the approx imated cost, while eliminating the need for ground truth generation and data-specific training. Extensive experimental evaluation demonstrates that the aforementioned benefits of EUGENE do not come at the cost of efficacy. Specifically, EUGENE consistently ranks among the most accurate methods across all of the benchmark datasets and outperforms majority of the neural approaches.