Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jai Bardhan

REALM: A Real-to-Sim Validated Benchmark for Generalization in Robotic Manipulation

Dec 22, 2025

Martin Sedlacek, Pavlo Yefanov, Georgy Ponimatkin, Jai Bardhan, Simon Pilc, Mederic Fourmy, Evangelos Kazakos, Cees G. M. Snoek, Josef Sivic, Vladimir Petrik

Abstract:Vision-Language-Action (VLA) models empower robots to understand and execute tasks described by natural language instructions. However, a key challenge lies in their ability to generalize beyond the specific environments and conditions they were trained on, which is presently difficult and expensive to evaluate in the real-world. To address this gap, we present REALM, a new simulation environment and benchmark designed to evaluate the generalization capabilities of VLA models, with a specific emphasis on establishing a strong correlation between simulated and real-world performance through high-fidelity visuals and aligned robot control. Our environment offers a suite of 15 perturbation factors, 7 manipulation skills, and more than 3,500 objects. Finally, we establish two task sets that form our benchmark and evaluate the π_{0}, π_{0}-FAST, and GR00T N1.5 VLA models, showing that generalization and robustness remain an open challenge. More broadly, we also show that simulation gives us a valuable proxy for the real-world and allows us to systematically probe for and quantify the weaknesses and failure modes of VLAs. Project page: https://martin-sedlacek.com/realm

* 9 pages, 10 figures

Via

Access Paper or Ask Questions

Refine and Align: Confidence Calibration through Multi-Agent Interaction in VQA

Nov 14, 2025

Ayush Pandey, Jai Bardhan, Ishita Jain, Ramya S Hebbalaguppe, Rohan Raju Dhanakshirur, Lovekesh Vig

Abstract:In the context of Visual Question Answering (VQA) and Agentic AI, calibration refers to how closely an AI system's confidence in its answers reflects their actual correctness. This aspect becomes especially important when such systems operate autonomously and must make decisions under visual uncertainty. While modern VQA systems, powered by advanced vision-language models (VLMs), are increasingly used in high-stakes domains like medical diagnostics and autonomous navigation due to their improved accuracy, the reliability of their confidence estimates remains under-examined. Particularly, these systems often produce overconfident responses. To address this, we introduce AlignVQA, a debate-based multi-agent framework, in which diverse specialized VLM -- each following distinct prompting strategies -- generate candidate answers and then engage in two-stage interaction: generalist agents critique, refine and aggregate these proposals. This debate process yields confidence estimates that more accurately reflect the model's true predictive performance. We find that more calibrated specialized agents produce better aligned confidences. Furthermore, we introduce a novel differentiable calibration-aware loss function called aligncal designed to fine-tune the specialized agents by minimizing an upper bound on the calibration error. This objective explicitly improves the fidelity of each agent's confidence estimates. Empirical results across multiple benchmark VQA datasets substantiate the efficacy of our approach, demonstrating substantial reductions in calibration discrepancies. Furthermore, we propose a novel differentiable calibration-aware loss to fine-tune the specialized agents and improve the quality of their individual confidence estimates based on minimising upper bound calibration error.

* 17 pages, 6 figures, 5 tables. Accepted to Special Track on AI Alignment, AAAI 2026. Project Page- https://refine-align.github.io/

Via

Access Paper or Ask Questions

Tagging fully hadronic exotic decays of the vectorlike $\mathbf{B}$ quark using a graph neural network

May 12, 2025

Jai Bardhan, Tanumoy Mandal, Subhadip Mitra, Cyrin Neeraj, Mihir Rawat

$Figure 1 for Tagging fully hadronic exotic decays of the vectorlike $\mathbf{B}$ quark using a graph neural network$

$Figure 2 for Tagging fully hadronic exotic decays of the vectorlike $\mathbf{B}$ quark using a graph neural network$

$Figure 3 for Tagging fully hadronic exotic decays of the vectorlike $\mathbf{B}$ quark using a graph neural network$

$Figure 4 for Tagging fully hadronic exotic decays of the vectorlike $\mathbf{B}$ quark using a graph neural network$

Abstract:Following up on our earlier study in [J. Bardhan et al., Machine learning-enhanced search for a vectorlike singlet B quark decaying to a singlet scalar or pseudoscalar, Phys. Rev. D 107 (2023) 115001; arXiv:2212.02442], we investigate the LHC prospects of pair-produced vectorlike $B$ quarks decaying exotically to a new gauge-singlet (pseudo)scalar field $\Phi$ and a $b$ quark. After the electroweak symmetry breaking, the $\Phi$ decays predominantly to $gg/bb$ final states, leading to a fully hadronic $2b+4j$ or $6b$ signature. Because of the large Standard Model background and the lack of leptonic handles, it is a difficult channel to probe. To overcome the challenge, we employ a hybrid deep learning model containing a graph neural network followed by a deep neural network. We estimate that such a state-of-the-art deep learning analysis pipeline can lead to a performance comparable to that in the semi-leptonic mode, taking the discovery (exclusion) reach up to about $M_B=1.8\:(2.4)$~TeV at HL-LHC when $B$ decays fully exotically, i.e., BR$(B \to b\Phi) = 100\%$.

* 13 pages, 10 figures, 3 tables

Via

Access Paper or Ask Questions

HEP-JEPA: A foundation model for collider physics using joint embedding predictive architecture

Feb 06, 2025

Jai Bardhan, Radhikesh Agrawal, Abhiram Tilak, Cyrin Neeraj, Subhadip Mitra

Figure 1 for HEP-JEPA: A foundation model for collider physics using joint embedding predictive architecture

Figure 2 for HEP-JEPA: A foundation model for collider physics using joint embedding predictive architecture

Figure 3 for HEP-JEPA: A foundation model for collider physics using joint embedding predictive architecture

Figure 4 for HEP-JEPA: A foundation model for collider physics using joint embedding predictive architecture

Abstract:We present a transformer architecture-based foundation model for tasks at high-energy particle colliders such as the Large Hadron Collider. We train the model to classify jets using a self-supervised strategy inspired by the Joint Embedding Predictive Architecture. We use the JetClass dataset containing 100M jets of various known particles to pre-train the model with a data-centric approach -- the model uses a fraction of the jet constituents as the context to predict the embeddings of the unseen target constituents. Our pre-trained model fares well with other datasets for standard classification benchmark tasks. We test our model on two additional downstream tasks: top tagging and differentiating light-quark jets from gluon jets. We also evaluate our model with task-specific metrics and baselines and compare it with state-of-the-art models in high-energy physics. Project site: https://hep-jepa.github.io/

* 11 pages, 3 figures, 8 tables. Project website: https://hep-jepa.github.io/

Via

Access Paper or Ask Questions

Constructing sensible baselines for Integrated Gradients

Dec 18, 2024

Jai Bardhan, Cyrin Neeraj, Mihir Rawat, Subhadip Mitra

Figure 1 for Constructing sensible baselines for Integrated Gradients

Figure 2 for Constructing sensible baselines for Integrated Gradients

Figure 3 for Constructing sensible baselines for Integrated Gradients

Figure 4 for Constructing sensible baselines for Integrated Gradients

Abstract:Machine learning methods have seen a meteoric rise in their applications in the scientific community. However, little effort has been put into understanding these "black box" models. We show how one can apply integrated gradients (IGs) to understand these models by designing different baselines, by taking an example case study in particle physics. We find that the zero-vector baseline does not provide good feature attributions and that an averaged baseline sampled from the background events provides consistently more reasonable attributions.

* 7 pages, 5 figures. Accepted to 4th Annual AAAI Workshop on AI to Accelerate Science and Engineering (AI2ASE)

Via

Access Paper or Ask Questions

Loss function to optimise signal significance in particle physics

Dec 12, 2024

Jai Bardhan, Cyrin Neeraj, Subhadip Mitra, Tanumoy Mandal

Abstract:We construct a surrogate loss to directly optimise the significance metric used in particle physics. We evaluate our loss function for a simple event classification task using a linear model and show that it produces decision boundaries that change according to the cross sections of the processes involved. We find that the models trained with the new loss have higher signal efficiency for similar values of estimated signal significance compared to ones trained with a cross-entropy loss, showing promise to improve sensitivity of particle physics searches at colliders.

* 9 pages, 4 figures. Appeared in the Machine Learning for Physical Sciences (ML4PS) workshop in NeurIPS 2024 conference

Via

Access Paper or Ask Questions

ReMOVE: A Reference-free Metric for Object Erasure

Sep 01, 2024

Aditya Chandrasekar, Goirik Chakrabarty, Jai Bardhan, Ramya Hebbalaguppe, Prathosh AP

Figure 1 for ReMOVE: A Reference-free Metric for Object Erasure

Figure 2 for ReMOVE: A Reference-free Metric for Object Erasure

Figure 3 for ReMOVE: A Reference-free Metric for Object Erasure

Figure 4 for ReMOVE: A Reference-free Metric for Object Erasure

Abstract:We introduce $\texttt{ReMOVE}$, a novel reference-free metric for assessing object erasure efficacy in diffusion-based image editing models post-generation. Unlike existing measures such as LPIPS and CLIPScore, $\texttt{ReMOVE}$ addresses the challenge of evaluating inpainting without a reference image, common in practical scenarios. It effectively distinguishes between object removal and replacement. This is a key issue in diffusion models due to stochastic nature of image generation. Traditional metrics fail to align with the intuitive definition of inpainting, which aims for (1) seamless object removal within masked regions (2) while preserving the background continuity. $\texttt{ReMOVE}$ not only correlates with state-of-the-art metrics and aligns with human perception but also captures the nuanced aspects of the inpainting process, providing a finer-grained evaluation of the generated outputs.

* Accepted at The First Workshop on the Evaluation of Generative Foundation Models (EvGENFM) at CVPR 2024

Via

Access Paper or Ask Questions