Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Arjun Bhardwaj

VR-DAgger: Immersive VR for Dexterous Data Collection and Uncertainty-Guided On-Policy Correction

May 26, 2026

René Zurbrügg, Tifanny Portela, Arjun Bhardwaj, Aravind Elanjimattathil Vijayan, Maximum Wilder-Smith, Marco Hutter

Abstract:Learning from demonstrations is effective for robotic manipulation, but collecting sufficient task-specific data remains a major bottleneck. Under distribution shift, small errors compound, performance degrades, and expert time is often spent on redundant, low-value corrections instead of the few critical failure cases.

Via

Access Paper or Ask Questions

ViserDex: Visual Sim-to-Real for Robust Dexterous In-hand Reorientation

Apr 13, 2026

Arjun Bhardwaj, Maximum Wilder-Smith, Mayank Mittal, Vaishakh Patil, Marco Hutter

Abstract:In-hand object reorientation requires precise estimation of the object pose to handle complex task dynamics. While RGB sensing offers rich semantic cues for pose tracking, existing solutions rely on multi-camera setups or costly ray tracing. We present a sim-to-real framework for monocular RGB in-hand reorientation that integrates 3D Gaussian Splatting (3DGS) to bridge the visual sim-to-real gap. Our key insight is performing domain randomization in the Gaussian representation space: by applying physically consistent, pre-rendering augmentations to 3D Gaussians, we generate photorealistic, randomized visual data for object pose estimation. The manipulation policy is trained using curriculum-based reinforcement learning with teacher-student distillation, enabling efficient learning of complex behaviors. Importantly, both perception and control models can be trained independently on consumer-grade hardware, eliminating the need for large compute clusters. Experiments show that the pose estimator trained with 3DGS data outperforms those trained using conventional rendering data in challenging visual environments. We validate the system on a physical multi-fingered hand equipped with an RGB camera, demonstrating robust reorientation of five diverse objects even under challenging lighting conditions. Our results highlight Gaussian splatting as a practical path for RGB-only dexterous manipulation. For videos of the hardware deployments and additional supplementary materials, please refer to the project website: https://rffr.leggedrobotics.com/works/viserdex/

Via

Access Paper or Ask Questions

Data-Efficient Task Generalization via Probabilistic Model-based Meta Reinforcement Learning

Nov 13, 2023

Arjun Bhardwaj, Jonas Rothfuss, Bhavya Sukhija, Yarden As, Marco Hutter, Stelian Coros, Andreas Krause

Figure 1 for Data-Efficient Task Generalization via Probabilistic Model-based Meta Reinforcement Learning

Figure 2 for Data-Efficient Task Generalization via Probabilistic Model-based Meta Reinforcement Learning

Figure 3 for Data-Efficient Task Generalization via Probabilistic Model-based Meta Reinforcement Learning

Figure 4 for Data-Efficient Task Generalization via Probabilistic Model-based Meta Reinforcement Learning

Abstract:We introduce PACOH-RL, a novel model-based Meta-Reinforcement Learning (Meta-RL) algorithm designed to efficiently adapt control policies to changing dynamics. PACOH-RL meta-learns priors for the dynamics model, allowing swift adaptation to new dynamics with minimal interaction data. Existing Meta-RL methods require abundant meta-learning data, limiting their applicability in settings such as robotics, where data is costly to obtain. To address this, PACOH-RL incorporates regularization and epistemic uncertainty quantification in both the meta-learning and task adaptation stages. When facing new dynamics, we use these uncertainty estimates to effectively guide exploration and data collection. Overall, this enables positive transfer, even when access to data from prior tasks or dynamic settings is severely limited. Our experiment results demonstrate that PACOH-RL outperforms model-based RL and model-based Meta-RL baselines in adapting to new dynamic conditions. Finally, on a real robotic car, we showcase the potential for efficient RL policy adaptation in diverse, data-scarce conditions.

Via

Access Paper or Ask Questions

User Intent Classification using Memory Networks: A Comparative Analysis for a Limited Data Scenario

Jun 19, 2017

Arjun Bhardwaj, Alexander Rudnicky

Figure 1 for User Intent Classification using Memory Networks: A Comparative Analysis for a Limited Data Scenario

Figure 2 for User Intent Classification using Memory Networks: A Comparative Analysis for a Limited Data Scenario

Figure 3 for User Intent Classification using Memory Networks: A Comparative Analysis for a Limited Data Scenario

Figure 4 for User Intent Classification using Memory Networks: A Comparative Analysis for a Limited Data Scenario

Abstract:In this report, we provide a comparative analysis of different techniques for user intent classification towards the task of app recommendation. We analyse the performance of different models and architectures for multi-label classification over a dataset with a relative large number of classes and only a handful examples of each class. We focus, in particular, on memory network architectures, and compare how well the different versions perform under the task constraints. Since the classifier is meant to serve as a module in a practical dialog system, it needs to be able to work with limited training data and incorporate new data on the fly. We devise a 1-shot learning task to test the models under the above constraint. We conclude that relatively simple versions of memory networks perform better than other approaches. Although, for tasks with very limited data, simple non-parametric methods perform comparably, without needing the extra training data.

Via

Access Paper or Ask Questions

GC-SROIQ(C) : Expressive Constraint Modelling and Grounded Circumscription for SROIQ

Apr 03, 2017

Arjun Bhardwaj, Sangeetha

Abstract:Developments in semantic web technologies have promoted ontological encoding of knowledge from diverse domains. However, modelling many practical domains requires more expressive representations schemes than what the standard description logics(DLs) support. We extend the DL SROIQ with constraint networks and grounded circumscription. Applications of constraint modelling include embedding ontologies with temporal or spatial information, while grounded circumscription allows defeasible inference and closed world reasoning. This paper overcomes restrictions on existing constraint modelling approaches by introducing expressive constructs. Grounded circumscription allows concept and role minimization and is decidable for DL. We provide a general and intuitive algorithm for the framework of grounded circumscription that can be applied to a whole range of logics. We present the resulting logic: GC-SROIQ(C), and describe a tableau decision procedure for it.

* For an improved formulation of the problem, which addresses critical shortcomings of this paper, please refer to the following : Extending SROIQ with Constraint Networks and Grounded Circumscription, arXiv:1508.00116

Via

Access Paper or Ask Questions

Extending SROIQ with Constraint Networks and Grounded Circumscription

Aug 01, 2015

Arjun Bhardwaj

Abstract:Developments in semantic web technologies have promoted ontological encoding of knowledge from diverse domains. However, modelling many practical domains requires more expressiveness than what the standard description logics (most prominently SROIQ) support. In this paper, we extend the expressive DL SROIQ with constraint networks (resulting in the logic SROIQc) and grounded circumscription (resulting in the logic GC-SROIQ). Applications of constraint modelling include embedding ontologies with temporal or spatial information, while those of grounded circumscription include defeasible inference and closed world reasoning. We describe the syntax and semantics of the logic formed by including constraint modelling constructs in SROIQ, and provide a sound, complete and terminating tableau algorithm for it. We further provide an intuitive algorithm for Grounded Circumscription in SROIQc, which adheres to the general framework of grounded circumscription, and which can be applied to a whole range of expressive logics for which no such specific algorithm presently exists.

Via

Access Paper or Ask Questions