Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Aditya Prakash

Multi-Contact Force Estimation for Continuum Robots via Gaussian-Parameterized Factor Graphs

Jun 28, 2026

Aditya Prakash, Panagiotis Tsiotras

Abstract:Continuum robots offer key advantages in navigating unstructured environments, but their safe operation requires accurate estimation of the external contact forces acting anywhere along the robot body. Estimating these forces at unknown locations is an ill-conditioned problem, particularly for multiple contacts. We propose a unified shape and force estimation framework formulated on a factor graph. By incorporating a Gaussian mixture force parameterization into a discretized probabilistic Cosserat rod model, we reduce the dimensionality of the unknown external forces and mitigate the ill-conditioning of node-wise force estimation. The framework fuses strain, tendon tension, and pose measurements to simultaneously estimate the robot's shape and external forces while accounting for modeling and sensor uncertainties. Numerical simulations demonstrate that the proposed method outperforms existing methods in terms of force location and magnitude estimation for both single and multi-contact scenarios. We further present a progressive variant that introduces basis functions on demand to estimate contact forces sequentially during a simulated confined-navigation task.

Via

Access Paper or Ask Questions

What Matters When Cotraining Robot Manipulation Policies on Everyday Human Videos?

Jun 04, 2026

Richard Li, Aditya Prakash, Andrew Wen, Saurabh Gupta, Yilun Du, Pulkit Agrawal

Abstract:Human video datasets used for cotraining robot manipulation policies largely consist of curated demonstrations where motions are orchestrated to resemble robot behavior and 3D hand poses are captured with specialized hardware. A more plentiful source of data is everyday Internet video, but it is an open question what factors enable transfer from such videos to robots. We investigate this using a new dataset of 532 human videos with 28 hours of high-quality triangulated hand labels and natural motions. We find that hand pose quality affects transfer, but even with accurate hands, the inherent motion gap hinders transfer unless the vision and policy networks specialize to each embodiment. Our cotraining recipe yields consistent improvements, with an absolute success rate gain of $29.7\%$ in the low-robot-data regime across six manipulation tasks.

* The project website is here: https://richardrl.github.io/what-matters-cotraining-human-videos/index.html

Via

Access Paper or Ask Questions

QShield: Securing Neural Networks Against Adversarial Attacks using Quantum Circuits

Apr 13, 2026

Navid Azimi, Aditya Prakash, Yao Wang, Li Xiong

Abstract:Deep neural networks remain highly vulnerable to adversarial perturbations, limiting their reliability in security- and safety-critical applications. To address this challenge, we introduce QShield, a modular hybrid quantum-classical neural network (HQCNN) architecture designed to enhance the adversarial robustness of classical deep learning models. QShield integrates a conventional convolutional neural network (CNN) backbone for feature extraction with a quantum processing module that encodes the extracted features into quantum states, applies structured entanglement operations under realistic noise models, and outputs a hybrid prediction through a dynamically weighted fusion mechanism implemented via a lightweight multilayer perceptron (MLP). We systematically evaluate both classical and hybrid quantum-classical models on the MNIST, OrganAMNIST, and CIFAR-10 datasets, using a comprehensive set of robustness, efficiency, and computational performance metrics. Our results demonstrate that classical models are highly vulnerable to adversarial attacks, whereas the proposed hybrid models with entanglement patterns maintain high predictive accuracy while substantially reducing attack success rates across a wide range of adversarial attacks. Furthermore, the proposed hybrid architecture significantly increased the computational cost required to generate adversarial examples, thereby introducing an additional layer of defense. These findings indicate that the proposed modular hybrid architecture achieves a practical balance between predictive accuracy and adversarial robustness, positioning it as a promising approach for secure and reliable machine learning in sensitive and safety-critical applications.

Via

Access Paper or Ask Questions

How Do I Do That? Synthesizing 3D Hand Motion and Contacts for Everyday Interactions

Apr 16, 2025

Aditya Prakash, Benjamin Lundell, Dmitry Andreychuk, David Forsyth, Saurabh Gupta, Harpreet Sawhney

Figure 1 for How Do I Do That? Synthesizing 3D Hand Motion and Contacts for Everyday Interactions

Figure 2 for How Do I Do That? Synthesizing 3D Hand Motion and Contacts for Everyday Interactions

Figure 3 for How Do I Do That? Synthesizing 3D Hand Motion and Contacts for Everyday Interactions

Figure 4 for How Do I Do That? Synthesizing 3D Hand Motion and Contacts for Everyday Interactions

Abstract:We tackle the novel problem of predicting 3D hand motion and contact maps (or Interaction Trajectories) given a single RGB view, action text, and a 3D contact point on the object as input. Our approach consists of (1) Interaction Codebook: a VQVAE model to learn a latent codebook of hand poses and contact points, effectively tokenizing interaction trajectories, (2) Interaction Predictor: a transformer-decoder module to predict the interaction trajectory from test time inputs by using an indexer module to retrieve a latent affordance from the learned codebook. To train our model, we develop a data engine that extracts 3D hand poses and contact trajectories from the diverse HoloAssist dataset. We evaluate our model on a benchmark that is 2.5-10X larger than existing works, in terms of diversity of objects and interactions observed, and test for generalization of the model across object categories, action categories, tasks, and scenes. Experimental results show the effectiveness of our approach over transformer & diffusion baselines across all settings.

* CVPR 2025, Project page: https://ap229997.github.io/projects/latentact

Via

Access Paper or Ask Questions

Weakly Supervised Learning on Large Graphs

Jan 02, 2025

Aditya Prakash

Figure 1 for Weakly Supervised Learning on Large Graphs

Figure 2 for Weakly Supervised Learning on Large Graphs

Figure 3 for Weakly Supervised Learning on Large Graphs

Figure 4 for Weakly Supervised Learning on Large Graphs

Abstract:Graph classification plays a pivotal role in various domains, including pathology, where images can be represented as graphs.In this domain, images can be represented as graphs, where nodes might represent individual nuclei, and edges capture the spatial or functional relationships between them. Often, the overall label of the graph, such as a cancer type or disease state, is determined by patterns within smaller, localized regions of the image. This work introduces a weakly-supervised graph classification framework leveraging two subgraph extraction techniques: (1) Sliding-window approach (2) BFS-based approach. Subgraphs are processed using a Graph Attention Network (GAT), which employs attention mechanisms to identify the most informative subgraphs for classification. Weak supervision is achieved by propagating graph-level labels to subgraphs, eliminating the need for detailed subgraph annotations.

Via

Access Paper or Ask Questions

Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects

Mar 25, 2024

Zicong Fan, Takehiko Ohkawa, Linlin Yang, Nie Lin, Zhishan Zhou, Shihao Zhou, Jiajun Liang, Zhong Gao, Xuanyang Zhang, Xue Zhang(+14 more)

Figure 1 for Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects

Figure 2 for Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects

Figure 3 for Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects

Figure 4 for Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects

Abstract:We interact with the world with our hands and see it through our own (egocentric) perspective. A holistic 3D understanding of such interactions from egocentric views is important for tasks in robotics, AR/VR, action recognition and motion generation. Accurately reconstructing such interactions in 3D is challenging due to heavy occlusion, viewpoint bias, camera distortion, and motion blur from the head movement. To this end, we designed the HANDS23 challenge based on the AssemblyHands and ARCTIC datasets with carefully designed training and testing splits. Based on the results of the top submitted methods and more recent baselines on the leaderboards, we perform a thorough analysis on 3D hand(-object) reconstruction tasks. Our analysis demonstrates the effectiveness of addressing distortion specific to egocentric cameras, adopting high-capacity transformers to learn complex hand-object interactions, and fusing predictions from different views. Our study further reveals challenging scenarios intractable with state-of-the-art methods, such as fast hand motion, object reconstruction from narrow egocentric views, and close contact between two hands and objects. Our efforts will enrich the community's knowledge foundation and facilitate future hand studies on egocentric hand-object interactions.

Via

Access Paper or Ask Questions

3D Hand Pose Estimation in Egocentric Images in the Wild

Dec 11, 2023

Aditya Prakash, Ruisen Tu, Matthew Chang, Saurabh Gupta

Figure 1 for 3D Hand Pose Estimation in Egocentric Images in the Wild

Figure 2 for 3D Hand Pose Estimation in Egocentric Images in the Wild

Figure 3 for 3D Hand Pose Estimation in Egocentric Images in the Wild

Figure 4 for 3D Hand Pose Estimation in Egocentric Images in the Wild

Abstract:We present WildHands, a method for 3D hand pose estimation in egocentric images in the wild. This is challenging due to (a) lack of 3D hand pose annotations for images in the wild, and (b) a form of perspective distortion-induced shape ambiguity that arises in the analysis of crops around hands. For the former, we use auxiliary supervision on in-the-wild data in the form of segmentation masks & grasp labels in addition to 3D supervision available in lab datasets. For the latter, we provide spatial cues about the location of the hand crop in the camera's field of view. Our approach achieves the best 3D hand pose on the ARCTIC leaderboard and outperforms FrankMocap, a popular and robust approach for estimating hand pose in the wild, by 45.3% when evaluated on 2D hand pose on our EPIC-HandKps dataset.

* Project page: https://ap229997.github.io/projects/hands/

Via

Access Paper or Ask Questions

Mitigating Perspective Distortion-induced Shape Ambiguity in Image Crops

Dec 11, 2023

Aditya Prakash, Arjun Gupta, Saurabh Gupta

Figure 1 for Mitigating Perspective Distortion-induced Shape Ambiguity in Image Crops

Figure 2 for Mitigating Perspective Distortion-induced Shape Ambiguity in Image Crops

Figure 3 for Mitigating Perspective Distortion-induced Shape Ambiguity in Image Crops

Figure 4 for Mitigating Perspective Distortion-induced Shape Ambiguity in Image Crops

Abstract:Objects undergo varying amounts of perspective distortion as they move across a camera's field of view. Models for predicting 3D from a single image often work with crops around the object of interest and ignore the location of the object in the camera's field of view. We note that ignoring this location information further exaggerates the inherent ambiguity in making 3D inferences from 2D images and can prevent models from even fitting to the training data. To mitigate this ambiguity, we propose Intrinsics-Aware Positional Encoding (KPE), which incorporates information about the location of crops in the image and camera intrinsics. Experiments on three popular 3D-from-a-single-image benchmarks: depth prediction on NYU, 3D object detection on KITTI & nuScenes, and predicting 3D shapes of articulated objects on ARCTIC, show the benefits of KPE.

* Project Page: https://ap229997.github.io/projects/ambiguity/

Via

Access Paper or Ask Questions

Look Ma, No Hands! Agent-Environment Factorization of Egocentric Videos

May 25, 2023

Matthew Chang, Aditya Prakash, Saurabh Gupta

Figure 1 for Look Ma, No Hands! Agent-Environment Factorization of Egocentric Videos

Figure 2 for Look Ma, No Hands! Agent-Environment Factorization of Egocentric Videos

Figure 3 for Look Ma, No Hands! Agent-Environment Factorization of Egocentric Videos

Figure 4 for Look Ma, No Hands! Agent-Environment Factorization of Egocentric Videos

Abstract:The analysis and use of egocentric videos for robotic tasks is made challenging by occlusion due to the hand and the visual mismatch between the human hand and a robot end-effector. In this sense, the human hand presents a nuisance. However, often hands also provide a valuable signal, e.g. the hand pose may suggest what kind of object is being held. In this work, we propose to extract a factored representation of the scene that separates the agent (human hand) and the environment. This alleviates both occlusion and mismatch while preserving the signal, thereby easing the design of models for downstream robotics tasks. At the heart of this factorization is our proposed Video Inpainting via Diffusion Model (VIDM) that leverages both a prior on real-world images (through a large-scale pre-trained diffusion model) and the appearance of the object in earlier frames of the video (through attention). Our experiments demonstrate the effectiveness of VIDM at improving inpainting quality on egocentric videos and the power of our factored representation for numerous tasks: object detection, 3D reconstruction of manipulated objects, and learning of reward functions, policies, and affordances from videos.

* for project website with video, see https://matthewchang.github.io/vidm/

Via

Access Paper or Ask Questions

Learning Hand-Held Object Reconstruction from In-The-Wild Videos

May 04, 2023

Aditya Prakash, Matthew Chang, Matthew Jin, Saurabh Gupta

Abstract:Prior works for reconstructing hand-held objects from a single image rely on direct 3D shape supervision which is challenging to gather in real world at scale. Consequently, these approaches do not generalize well when presented with novel objects in in-the-wild settings. While 3D supervision is a major bottleneck, there is an abundance of in-the-wild raw video data showing hand-object interactions. In this paper, we automatically extract 3D supervision (via multiview 2D supervision) from such raw video data to scale up the learning of models for hand-held object reconstruction. This requires tackling two key challenges: unknown camera pose and occlusion. For the former, we use hand pose (predicted from existing techniques, e.g. FrankMocap) as a proxy for object pose. For the latter, we learn data-driven 3D shape priors using synthetic objects from the ObMan dataset. We use these indirect 3D cues to train occupancy networks that predict the 3D shape of objects from a single RGB image. Our experiments on the MOW and HO3D datasets show the effectiveness of these supervisory signals at predicting the 3D shape for real-world hand-held objects without any direct real-world 3D supervision.

* Project Webpage: https://ap229997.github.io/projects/horse/

Via

Access Paper or Ask Questions