Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Saurabh Gupta

Mitigating Perspective Distortion-induced Shape Ambiguity in Image Crops

Dec 11, 2023

Aditya Prakash, Arjun Gupta, Saurabh Gupta

Figure 1 for Mitigating Perspective Distortion-induced Shape Ambiguity in Image Crops

Figure 2 for Mitigating Perspective Distortion-induced Shape Ambiguity in Image Crops

Figure 3 for Mitigating Perspective Distortion-induced Shape Ambiguity in Image Crops

Figure 4 for Mitigating Perspective Distortion-induced Shape Ambiguity in Image Crops

Abstract:Objects undergo varying amounts of perspective distortion as they move across a camera's field of view. Models for predicting 3D from a single image often work with crops around the object of interest and ignore the location of the object in the camera's field of view. We note that ignoring this location information further exaggerates the inherent ambiguity in making 3D inferences from 2D images and can prevent models from even fitting to the training data. To mitigate this ambiguity, we propose Intrinsics-Aware Positional Encoding (KPE), which incorporates information about the location of crops in the image and camera intrinsics. Experiments on three popular 3D-from-a-single-image benchmarks: depth prediction on NYU, 3D object detection on KITTI & nuScenes, and predicting 3D shapes of articulated objects on ARCTIC, show the benefits of KPE.

* Project Page: https://ap229997.github.io/projects/ambiguity/

Via

Access Paper or Ask Questions

Bootstrapping Autonomous Radars with Self-Supervised Learning

Dec 09, 2023

Yiduo Hao, Sohrab Madani, Junfeng Guan, Mohammed Alloulah, Saurabh Gupta, Haitham Hassanieh

Figure 1 for Bootstrapping Autonomous Radars with Self-Supervised Learning

Figure 2 for Bootstrapping Autonomous Radars with Self-Supervised Learning

Figure 3 for Bootstrapping Autonomous Radars with Self-Supervised Learning

Figure 4 for Bootstrapping Autonomous Radars with Self-Supervised Learning

Abstract:The perception of autonomous vehicles using radars has attracted increased research interest due its ability to operate in fog and bad weather. However, training radar models is hindered by the cost and difficulty of annotating large-scale radar data. To overcome this bottleneck, we propose a self-supervised learning framework to leverage the large amount of unlabeled radar data to pre-train radar-only embeddings for self-driving perception tasks. The proposed method combines radar-to-radar and radar-to-vision contrastive losses to learn a general representation from unlabeled radar heatmaps paired with their corresponding camera images. When used for downstream object detection, we demonstrate that the proposed self-supervision framework can improve the accuracy of state-of-the-art supervised baselines by 5.8% in mAP.

Via

Access Paper or Ask Questions

GOAT: GO to Any Thing

Nov 10, 2023

Matthew Chang, Theophile Gervet, Mukul Khanna, Sriram Yenamandra, Dhruv Shah, So Yeon Min, Kavit Shah, Chris Paxton, Saurabh Gupta, Dhruv Batra(+3 more)

Abstract:In deployment scenarios such as homes and warehouses, mobile robots are expected to autonomously navigate for extended periods, seamlessly executing tasks articulated in terms that are intuitively understandable by human operators. We present GO To Any Thing (GOAT), a universal navigation system capable of tackling these requirements with three key features: a) Multimodal: it can tackle goals specified via category labels, target images, and language descriptions, b) Lifelong: it benefits from its past experience in the same environment, and c) Platform Agnostic: it can be quickly deployed on robots with different embodiments. GOAT is made possible through a modular system design and a continually augmented instance-aware semantic memory that keeps track of the appearance of objects from different viewpoints in addition to category-level semantics. This enables GOAT to distinguish between different instances of the same category to enable navigation to targets specified by images and language descriptions. In experimental comparisons spanning over 90 hours in 9 different homes consisting of 675 goals selected across 200+ different object instances, we find GOAT achieves an overall success rate of 83%, surpassing previous methods and ablations by 32% (absolute improvement). GOAT improves with experience in the environment, from a 60% success rate at the first goal to a 90% success after exploration. In addition, we demonstrate that GOAT can readily be applied to downstream tasks such as pick and place and social navigation.

Via

Access Paper or Ask Questions

ContactGen: Generative Contact Modeling for Grasp Generation

Oct 05, 2023

Shaowei Liu, Yang Zhou, Jimei Yang, Saurabh Gupta, Shenlong Wang

Figure 1 for ContactGen: Generative Contact Modeling for Grasp Generation

Figure 2 for ContactGen: Generative Contact Modeling for Grasp Generation

Figure 3 for ContactGen: Generative Contact Modeling for Grasp Generation

Figure 4 for ContactGen: Generative Contact Modeling for Grasp Generation

Abstract:This paper presents a novel object-centric contact representation ContactGen for hand-object interaction. The ContactGen comprises three components: a contact map indicates the contact location, a part map represents the contact hand part, and a direction map tells the contact direction within each part. Given an input object, we propose a conditional generative model to predict ContactGen and adopt model-based optimization to predict diverse and geometrically feasible grasps. Experimental results demonstrate our method can generate high-fidelity and diverse human grasps for various objects. Project page: https://stevenlsw.github.io/contactgen/

* Accepted to ICCV 2023. Website: https://stevenlsw.github.io/contactgen/

Via

Access Paper or Ask Questions

Learning Inertial Parameter Identification of Unknown Object with Humanoid Robot using Sim-to-Real Adaptation

Sep 18, 2023

Donghoon Baek, Bo Peng, Saurabh Gupta, Joao Ramos

Figure 1 for Learning Inertial Parameter Identification of Unknown Object with Humanoid Robot using Sim-to-Real Adaptation

Figure 2 for Learning Inertial Parameter Identification of Unknown Object with Humanoid Robot using Sim-to-Real Adaptation

Figure 3 for Learning Inertial Parameter Identification of Unknown Object with Humanoid Robot using Sim-to-Real Adaptation

Figure 4 for Learning Inertial Parameter Identification of Unknown Object with Humanoid Robot using Sim-to-Real Adaptation

Abstract:Understanding the dynamics of unknown object is crucial for collaborative robots including humanoids to more safely and accurately interact with humans. Most relevant literature leverage a force/torque sensor, prior knowledge of object, vision system, and a long-horizon trajectory which are often impractical. Moreover, these methods often entail solving non-linear optimization problem, sometimes yielding physically inconsistent results. In this work, we propose a fast learningbased inertial parameter estimation as more practical manner. We acquire a reliable dataset in a high-fidelity simulation and train a time-series data-driven regression model (e.g., LSTM) to estimate the inertial parameter of unknown objects. We also introduce a novel sim-to-real adaptation method combining Robot System Identification and Gaussian Processes to directly transfer the trained model to real-world application. We demonstrate our method with a 4-DOF single manipulator of physical wheeled humanoid robot, SATYRR. Results show that our method can identify the inertial parameters of various unknown objects faster and more accurately than conventional methods.

* This paper is submitted to ICRA2024

Via

Access Paper or Ask Questions

Push Past Green: Learning to Look Behind Plant Foliage by Moving It

Jul 06, 2023

Xiaoyu Zhang, Saurabh Gupta

Abstract:Autonomous agriculture applications (e.g., inspection, phenotyping, plucking fruits) require manipulating the plant foliage to look behind the leaves and the branches. Partial visibility, extreme clutter, thin structures, and unknown geometry and dynamics for plants make such manipulation challenging. We tackle these challenges through data-driven methods. We use self-supervision to train SRPNet, a neural network that predicts what space is revealed on execution of a candidate action on a given plant. We use SRPNet with the cross-entropy method to predict actions that are effective at revealing space beneath plant foliage. Furthermore, as SRPNet does not just predict how much space is revealed but also where it is revealed, we can execute a sequence of actions that incrementally reveal more and more space beneath the plant foliage. We experiment with a synthetic (vines) and a real plant (Dracaena) on a physical test-bed across 5 settings including 2 settings that test generalization to novel plant configurations. Our experiments reveal the effectiveness of our overall method, PPG, over a competitive hand-crafted exploration method, and the effectiveness of SRPNet over a hand-crafted dynamics model and relevant ablations.

* for project website with video, see https://sites.google.com/view/pushpastgreen/

Via

Access Paper or Ask Questions

Building Rearticulable Models for Arbitrary 3D Objects from 4D Point Clouds

Jun 01, 2023

Shaowei Liu, Saurabh Gupta, Shenlong Wang

Abstract:We build rearticulable models for arbitrary everyday man-made objects containing an arbitrary number of parts that are connected together in arbitrary ways via 1 degree-of-freedom joints. Given point cloud videos of such everyday objects, our method identifies the distinct object parts, what parts are connected to what other parts, and the properties of the joints connecting each part pair. We do this by jointly optimizing the part segmentation, transformation, and kinematics using a novel energy minimization framework. Our inferred animatable models, enables retargeting to novel poses with sparse point correspondences guidance. We test our method on a new articulating robot dataset, and the Sapiens dataset with common daily objects, as well as real-world scans. Experiments show that our method outperforms two leading prior works on various metrics.

* Accepted to CVPR 2023. Project page: https://stevenlsw.github.io/reart

Via

Access Paper or Ask Questions

Look Ma, No Hands! Agent-Environment Factorization of Egocentric Videos

May 25, 2023

Matthew Chang, Aditya Prakash, Saurabh Gupta

Figure 1 for Look Ma, No Hands! Agent-Environment Factorization of Egocentric Videos

Figure 2 for Look Ma, No Hands! Agent-Environment Factorization of Egocentric Videos

Figure 3 for Look Ma, No Hands! Agent-Environment Factorization of Egocentric Videos

Figure 4 for Look Ma, No Hands! Agent-Environment Factorization of Egocentric Videos

Abstract:The analysis and use of egocentric videos for robotic tasks is made challenging by occlusion due to the hand and the visual mismatch between the human hand and a robot end-effector. In this sense, the human hand presents a nuisance. However, often hands also provide a valuable signal, e.g. the hand pose may suggest what kind of object is being held. In this work, we propose to extract a factored representation of the scene that separates the agent (human hand) and the environment. This alleviates both occlusion and mismatch while preserving the signal, thereby easing the design of models for downstream robotics tasks. At the heart of this factorization is our proposed Video Inpainting via Diffusion Model (VIDM) that leverages both a prior on real-world images (through a large-scale pre-trained diffusion model) and the appearance of the object in earlier frames of the video (through attention). Our experiments demonstrate the effectiveness of VIDM at improving inpainting quality on egocentric videos and the power of our factored representation for numerous tasks: object detection, 3D reconstruction of manipulated objects, and learning of reward functions, policies, and affordances from videos.

* for project website with video, see https://matthewchang.github.io/vidm/

Via

Access Paper or Ask Questions

Learning Hand-Held Object Reconstruction from In-The-Wild Videos

May 04, 2023

Aditya Prakash, Matthew Chang, Matthew Jin, Saurabh Gupta

Abstract:Prior works for reconstructing hand-held objects from a single image rely on direct 3D shape supervision which is challenging to gather in real world at scale. Consequently, these approaches do not generalize well when presented with novel objects in in-the-wild settings. While 3D supervision is a major bottleneck, there is an abundance of in-the-wild raw video data showing hand-object interactions. In this paper, we automatically extract 3D supervision (via multiview 2D supervision) from such raw video data to scale up the learning of models for hand-held object reconstruction. This requires tackling two key challenges: unknown camera pose and occlusion. For the former, we use hand pose (predicted from existing techniques, e.g. FrankMocap) as a proxy for object pose. For the latter, we learn data-driven 3D shape priors using synthetic objects from the ObMan dataset. We use these indirect 3D cues to train occupancy networks that predict the 3D shape of objects from a single RGB image. Our experiments on the MOW and HO3D datasets show the effectiveness of these supervisory signals at predicting the 3D shape for real-world hand-held objects without any direct real-world 3D supervision.

* Project Webpage: https://ap229997.github.io/projects/horse/

Via

Access Paper or Ask Questions

Predicting Motion Plans for Articulating Everyday Objects

Mar 02, 2023

Arjun Gupta, Max E. Shepherd, Saurabh Gupta

Figure 1 for Predicting Motion Plans for Articulating Everyday Objects

Figure 2 for Predicting Motion Plans for Articulating Everyday Objects

Figure 3 for Predicting Motion Plans for Articulating Everyday Objects

Figure 4 for Predicting Motion Plans for Articulating Everyday Objects

Abstract:Mobile manipulation tasks such as opening a door, pulling open a drawer, or lifting a toilet lid require constrained motion of the end-effector under environmental and task constraints. This, coupled with partial information in novel environments, makes it challenging to employ classical motion planning approaches at test time. Our key insight is to cast it as a learning problem to leverage past experience of solving similar planning problems to directly predict motion plans for mobile manipulation tasks in novel situations at test time. To enable this, we develop a simulator, ArtObjSim, that simulates articulated objects placed in real scenes. We then introduce SeqIK+$\theta_0$, a fast and flexible representation for motion plans. Finally, we learn models that use SeqIK+$\theta_0$ to quickly predict motion plans for articulating novel objects at test time. Experimental evaluation shows improved speed and accuracy at generating motion plans than pure search-based methods and pure learning methods.

* To Appear in ICRA 2023. Project webpage: https://arjung128.github.io/mpao/

Via

Access Paper or Ask Questions