Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Adrien Gaidon

Xerox Research Center Europe, France

Discovering Objects that Can Move

Mar 18, 2022

Zhipeng Bao, Pavel Tokmakov, Allan Jabri, Yu-Xiong Wang, Adrien Gaidon, Martial Hebert

Figure 1 for Discovering Objects that Can Move

Figure 2 for Discovering Objects that Can Move

Figure 3 for Discovering Objects that Can Move

Figure 4 for Discovering Objects that Can Move

Abstract:This paper studies the problem of object discovery -- separating objects from the background without manual labels. Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions. However, by relying on appearance alone, these methods fail to separate objects from the background in cluttered scenes. This is a fundamental limitation since the definition of an object is inherently ambiguous and context-dependent. To resolve this ambiguity, we choose to focus on dynamic objects -- entities that can move independently in the world. We then scale the recent auto-encoder based frameworks for unsupervised object discovery from toy synthetic images to complex real-world scenes. To this end, we simplify their architecture, and augment the resulting model with a weak learning signal from general motion segmentation algorithms. Our experiments demonstrate that, despite only capturing a small subset of the objects that move, this signal is enough to generalize to segment both moving and static instances of dynamic objects. We show that our model scales to a newly collected, photo-realistic synthetic dataset with street driving scenarios. Additionally, we leverage ground truth segmentation and flow annotations in this dataset for thorough ablation and evaluation. Finally, our experiments on the real-world KITTI benchmark demonstrate that the proposed approach outperforms both heuristic- and learning-based methods by capitalizing on motion cues.

* Accepted to CVPR 2022

Via

Access Paper or Ask Questions

Dynamics-Aware Comparison of Learned Reward Functions

Jan 25, 2022

Blake Wulfe, Ashwin Balakrishna, Logan Ellis, Jean Mercat, Rowan McAllister, Adrien Gaidon

Figure 1 for Dynamics-Aware Comparison of Learned Reward Functions

Figure 2 for Dynamics-Aware Comparison of Learned Reward Functions

Figure 3 for Dynamics-Aware Comparison of Learned Reward Functions

Figure 4 for Dynamics-Aware Comparison of Learned Reward Functions

Abstract:The ability to learn reward functions plays an important role in enabling the deployment of intelligent agents in the real world. However, comparing reward functions, for example as a means of evaluating reward learning methods, presents a challenge. Reward functions are typically compared by considering the behavior of optimized policies, but this approach conflates deficiencies in the reward function with those of the policy search algorithm used to optimize it. To address this challenge, Gleave et al. (2020) propose the Equivalent-Policy Invariant Comparison (EPIC) distance. EPIC avoids policy optimization, but in doing so requires computing reward values at transitions that may be impossible under the system dynamics. This is problematic for learned reward functions because it entails evaluating them outside of their training distribution, resulting in inaccurate reward values that we show can render EPIC ineffective at comparing rewards. To address this problem, we propose the Dynamics-Aware Reward Distance (DARD), a new reward pseudometric. DARD uses an approximate transition model of the environment to transform reward functions into a form that allows for comparisons that are invariant to reward shaping while only evaluating reward functions on transitions close to their training distribution. Experiments in simulated physical domains demonstrate that DARD enables reliable reward comparisons without policy optimization and is significantly more predictive than baseline methods of downstream policy performance when dealing with learned reward functions.

Via

Access Paper or Ask Questions

Self-Supervised Camera Self-Calibration from Video

Dec 06, 2021

Jiading Fang, Igor Vasiljevic, Vitor Guizilini, Rares Ambrus, Greg Shakhnarovich, Adrien Gaidon, Matthew R. Walter

Figure 1 for Self-Supervised Camera Self-Calibration from Video

Figure 2 for Self-Supervised Camera Self-Calibration from Video

Figure 3 for Self-Supervised Camera Self-Calibration from Video

Figure 4 for Self-Supervised Camera Self-Calibration from Video

Abstract:Camera calibration is integral to robotics and computer vision algorithms that seek to infer geometric properties of the scene from visual input streams. In practice, calibration is a laborious procedure requiring specialized data collection and careful tuning. This process must be repeated whenever the parameters of the camera change, which can be a frequent occurrence for mobile robots and autonomous vehicles. In contrast, self-supervised depth and ego-motion estimation approaches can bypass explicit calibration by inferring per-frame projection models that optimize a view synthesis objective. In this paper, we extend this approach to explicitly calibrate a wide range of cameras from raw videos in the wild. We propose a learning algorithm to regress per-sequence calibration parameters using an efficient family of general camera models. Our procedure achieves self-calibration results with sub-pixel reprojection error, outperforming other learning-based methods. We validate our approach on a wide variety of camera geometries, including perspective, fisheye, and catadioptric. Finally, we show that our approach leads to improvements in the downstream task of depth estimation, achieving state-of-the-art results on the EuRoC dataset with greater computational efficiency than contemporary methods.

Via

Access Paper or Ask Questions

Self-supervised Learning is More Robust to Dataset Imbalance

Oct 11, 2021

Hong Liu, Jeff Z. HaoChen, Adrien Gaidon, Tengyu Ma

Figure 1 for Self-supervised Learning is More Robust to Dataset Imbalance

Figure 2 for Self-supervised Learning is More Robust to Dataset Imbalance

Figure 3 for Self-supervised Learning is More Robust to Dataset Imbalance

Figure 4 for Self-supervised Learning is More Robust to Dataset Imbalance

Abstract:Self-supervised learning (SSL) is a scalable way to learn general visual representations since it learns without labels. However, large-scale unlabeled datasets in the wild often have long-tailed label distributions, where we know little about the behavior of SSL. In this work, we systematically investigate self-supervised learning under dataset imbalance. First, we find out via extensive experiments that off-the-shelf self-supervised representations are already more robust to class imbalance than supervised representations. The performance gap between balanced and imbalanced pre-training with SSL is significantly smaller than the gap with supervised learning, across sample sizes, for both in-domain and, especially, out-of-domain evaluation. Second, towards understanding the robustness of SSL, we hypothesize that SSL learns richer features from frequent data: it may learn label-irrelevant-but-transferable features that help classify the rare classes and downstream tasks. In contrast, supervised learning has no incentive to learn features irrelevant to the labels from frequent examples. We validate this hypothesis with semi-synthetic experiments and theoretical analyses on a simplified setting. Third, inspired by the theoretical insights, we devise a re-weighted regularization technique that consistently improves the SSL representation quality on imbalanced datasets with several evaluation criteria, closing the small gap between balanced and imbalanced datasets with the same number of examples.

Via

Access Paper or Ask Questions

Warp-Refine Propagation: Semi-Supervised Auto-labeling via Cycle-consistency

Sep 28, 2021

Aditya Ganeshan, Alexis Vallet, Yasunori Kudo, Shin-ichi Maeda, Tommi Kerola, Rares Ambrus, Dennis Park, Adrien Gaidon

Figure 1 for Warp-Refine Propagation: Semi-Supervised Auto-labeling via Cycle-consistency

Figure 2 for Warp-Refine Propagation: Semi-Supervised Auto-labeling via Cycle-consistency

Figure 3 for Warp-Refine Propagation: Semi-Supervised Auto-labeling via Cycle-consistency

Figure 4 for Warp-Refine Propagation: Semi-Supervised Auto-labeling via Cycle-consistency

Abstract:Deep learning models for semantic segmentation rely on expensive, large-scale, manually annotated datasets. Labelling is a tedious process that can take hours per image. Automatically annotating video sequences by propagating sparsely labeled frames through time is a more scalable alternative. In this work, we propose a novel label propagation method, termed Warp-Refine Propagation, that combines semantic cues with geometric cues to efficiently auto-label videos. Our method learns to refine geometrically-warped labels and infuse them with learned semantic priors in a semi-supervised setting by leveraging cycle consistency across time. We quantitatively show that our method improves label-propagation by a noteworthy margin of 13.1 mIoU on the ApolloScape dataset. Furthermore, by training with the auto-labelled frames, we achieve competitive results on three semantic-segmentation benchmarks, improving the state-of-the-art by a large margin of 1.8 and 3.61 mIoU on NYU-V2 and KITTI, while matching the current best results on Cityscapes.

* 16 pages, 12 figures, including supplementary material. To be published in ICCV 2021

Via

Access Paper or Ask Questions

Is Pseudo-Lidar needed for Monocular 3D Object detection?

Aug 13, 2021

Dennis Park, Rares Ambrus, Vitor Guizilini, Jie Li, Adrien Gaidon

Figure 1 for Is Pseudo-Lidar needed for Monocular 3D Object detection?

Figure 2 for Is Pseudo-Lidar needed for Monocular 3D Object detection?

Figure 3 for Is Pseudo-Lidar needed for Monocular 3D Object detection?

Figure 4 for Is Pseudo-Lidar needed for Monocular 3D Object detection?

Abstract:Recent progress in 3D object detection from single images leverages monocular depth estimation as a way to produce 3D pointclouds, turning cameras into pseudo-lidar sensors. These two-stage detectors improve with the accuracy of the intermediate depth estimation network, which can itself be improved without manual labels via large-scale self-supervised learning. However, they tend to suffer from overfitting more than end-to-end methods, are more complex, and the gap with similar lidar-based detectors remains significant. In this work, we propose an end-to-end, single stage, monocular 3D object detector, DD3D, that can benefit from depth pre-training like pseudo-lidar methods, but without their limitations. Our architecture is designed for effective information transfer between depth estimation and 3D detection, allowing us to scale with the amount of unlabeled pre-training data. Our method achieves state-of-the-art results on two challenging benchmarks, with 16.34% and 9.28% AP for Cars and Pedestrians (respectively) on the KITTI-3D benchmark, and 41.5% mAP on NuScenes.

* In Proceedings of the ICCV 2021

Via

Access Paper or Ask Questions

Provable Guarantees for Self-Supervised Deep Learning with Spectral Contrastive Loss

Jun 17, 2021

Jeff Z. HaoChen, Colin Wei, Adrien Gaidon, Tengyu Ma

Figure 1 for Provable Guarantees for Self-Supervised Deep Learning with Spectral Contrastive Loss

Figure 2 for Provable Guarantees for Self-Supervised Deep Learning with Spectral Contrastive Loss

Figure 3 for Provable Guarantees for Self-Supervised Deep Learning with Spectral Contrastive Loss

Abstract:Recent works in self-supervised learning have advanced the state-of-the-art by relying on the contrastive learning paradigm, which learns representations by pushing positive pairs, or similar examples from the same class, closer together while keeping negative pairs far apart. Despite the empirical successes, theoretical foundations are limited -- prior analyses assume conditional independence of the positive pairs given the same class label, but recent empirical applications use heavily correlated positive pairs (i.e., data augmentations of the same image). Our work analyzes contrastive learning without assuming conditional independence of positive pairs using a novel concept of the augmentation graph on data. Edges in this graph connect augmentations of the same data, and ground-truth classes naturally form connected sub-graphs. We propose a loss that performs spectral decomposition on the population augmentation graph and can be succinctly written as a contrastive learning objective on neural net representations. Minimizing this objective leads to features with provable accuracy guarantees under linear probe evaluation. By standard generalization bounds, these accuracy guarantees also hold when minimizing the training contrastive loss. Empirically, the features learned by our objective can match or outperform several strong baselines on benchmark vision datasets. In all, this work provides the first provable analysis for contrastive learning where guarantees for linear probe evaluation can apply to realistic empirical settings.

Via

Access Paper or Ask Questions

Hierarchical Lovász Embeddings for Proposal-free Panoptic Segmentation

Jun 08, 2021

Tommi Kerola, Jie Li, Atsushi Kanehira, Yasunori Kudo, Alexis Vallet, Adrien Gaidon

Figure 1 for Hierarchical Lovász Embeddings for Proposal-free Panoptic Segmentation

Figure 2 for Hierarchical Lovász Embeddings for Proposal-free Panoptic Segmentation

Figure 3 for Hierarchical Lovász Embeddings for Proposal-free Panoptic Segmentation

Figure 4 for Hierarchical Lovász Embeddings for Proposal-free Panoptic Segmentation

Abstract:Panoptic segmentation brings together two separate tasks: instance and semantic segmentation. Although they are related, unifying them faces an apparent paradox: how to learn simultaneously instance-specific and category-specific (i.e. instance-agnostic) representations jointly. Hence, state-of-the-art panoptic segmentation methods use complex models with a distinct stream for each task. In contrast, we propose Hierarchical Lov\'asz Embeddings, per pixel feature vectors that simultaneously encode instance- and category-level discriminative information. We use a hierarchical Lov\'asz hinge loss to learn a low-dimensional embedding space structured into a unified semantic and instance hierarchy without requiring separate network branches or object proposals. Besides modeling instances precisely in a proposal-free manner, our Hierarchical Lov\'asz Embeddings generalize to categories by using a simple Nearest-Class-Mean classifier, including for non-instance "stuff" classes where instance segmentation methods are not applicable. Our simple model achieves state-of-the-art results compared to existing proposal-free panoptic segmentation methods on Cityscapes, COCO, and Mapillary Vistas. Furthermore, our model demonstrates temporal stability between video frames.

* 13 pages, 9 figures, including supplementary material. To be published in CVPR 2021

Via

Access Paper or Ask Questions

CoCon: Cooperative-Contrastive Learning

Apr 30, 2021

Nishant Rai, Ehsan Adeli, Kuan-Hui Lee, Adrien Gaidon, Juan Carlos Niebles

Figure 1 for CoCon: Cooperative-Contrastive Learning

Figure 2 for CoCon: Cooperative-Contrastive Learning

Figure 3 for CoCon: Cooperative-Contrastive Learning

Figure 4 for CoCon: Cooperative-Contrastive Learning

Abstract:Labeling videos at scale is impractical. Consequently, self-supervised visual representation learning is key for efficient video analysis. Recent success in learning image representations suggests contrastive learning is a promising framework to tackle this challenge. However, when applied to real-world videos, contrastive learning may unknowingly lead to the separation of instances that contain semantically similar events. In our work, we introduce a cooperative variant of contrastive learning to utilize complementary information across views and address this issue. We use data-driven sampling to leverage implicit relationships between multiple input video views, whether observed (e.g. RGB) or inferred (e.g. flow, segmentation masks, poses). We are one of the firsts to explore exploiting inter-instance relationships to drive learning. We experimentally evaluate our representations on the downstream task of action recognition. Our method achieves competitive performance on standard benchmarks (UCF101, HMDB51, Kinetics400). Furthermore, qualitative experiments illustrate that our models can capture higher-order class relationships.

Via

Access Paper or Ask Questions

Heterogeneous-Agent Trajectory Forecasting Incorporating Class Uncertainty

Apr 26, 2021

Boris Ivanovic, Kuan-Hui Lee, Pavel Tokmakov, Blake Wulfe, Rowan McAllister, Adrien Gaidon, Marco Pavone

Figure 1 for Heterogeneous-Agent Trajectory Forecasting Incorporating Class Uncertainty

Figure 2 for Heterogeneous-Agent Trajectory Forecasting Incorporating Class Uncertainty

Figure 3 for Heterogeneous-Agent Trajectory Forecasting Incorporating Class Uncertainty

Figure 4 for Heterogeneous-Agent Trajectory Forecasting Incorporating Class Uncertainty

Abstract:Reasoning about the future behavior of other agents is critical to safe robot navigation. The multiplicity of plausible futures is further amplified by the uncertainty inherent to agent state estimation from data, including positions, velocities, and semantic class. Forecasting methods, however, typically neglect class uncertainty, conditioning instead only on the agent's most likely class, even though perception models often return full class distributions. To exploit this information, we present HAICU, a method for heterogeneous-agent trajectory forecasting that explicitly incorporates agents' class probabilities. We additionally present PUP, a new challenging real-world autonomous driving dataset, to investigate the impact of Perceptual Uncertainty in Prediction. It contains challenging crowded scenes with unfiltered agent class probabilities that reflect the long-tail of current state-of-the-art perception systems. We demonstrate that incorporating class probabilities in trajectory forecasting significantly improves performance in the face of uncertainty, and enables new forecasting capabilities such as counterfactual predictions.

* 17 pages, 12 figures, 5 tables

Via

Access Paper or Ask Questions