Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Vladlen Koltun

Stanford University

Stabilizing Equilibrium Models by Jacobian Regularization

Jun 28, 2021

Shaojie Bai, Vladlen Koltun, J. Zico Kolter

Figure 1 for Stabilizing Equilibrium Models by Jacobian Regularization

Figure 2 for Stabilizing Equilibrium Models by Jacobian Regularization

Figure 3 for Stabilizing Equilibrium Models by Jacobian Regularization

Figure 4 for Stabilizing Equilibrium Models by Jacobian Regularization

Abstract:Deep equilibrium networks (DEQs) are a new class of models that eschews traditional depth in favor of finding the fixed point of a single nonlinear layer. These models have been shown to achieve performance competitive with the state-of-the-art deep networks while using significantly less memory. Yet they are also slower, brittle to architectural choices, and introduce potential instability to the model. In this paper, we propose a regularization scheme for DEQ models that explicitly regularizes the Jacobian of the fixed-point update equations to stabilize the learning of equilibrium models. We show that this regularization adds only minimal computational cost, significantly stabilizes the fixed-point convergence in both forward and backward passes, and scales well to high-dimensional, realistic domains (e.g., WikiText-103 language modeling and ImageNet classification). Using this method, we demonstrate, for the first time, an implicit-depth model that runs with approximately the same speed and level of performance as popular conventional deep networks such as ResNet-101, while still maintaining the constant memory footprint and architectural simplicity of DEQs. Code is available at https://github.com/locuslab/deq .

* ICML 2021 Short Oral

Via

Access Paper or Ask Questions

Training Graph Neural Networks with 1000 Layers

Jun 17, 2021

Guohao Li, Matthias Müller, Bernard Ghanem, Vladlen Koltun

Figure 1 for Training Graph Neural Networks with 1000 Layers

Figure 2 for Training Graph Neural Networks with 1000 Layers

Figure 3 for Training Graph Neural Networks with 1000 Layers

Figure 4 for Training Graph Neural Networks with 1000 Layers

Abstract:Deep graph neural networks (GNNs) have achieved excellent results on various tasks on increasingly large graph datasets with millions of nodes and edges. However, memory complexity has become a major obstacle when training deep GNNs for practical applications due to the immense number of nodes, edges, and intermediate activations. To improve the scalability of GNNs, prior works propose smart graph sampling or partitioning strategies to train GNNs with a smaller set of nodes or sub-graphs. In this work, we study reversible connections, group convolutions, weight tying, and equilibrium models to advance the memory and parameter efficiency of GNNs. We find that reversible connections in combination with deep network architectures enable the training of overparameterized GNNs that significantly outperform existing methods on multiple datasets. Our models RevGNN-Deep (1001 layers with 80 channels each) and RevGNN-Wide (448 layers with 224 channels each) were both trained on a single commodity GPU and achieve an ROC-AUC of $87.74 \pm 0.13$ and $88.24 \pm 0.15$ on the ogbn-proteins dataset. To the best of our knowledge, RevGNN-Deep is the deepest GNN in the literature by one order of magnitude. Please visit our project website https://www.deepgcns.org/arch/gnn1000 for more information.

* Accepted at ICML'2021. Code available at https://www.deepgcns.org/arch/gnn1000. Work done during Guohao Li's internship at Intel Intelligent Systems Lab

Via

Access Paper or Ask Questions

A Measure of Research Taste

May 17, 2021

Vladlen Koltun, David Hafner

Figure 1 for A Measure of Research Taste

Figure 2 for A Measure of Research Taste

Figure 3 for A Measure of Research Taste

Figure 4 for A Measure of Research Taste

Abstract:Researchers are often evaluated by citation-based metrics. Such metrics can inform hiring, promotion, and funding decisions. Concerns have been expressed that popular citation-based metrics incentivize researchers to maximize the production of publications. Such incentives may not be optimal for scientific progress. Here we present a citation-based measure that rewards both productivity and taste: the researcher's ability to focus on impactful contributions. The presented measure, CAP, balances the impact of publications and their quantity, thus incentivizing researchers to consider whether a publication is a useful addition to the literature. CAP is simple, interpretable, and parameter-free. We analyze the characteristics of CAP for highly-cited researchers in biology, computer science, economics, and physics, using a corpus of millions of publications and hundreds of millions of citations with yearly temporal granularity. CAP produces qualitatively plausible outcomes and has a number of advantages over prior metrics. Results can be explored at https://cap-measure.org/

* Results can be explored at https://cap-measure.org/

Via

Access Paper or Ask Questions

Enhancing Photorealism Enhancement

May 10, 2021

Stephan R. Richter, Hassan Abu AlHaija, Vladlen Koltun

Figure 1 for Enhancing Photorealism Enhancement

Figure 2 for Enhancing Photorealism Enhancement

Figure 3 for Enhancing Photorealism Enhancement

Figure 4 for Enhancing Photorealism Enhancement

Abstract:We present an approach to enhancing the realism of synthetic images. The images are enhanced by a convolutional network that leverages intermediate representations produced by conventional rendering pipelines. The network is trained via a novel adversarial objective, which provides strong supervision at multiple perceptual levels. We analyze scene layout distributions in commonly used datasets and find that they differ in important ways. We hypothesize that this is one of the causes of strong artifacts that can be observed in the results of many prior methods. To address this we propose a new strategy for sampling image patches during training. We also introduce multiple architectural improvements in the deep network modules used for photorealism enhancement. We confirm the benefits of our contributions in controlled experiments and report substantial gains in stability and realism in comparison to recent image-to-image translation methods and a variety of other baselines.

* Code and data available at https://github.com/intel-isl/PhotorealismEnhancement Video available at https://youtu.be/P1IcaBn3ej0

Via

Access Paper or Ask Questions

Learning to drive from a world on rails

May 03, 2021

Dian Chen, Vladlen Koltun, Philipp Krähenbühl

Figure 1 for Learning to drive from a world on rails

Figure 2 for Learning to drive from a world on rails

Figure 3 for Learning to drive from a world on rails

Figure 4 for Learning to drive from a world on rails

Abstract:We learn an interactive vision-based driving policy from pre-recorded driving logs via a model-based approach. A forward model of the world supervises a driving policy that predicts the outcome of any potential driving trajectory. To support learning from pre-recorded logs, we assume that the world is on rails, meaning neither the agent nor its actions influence the environment. This assumption greatly simplifies the learning problem, factorizing the dynamics into a nonreactive world model and a low-dimensional and compact forward model of the ego-vehicle. Our approach computes action-values for each training trajectory using a tabular dynamic-programming evaluation of the Bellman equations; these action-values in turn supervise the final vision-based driving policy. Despite the world-on-rails assumption, the final driving policy acts well in a dynamic and reactive world. Our method ranks first on the CARLA leaderboard, attaining a 25% higher driving score while using 40 times less data. Our method is also an order of magnitude more sample-efficient than state-of-the-art model-free reinforcement learning techniques on navigational tasks in the ProcGen benchmark.

* Code and data available at: https://dotchen.github.io/world_on_rails/

Via

Access Paper or Ask Questions

Vision Transformers for Dense Prediction

Mar 24, 2021

René Ranftl, Alexey Bochkovskiy, Vladlen Koltun

Figure 1 for Vision Transformers for Dense Prediction

Figure 2 for Vision Transformers for Dense Prediction

Figure 3 for Vision Transformers for Dense Prediction

Figure 4 for Vision Transformers for Dense Prediction

Abstract:We introduce dense vision transformers, an architecture that leverages vision transformers in place of convolutional networks as a backbone for dense prediction tasks. We assemble tokens from various stages of the vision transformer into image-like representations at various resolutions and progressively combine them into full-resolution predictions using a convolutional decoder. The transformer backbone processes representations at a constant and relatively high resolution and has a global receptive field at every stage. These properties allow the dense vision transformer to provide finer-grained and more globally coherent predictions when compared to fully-convolutional networks. Our experiments show that this architecture yields substantial improvements on dense prediction tasks, especially when a large amount of training data is available. For monocular depth estimation, we observe an improvement of up to 28% in relative performance when compared to a state-of-the-art fully-convolutional network. When applied to semantic segmentation, dense vision transformers set a new state of the art on ADE20K with 49.02% mIoU. We further show that the architecture can be fine-tuned on smaller datasets such as NYUv2, KITTI, and Pascal Context where it also sets the new state of the art. Our models are available at https://github.com/intel-isl/DPT.

* 15 pages

Via

Access Paper or Ask Questions

Probabilistic two-stage detection

Mar 12, 2021

Xingyi Zhou, Vladlen Koltun, Philipp Krähenbühl

Figure 1 for Probabilistic two-stage detection

Figure 2 for Probabilistic two-stage detection

Figure 3 for Probabilistic two-stage detection

Figure 4 for Probabilistic two-stage detection

Abstract:We develop a probabilistic interpretation of two-stage object detection. We show that this probabilistic interpretation motivates a number of common empirical training practices. It also suggests changes to two-stage detection pipelines. Specifically, the first stage should infer proper object-vs-background likelihoods, which should then inform the overall score of the detector. A standard region proposal network (RPN) cannot infer this likelihood sufficiently well, but many one-stage detectors can. We show how to build a probabilistic two-stage detector from any state-of-the-art one-stage detector. The resulting detectors are faster and more accurate than both their one- and two-stage precursors. Our detector achieves 56.4 mAP on COCO test-dev with single-scale testing, outperforming all published results. Using a lightweight backbone, our detector achieves 49.2 mAP on COCO at 33 fps on a Titan Xp, outperforming the popular YOLOv4 model.

* Code is available at https://github.com/xingyizhou/CenterNet2

Via

Access Paper or Ask Questions

Large Batch Simulation for Deep Reinforcement Learning

Mar 12, 2021

Brennan Shacklett, Erik Wijmans, Aleksei Petrenko, Manolis Savva, Dhruv Batra, Vladlen Koltun, Kayvon Fatahalian

Figure 1 for Large Batch Simulation for Deep Reinforcement Learning

Figure 2 for Large Batch Simulation for Deep Reinforcement Learning

Figure 3 for Large Batch Simulation for Deep Reinforcement Learning

Figure 4 for Large Batch Simulation for Deep Reinforcement Learning

Abstract:We accelerate deep reinforcement learning-based training in visually complex 3D environments by two orders of magnitude over prior work, realizing end-to-end training speeds of over 19,000 frames of experience per second on a single GPU and up to 72,000 frames per second on a single eight-GPU machine. The key idea of our approach is to design a 3D renderer and embodied navigation simulator around the principle of "batch simulation": accepting and executing large batches of requests simultaneously. Beyond exposing large amounts of work at once, batch simulation allows implementations to amortize in-memory storage of scene assets, rendering work, data loading, and synchronization costs across many simulation requests, dramatically improving the number of simulated agents per GPU and overall simulation throughput. To balance DNN inference and training costs with faster simulation, we also build a computationally efficient policy DNN that maintains high task performance, and modify training algorithms to maintain sample efficiency when training with large mini-batches. By combining batch simulation and DNN performance optimizations, we demonstrate that PointGoal navigation agents can be trained in complex 3D environments on a single GPU in 1.5 days to 97% of the accuracy of agents trained on a prior state-of-the-art system using a 64-GPU cluster over three days. We provide open-source reference implementations of our batch 3D renderer and simulator to facilitate incorporation of these ideas into RL systems.

* Published as a conference paper at ICLR 2021

Via

Access Paper or Ask Questions

Self-supervised Geometric Perception

Mar 04, 2021

Heng Yang, Wei Dong, Luca Carlone, Vladlen Koltun

Figure 1 for Self-supervised Geometric Perception

Figure 2 for Self-supervised Geometric Perception

Figure 3 for Self-supervised Geometric Perception

Figure 4 for Self-supervised Geometric Perception

Abstract:We present self-supervised geometric perception (SGP), the first general framework to learn a feature descriptor for correspondence matching without any ground-truth geometric model labels (e.g., camera poses, rigid transformations). Our first contribution is to formulate geometric perception as an optimization problem that jointly optimizes the feature descriptor and the geometric models given a large corpus of visual measurements (e.g., images, point clouds). Under this optimization formulation, we show that two important streams of research in vision, namely robust model fitting and deep feature learning, correspond to optimizing one block of the unknown variables while fixing the other block. This analysis naturally leads to our second contribution -- the SGP algorithm that performs alternating minimization to solve the joint optimization. SGP iteratively executes two meta-algorithms: a teacher that performs robust model fitting given learned features to generate geometric pseudo-labels, and a student that performs deep feature learning under noisy supervision of the pseudo-labels. As a third contribution, we apply SGP to two perception problems on large-scale real datasets, namely relative camera pose estimation on MegaDepth and point cloud registration on 3DMatch. We demonstrate that SGP achieves state-of-the-art performance that is on-par or superior to the supervised oracles trained using ground-truth labels.

* CVPR 2021, Oral presentation. 8 pages main results, 19 pages in total, including references and supplementary

Via

Access Paper or Ask Questions

Simple multi-dataset detection

Feb 25, 2021

Xingyi Zhou, Vladlen Koltun, Philipp Krähenbühl

Figure 1 for Simple multi-dataset detection

Figure 2 for Simple multi-dataset detection

Figure 3 for Simple multi-dataset detection

Figure 4 for Simple multi-dataset detection

Abstract:How do we build a general and broad object detection system? We use all labels of all concepts ever annotated. These labels span diverse datasets with potentially inconsistent taxonomies. In this paper, we present a simple method for training a unified detector on multiple large-scale datasets. We use dataset-specific training protocols and losses, but share a common detection architecture with dataset-specific outputs. We show how to automatically integrate these dataset-specific outputs into a common semantic taxonomy. In contrast to prior work, our approach does not require manual taxonomy reconciliation. Our multi-dataset detector performs as well as dataset-specific models on each training domain, but generalizes much better to new unseen domains. Entries based on the presented methodology ranked first in the object detection and instance segmentation tracks of the ECCV 2020 Robust Vision Challenge.

* code is available at https://github.com/xingyizhou/UniDet

Via

Access Paper or Ask Questions