Alert button
Picture for Ross Goroshin

Ross Goroshin

Alert button

Estimating Residential Solar Potential Using Aerial Data

Jun 23, 2023
Ross Goroshin, Alex Wilson, Andrew Lamb, Betty Peng, Brandon Ewonus, Cornelius Ratsch, Jordan Raisher, Marisa Leung, Max Burq, Thomas Colthurst, William Rucklidge, Carl Elkin

Figure 1 for Estimating Residential Solar Potential Using Aerial Data
Figure 2 for Estimating Residential Solar Potential Using Aerial Data
Figure 3 for Estimating Residential Solar Potential Using Aerial Data
Figure 4 for Estimating Residential Solar Potential Using Aerial Data

Project Sunroof estimates the solar potential of residential buildings using high quality aerial data. That is, it estimates the potential solar energy (and associated financial savings) that can be captured by buildings if solar panels were to be installed on their roofs. Unfortunately its coverage is limited by the lack of high resolution digital surface map (DSM) data. We present a deep learning approach that bridges this gap by enhancing widely available low-resolution data, thereby dramatically increasing the coverage of Sunroof. We also present some ongoing efforts to potentially improve accuracy even further by replacing certain algorithmic components of the Sunroof processing pipeline with deep learning.

* ICLR 2023 - Tackling Climate Change with Machine Learning Workshop  
Viaarxiv icon

Block-State Transformer

Jun 15, 2023
Mahan Fathi, Jonathan Pilault, Pierre-Luc Bacon, Christopher Pal, Orhan Firat, Ross Goroshin

Figure 1 for Block-State Transformer
Figure 2 for Block-State Transformer
Figure 3 for Block-State Transformer
Figure 4 for Block-State Transformer

State space models (SSMs) have shown impressive results on tasks that require modeling long-range dependencies and efficiently scale to long sequences owing to their subquadratic runtime complexity. Originally designed for continuous signals, SSMs have shown superior performance on a plethora of tasks, in vision and audio; however, SSMs still lag Transformer performance in Language Modeling tasks. In this work, we propose a hybrid layer named Block-State Transformer (BST), that internally combines an SSM sublayer for long-range contextualization, and a Block Transformer sublayer for short-term representation of sequences. We study three different, and completely parallelizable, variants that integrate SSMs and block-wise attention. We show that our model outperforms similar Transformer-based architectures on language modeling perplexity and generalizes to longer sequences. In addition, the Block-State Transformer demonstrates more than tenfold increase in speed at the layer level compared to the Block-Recurrent Transformer when model parallelization is employed.

Viaarxiv icon

Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks

Apr 25, 2023
Jesse Farebrother, Joshua Greaves, Rishabh Agarwal, Charline Le Lan, Ross Goroshin, Pablo Samuel Castro, Marc G. Bellemare

Figure 1 for Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks
Figure 2 for Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks
Figure 3 for Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks
Figure 4 for Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks

Auxiliary tasks improve the representations learned by deep reinforcement learning agents. Analytically, their effect is reasonably well understood; in practice, however, their primary use remains in support of a main learning objective, rather than as a method for learning representations. This is perhaps surprising given that many auxiliary tasks are defined procedurally, and hence can be treated as an essentially infinite source of information about the environment. Based on this observation, we study the effectiveness of auxiliary tasks for learning rich representations, focusing on the setting where the number of tasks and the size of the agent's network are simultaneously increased. For this purpose, we derive a new family of auxiliary tasks based on the successor measure. These tasks are easy to implement and have appealing theoretical properties. Combined with a suitable off-policy learning rule, the result is a representation learning algorithm that can be understood as extending Mahadevan & Maggioni (2007)'s proto-value functions to deep reinforcement learning -- accordingly, we call the resulting object proto-value networks. Through a series of experiments on the Arcade Learning Environment, we demonstrate that proto-value networks produce rich features that may be used to obtain performance comparable to established algorithms, using only linear approximation and a small number (~4M) of interactions with the environment's reward function.

* ICLR 2023. Code and models $\href{https://github.com/google-research/google-research/tree/master/pvn}{\text{available}}$. 22 pages, 8 figures 
Viaarxiv icon

Learned Image Compression for Machine Perception

Nov 03, 2021
Felipe Codevilla, Jean Gabriel Simard, Ross Goroshin, Chris Pal

Figure 1 for Learned Image Compression for Machine Perception
Figure 2 for Learned Image Compression for Machine Perception
Figure 3 for Learned Image Compression for Machine Perception
Figure 4 for Learned Image Compression for Machine Perception

Recent work has shown that learned image compression strategies can outperform standard hand-crafted compression algorithms that have been developed over decades of intensive research on the rate-distortion trade-off. With growing applications of computer vision, high quality image reconstruction from a compressible representation is often a secondary objective. Compression that ensures high accuracy on computer vision tasks such as image segmentation, classification, and detection therefore has the potential for significant impact across a wide variety of settings. In this work, we develop a framework that produces a compression format suitable for both human perception and machine perception. We show that representations can be learned that simultaneously optimize for compression and performance on core vision tasks. Our approach allows models to be trained directly from compressed representations, and this approach yields increased performance on new tasks and in low-shot learning settings. We present results that improve upon segmentation and detection performance compared to standard high quality JPGs, but with representations that are four to ten times smaller in terms of bits per pixel. Further, unlike naive compression methods, at a level ten times smaller than standard JEPGs, segmentation and detection models trained from our format suffer only minor degradation in performance.

* 13 pages, 6 figures 
Viaarxiv icon

Impact of Aliasing on Generalization in Deep Convolutional Networks

Aug 07, 2021
Cristina Vasconcelos, Hugo Larochelle, Vincent Dumoulin, Rob Romijnders, Nicolas Le Roux, Ross Goroshin

Figure 1 for Impact of Aliasing on Generalization in Deep Convolutional Networks
Figure 2 for Impact of Aliasing on Generalization in Deep Convolutional Networks
Figure 3 for Impact of Aliasing on Generalization in Deep Convolutional Networks
Figure 4 for Impact of Aliasing on Generalization in Deep Convolutional Networks

We investigate the impact of aliasing on generalization in Deep Convolutional Networks and show that data augmentation schemes alone are unable to prevent it due to structural limitations in widely used architectures. Drawing insights from frequency analysis theory, we take a closer look at ResNet and EfficientNet architectures and review the trade-off between aliasing and information loss in each of their major components. We show how to mitigate aliasing by inserting non-trainable low-pass filters at key locations, particularly where networks lack the capacity to learn them. These simple architectural changes lead to substantial improvements in generalization on i.i.d. and even more on out-of-distribution conditions, such as image classification under natural corruptions on ImageNet-C [11] and few-shot learning on Meta-Dataset [26]. State-of-the art results are achieved on both datasets without introducing additional trainable parameters and using the default hyper-parameters of open source codebases.

* Accepted to ICCV 2021 
Viaarxiv icon

Comparing Transfer and Meta Learning Approaches on a Unified Few-Shot Classification Benchmark

Apr 06, 2021
Vincent Dumoulin, Neil Houlsby, Utku Evci, Xiaohua Zhai, Ross Goroshin, Sylvain Gelly, Hugo Larochelle

Figure 1 for Comparing Transfer and Meta Learning Approaches on a Unified Few-Shot Classification Benchmark
Figure 2 for Comparing Transfer and Meta Learning Approaches on a Unified Few-Shot Classification Benchmark
Figure 3 for Comparing Transfer and Meta Learning Approaches on a Unified Few-Shot Classification Benchmark
Figure 4 for Comparing Transfer and Meta Learning Approaches on a Unified Few-Shot Classification Benchmark

Meta and transfer learning are two successful families of approaches to few-shot learning. Despite highly related goals, state-of-the-art advances in each family are measured largely in isolation of each other. As a result of diverging evaluation norms, a direct or thorough comparison of different approaches is challenging. To bridge this gap, we perform a cross-family study of the best transfer and meta learners on both a large-scale meta-learning benchmark (Meta-Dataset, MD), and a transfer learning benchmark (Visual Task Adaptation Benchmark, VTAB). We find that, on average, large-scale transfer methods (Big Transfer, BiT) outperform competing approaches on MD, even when trained only on ImageNet. In contrast, meta-learning approaches struggle to compete on VTAB when trained and validated on MD. However, BiT is not without limitations, and pushing for scale does not improve performance on highly out-of-distribution MD tasks. In performing this study, we reveal a number of discrepancies in evaluation norms and study some of these in light of the performance gap. We hope that this work facilitates sharing of insights from each community, and accelerates progress on few-shot learning.

Viaarxiv icon

An Effective Anti-Aliasing Approach for Residual Networks

Nov 20, 2020
Cristina Vasconcelos, Hugo Larochelle, Vincent Dumoulin, Nicolas Le Roux, Ross Goroshin

Figure 1 for An Effective Anti-Aliasing Approach for Residual Networks
Figure 2 for An Effective Anti-Aliasing Approach for Residual Networks
Figure 3 for An Effective Anti-Aliasing Approach for Residual Networks
Figure 4 for An Effective Anti-Aliasing Approach for Residual Networks

Image pre-processing in the frequency domain has traditionally played a vital role in computer vision and was even part of the standard pipeline in the early days of deep learning. However, with the advent of large datasets, many practitioners concluded that this was unnecessary due to the belief that these priors can be learned from the data itself. Frequency aliasing is a phenomenon that may occur when sub-sampling any signal, such as an image or feature map, causing distortion in the sub-sampled output. We show that we can mitigate this effect by placing non-trainable blur filters and using smooth activation functions at key locations, particularly where networks lack the capacity to learn them. These simple architectural changes lead to substantial improvements in out-of-distribution generalization on both image classification under natural corruptions on ImageNet-C [10] and few-shot learning on Meta-Dataset [17], without introducing additional trainable parameters and using the default hyper-parameters of open source codebases.

Viaarxiv icon

An Analysis of Object Representations in Deep Visual Trackers

Jan 08, 2020
Ross Goroshin, Jonathan Tompson, Debidatta Dwibedi

Figure 1 for An Analysis of Object Representations in Deep Visual Trackers
Figure 2 for An Analysis of Object Representations in Deep Visual Trackers
Figure 3 for An Analysis of Object Representations in Deep Visual Trackers
Figure 4 for An Analysis of Object Representations in Deep Visual Trackers

Fully convolutional deep correlation networks are integral components of state-of the-art approaches to single object visual tracking. It is commonly assumed that these networks perform tracking by detection by matching features of the object instance with features of the entire frame. Strong architectural priors and conditioning on the object representation is thought to encourage this tracking strategy. Despite these strong priors, we show that deep trackers often default to tracking by saliency detection - without relying on the object instance representation. Our analysis shows that despite being a useful prior, salience detection can prevent the emergence of more robust tracking strategies in deep networks. This leads us to introduce an auxiliary detection task that encourages more discriminative object representations that improve tracking performance.

Viaarxiv icon

Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples

Mar 07, 2019
Eleni Triantafillou, Tyler Zhu, Vincent Dumoulin, Pascal Lamblin, Kelvin Xu, Ross Goroshin, Carles Gelada, Kevin Swersky, Pierre-Antoine Manzagol, Hugo Larochelle

Figure 1 for Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples
Figure 2 for Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples
Figure 3 for Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples
Figure 4 for Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples

Few-shot classification refers to learning a classifier for new classes given only a few examples. While a plethora of models have emerged to tackle this recently, we find the current procedure and datasets that are used to systematically assess progress in this setting lacking. To address this, we propose Meta-Dataset: a new benchmark for training and evaluating few-shot classifiers that is large-scale, consists of multiple datasets, and presents more natural and realistic tasks. The aim is to measure the ability of state-of-the-art models to leverage diverse sources of data to achieve higher generalization, and to evaluate that generalization ability in a more challenging setting. We additionally measure robustness of current methods to variations in the number of available examples and the number of classes. Finally our extensive empirical evaluation leads us to identify weaknesses in Prototypical Networks and MAML, two popular few-shot classification methods, and to propose a new method, Proto-MAML, which achieves improved performance on our benchmark.

Viaarxiv icon

Learning to Navigate in Complex Environments

Jan 13, 2017
Piotr Mirowski, Razvan Pascanu, Fabio Viola, Hubert Soyer, Andrew J. Ballard, Andrea Banino, Misha Denil, Ross Goroshin, Laurent Sifre, Koray Kavukcuoglu, Dharshan Kumaran, Raia Hadsell

Figure 1 for Learning to Navigate in Complex Environments
Figure 2 for Learning to Navigate in Complex Environments
Figure 3 for Learning to Navigate in Complex Environments
Figure 4 for Learning to Navigate in Complex Environments

Learning to navigate in complex environments with dynamic elements is an important milestone in developing AI agents. In this work we formulate the navigation question as a reinforcement learning problem and show that data efficiency and task performance can be dramatically improved by relying on additional auxiliary tasks leveraging multimodal sensory inputs. In particular we consider jointly learning the goal-driven reinforcement learning problem with auxiliary depth prediction and loop closure classification tasks. This approach can learn to navigate from raw sensory input in complicated 3D mazes, approaching human-level performance even under conditions where the goal location changes frequently. We provide detailed analysis of the agent behaviour, its ability to localise, and its network activity dynamics, showing that the agent implicitly learns key navigation abilities.

* 11 pages, 5 appendix pages, 11 figures, 3 tables, under review as a conference paper at ICLR 2017 
Viaarxiv icon