Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yoshua Bengio

DIRO

Improving Gradient-guided Nested Sampling for Posterior Inference

Dec 06, 2023

Pablo Lemos, Nikolay Malkin, Will Handley, Yoshua Bengio, Yashar Hezaveh, Laurence Perreault-Levasseur

Figure 1 for Improving Gradient-guided Nested Sampling for Posterior Inference

Figure 2 for Improving Gradient-guided Nested Sampling for Posterior Inference

Figure 3 for Improving Gradient-guided Nested Sampling for Posterior Inference

Figure 4 for Improving Gradient-guided Nested Sampling for Posterior Inference

Abstract:We present a performant, general-purpose gradient-guided nested sampling algorithm, ${\tt GGNS}$, combining the state of the art in differentiable programming, Hamiltonian slice sampling, clustering, mode separation, dynamic nested sampling, and parallelization. This unique combination allows ${\tt GGNS}$ to scale well with dimensionality and perform competitively on a variety of synthetic and real-world problems. We also show the potential of combining nested sampling with generative flow networks to obtain large amounts of high-quality samples from the posterior distribution. This combination leads to faster mode discovery and more accurate estimates of the partition function.

* 10 pages, 5 figures. Code available at https://github.com/Pablo-Lemos/GGNS

Via

Access Paper or Ask Questions

Unlearning via Sparse Representations

Nov 26, 2023

Vedant Shah, Frederik Träuble, Ashish Malik, Hugo Larochelle, Michael Mozer, Sanjeev Arora, Yoshua Bengio, Anirudh Goyal

Abstract:Machine \emph{unlearning}, which involves erasing knowledge about a \emph{forget set} from a trained model, can prove to be costly and infeasible by existing techniques. We propose a nearly compute-free zero-shot unlearning technique based on a discrete representational bottleneck. We show that the proposed technique efficiently unlearns the forget set and incurs negligible damage to the model's performance on the rest of the data set. We evaluate the proposed technique on the problem of \textit{class unlearning} using three datasets: CIFAR-10, CIFAR-100, and LACUNA-100. We compare the proposed technique to SCRUB, a state-of-the-art approach which uses knowledge distillation for unlearning. Across all three datasets, the proposed technique performs as well as, if not better than SCRUB while incurring almost no computational cost.

Via

Access Paper or Ask Questions

Shortcut Bias Mitigation via Ensemble Diversity Using Diffusion Probabilistic Models

Nov 23, 2023

Luca Scimeca, Alexander Rubinstein, Damien Teney, Seong Joon Oh, Armand Mihai Nicolicioiu, Yoshua Bengio

Figure 1 for Shortcut Bias Mitigation via Ensemble Diversity Using Diffusion Probabilistic Models

Figure 2 for Shortcut Bias Mitigation via Ensemble Diversity Using Diffusion Probabilistic Models

Figure 3 for Shortcut Bias Mitigation via Ensemble Diversity Using Diffusion Probabilistic Models

Figure 4 for Shortcut Bias Mitigation via Ensemble Diversity Using Diffusion Probabilistic Models

Abstract:Spurious correlations in the data, where multiple cues are predictive of the target labels, often lead to a phenomenon known as simplicity bias, where a model relies on erroneous, easy-to-learn cues while ignoring reliable ones. In this work, we propose an ensemble diversification framework exploiting Diffusion Probabilistic Models (DPMs) for shortcut bias mitigation. We show that at particular training intervals, DPMs can generate images with novel feature combinations, even when trained on images displaying correlated input features. We leverage this crucial property to generate synthetic counterfactuals to increase model diversity via ensemble disagreement. We show that DPM-guided diversification is sufficient to remove dependence on primary shortcut cues, without a need for additional supervised signals. We further empirically quantify its efficacy on several diversification objectives, and finally show improved generalization and diversification performance on par with prior work that relies on auxiliary data collection.

* arXiv admin note: substantial text overlap with arXiv:2310.02230

Via

Access Paper or Ask Questions

SatBird: Bird Species Distribution Modeling with Remote Sensing and Citizen Science Data

Nov 02, 2023

Mélisande Teng, Amna Elmustafa, Benjamin Akera, Yoshua Bengio, Hager Radi Abdelwahed, Hugo Larochelle, David Rolnick

Figure 1 for SatBird: Bird Species Distribution Modeling with Remote Sensing and Citizen Science Data

Figure 2 for SatBird: Bird Species Distribution Modeling with Remote Sensing and Citizen Science Data

Figure 3 for SatBird: Bird Species Distribution Modeling with Remote Sensing and Citizen Science Data

Figure 4 for SatBird: Bird Species Distribution Modeling with Remote Sensing and Citizen Science Data

Abstract:Biodiversity is declining at an unprecedented rate, impacting ecosystem services necessary to ensure food, water, and human health and well-being. Understanding the distribution of species and their habitats is crucial for conservation policy planning. However, traditional methods in ecology for species distribution models (SDMs) generally focus either on narrow sets of species or narrow geographical areas and there remain significant knowledge gaps about the distribution of species. A major reason for this is the limited availability of data traditionally used, due to the prohibitive amount of effort and expertise required for traditional field monitoring. The wide availability of remote sensing data and the growing adoption of citizen science tools to collect species observations data at low cost offer an opportunity for improving biodiversity monitoring and enabling the modelling of complex ecosystems. We introduce a novel task for mapping bird species to their habitats by predicting species encounter rates from satellite images, and present SatBird, a satellite dataset of locations in the USA with labels derived from presence-absence observation data from the citizen science database eBird, considering summer (breeding) and winter seasons. We also provide a dataset in Kenya representing low-data regimes. We additionally provide environmental data and species range maps for each location. We benchmark a set of baselines on our dataset, including SOTA models for remote sensing tasks. SatBird opens up possibilities for scalably modelling properties of ecosystems worldwide.

* 37th Conference on Neural Information Processing Systems (NeurIPS 2023) Track on Datasets and Benchmarks

Via

Access Paper or Ask Questions

Object-centric architectures enable efficient causal representation learning

Oct 29, 2023

Amin Mansouri, Jason Hartford, Yan Zhang, Yoshua Bengio

Figure 1 for Object-centric architectures enable efficient causal representation learning

Figure 2 for Object-centric architectures enable efficient causal representation learning

Figure 3 for Object-centric architectures enable efficient causal representation learning

Figure 4 for Object-centric architectures enable efficient causal representation learning

Abstract:Causal representation learning has showed a variety of settings in which we can disentangle latent variables with identifiability guarantees (up to some reasonable equivalence class). Common to all of these approaches is the assumption that (1) the latent variables are represented as $d$-dimensional vectors, and (2) that the observations are the output of some injective generative function of these latent variables. While these assumptions appear benign, we show that when the observations are of multiple objects, the generative function is no longer injective and disentanglement fails in practice. We can address this failure by combining recent developments in object-centric learning and causal representation learning. By modifying the Slot Attention architecture arXiv:2006.15055, we develop an object-centric architecture that leverages weak supervision from sparse perturbations to disentangle each object's properties. This approach is more data-efficient in the sense that it requires significantly fewer perturbations than a comparable approach that encodes to a Euclidean space and we show that this approach successfully disentangles the properties of a set of objects in a series of simple image-based disentanglement experiments.

Via

Access Paper or Ask Questions

Laughing Hyena Distillery: Extracting Compact Recurrences From Convolutions

Oct 28, 2023

Stefano Massaroli, Michael Poli, Daniel Y. Fu, Hermann Kumbong, Rom N. Parnichkun, Aman Timalsina, David W. Romero, Quinn McIntyre, Beidi Chen, Atri Rudra(+4 more)

Figure 1 for Laughing Hyena Distillery: Extracting Compact Recurrences From Convolutions

Figure 2 for Laughing Hyena Distillery: Extracting Compact Recurrences From Convolutions

Figure 3 for Laughing Hyena Distillery: Extracting Compact Recurrences From Convolutions

Figure 4 for Laughing Hyena Distillery: Extracting Compact Recurrences From Convolutions

Abstract:Recent advances in attention-free sequence models rely on convolutions as alternatives to the attention operator at the core of Transformers. In particular, long convolution sequence models have achieved state-of-the-art performance in many domains, but incur a significant cost during auto-regressive inference workloads -- naively requiring a full pass (or caching of activations) over the input sequence for each generated token -- similarly to attention-based models. In this paper, we seek to enable $\mathcal O(1)$ compute and memory cost per token in any pre-trained long convolution architecture to reduce memory footprint and increase throughput during generation. Concretely, our methods consist in extracting low-dimensional linear state-space models from each convolution layer, building upon rational interpolation and model-order reduction techniques. We further introduce architectural improvements to convolution-based layers such as Hyena: by weight-tying the filters across channels into heads, we achieve higher pre-training quality and reduce the number of filters to be distilled. The resulting model achieves 10x higher throughput than Transformers and 1.5x higher than Hyena at 1.3B parameters, without any loss in quality after distillation.

Via

Access Paper or Ask Questions

OC-NMN: Object-centric Compositional Neural Module Network for Generative Visual Analogical Reasoning

Oct 28, 2023

Rim Assouel, Pau Rodriguez, Perouz Taslakian, David Vazquez, Yoshua Bengio

Abstract:A key aspect of human intelligence is the ability to imagine -- composing learned concepts in novel ways -- to make sense of new scenarios. Such capacity is not yet attained for machine learning systems. In this work, in the context of visual reasoning, we show how modularity can be leveraged to derive a compositional data augmentation framework inspired by imagination. Our method, denoted Object-centric Compositional Neural Module Network (OC-NMN), decomposes visual generative reasoning tasks into a series of primitives applied to objects without using a domain-specific language. We show that our modular architectural choices can be used to generate new training tasks that lead to better out-of-distribution generalization. We compare our model to existing and new baselines in proposed visual reasoning benchmark that consists of applying arithmetic operations to MNIST digits.

Via

Access Paper or Ask Questions

Managing AI Risks in an Era of Rapid Progress

Oct 26, 2023

Yoshua Bengio, Geoffrey Hinton, Andrew Yao, Dawn Song, Pieter Abbeel, Yuval Noah Harari, Ya-Qin Zhang, Lan Xue, Shai Shalev-Shwartz, Gillian Hadfield(+14 more)

Abstract:In this short consensus paper, we outline risks from upcoming, advanced AI systems. We examine large-scale social harms and malicious uses, as well as an irreversible loss of human control over autonomous AI systems. In light of rapid and continuing AI progress, we propose priorities for AI R&D and governance.

Via

Access Paper or Ask Questions

Causal machine learning for single-cell genomics

Oct 23, 2023

Alejandro Tejada-Lapuerta, Paul Bertin, Stefan Bauer, Hananeh Aliee, Yoshua Bengio, Fabian J. Theis

Figure 1 for Causal machine learning for single-cell genomics

Figure 2 for Causal machine learning for single-cell genomics

Figure 3 for Causal machine learning for single-cell genomics

Figure 4 for Causal machine learning for single-cell genomics

Abstract:Advances in single-cell omics allow for unprecedented insights into the transcription profiles of individual cells. When combined with large-scale perturbation screens, through which specific biological mechanisms can be targeted, these technologies allow for measuring the effect of targeted perturbations on the whole transcriptome. These advances provide an opportunity to better understand the causative role of genes in complex biological processes such as gene regulation, disease progression or cellular development. However, the high-dimensional nature of the data, coupled with the intricate complexity of biological systems renders this task nontrivial. Within the machine learning community, there has been a recent increase of interest in causality, with a focus on adapting established causal techniques and algorithms to handle high-dimensional data. In this perspective, we delineate the application of these methodologies within the realm of single-cell genomics and their challenges. We first present the model that underlies most of current causal approaches to single-cell biology and discuss and challenge the assumptions it entails from the biological point of view. We then identify open problems in the application of causal approaches to single-cell data: generalising to unseen environments, learning interpretable models, and learning causal models of dynamics. For each problem, we discuss how various research directions - including the development of computational approaches and the adaptation of experimental protocols - may offer ways forward, or on the contrary pose some difficulties. With the advent of single cell atlases and increasing perturbation data, we expect causal models to become a crucial tool for informed experimental design.

* 35 pages, 7 figures, 3 tables, 1 box

Via

Access Paper or Ask Questions

Towards equilibrium molecular conformation generation with GFlowNets

Oct 20, 2023

Alexandra Volokhova, Michał Koziarski, Alex Hernández-García, Cheng-Hao Liu, Santiago Miret, Pablo Lemos, Luca Thiede, Zichao Yan, Alán Aspuru-Guzik, Yoshua Bengio

Figure 1 for Towards equilibrium molecular conformation generation with GFlowNets

Figure 2 for Towards equilibrium molecular conformation generation with GFlowNets

Figure 3 for Towards equilibrium molecular conformation generation with GFlowNets

Figure 4 for Towards equilibrium molecular conformation generation with GFlowNets

Abstract:Sampling diverse, thermodynamically feasible molecular conformations plays a crucial role in predicting properties of a molecule. In this paper we propose to use GFlowNet for sampling conformations of small molecules from the Boltzmann distribution, as determined by the molecule's energy. The proposed approach can be used in combination with energy estimation methods of different fidelity and discovers a diverse set of low-energy conformations for highly flexible drug-like molecules. We demonstrate that GFlowNet can reproduce molecular potential energy surfaces by sampling proportionally to the Boltzmann distribution.

Via

Access Paper or Ask Questions