Alert button
Picture for Kartik Ahuja

Kartik Ahuja

Alert button

Context is Environment

Sep 20, 2023
Sharut Gupta, Stefanie Jegelka, David Lopez-Paz, Kartik Ahuja

Two lines of work are taking the central stage in AI research. On the one hand, the community is making increasing efforts to build models that discard spurious correlations and generalize better in novel test environments. Unfortunately, the bitter lesson so far is that no proposal convincingly outperforms a simple empirical risk minimization baseline. On the other hand, large language models (LLMs) have erupted as algorithms able to learn in-context, generalizing on-the-fly to eclectic contextual circumstances that users enforce by means of prompting. In this paper, we argue that context is environment, and posit that in-context learning holds the key to better domain generalization. Via extensive theory and experiments, we show that paying attention to context$\unicode{x2013}\unicode{x2013}$unlabeled examples as they arrive$\unicode{x2013}\unicode{x2013}$allows our proposed In-Context Risk Minimization (ICRM) algorithm to zoom-in on the test environment risk minimizer, leading to significant out-of-distribution performance improvements. From all of this, two messages are worth taking home. Researchers in domain generalization should consider environment as context, and harness the adaptive power of in-context learning. Researchers in LLMs should consider context as environment, to better structure data towards generalization.

* 41 Pages, 4 Figures 
Viaarxiv icon

Identifiability of Discretized Latent Coordinate Systems via Density Landmarks Detection

Jun 28, 2023
Vitória Barin-Pacela, Kartik Ahuja, Simon Lacoste-Julien, Pascal Vincent

Figure 1 for Identifiability of Discretized Latent Coordinate Systems via Density Landmarks Detection
Figure 2 for Identifiability of Discretized Latent Coordinate Systems via Density Landmarks Detection
Figure 3 for Identifiability of Discretized Latent Coordinate Systems via Density Landmarks Detection
Figure 4 for Identifiability of Discretized Latent Coordinate Systems via Density Landmarks Detection

Disentanglement aims to recover meaningful latent ground-truth factors from only the observed distribution. Identifiability provides the theoretical grounding for disentanglement to be well-founded. Unfortunately, unsupervised identifiability of independent latent factors is a theoretically proven impossibility in the i.i.d. setting under a general nonlinear smooth map from factors to observations. In this work, we show that, remarkably, it is possible to recover discretized latent coordinates under a highly generic nonlinear smooth mapping (a diffeomorphism) without any additional inductive bias on the mapping. This is, assuming that latent density has axis-aligned discontinuity landmarks, but without making the unrealistic assumption of statistical independence of the factors. We introduce this novel form of identifiability, termed quantized coordinate identifiability, and provide a comprehensive proof of the recovery of discretized coordinates.

Viaarxiv icon

A Closer Look at In-Context Learning under Distribution Shifts

May 26, 2023
Kartik Ahuja, David Lopez-Paz

Figure 1 for A Closer Look at In-Context Learning under Distribution Shifts
Figure 2 for A Closer Look at In-Context Learning under Distribution Shifts

In-context learning, a capability that enables a model to learn from input examples on the fly without necessitating weight updates, is a defining characteristic of large language models. In this work, we follow the setting proposed in (Garg et al., 2022) to better understand the generality and limitations of in-context learning from the lens of the simple yet fundamental task of linear regression. The key question we aim to address is: Are transformers more adept than some natural and simpler architectures at performing in-context learning under varying distribution shifts? To compare transformers, we propose to use a simple architecture based on set-based Multi-Layer Perceptrons (MLPs). We find that both transformers and set-based MLPs exhibit in-context learning under in-distribution evaluations, but transformers more closely emulate the performance of ordinary least squares (OLS). Transformers also display better resilience to mild distribution shifts, where set-based MLPs falter. However, under severe distribution shifts, both models' in-context learning abilities diminish.

Viaarxiv icon

Reusable Slotwise Mechanisms

Feb 21, 2023
Trang Nguyen, Amin Mansouri, Kanika Madan, Khuong Nguyen, Kartik Ahuja, Dianbo Liu, Yoshua Bengio

Figure 1 for Reusable Slotwise Mechanisms
Figure 2 for Reusable Slotwise Mechanisms
Figure 3 for Reusable Slotwise Mechanisms
Figure 4 for Reusable Slotwise Mechanisms

Agents that can understand and reason over the dynamics of objects can have a better capability to act robustly and generalize to novel scenarios. Such an ability, however, requires a suitable representation of the scene as well as an understanding of the mechanisms that govern the interactions of different subsets of objects. To address this problem, we propose RSM, or Reusable Slotwise Mechanisms, that jointly learns a slotwise representation of the scene and a modular architecture that dynamically chooses one mechanism among a set of reusable mechanisms to predict the next state of each slot. RSM crucially takes advantage of a \textit{Central Contextual Information (CCI)}, which lets each selected reusable mechanism access the rest of the slots through a bottleneck, effectively allowing for modeling higher order and complex interactions that might require a sparse subset of objects. We show how this model outperforms state-of-the-art methods in a variety of next-step prediction tasks ranging from grid-world environments to Atari 2600 games. Particularly, we challenge methods that model the dynamics with Graph Neural Networks (GNNs) on top of slotwise representations, and modular architectures that restrict the interactions to be only pairwise. Finally, we show that RSM is able to generalize to scenes with objects varying in number and shape, highlighting its out-of-distribution generalization capabilities. Our implementation is available online\footnote{\hyperlink{https://github.com/trangnnp/RSM}{github.com/trangnnp/RSM}}.

Viaarxiv icon

Recycling diverse models for out-of-distribution generalization

Dec 20, 2022
Alexandre Ramé, Kartik Ahuja, Jianyu Zhang, Matthieu Cord, Léon Bottou, David Lopez-Paz

Figure 1 for Recycling diverse models for out-of-distribution generalization
Figure 2 for Recycling diverse models for out-of-distribution generalization
Figure 3 for Recycling diverse models for out-of-distribution generalization
Figure 4 for Recycling diverse models for out-of-distribution generalization

Foundation models are redefining how AI systems are built. Practitioners now follow a standard procedure to build their machine learning solutions: download a copy of a foundation model, and fine-tune it using some in-house data about the target task of interest. Consequently, the Internet is swarmed by a handful of foundation models fine-tuned on many diverse tasks. Yet, these individual fine-tunings often lack strong generalization and exist in isolation without benefiting from each other. In our opinion, this is a missed opportunity, as these specialized models contain diverse features. Based on this insight, we propose model recycling, a simple strategy that leverages multiple fine-tunings of the same foundation model on diverse auxiliary tasks, and repurposes them as rich and diverse initializations for the target task. Specifically, model recycling fine-tunes in parallel each specialized model on the target task, and then averages the weights of all target fine-tunings into a final model. Empirically, we show that model recycling maximizes model diversity by benefiting from diverse auxiliary tasks, and achieves a new state of the art on the reference DomainBed benchmark for out-of-distribution generalization. Looking forward, model recycling is a contribution to the emerging paradigm of updatable machine learning where, akin to open-source software development, the community collaborates to incrementally and reliably update machine learning models.

* 24 pages, 7 tables, 13 figures 
Viaarxiv icon

Empirical Study on Optimizer Selection for Out-of-Distribution Generalization

Nov 18, 2022
Hiroki Naganuma, Kartik Ahuja, Shiro Takagi, Tetsuya Motokawa, Rio Yokota, Kohta Ishikawa, Ikuro Sato, Ioannis Mitliagkas

Figure 1 for Empirical Study on Optimizer Selection for Out-of-Distribution Generalization
Figure 2 for Empirical Study on Optimizer Selection for Out-of-Distribution Generalization
Figure 3 for Empirical Study on Optimizer Selection for Out-of-Distribution Generalization
Figure 4 for Empirical Study on Optimizer Selection for Out-of-Distribution Generalization

Modern deep learning systems are fragile and do not generalize well under distribution shifts. While much promising work has been accomplished to address these concerns, a systematic study of the role of optimizers and their out-of-distribution generalization performance has not been undertaken. In this study, we examine the performance of popular first-order optimizers for different classes of distributional shift under empirical risk minimization and invariant risk minimization. We address the problem settings for image and text classification using DomainBed, WILDS, and Backgrounds Challenge as out-of-distribution datasets for the exhaustive study. We search over a wide range of hyperparameters and examine the classification accuracy (in-distribution and out-of-distribution) for over 20,000 models. We arrive at the following findings: i) contrary to conventional wisdom, adaptive optimizers (e.g., Adam) perform worse than non-adaptive optimizers (e.g., SGD, momentum-based SGD), ii) in-distribution performance and out-of-distribution performance exhibit three types of behavior depending on the dataset - linear returns, increasing returns, and diminishing returns. We believe these findings can help practitioners choose the right optimizer and know what behavior to expect.

* NeurIPS2022 Workshop on Distribution Shifts (DistShift) 
Viaarxiv icon

FL Games: A Federated Learning Framework for Distribution Shifts

Oct 31, 2022
Sharut Gupta, Kartik Ahuja, Mohammad Havaei, Niladri Chatterjee, Yoshua Bengio

Figure 1 for FL Games: A Federated Learning Framework for Distribution Shifts
Figure 2 for FL Games: A Federated Learning Framework for Distribution Shifts
Figure 3 for FL Games: A Federated Learning Framework for Distribution Shifts
Figure 4 for FL Games: A Federated Learning Framework for Distribution Shifts

Federated learning aims to train predictive models for data that is distributed across clients, under the orchestration of a server. However, participating clients typically each hold data from a different distribution, which can yield to catastrophic generalization on data from a different client, which represents a new domain. In this work, we argue that in order to generalize better across non-i.i.d. clients, it is imperative to only learn correlations that are stable and invariant across domains. We propose FL GAMES, a game-theoretic framework for federated learning that learns causal features that are invariant across clients. While training to achieve the Nash equilibrium, the traditional best response strategy suffers from high-frequency oscillations. We demonstrate that FL GAMES effectively resolves this challenge and exhibits smooth performance curves. Further, FL GAMES scales well in the number of clients, requires significantly fewer communication rounds, and is agnostic to device heterogeneity. Through empirical evaluation, we demonstrate that FL GAMES achieves high out-of-distribution performance on various benchmarks.

* Accepted as ORAL at NeurIPS Workshop on Federated Learning: Recent Advances and New Challenges. arXiv admin note: text overlap with arXiv:2205.11101 
Viaarxiv icon

Interventional Causal Representation Learning

Oct 09, 2022
Kartik Ahuja, Yixin Wang, Divyat Mahajan, Yoshua Bengio

Figure 1 for Interventional Causal Representation Learning
Figure 2 for Interventional Causal Representation Learning

The theory of identifiable representation learning aims to build general-purpose methods that extract high-level latent (causal) factors from low-level sensory data. Most existing works focus on identifiable representation learning with observational data, relying on distributional assumptions on latent (causal) factors. However, in practice, we often also have access to interventional data for representation learning. How can we leverage interventional data to help identify high-level latents? To this end, we explore the role of interventional data for identifiable representation learning in this work. We study the identifiability of latent causal factors with and without interventional data, under minimal distributional assumptions on the latents. We prove that, if the true latent variables map to the observed high-dimensional data via a polynomial function, then representation learning via minimizing the standard reconstruction loss of autoencoders identifies the true latents up to affine transformation. If we further have access to interventional data generated by hard $do$ interventions on some of the latents, then we can identify these intervened latents up to permutation, shift and scaling.

Viaarxiv icon

Weakly Supervised Representation Learning with Sparse Perturbations

Jun 02, 2022
Kartik Ahuja, Jason Hartford, Yoshua Bengio

Figure 1 for Weakly Supervised Representation Learning with Sparse Perturbations
Figure 2 for Weakly Supervised Representation Learning with Sparse Perturbations
Figure 3 for Weakly Supervised Representation Learning with Sparse Perturbations
Figure 4 for Weakly Supervised Representation Learning with Sparse Perturbations

The theory of representation learning aims to build methods that provably invert the data generating process with minimal domain knowledge or any source of supervision. Most prior approaches require strong distributional assumptions on the latent variables and weak supervision (auxiliary information such as timestamps) to provide provable identification guarantees. In this work, we show that if one has weak supervision from observations generated by sparse perturbations of the latent variables--e.g. images in a reinforcement learning environment where actions move individual sprites--identification is achievable under unknown continuous latent distributions. We show that if the perturbations are applied only on mutually exclusive blocks of latents, we identify the latents up to those blocks. We also show that if these perturbation blocks overlap, we identify latents up to the smallest blocks shared across perturbations. Consequently, if there are blocks that intersect in one latent variable only, then such latents are identified up to permutation and scaling. We propose a natural estimation procedure based on this theory and illustrate it on low-dimensional synthetic and image-based experiments.

Viaarxiv icon