Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zeynep Akata

Meta-in-context learning in large language models

May 22, 2023

Julian Coda-Forno, Marcel Binz, Zeynep Akata, Matthew Botvinick, Jane X. Wang, Eric Schulz

Abstract:Large language models have shown tremendous performance in a variety of tasks. In-context learning -- the ability to improve at a task after being provided with a number of demonstrations -- is seen as one of the main contributors to their success. In the present paper, we demonstrate that the in-context learning abilities of large language models can be recursively improved via in-context learning itself. We coin this phenomenon meta-in-context learning. Looking at two idealized domains, a one-dimensional regression task and a two-armed bandit task, we show that meta-in-context learning adaptively reshapes a large language model's priors over expected tasks. Furthermore, we find that meta-in-context learning modifies the in-context learning strategies of such models. Finally, we extend our approach to a benchmark of real-world regression problems where we observe competitive performance to traditional learning algorithms. Taken together, our work improves our understanding of in-context learning and paves the way toward adapting large language models to the environment they are applied purely through meta-in-context learning rather than traditional finetuning.

Via

Access Paper or Ask Questions

If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based Text-to-Image Generation by Selection

May 22, 2023

Shyamgopal Karthik, Karsten Roth, Massimiliano Mancini, Zeynep Akata

Figure 1 for If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based Text-to-Image Generation by Selection

Figure 2 for If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based Text-to-Image Generation by Selection

Figure 3 for If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based Text-to-Image Generation by Selection

Figure 4 for If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based Text-to-Image Generation by Selection

Abstract:Despite their impressive capabilities, diffusion-based text-to-image (T2I) models can lack faithfulness to the text prompt, where generated images may not contain all the mentioned objects, attributes or relations. To alleviate these issues, recent works proposed post-hoc methods to improve model faithfulness without costly retraining, by modifying how the model utilizes the input prompt. In this work, we take a step back and show that large T2I diffusion models are more faithful than usually assumed, and can generate images faithful to even complex prompts without the need to manipulate the generative process. Based on that, we show how faithfulness can be simply treated as a candidate selection problem instead, and introduce a straightforward pipeline that generates candidate images for a text prompt and picks the best one according to an automatic scoring system that can leverage already existing T2I evaluation metrics. Quantitative comparisons alongside user studies on diverse benchmarks show consistently improved faithfulness over post-hoc enhancement methods, with comparable or lower computational cost. Code is available at \url{https://github.com/ExplainableML/ImageSelect}.

Via

Access Paper or Ask Questions

Inducing anxiety in large language models increases exploration and bias

Apr 21, 2023

Julian Coda-Forno, Kristin Witte, Akshay K. Jagadish, Marcel Binz, Zeynep Akata, Eric Schulz

Figure 1 for Inducing anxiety in large language models increases exploration and bias

Figure 2 for Inducing anxiety in large language models increases exploration and bias

Figure 3 for Inducing anxiety in large language models increases exploration and bias

Figure 4 for Inducing anxiety in large language models increases exploration and bias

Abstract:Large language models are transforming research on machine learning while galvanizing public debates. Understanding not only when these models work well and succeed but also why they fail and misbehave is of great societal relevance. We propose to turn the lens of computational psychiatry, a framework used to computationally describe and modify aberrant behavior, to the outputs produced by these models. We focus on the Generative Pre-Trained Transformer 3.5 and subject it to tasks commonly studied in psychiatry. Our results show that GPT-3.5 responds robustly to a common anxiety questionnaire, producing higher anxiety scores than human subjects. Moreover, GPT-3.5's responses can be predictably changed by using emotion-inducing prompts. Emotion-induction not only influences GPT-3.5's behavior in a cognitive task measuring exploratory decision-making but also influences its behavior in a previously-established task measuring biases such as racism and ableism. Crucially, GPT-3.5 shows a strong increase in biases when prompted with anxiety-inducing text. Thus, it is likely that how prompts are communicated to large language models has a strong influence on their behavior in applied settings. These results progress our understanding of prompt engineering and demonstrate the usefulness of methods taken from computational psychiatry for studying the capable algorithms to which we increasingly delegate authority and autonomy.

Via

Access Paper or Ask Questions

Exposing and Mitigating Spurious Correlations for Cross-Modal Retrieval

Apr 06, 2023

Jae Myung Kim, A. Sophia Koepke, Cordelia Schmid, Zeynep Akata

Figure 1 for Exposing and Mitigating Spurious Correlations for Cross-Modal Retrieval

Figure 2 for Exposing and Mitigating Spurious Correlations for Cross-Modal Retrieval

Figure 3 for Exposing and Mitigating Spurious Correlations for Cross-Modal Retrieval

Figure 4 for Exposing and Mitigating Spurious Correlations for Cross-Modal Retrieval

Abstract:Cross-modal retrieval methods are the preferred tool to search databases for the text that best matches a query image and vice versa. However, image-text retrieval models commonly learn to memorize spurious correlations in the training data, such as frequent object co-occurrence, instead of looking at the actual underlying reasons for the prediction in the image. For image-text retrieval, this manifests in retrieved sentences that mention objects that are not present in the query image. In this work, we introduce ODmAP@k, an object decorrelation metric that measures a model's robustness to spurious correlations in the training data. We use automatic image and text manipulations to control the presence of such object correlations in designated test data. Additionally, our data synthesis technique is used to tackle model biases due to spurious correlations of semantically unrelated objects in the training data. We apply our proposed pipeline, which involves the finetuning of image-text retrieval frameworks on carefully designed synthetic data, to three state-of-the-art models for image-text retrieval. This results in significant improvements for all three models, both in terms of the standard retrieval performance and in terms of our object decorrelation metric. The code is available at https://github.com/ExplainableML/Spurious_CM_Retrieval.

* CVPR'23 MULA Workshop

Via

Access Paper or Ask Questions

Bridging the Gap between Model Explanations in Partially Annotated Multi-label Classification

Apr 04, 2023

Youngwook Kim, Jae Myung Kim, Jieun Jeong, Cordelia Schmid, Zeynep Akata, Jungwoo Lee

Abstract:Due to the expensive costs of collecting labels in multi-label classification datasets, partially annotated multi-label classification has become an emerging field in computer vision. One baseline approach to this task is to assume unobserved labels as negative labels, but this assumption induces label noise as a form of false negative. To understand the negative impact caused by false negative labels, we study how these labels affect the model's explanation. We observe that the explanation of two models, trained with full and partial labels each, highlights similar regions but with different scaling, where the latter tends to have lower attribution scores. Based on these findings, we propose to boost the attribution scores of the model trained with partial labels to make its explanation resemble that of the model trained with full labels. Even with the conceptually simple approach, the multi-label classification performance improves by a large margin in three different datasets on a single positive label setting and one on a large-scale partial label setting. Code is available at https://github.com/youngwk/BridgeGapExplanationPAMC.

* CVPR2023 Camera-ready

Via

Access Paper or Ask Questions

Posterior Annealing: Fast Calibrated Uncertainty for Regression

Feb 21, 2023

Uddeshya Upadhyay, Jae Myung Kim, Cordelia Schmidt, Bernhard Schölkopf, Zeynep Akata

Abstract:Bayesian deep learning approaches that allow uncertainty estimation for regression problems often converge slowly and yield poorly calibrated uncertainty estimates that can not be effectively used for quantification. Recently proposed post hoc calibration techniques are seldom applicable to regression problems and often add overhead to an already slow model training phase. This work presents a fast calibrated uncertainty estimation method for regression tasks, called posterior annealing, that consistently improves the convergence of deep regression models and yields calibrated uncertainty without any post hoc calibration phase. Unlike previous methods for calibrated uncertainty in regression that focus only on low-dimensional regression problems, our method works well on a wide spectrum of regression problems. Our empirical analysis shows that our approach is generalizable to various network architectures including, multilayer perceptrons, 1D/2D convolutional networks, and graph neural networks, on five vastly diverse tasks, i.e., chaotic particle trajectory denoising, physical property prediction of molecules using 3D atomistic representation, natural image super-resolution, and medical image translation using MRI images.

* 11 pages, 6 figures, 2 tables

Via

Access Paper or Ask Questions

Urban Scene Semantic Segmentation with Low-Cost Coarse Annotation

Dec 15, 2022

Anurag Das, Yongqin Xian, Yang He, Zeynep Akata, Bernt Schiele

Figure 1 for Urban Scene Semantic Segmentation with Low-Cost Coarse Annotation

Figure 2 for Urban Scene Semantic Segmentation with Low-Cost Coarse Annotation

Figure 3 for Urban Scene Semantic Segmentation with Low-Cost Coarse Annotation

Figure 4 for Urban Scene Semantic Segmentation with Low-Cost Coarse Annotation

Abstract:For best performance, today's semantic segmentation methods use large and carefully labeled datasets, requiring expensive annotation budgets. In this work, we show that coarse annotation is a low-cost but highly effective alternative for training semantic segmentation models. Considering the urban scene segmentation scenario, we leverage cheap coarse annotations for real-world captured data, as well as synthetic data to train our model and show competitive performance compared with finely annotated real-world data. Specifically, we propose a coarse-to-fine self-training framework that generates pseudo labels for unlabeled regions of the coarsely annotated data, using synthetic data to improve predictions around the boundaries between semantic classes, and using cross-domain data augmentation to increase diversity. Our extensive experimental results on Cityscapes and BDD100k datasets demonstrate that our method achieves a significantly better performance vs annotation cost tradeoff, yielding a comparable performance to fully annotated data with only a small fraction of the annotation budget. Also, when used as pretraining, our framework performs better compared to the standard fully supervised setting.

* Accepted at WACV 2023

Via

Access Paper or Ask Questions

Distilling Knowledge from Self-Supervised Teacher by Embedding Graph Alignment

Nov 23, 2022

Yuchen Ma, Yanbei Chen, Zeynep Akata

Figure 1 for Distilling Knowledge from Self-Supervised Teacher by Embedding Graph Alignment

Figure 2 for Distilling Knowledge from Self-Supervised Teacher by Embedding Graph Alignment

Figure 3 for Distilling Knowledge from Self-Supervised Teacher by Embedding Graph Alignment

Figure 4 for Distilling Knowledge from Self-Supervised Teacher by Embedding Graph Alignment

Abstract:Recent advances have indicated the strengths of self-supervised pre-training for improving representation learning on downstream tasks. Existing works often utilize self-supervised pre-trained models by fine-tuning on downstream tasks. However, fine-tuning does not generalize to the case when one needs to build a customized model architecture different from the self-supervised model. In this work, we formulate a new knowledge distillation framework to transfer the knowledge from self-supervised pre-trained models to any other student network by a novel approach named Embedding Graph Alignment. Specifically, inspired by the spirit of instance discrimination in self-supervised learning, we model the instance-instance relations by a graph formulation in the feature embedding space and distill the self-supervised teacher knowledge to a student network by aligning the teacher graph and the student graph. Our distillation scheme can be flexibly applied to transfer the self-supervised knowledge to enhance representation learning on various student networks. We demonstrate that our model outperforms multiple representative knowledge distillation methods on three benchmark datasets, including CIFAR100, STL10, and TinyImageNet. Code is here: https://github.com/yccm/EGA.

* British Machine Vision Conference (BMVC 2022)

Via

Access Paper or Ask Questions

Momentum-based Weight Interpolation of Strong Zero-Shot Models for Continual Learning

Nov 06, 2022

Zafir Stojanovski, Karsten Roth, Zeynep Akata

Figure 1 for Momentum-based Weight Interpolation of Strong Zero-Shot Models for Continual Learning

Figure 2 for Momentum-based Weight Interpolation of Strong Zero-Shot Models for Continual Learning

Figure 3 for Momentum-based Weight Interpolation of Strong Zero-Shot Models for Continual Learning

Figure 4 for Momentum-based Weight Interpolation of Strong Zero-Shot Models for Continual Learning

Abstract:Large pre-trained, zero-shot capable models have shown considerable success both for standard transfer and adaptation tasks, with particular robustness towards distribution shifts. In addition, subsequent fine-tuning can considerably improve performance on a selected downstream task. However, through naive fine-tuning, these zero-shot models lose their generalizability and robustness towards distribution shifts. This is a particular problem for tasks such as Continual Learning (CL), where continuous adaptation has to be performed as new task distributions are introduced sequentially. In this work, we showcase that where fine-tuning falls short to adapt such zero-shot capable models, simple momentum-based weight interpolation can provide consistent improvements for CL tasks in both memory-free and memory-based settings. In particular, we find improvements of over $+4\%$ on standard CL benchmarks, while reducing the error to the upper limit of jointly training on all tasks at once in parts by more than half, allowing the continual learner to inch closer to the joint training limits.

* First Workshop on Interpolation Regularizers and Beyond, NeurIPS 2022 (Spotlight) and Workshop on Distribution Shifts, NeurIPS 2022

Via

Access Paper or Ask Questions

PlanT: Explainable Planning Transformers via Object-Level Representations

Oct 25, 2022

Katrin Renz, Kashyap Chitta, Otniel-Bogdan Mercea, A. Sophia Koepke, Zeynep Akata, Andreas Geiger

Figure 1 for PlanT: Explainable Planning Transformers via Object-Level Representations

Figure 2 for PlanT: Explainable Planning Transformers via Object-Level Representations

Figure 3 for PlanT: Explainable Planning Transformers via Object-Level Representations

Figure 4 for PlanT: Explainable Planning Transformers via Object-Level Representations

Abstract:Planning an optimal route in a complex environment requires efficient reasoning about the surrounding scene. While human drivers prioritize important objects and ignore details not relevant to the decision, learning-based planners typically extract features from dense, high-dimensional grid representations containing all vehicle and road context information. In this paper, we propose PlanT, a novel approach for planning in the context of self-driving that uses a standard transformer architecture. PlanT is based on imitation learning with a compact object-level input representation. On the Longest6 benchmark for CARLA, PlanT outperforms all prior methods (matching the driving score of the expert) while being 5.3x faster than equivalent pixel-based planning baselines during inference. Combining PlanT with an off-the-shelf perception module provides a sensor-based driving system that is more than 10 points better in terms of driving score than the existing state of the art. Furthermore, we propose an evaluation protocol to quantify the ability of planners to identify relevant objects, providing insights regarding their decision-making. Our results indicate that PlanT can focus on the most relevant object in the scene, even when this object is geometrically distant.

* CoRL 2022. Project Page: https://www.katrinrenz.de/plant/

Via

Access Paper or Ask Questions