Alert button
Picture for Katerina Margatina

Katerina Margatina

Alert button

Active Learning Principles for In-Context Learning with Large Language Models

May 23, 2023
Katerina Margatina, Timo Schick, Nikolaos Aletras, Jane Dwivedi-Yu

Figure 1 for Active Learning Principles for In-Context Learning with Large Language Models
Figure 2 for Active Learning Principles for In-Context Learning with Large Language Models
Figure 3 for Active Learning Principles for In-Context Learning with Large Language Models
Figure 4 for Active Learning Principles for In-Context Learning with Large Language Models

The remarkable advancements in large language models (LLMs) have significantly enhanced the performance in few-shot learning settings. By using only a small number of labeled examples, referred to as demonstrations, LLMs can effectively grasp the task at hand through in-context learning. However, the process of selecting appropriate demonstrations has received limited attention in prior work. This paper addresses the issue of identifying the most informative demonstrations for few-shot learning by approaching it as a pool-based Active Learning (AL) problem over a single iteration. Our objective is to investigate how AL algorithms can serve as effective demonstration selection methods for in-context learning. We compare various standard AL algorithms based on uncertainty, diversity, and similarity, and consistently observe that the latter outperforms all other methods, including random sampling. Notably, uncertainty sampling, despite its success in conventional supervised learning scenarios, performs poorly in this context. Our extensive experimentation involving a diverse range of GPT and OPT models across $24$ classification and multi-choice tasks, coupled with thorough analysis, unambiguously demonstrates that in-context example selection through AL prioritizes high-quality examples that exhibit low uncertainty and bear similarity to the test examples.

Viaarxiv icon

On the Limitations of Simulating Active Learning

May 21, 2023
Katerina Margatina, Nikolaos Aletras

Figure 1 for On the Limitations of Simulating Active Learning
Figure 2 for On the Limitations of Simulating Active Learning

Active learning (AL) is a human-and-model-in-the-loop paradigm that iteratively selects informative unlabeled data for human annotation, aiming to improve over random sampling. However, performing AL experiments with human annotations on-the-fly is a laborious and expensive process, thus unrealistic for academic research. An easy fix to this impediment is to simulate AL, by treating an already labeled and publicly available dataset as the pool of unlabeled data. In this position paper, we first survey recent literature and highlight the challenges across all different steps within the AL loop. We further unveil neglected caveats in the experimental setup that can significantly affect the quality of AL research. We continue with an exploration of how the simulation setting can govern empirical findings, arguing that it might be one of the answers behind the ever posed question ``why do active learning algorithms sometimes fail to outperform random sampling?''. We argue that evaluating AL algorithms on available labeled datasets might provide a lower bound as to their effectiveness in real data. We believe it is essential to collectively shape the best practices for AL research, particularly as engineering advancements in LLMs push the research focus towards data-driven approaches (e.g., data efficiency, alignment, fairness). In light of this, we have developed guidelines for future work. Our aim is to draw attention to these limitations within the community, in the hope of finding ways to address them.

* To appear at Findings of ACL 2023 
Viaarxiv icon

Dynamic Benchmarking of Masked Language Models on Temporal Concept Drift with Multiple Views

Feb 23, 2023
Katerina Margatina, Shuai Wang, Yogarshi Vyas, Neha Anna John, Yassine Benajiba, Miguel Ballesteros

Figure 1 for Dynamic Benchmarking of Masked Language Models on Temporal Concept Drift with Multiple Views
Figure 2 for Dynamic Benchmarking of Masked Language Models on Temporal Concept Drift with Multiple Views
Figure 3 for Dynamic Benchmarking of Masked Language Models on Temporal Concept Drift with Multiple Views
Figure 4 for Dynamic Benchmarking of Masked Language Models on Temporal Concept Drift with Multiple Views

Temporal concept drift refers to the problem of data changing over time. In NLP, that would entail that language (e.g. new expressions, meaning shifts) and factual knowledge (e.g. new concepts, updated facts) evolve over time. Focusing on the latter, we benchmark $11$ pretrained masked language models (MLMs) on a series of tests designed to evaluate the effect of temporal concept drift, as it is crucial that widely used language models remain up-to-date with the ever-evolving factual updates of the real world. Specifically, we provide a holistic framework that (1) dynamically creates temporal test sets of any time granularity (e.g. month, quarter, year) of factual data from Wikidata, (2) constructs fine-grained splits of tests (e.g. updated, new, unchanged facts) to ensure comprehensive analysis, and (3) evaluates MLMs in three distinct ways (single-token probing, multi-token generation, MLM scoring). In contrast to prior work, our framework aims to unveil how robust an MLM is over time and thus to provide a signal in case it has become outdated, by leveraging multiple views of evaluation.

* To appear at EACL 2023. Our code will be available at https://github.com/amazon-science/temporal-robustness 
Viaarxiv icon

Investigating Multi-source Active Learning for Natural Language Inference

Feb 14, 2023
Ard Snijders, Douwe Kiela, Katerina Margatina

Figure 1 for Investigating Multi-source Active Learning for Natural Language Inference
Figure 2 for Investigating Multi-source Active Learning for Natural Language Inference
Figure 3 for Investigating Multi-source Active Learning for Natural Language Inference
Figure 4 for Investigating Multi-source Active Learning for Natural Language Inference

In recent years, active learning has been successfully applied to an array of NLP tasks. However, prior work often assumes that training and test data are drawn from the same distribution. This is problematic, as in real-life settings data may stem from several sources of varying relevance and quality. We show that four popular active learning schemes fail to outperform random selection when applied to unlabelled pools comprised of multiple data sources on the task of natural language inference. We reveal that uncertainty-based strategies perform poorly due to the acquisition of collective outliers, i.e., hard-to-learn instances that hamper learning and generalization. When outliers are removed, strategies are found to recover and outperform random baselines. In further analysis, we find that collective outliers vary in form between sources, and show that hard-to-learn data is not always categorically harmful. Lastly, we leverage dataset cartography to introduce difficulty-stratified testing and find that different strategies are affected differently by example learnability and difficulty.

* 23 pages. Accepted for publication at the European Chapter of the Association of Computational Linguistics (EACL) 2023 
Viaarxiv icon

Challenges and Strategies in Cross-Cultural NLP

Mar 18, 2022
Daniel Hershcovich, Stella Frank, Heather Lent, Miryam de Lhoneux, Mostafa Abdou, Stephanie Brandl, Emanuele Bugliarello, Laura Cabello Piqueras, Ilias Chalkidis, Ruixiang Cui, Constanza Fierro, Katerina Margatina, Phillip Rust, Anders Søgaard

Figure 1 for Challenges and Strategies in Cross-Cultural NLP

Various efforts in the Natural Language Processing (NLP) community have been made to accommodate linguistic diversity and serve speakers of many different languages. However, it is important to acknowledge that speakers and the content they produce and require, vary not just by language, but also by culture. Although language and culture are tightly linked, there are important differences. Analogous to cross-lingual and multilingual NLP, cross-cultural and multicultural NLP considers these differences in order to better serve users of NLP systems. We propose a principled framework to frame these efforts, and survey existing and potential strategies.

* ACL 2022 - Theme track 
Viaarxiv icon

Active Learning by Acquiring Contrastive Examples

Sep 08, 2021
Katerina Margatina, Giorgos Vernikos, Loïc Barrault, Nikolaos Aletras

Figure 1 for Active Learning by Acquiring Contrastive Examples
Figure 2 for Active Learning by Acquiring Contrastive Examples
Figure 3 for Active Learning by Acquiring Contrastive Examples
Figure 4 for Active Learning by Acquiring Contrastive Examples

Common acquisition functions for active learning use either uncertainty or diversity sampling, aiming to select difficult and diverse data points from the pool of unlabeled data, respectively. In this work, leveraging the best of both worlds, we propose an acquisition function that opts for selecting \textit{contrastive examples}, i.e. data points that are similar in the model feature space and yet the model outputs maximally different predictive likelihoods. We compare our approach, CAL (Contrastive Active Learning), with a diverse set of acquisition functions in four natural language understanding tasks and seven datasets. Our experiments show that CAL performs consistently better or equal than the best performing baseline across all tasks, on both in-domain and out-of-domain data. We also conduct an extensive ablation study of our method and we further analyze all actively acquired datasets showing that CAL achieves a better trade-off between uncertainty and diversity compared to other strategies.

* Accepted at EMNLP 2021 
Viaarxiv icon

Frustratingly Simple Pretraining Alternatives to Masked Language Modeling

Sep 04, 2021
Atsuki Yamaguchi, George Chrysostomou, Katerina Margatina, Nikolaos Aletras

Figure 1 for Frustratingly Simple Pretraining Alternatives to Masked Language Modeling
Figure 2 for Frustratingly Simple Pretraining Alternatives to Masked Language Modeling
Figure 3 for Frustratingly Simple Pretraining Alternatives to Masked Language Modeling
Figure 4 for Frustratingly Simple Pretraining Alternatives to Masked Language Modeling

Masked language modeling (MLM), a self-supervised pretraining objective, is widely used in natural language processing for learning text representations. MLM trains a model to predict a random sample of input tokens that have been replaced by a [MASK] placeholder in a multi-class setting over the entire vocabulary. When pretraining, it is common to use alongside MLM other auxiliary objectives on the token or sequence level to improve downstream performance (e.g. next sentence prediction). However, no previous work so far has attempted in examining whether other simpler linguistically intuitive or not objectives can be used standalone as main pretraining objectives. In this paper, we explore five simple pretraining objectives based on token-level classification tasks as replacements of MLM. Empirical results on GLUE and SQuAD show that our proposed methods achieve comparable or better performance to MLM using a BERT-BASE architecture. We further validate our methods using smaller models, showing that pretraining a model with 41% of the BERT-BASE's parameters, BERT-MEDIUM results in only a 1% drop in GLUE scores with our best objective.

* Accepted at EMNLP 2021 
Viaarxiv icon

Bayesian Active Learning with Pretrained Language Models

Apr 16, 2021
Katerina Margatina, Loic Barrault, Nikolaos Aletras

Figure 1 for Bayesian Active Learning with Pretrained Language Models
Figure 2 for Bayesian Active Learning with Pretrained Language Models
Figure 3 for Bayesian Active Learning with Pretrained Language Models
Figure 4 for Bayesian Active Learning with Pretrained Language Models

Active Learning (AL) is a method to iteratively select data for annotation from a pool of unlabeled data, aiming to achieve better model performance than random selection. Previous AL approaches in Natural Language Processing (NLP) have been limited to either task-specific models that are trained from scratch at each iteration using only the labeled data at hand or using off-the-shelf pretrained language models (LMs) that are not adapted effectively to the downstream task. In this paper, we address these limitations by introducing BALM; Bayesian Active Learning with pretrained language Models. We first propose to adapt the pretrained LM to the downstream task by continuing training with all the available unlabeled data and then use it for AL. We also suggest a simple yet effective fine-tuning method to ensure that the adapted LM is properly trained in both low and high resource scenarios during AL. We finally apply Monte Carlo dropout to the downstream model to obtain well-calibrated confidence scores for data selection with uncertainty sampling. Our experiments in five standard natural language understanding tasks demonstrate that BALM provides substantial data efficiency improvements compared to various combinations of acquisition functions, models and fine-tuning methods proposed in recent AL literature.

Viaarxiv icon

Domain Adversarial Fine-Tuning as an Effective Regularizer

Oct 05, 2020
Giorgos Vernikos, Katerina Margatina, Alexandra Chronopoulou, Ion Androutsopoulos

Figure 1 for Domain Adversarial Fine-Tuning as an Effective Regularizer
Figure 2 for Domain Adversarial Fine-Tuning as an Effective Regularizer
Figure 3 for Domain Adversarial Fine-Tuning as an Effective Regularizer
Figure 4 for Domain Adversarial Fine-Tuning as an Effective Regularizer

In Natural Language Processing (NLP), pretrained language models (LMs) that are transferred to downstream tasks have been recently shown to achieve state-of-the-art results. However, standard fine-tuning can degrade the general-domain representations captured during pretraining. To address this issue, we introduce a new regularization technique, AFTER; domain Adversarial Fine-Tuning as an Effective Regularizer. Specifically, we complement the task-specific loss used during fine-tuning with an adversarial objective. This additional loss term is related to an adversarial classifier, that aims to discriminate between in-domain and out-of-domain text representations. In-domain refers to the labeled dataset of the task at hand while out-of-domain refers to unlabeled data from a different domain. Intuitively, the adversarial classifier acts as a regularizer which prevents the model from overfitting to the task-specific domain. Empirical results on various natural language understanding tasks show that AFTER leads to improved performance compared to standard fine-tuning.

* EMNLP 2020, Findings of EMNLP 
Viaarxiv icon