Alert button
Picture for Michael A. Hedderich

Michael A. Hedderich

Alert button

Task-Adaptive Pre-Training for Boosting Learning With Noisy Labels: A Study on Text Classification for African Languages

Jun 03, 2022
Dawei Zhu, Michael A. Hedderich, Fangzhou Zhai, David Ifeoluwa Adelani, Dietrich Klakow

Figure 1 for Task-Adaptive Pre-Training for Boosting Learning With Noisy Labels: A Study on Text Classification for African Languages
Figure 2 for Task-Adaptive Pre-Training for Boosting Learning With Noisy Labels: A Study on Text Classification for African Languages
Figure 3 for Task-Adaptive Pre-Training for Boosting Learning With Noisy Labels: A Study on Text Classification for African Languages

For high-resource languages like English, text classification is a well-studied task. The performance of modern NLP models easily achieves an accuracy of more than 90% in many standard datasets for text classification in English (Xie et al., 2019; Yang et al., 2019; Zaheer et al., 2020). However, text classification in low-resource languages is still challenging due to the lack of annotated data. Although methods like weak supervision and crowdsourcing can help ease the annotation bottleneck, the annotations obtained by these methods contain label noise. Models trained with label noise may not generalize well. To this end, a variety of noise-handling techniques have been proposed to alleviate the negative impact caused by the errors in the annotations (for extensive surveys see (Hedderich et al., 2021; Algan & Ulusoy, 2021)). In this work, we experiment with a group of standard noisy-handling methods on text classification tasks with noisy labels. We study both simulated noise and realistic noise induced by weak supervision. Moreover, we find task-adaptive pre-training techniques (Gururangan et al., 2020) are beneficial for learning with noisy labels.

* AfricaNLP Workshop @ ICLR2022 
Viaarxiv icon

Meta Self-Refinement for Robust Learning with Weak Supervision

May 15, 2022
Dawei Zhu, Xiaoyu Shen, Michael A. Hedderich, Dietrich Klakow

Figure 1 for Meta Self-Refinement for Robust Learning with Weak Supervision
Figure 2 for Meta Self-Refinement for Robust Learning with Weak Supervision
Figure 3 for Meta Self-Refinement for Robust Learning with Weak Supervision
Figure 4 for Meta Self-Refinement for Robust Learning with Weak Supervision

Training deep neural networks (DNNs) with weak supervision has been a hot topic as it can significantly reduce the annotation cost. However, labels from weak supervision can be rather noisy and the high capacity of DNNs makes them easy to overfit the noisy labels. Recent methods leverage self-training techniques to train noise-robust models, where a teacher trained on noisy labels is used to teach a student. However, the teacher from such models might fit a substantial amount of noise and produce wrong pseudo-labels with high confidence, leading to error propagation. In this work, we propose Meta Self-Refinement (MSR), a noise-resistant learning framework, to effectively combat noisy labels from weak supervision sources. Instead of purely relying on a fixed teacher trained on noisy labels, we keep updating the teacher to refine its pseudo-labels. At each training step, it performs a meta gradient descent on the current mini-batch to maximize the student performance on a clean validation set. Extensive experimentation on eight NLP benchmarks demonstrates that MSR is robust against noise in all settings and outperforms the state-of-the-art up to 11.4% in accuracy and 9.26% in F1 score.

Viaarxiv icon

MCSE: Multimodal Contrastive Learning of Sentence Embeddings

Apr 22, 2022
Miaoran Zhang, Marius Mosbach, David Ifeoluwa Adelani, Michael A. Hedderich, Dietrich Klakow

Figure 1 for MCSE: Multimodal Contrastive Learning of Sentence Embeddings
Figure 2 for MCSE: Multimodal Contrastive Learning of Sentence Embeddings
Figure 3 for MCSE: Multimodal Contrastive Learning of Sentence Embeddings
Figure 4 for MCSE: Multimodal Contrastive Learning of Sentence Embeddings

Learning semantically meaningful sentence embeddings is an open problem in natural language processing. In this work, we propose a sentence embedding learning approach that exploits both visual and textual information via a multimodal contrastive objective. Through experiments on a variety of semantic textual similarity tasks, we demonstrate that our approach consistently improves the performance across various datasets and pre-trained encoders. In particular, combining a small amount of multimodal data with a large text-only corpus, we improve the state-of-the-art average Spearman's correlation by 1.7%. By analyzing the properties of the textual embedding space, we show that our model excels in aligning semantically similar sentences, providing an explanation for its improved performance.

* Accepted by NAACL 2022 main conference (short paper), 11 pages 
Viaarxiv icon

Is BERT Robust to Label Noise? A Study on Learning with Noisy Labels in Text Classification

Apr 20, 2022
Dawei Zhu, Michael A. Hedderich, Fangzhou Zhai, David Ifeoluwa Adelani, Dietrich Klakow

Figure 1 for Is BERT Robust to Label Noise? A Study on Learning with Noisy Labels in Text Classification
Figure 2 for Is BERT Robust to Label Noise? A Study on Learning with Noisy Labels in Text Classification
Figure 3 for Is BERT Robust to Label Noise? A Study on Learning with Noisy Labels in Text Classification
Figure 4 for Is BERT Robust to Label Noise? A Study on Learning with Noisy Labels in Text Classification

Incorrect labels in training data occur when human annotators make mistakes or when the data is generated via weak or distant supervision. It has been shown that complex noise-handling techniques - by modeling, cleaning or filtering the noisy instances - are required to prevent models from fitting this label noise. However, we show in this work that, for text classification tasks with modern NLP models like BERT, over a variety of noise types, existing noisehandling methods do not always improve its performance, and may even deteriorate it, suggesting the need for further investigation. We also back our observations with a comprehensive analysis.

* Accepted at Workshop on Insights from Negative Results in NLP 2022 @ACL 2022 
Viaarxiv icon

Proceedings of the First Workshop on Weakly Supervised Learning (WeaSuL)

Jul 08, 2021
Michael A. Hedderich, Benjamin Roth, Katharina Kann, Barbara Plank, Alex Ratner, Dietrich Klakow

Figure 1 for Proceedings of the First Workshop on Weakly Supervised Learning (WeaSuL)
Figure 2 for Proceedings of the First Workshop on Weakly Supervised Learning (WeaSuL)
Figure 3 for Proceedings of the First Workshop on Weakly Supervised Learning (WeaSuL)
Figure 4 for Proceedings of the First Workshop on Weakly Supervised Learning (WeaSuL)

Welcome to WeaSuL 2021, the First Workshop on Weakly Supervised Learning, co-located with ICLR 2021. In this workshop, we want to advance theory, methods and tools for allowing experts to express prior coded knowledge for automatic data annotations that can be used to train arbitrary deep neural networks for prediction. The ICLR 2021 Workshop on Weak Supervision aims at advancing methods that help modern machine-learning methods to generalize from knowledge provided by experts, in interaction with observable (unlabeled) data. In total, 15 papers were accepted. All the accepted contributions are listed in these Proceedings.

Viaarxiv icon

ANEA: Distant Supervision for Low-Resource Named Entity Recognition

Feb 25, 2021
Michael A. Hedderich, Lukas Lange, Dietrich Klakow

Figure 1 for ANEA: Distant Supervision for Low-Resource Named Entity Recognition
Figure 2 for ANEA: Distant Supervision for Low-Resource Named Entity Recognition

Distant supervision allows obtaining labeled training corpora for low-resource settings where only limited hand-annotated data exists. However, to be used effectively, the distant supervision must be easy to obtain. In this work, we present ANEA, a tool to automatically annotate named entities in text based on entity lists. It spans the whole pipeline from obtaining the lists to analyzing the errors of the distant supervision. A tuning step allows the user to improve the automatic annotation with their linguistic insights without having to manually label or check all tokens. In six low-resource scenarios, we show that the F1-score can be increased by on average 18 points through distantly supervised data obtained by ANEA.

Viaarxiv icon

Analysing the Noise Model Error for Realistic Noisy Label Data

Jan 24, 2021
Michael A. Hedderich, Dawei Zhu, Dietrich Klakow

Figure 1 for Analysing the Noise Model Error for Realistic Noisy Label Data
Figure 2 for Analysing the Noise Model Error for Realistic Noisy Label Data
Figure 3 for Analysing the Noise Model Error for Realistic Noisy Label Data
Figure 4 for Analysing the Noise Model Error for Realistic Noisy Label Data

Distant and weak supervision allow to obtain large amounts of labeled training data quickly and cheaply, but these automatic annotations tend to contain a high amount of errors. A popular technique to overcome the negative effects of these noisy labels is noise modelling where the underlying noise process is modelled. In this work, we study the quality of these estimated noise models from the theoretical side by deriving the expected error of the noise model. Apart from evaluating the theoretical results on commonly used synthetic noise, we also publish NoisyNER, a new noisy label dataset from the NLP domain that was obtained through a realistic distant supervision technique. It provides seven sets of labels with differing noise patterns to evaluate different noise levels on the same instances. Parallel, clean labels are available making it possible to study scenarios where a small amount of gold-standard data can be leveraged. Our theoretical results and the corresponding experiments give insights into the factors that influence the noise model estimation like the noise distribution and the sampling technique.

* Accepted at AAAI 2021, additional material at https://github.com/uds-lsv/noise-estimation 
Viaarxiv icon

A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios

Oct 23, 2020
Michael A. Hedderich, Lukas Lange, Heike Adel, Jannik Strötgen, Dietrich Klakow

Figure 1 for A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios
Figure 2 for A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios
Figure 3 for A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios
Figure 4 for A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios

Current developments in natural language processing offer challenges and opportunities for low-resource languages and domains. Deep neural networks are known for requiring large amounts of training data which might not be available in resource-lean scenarios. However, there is also a growing body of works to improve the performance in low-resource settings. Motivated by fundamental changes towards neural models and the currently popular pre-train and fine-tune paradigm, we give an overview of promising approaches for low-resource natural language processing. After a discussion about the definition of low-resource scenarios and the different dimensions of data availability, we then examine methods that enable learning when training data is sparse. This includes mechanisms to create additional labeled data like data augmentation and distant supervision as well as transfer learning settings that reduce the need for target supervision. The survey closes with a brief look into methods suggested in non-NLP machine learning communities, which might be beneficial for NLP in low-resource scenarios

Viaarxiv icon

Transfer Learning and Distant Supervision for Multilingual Transformer Models: A Study on African Languages

Oct 07, 2020
Michael A. Hedderich, David Adelani, Dawei Zhu, Jesujoba Alabi, Udia Markus, Dietrich Klakow

Figure 1 for Transfer Learning and Distant Supervision for Multilingual Transformer Models: A Study on African Languages
Figure 2 for Transfer Learning and Distant Supervision for Multilingual Transformer Models: A Study on African Languages
Figure 3 for Transfer Learning and Distant Supervision for Multilingual Transformer Models: A Study on African Languages
Figure 4 for Transfer Learning and Distant Supervision for Multilingual Transformer Models: A Study on African Languages

Multilingual transformer models like mBERT and XLM-RoBERTa have obtained great improvements for many NLP tasks on a variety of languages. However, recent works also showed that results from high-resource languages could not be easily transferred to realistic, low-resource scenarios. In this work, we study trends in performance for different amounts of available resources for the three African languages Hausa, isiXhosa and Yor\`ub\'a on both NER and topic classification. We show that in combination with transfer learning or distant supervision, these models can achieve with as little as 10 or 100 labeled sentences the same performance as baselines with much more supervised training data. However, we also find settings where this does not hold. Our discussions and additional experiments on assumptions such as time and hardware restrictions highlight challenges and opportunities in low-resource learning.

* Accepted at EMNLP'20 
Viaarxiv icon

On the Interplay Between Fine-tuning and Sentence-level Probing for Linguistic Knowledge in Pre-trained Transformers

Oct 06, 2020
Marius Mosbach, Anna Khokhlova, Michael A. Hedderich, Dietrich Klakow

Figure 1 for On the Interplay Between Fine-tuning and Sentence-level Probing for Linguistic Knowledge in Pre-trained Transformers
Figure 2 for On the Interplay Between Fine-tuning and Sentence-level Probing for Linguistic Knowledge in Pre-trained Transformers
Figure 3 for On the Interplay Between Fine-tuning and Sentence-level Probing for Linguistic Knowledge in Pre-trained Transformers
Figure 4 for On the Interplay Between Fine-tuning and Sentence-level Probing for Linguistic Knowledge in Pre-trained Transformers

Fine-tuning pre-trained contextualized embedding models has become an integral part of the NLP pipeline. At the same time, probing has emerged as a way to investigate the linguistic knowledge captured by pre-trained models. Very little is, however, understood about how fine-tuning affects the representations of pre-trained models and thereby the linguistic knowledge they encode. This paper contributes towards closing this gap. We study three different pre-trained models: BERT, RoBERTa, and ALBERT, and investigate through sentence-level probing how fine-tuning affects their representations. We find that for some probing tasks fine-tuning leads to substantial changes in accuracy, possibly suggesting that fine-tuning introduces or even removes linguistic knowledge from a pre-trained model. These changes, however, vary greatly across different models, fine-tuning and probing tasks. Our analysis reveals that while fine-tuning indeed changes the representations of a pre-trained model and these changes are typically larger for higher layers, only in very few cases, fine-tuning has a positive effect on probing accuracy that is larger than just using the pre-trained model with a strong pooling method. Based on our findings, we argue that both positive and negative effects of fine-tuning on probing require a careful interpretation.

* Accepted at Findings of EMNLP 2020 and BlackboxNLP 2020 
Viaarxiv icon