Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Nikolaos Aletras

On the Limitations of Simulating Active Learning

May 21, 2023

Katerina Margatina, Nikolaos Aletras

Figure 1 for On the Limitations of Simulating Active Learning

Figure 2 for On the Limitations of Simulating Active Learning

Abstract:Active learning (AL) is a human-and-model-in-the-loop paradigm that iteratively selects informative unlabeled data for human annotation, aiming to improve over random sampling. However, performing AL experiments with human annotations on-the-fly is a laborious and expensive process, thus unrealistic for academic research. An easy fix to this impediment is to simulate AL, by treating an already labeled and publicly available dataset as the pool of unlabeled data. In this position paper, we first survey recent literature and highlight the challenges across all different steps within the AL loop. We further unveil neglected caveats in the experimental setup that can significantly affect the quality of AL research. We continue with an exploration of how the simulation setting can govern empirical findings, arguing that it might be one of the answers behind the ever posed question ``why do active learning algorithms sometimes fail to outperform random sampling?''. We argue that evaluating AL algorithms on available labeled datasets might provide a lower bound as to their effectiveness in real data. We believe it is essential to collectively shape the best practices for AL research, particularly as engineering advancements in LLMs push the research focus towards data-driven approaches (e.g., data efficiency, alignment, fairness). In light of this, we have developed guidelines for future work. Our aim is to draw attention to these limitations within the community, in the hope of finding ways to address them.

* To appear at Findings of ACL 2023

Via

Access Paper or Ask Questions

Trading Syntax Trees for Wordpieces: Target-oriented Opinion Words Extraction with Wordpieces and Aspect Enhancement

May 18, 2023

Samuel Mensah, Kai Sun, Nikolaos Aletras

Figure 1 for Trading Syntax Trees for Wordpieces: Target-oriented Opinion Words Extraction with Wordpieces and Aspect Enhancement

Figure 2 for Trading Syntax Trees for Wordpieces: Target-oriented Opinion Words Extraction with Wordpieces and Aspect Enhancement

Figure 3 for Trading Syntax Trees for Wordpieces: Target-oriented Opinion Words Extraction with Wordpieces and Aspect Enhancement

Figure 4 for Trading Syntax Trees for Wordpieces: Target-oriented Opinion Words Extraction with Wordpieces and Aspect Enhancement

Abstract:State-of-the-art target-oriented opinion word extraction (TOWE) models typically use BERT-based text encoders that operate on the word level, along with graph convolutional networks (GCNs) that incorporate syntactic information extracted from syntax trees. These methods achieve limited gains with GCNs and have difficulty using BERT wordpieces. Meanwhile, BERT wordpieces are known to be effective at representing rare words or words with insufficient context information. To address this issue, this work trades syntax trees for BERT wordpieces by entirely removing the GCN component from the methods' architectures. To enhance TOWE performance, we tackle the issue of aspect representation loss during encoding. Instead of solely utilizing a sentence as the input, we use a sentence-aspect pair. Our relatively simple approach achieves state-of-the-art results on benchmark datasets and should serve as a strong baseline for further research.

* Accepted at ACL 2023

Via

Access Paper or Ask Questions

Incorporating Attribution Importance for Improving Faithfulness Metrics

May 17, 2023

Zhixue Zhao, Nikolaos Aletras

Figure 1 for Incorporating Attribution Importance for Improving Faithfulness Metrics

Figure 2 for Incorporating Attribution Importance for Improving Faithfulness Metrics

Figure 3 for Incorporating Attribution Importance for Improving Faithfulness Metrics

Figure 4 for Incorporating Attribution Importance for Improving Faithfulness Metrics

Abstract:Feature attribution methods (FAs) are popular approaches for providing insights into the model reasoning process of making predictions. The more faithful a FA is, the more accurately it reflects which parts of the input are more important for the prediction. Widely used faithfulness metrics, such as sufficiency and comprehensiveness use a hard erasure criterion, i.e. entirely removing or retaining the top most important tokens ranked by a given FA and observing the changes in predictive likelihood. However, this hard criterion ignores the importance of each individual token, treating them all equally for computing sufficiency and comprehensiveness. In this paper, we propose a simple yet effective soft erasure criterion. Instead of entirely removing or retaining tokens from the input, we randomly mask parts of the token vector representations proportionately to their FA importance. Extensive experiments across various natural language processing tasks and different FAs show that our soft-sufficiency and soft-comprehensiveness metrics consistently prefer more faithful explanations compared to hard sufficiency and comprehensiveness. Our code: https://github.com/casszhao/SoftFaith

* Accepted at ACL2023

Via

Access Paper or Ask Questions

Self-training through Classifier Disagreement for Cross-Domain Opinion Target Extraction

Feb 28, 2023

Kai Sun, Richong Zhang, Samuel Mensah, Nikolaos Aletras, Yongyi Mao, Xudong Liu

Abstract:Opinion target extraction (OTE) or aspect extraction (AE) is a fundamental task in opinion mining that aims to extract the targets (or aspects) on which opinions have been expressed. Recent work focus on cross-domain OTE, which is typically encountered in real-world scenarios, where the testing and training distributions differ. Most methods use domain adversarial neural networks that aim to reduce the domain gap between the labelled source and unlabelled target domains to improve target domain performance. However, this approach only aligns feature distributions and does not account for class-wise feature alignment, leading to suboptimal results. Semi-supervised learning (SSL) has been explored as a solution, but is limited by the quality of pseudo-labels generated by the model. Inspired by the theoretical foundations in domain adaptation [2], we propose a new SSL approach that opts for selecting target samples whose model output from a domain-specific teacher and student network disagree on the unlabelled target data, in an effort to boost the target domain performance. Extensive experiments on benchmark cross-domain OTE datasets show that this approach is effective and performs consistently well in settings with large domain shifts.

* Accepted at TheWebConf 2023

Via

Access Paper or Ask Questions

It's about Time: Rethinking Evaluation on Rumor Detection Benchmarks using Chronological Splits

Feb 06, 2023

Yida Mu, Kalina Bontcheva, Nikolaos Aletras

Abstract:New events emerge over time influencing the topics of rumors in social media. Current rumor detection benchmarks use random splits as training, development and test sets which typically results in topical overlaps. Consequently, models trained on random splits may not perform well on rumor classification on previously unseen topics due to the temporal concept drift. In this paper, we provide a re-evaluation of classification models on four popular rumor detection benchmarks considering chronological instead of random splits. Our experimental results show that the use of random splits can significantly overestimate predictive performance across all datasets and models. Therefore, we suggest that rumor detection models should always be evaluated using chronological splits for minimizing topical overlaps.

* Accepted at EACL 2023 Findings

Via

Access Paper or Ask Questions

On the Impact of Temporal Concept Drift on Model Explanations

Oct 17, 2022

Zhixue Zhao, George Chrysostomou, Kalina Bontcheva, Nikolaos Aletras

Figure 1 for On the Impact of Temporal Concept Drift on Model Explanations

Figure 2 for On the Impact of Temporal Concept Drift on Model Explanations

Figure 3 for On the Impact of Temporal Concept Drift on Model Explanations

Figure 4 for On the Impact of Temporal Concept Drift on Model Explanations

Abstract:Explanation faithfulness of model predictions in natural language processing is typically evaluated on held-out data from the same temporal distribution as the training data (i.e. synchronous settings). While model performance often deteriorates due to temporal variation (i.e. temporal concept drift), it is currently unknown how explanation faithfulness is impacted when the time span of the target data is different from the data used to train the model (i.e. asynchronous settings). For this purpose, we examine the impact of temporal variation on model explanations extracted by eight feature attribution methods and three select-then-predict models across six text classification tasks. Our experiments show that (i)faithfulness is not consistent under temporal variations across feature attribution methods (e.g. it decreases or increases depending on the method), with an attention-based method demonstrating the most robust faithfulness scores across datasets; and (ii) select-then-predict models are mostly robust in asynchronous settings with only small degradation in predictive performance. Finally, feature attribution methods show conflicting behavior when used in FRESH (i.e. a select-and-predict model) and for measuring sufficiency/comprehensiveness (i.e. as post-hoc methods), suggesting that we need more robust metrics to evaluate post-hoc explanation faithfulness.

* Accepted at EMNLP Findings 2022

Via

Access Paper or Ask Questions

HashFormers: Towards Vocabulary-independent Pre-trained Transformers

Oct 14, 2022

Huiyin Xue, Nikolaos Aletras

Figure 1 for HashFormers: Towards Vocabulary-independent Pre-trained Transformers

Figure 2 for HashFormers: Towards Vocabulary-independent Pre-trained Transformers

Figure 3 for HashFormers: Towards Vocabulary-independent Pre-trained Transformers

Figure 4 for HashFormers: Towards Vocabulary-independent Pre-trained Transformers

Abstract:Transformer-based pre-trained language models are vocabulary-dependent, mapping by default each token to its corresponding embedding. This one-to-one mapping results into embedding matrices that occupy a lot of memory (i.e. millions of parameters) and grow linearly with the size of the vocabulary. Previous work on on-device transformers dynamically generate token embeddings on-the-fly without embedding matrices using locality-sensitive hashing over morphological information. These embeddings are subsequently fed into transformer layers for text classification. However, these methods are not pre-trained. Inspired by this line of work, we propose HashFormers, a new family of vocabulary-independent pre-trained transformers that support an unlimited vocabulary (i.e. all possible tokens in a corpus) given a substantially smaller fixed-sized embedding matrix. We achieve this by first introducing computationally cheap hashing functions that bucket together individual tokens to embeddings. We also propose three variants that do not require an embedding matrix at all, further reducing the memory requirements. We empirically demonstrate that HashFormers are more memory efficient compared to standard pre-trained transformers while achieving comparable predictive performance when fine-tuned on multiple text classification tasks. For example, our most efficient HashFormer variant has a negligible performance degradation (0.4\% on GLUE) using only 99.1K parameters for representing the embeddings compared to 12.3-38M parameters of state-of-the-art models.

* Accepted at EMNLP 2022

Via

Access Paper or Ask Questions

Improving Graph-Based Text Representations with Character and Word Level N-grams

Oct 12, 2022

Wenzhe Li, Nikolaos Aletras

Figure 1 for Improving Graph-Based Text Representations with Character and Word Level N-grams

Figure 2 for Improving Graph-Based Text Representations with Character and Word Level N-grams

Figure 3 for Improving Graph-Based Text Representations with Character and Word Level N-grams

Figure 4 for Improving Graph-Based Text Representations with Character and Word Level N-grams

Abstract:Graph-based text representation focuses on how text documents are represented as graphs for exploiting dependency information between tokens and documents within a corpus. Despite the increasing interest in graph representation learning, there is limited research in exploring new ways for graph-based text representation, which is important in downstream natural language processing tasks. In this paper, we first propose a new heterogeneous word-character text graph that combines word and character n-gram nodes together with document nodes, allowing us to better learn dependencies among these entities. Additionally, we propose two new graph-based neural models, WCTextGCN and WCTextGAT, for modeling our proposed text graph. Extensive experiments in text classification and automatic text summarization benchmarks demonstrate that our proposed models consistently outperform competitive baselines and state-of-the-art graph-based models.

Via

Access Paper or Ask Questions

Domain Classification-based Source-specific Term Penalization for Domain Adaptation in Hate-speech Detection

Sep 18, 2022

Tulika Bose, Nikolaos Aletras, Irina Illina, Dominique Fohr

Figure 1 for Domain Classification-based Source-specific Term Penalization for Domain Adaptation in Hate-speech Detection

Figure 2 for Domain Classification-based Source-specific Term Penalization for Domain Adaptation in Hate-speech Detection

Figure 3 for Domain Classification-based Source-specific Term Penalization for Domain Adaptation in Hate-speech Detection

Figure 4 for Domain Classification-based Source-specific Term Penalization for Domain Adaptation in Hate-speech Detection

Abstract:State-of-the-art approaches for hate-speech detection usually exhibit poor performance in out-of-domain settings. This occurs, typically, due to classifiers overemphasizing source-specific information that negatively impacts its domain invariance. Prior work has attempted to penalize terms related to hate-speech from manually curated lists using feature attribution methods, which quantify the importance assigned to input terms by the classifier when making a prediction. We, instead, propose a domain adaptation approach that automatically extracts and penalizes source-specific terms using a domain classifier, which learns to differentiate between domains, and feature-attribution scores for hate-speech classes, yielding consistent improvements in cross-domain evaluation.

* COLING 2022 pre-print

Via

Access Paper or Ask Questions

Combining Humor and Sarcasm for Improving Political Parody Detection

May 06, 2022

Xiao Ao, Danae Sánchez Villegas, Daniel Preoţiuc-Pietro, Nikolaos Aletras

Figure 1 for Combining Humor and Sarcasm for Improving Political Parody Detection

Figure 2 for Combining Humor and Sarcasm for Improving Political Parody Detection

Figure 3 for Combining Humor and Sarcasm for Improving Political Parody Detection

Figure 4 for Combining Humor and Sarcasm for Improving Political Parody Detection

Abstract:Parody is a figurative device used for mimicking entities for comedic or critical purposes. Parody is intentionally humorous and often involves sarcasm. This paper explores jointly modelling these figurative tropes with the goal of improving performance of political parody detection in tweets. To this end, we present a multi-encoder model that combines three parallel encoders to enrich parody-specific representations with humor and sarcasm information. Experiments on a publicly available data set of political parody tweets demonstrate that our approach outperforms previous state-of-the-art methods.

* Accepted at NAACL 2022

Via

Access Paper or Ask Questions