Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Nikolaos Aletras

Identifying and Characterizing Active Citizens who Refute Misinformation in Social Media

Apr 21, 2022

Yida Mu, Pu Niu, Nikolaos Aletras

Figure 1 for Identifying and Characterizing Active Citizens who Refute Misinformation in Social Media

Figure 2 for Identifying and Characterizing Active Citizens who Refute Misinformation in Social Media

Figure 3 for Identifying and Characterizing Active Citizens who Refute Misinformation in Social Media

Figure 4 for Identifying and Characterizing Active Citizens who Refute Misinformation in Social Media

Abstract:The phenomenon of misinformation spreading in social media has developed a new form of active citizens who focus on tackling the problem by refuting posts that might contain misinformation. Automatically identifying and characterizing the behavior of such active citizens in social media is an important task in computational social science for complementing studies in misinformation analysis. In this paper, we study this task across different social media platforms (i.e., Twitter and Weibo) and languages (i.e., English and Chinese) for the first time. To this end, (1) we develop and make publicly available a new dataset of Weibo users mapped into one of the two categories (i.e., misinformation posters or active citizens); (2) we evaluate a battery of supervised models on our new Weibo dataset and an existing Twitter dataset which we repurpose for the task; and (3) we present an extensive analysis of the differences in language use between the two user categories.

* Accepted at ACM WebSci 2022

Via

Access Paper or Ask Questions

A Hierarchical N-Gram Framework for Zero-Shot Link Prediction

Apr 16, 2022

Mingchen Li, Junfan Chen, Samuel Mensah, Nikolaos Aletras, Xiulong Yang, Yang Ye

Figure 1 for A Hierarchical N-Gram Framework for Zero-Shot Link Prediction

Figure 2 for A Hierarchical N-Gram Framework for Zero-Shot Link Prediction

Figure 3 for A Hierarchical N-Gram Framework for Zero-Shot Link Prediction

Figure 4 for A Hierarchical N-Gram Framework for Zero-Shot Link Prediction

Abstract:Due to the incompleteness of knowledge graphs (KGs), zero-shot link prediction (ZSLP) which aims to predict unobserved relations in KGs has attracted recent interest from researchers. A common solution is to use textual features of relations (e.g., surface name or textual descriptions) as auxiliary information to bridge the gap between seen and unseen relations. Current approaches learn an embedding for each word token in the text. These methods lack robustness as they suffer from the out-of-vocabulary (OOV) problem. Meanwhile, models built on character n-grams have the capability of generating expressive representations for OOV words. Thus, in this paper, we propose a Hierarchical N-Gram framework for Zero-Shot Link Prediction (HNZSLP), which considers the dependencies among character n-grams of the relation surface name for ZSLP. Our approach works by first constructing a hierarchical n-gram graph on the surface name to model the organizational structure of n-grams that leads to the surface name. A GramTransformer, based on the Transformer is then presented to model the hierarchical n-gram graph to construct the relation embedding for ZSLP. Experimental results show the proposed HNZSLP achieved state-of-the-art performance on two ZSLP datasets.

* under review

Via

Access Paper or Ask Questions

Dynamically Refined Regularization for Improving Cross-corpora Hate Speech Detection

Mar 23, 2022

Tulika Bose, Nikolaos Aletras, Irina Illina, Dominique Fohr

Figure 1 for Dynamically Refined Regularization for Improving Cross-corpora Hate Speech Detection

Figure 2 for Dynamically Refined Regularization for Improving Cross-corpora Hate Speech Detection

Figure 3 for Dynamically Refined Regularization for Improving Cross-corpora Hate Speech Detection

Figure 4 for Dynamically Refined Regularization for Improving Cross-corpora Hate Speech Detection

Abstract:Hate speech classifiers exhibit substantial performance degradation when evaluated on datasets different from the source. This is due to learning spurious correlations between words that are not necessarily relevant to hateful language, and hate speech labels from the training corpus. Previous work has attempted to mitigate this problem by regularizing specific terms from pre-defined static dictionaries. While this has been demonstrated to improve the generalizability of classifiers, the coverage of such methods is limited and the dictionaries require regular manual updates from human experts. In this paper, we propose to automatically identify and reduce spurious correlations using attribution methods with dynamic refinement of the list of terms that need to be regularized during training. Our approach is flexible and improves the cross-corpora performance over previous work independently and in combination with pre-defined dictionaries.

* Findings of ACL 2022 preprint

Via

Access Paper or Ask Questions

How does the pre-training objective affect what large language models learn about linguistic properties?

Mar 20, 2022

Ahmed Alajrami, Nikolaos Aletras

Figure 1 for How does the pre-training objective affect what large language models learn about linguistic properties?

Figure 2 for How does the pre-training objective affect what large language models learn about linguistic properties?

Figure 3 for How does the pre-training objective affect what large language models learn about linguistic properties?

Figure 4 for How does the pre-training objective affect what large language models learn about linguistic properties?

Abstract:Several pre-training objectives, such as masked language modeling (MLM), have been proposed to pre-train language models (e.g. BERT) with the aim of learning better language representations. However, to the best of our knowledge, no previous work so far has investigated how different pre-training objectives affect what BERT learns about linguistics properties. We hypothesize that linguistically motivated objectives such as MLM should help BERT to acquire better linguistic knowledge compared to other non-linguistically motivated objectives that are not intuitive or hard for humans to guess the association between the input and the label to be predicted. To this end, we pre-train BERT with two linguistically motivated objectives and three non-linguistically motivated ones. We then probe for linguistic characteristics encoded in the representation of the resulting models. We find strong evidence that there are only small differences in probing performance between the representations learned by the two different types of objectives. These surprising results question the dominant narrative of linguistically informed pre-training.

* Accepted at ACL 2022

Via

Access Paper or Ask Questions

Automatic Identification and Classification of Bragging in Social Media

Mar 11, 2022

Mali Jin, Daniel Preoţiuc-Pietro, A. Seza Doğruöz, Nikolaos Aletras

Figure 1 for Automatic Identification and Classification of Bragging in Social Media

Figure 2 for Automatic Identification and Classification of Bragging in Social Media

Figure 3 for Automatic Identification and Classification of Bragging in Social Media

Figure 4 for Automatic Identification and Classification of Bragging in Social Media

Abstract:Bragging is a speech act employed with the goal of constructing a favorable self-image through positive statements about oneself. It is widespread in daily communication and especially popular in social media, where users aim to build a positive image of their persona directly or indirectly. In this paper, we present the first large scale study of bragging in computational linguistics, building on previous research in linguistics and pragmatics. To facilitate this, we introduce a new publicly available data set of tweets annotated for bragging and their types. We empirically evaluate different transformer-based models injected with linguistic information in (a) binary bragging classification, i.e., if tweets contain bragging statements or not; and (b) multi-class bragging type prediction including not bragging. Our results show that our models can predict bragging with macro F1 up to 72.42 and 35.95 in the binary and multi-class classification tasks respectively. Finally, we present an extensive linguistic and error analysis of bragging prediction to guide future research on this topic.

* Accepted at ACL 2022

Via

Access Paper or Ask Questions

An Empirical Study on Explanations in Out-of-Domain Settings

Feb 28, 2022

George Chrysostomou, Nikolaos Aletras

Figure 1 for An Empirical Study on Explanations in Out-of-Domain Settings

Figure 2 for An Empirical Study on Explanations in Out-of-Domain Settings

Figure 3 for An Empirical Study on Explanations in Out-of-Domain Settings

Figure 4 for An Empirical Study on Explanations in Out-of-Domain Settings

Abstract:Recent work in Natural Language Processing has focused on developing approaches that extract faithful explanations, either via identifying the most important tokens in the input (i.e. post-hoc explanations) or by designing inherently faithful models that first select the most important tokens and then use them to predict the correct label (i.e. select-then-predict models). Currently, these approaches are largely evaluated on in-domain settings. Yet, little is known about how post-hoc explanations and inherently faithful models perform in out-of-domain settings. In this paper, we conduct an extensive empirical study that examines: (1) the out-of-domain faithfulness of post-hoc explanations, generated by five feature attribution methods; and (2) the out-of-domain performance of two inherently faithful models over six datasets. Contrary to our expectations, results show that in many cases out-of-domain post-hoc explanation faithfulness measured by sufficiency and comprehensiveness is higher compared to in-domain. We find this misleading and suggest using a random baseline as a yardstick for evaluating post-hoc explanation faithfulness. Our findings also show that select-then predict models demonstrate comparable predictive performance in out-of-domain settings to full-text trained models.

* ACL2022 Pre-print

Via

Access Paper or Ask Questions

LexGLUE: A Benchmark Dataset for Legal Language Understanding in English

Oct 13, 2021

Ilias Chalkidis, Abhik Jana, Dirk Hartung, Michael Bommarito, Ion Androutsopoulos, Daniel Martin Katz, Nikolaos Aletras

Figure 1 for LexGLUE: A Benchmark Dataset for Legal Language Understanding in English

Figure 2 for LexGLUE: A Benchmark Dataset for Legal Language Understanding in English

Figure 3 for LexGLUE: A Benchmark Dataset for Legal Language Understanding in English

Figure 4 for LexGLUE: A Benchmark Dataset for Legal Language Understanding in English

Abstract:Law, interpretations of law, legal arguments, agreements, etc. are typically expressed in writing, leading to the production of vast corpora of legal text. Their analysis, which is at the center of legal practice, becomes increasingly elaborate as these collections grow in size. Natural language understanding (NLU) technologies can be a valuable tool to support legal practitioners in these endeavors. Their usefulness, however, largely depends on whether current state-of-the-art models can generalize across various tasks in the legal domain. To answer this currently open question, we introduce the Legal General Language Understanding Evaluation (LexGLUE) benchmark, a collection of datasets for evaluating model performance across a diverse set of legal NLU tasks in a standardized way. We also provide an evaluation and analysis of several generic and legal-oriented models demonstrating that the latter consistently offer performance improvements across multiple tasks.

* 9 pages, LexGLUE benchmark is available at: https://huggingface.co/datasets/lex_glue. Code is available at: https://github.com/coastalcph/lex-glue

Via

Access Paper or Ask Questions

Active Learning by Acquiring Contrastive Examples

Sep 08, 2021

Katerina Margatina, Giorgos Vernikos, Loïc Barrault, Nikolaos Aletras

Figure 1 for Active Learning by Acquiring Contrastive Examples

Figure 2 for Active Learning by Acquiring Contrastive Examples

Figure 3 for Active Learning by Acquiring Contrastive Examples

Figure 4 for Active Learning by Acquiring Contrastive Examples

Abstract:Common acquisition functions for active learning use either uncertainty or diversity sampling, aiming to select difficult and diverse data points from the pool of unlabeled data, respectively. In this work, leveraging the best of both worlds, we propose an acquisition function that opts for selecting \textit{contrastive examples}, i.e. data points that are similar in the model feature space and yet the model outputs maximally different predictive likelihoods. We compare our approach, CAL (Contrastive Active Learning), with a diverse set of acquisition functions in four natural language understanding tasks and seven datasets. Our experiments show that CAL performs consistently better or equal than the best performing baseline across all tasks, on both in-domain and out-of-domain data. We also conduct an extensive ablation study of our method and we further analyze all actively acquired datasets showing that CAL achieves a better trade-off between uncertainty and diversity compared to other strategies.

* Accepted at EMNLP 2021

Via

Access Paper or Ask Questions

Frustratingly Simple Pretraining Alternatives to Masked Language Modeling

Sep 04, 2021

Atsuki Yamaguchi, George Chrysostomou, Katerina Margatina, Nikolaos Aletras

Figure 1 for Frustratingly Simple Pretraining Alternatives to Masked Language Modeling

Figure 2 for Frustratingly Simple Pretraining Alternatives to Masked Language Modeling

Figure 3 for Frustratingly Simple Pretraining Alternatives to Masked Language Modeling

Figure 4 for Frustratingly Simple Pretraining Alternatives to Masked Language Modeling

Abstract:Masked language modeling (MLM), a self-supervised pretraining objective, is widely used in natural language processing for learning text representations. MLM trains a model to predict a random sample of input tokens that have been replaced by a [MASK] placeholder in a multi-class setting over the entire vocabulary. When pretraining, it is common to use alongside MLM other auxiliary objectives on the token or sequence level to improve downstream performance (e.g. next sentence prediction). However, no previous work so far has attempted in examining whether other simpler linguistically intuitive or not objectives can be used standalone as main pretraining objectives. In this paper, we explore five simple pretraining objectives based on token-level classification tasks as replacements of MLM. Empirical results on GLUE and SQuAD show that our proposed methods achieve comparable or better performance to MLM using a BERT-BASE architecture. We further validate our methods using smaller models, showing that pretraining a model with 41% of the BERT-BASE's parameters, BERT-MEDIUM results in only a 1% drop in GLUE scores with our best objective.

* Accepted at EMNLP 2021

Via

Access Paper or Ask Questions

An Empirical Study on Leveraging Position Embeddings for Target-oriented Opinion Words Extraction

Sep 02, 2021

Samuel Mensah, Kai Sun, Nikolaos Aletras

Figure 1 for An Empirical Study on Leveraging Position Embeddings for Target-oriented Opinion Words Extraction

Figure 2 for An Empirical Study on Leveraging Position Embeddings for Target-oriented Opinion Words Extraction

Figure 3 for An Empirical Study on Leveraging Position Embeddings for Target-oriented Opinion Words Extraction

Figure 4 for An Empirical Study on Leveraging Position Embeddings for Target-oriented Opinion Words Extraction

Abstract:Target-oriented opinion words extraction (TOWE) (Fan et al., 2019b) is a new subtask of target-oriented sentiment analysis that aims to extract opinion words for a given aspect in text. Current state-of-the-art methods leverage position embeddings to capture the relative position of a word to the target. However, the performance of these methods depends on the ability to incorporate this information into word representations. In this paper, we explore a variety of text encoders based on pretrained word embeddings or language models that leverage part-of-speech and position embeddings, aiming to examine the actual contribution of each component in TOWE. We also adapt a graph convolutional network (GCN) to enhance word representations by incorporating syntactic information. Our experimental results demonstrate that BiLSTM-based models can effectively encode position information into word representations while using a GCN only achieves marginal gains. Interestingly, our simple methods outperform several state-of-the-art complex neural structures.

* Accepted at EMNLP 2021

Via

Access Paper or Ask Questions