Alert button
Picture for George Chrysostomou

George Chrysostomou

Alert button

On the Impact of Temporal Concept Drift on Model Explanations

Oct 17, 2022
Zhixue Zhao, George Chrysostomou, Kalina Bontcheva, Nikolaos Aletras

Figure 1 for On the Impact of Temporal Concept Drift on Model Explanations
Figure 2 for On the Impact of Temporal Concept Drift on Model Explanations
Figure 3 for On the Impact of Temporal Concept Drift on Model Explanations
Figure 4 for On the Impact of Temporal Concept Drift on Model Explanations

Explanation faithfulness of model predictions in natural language processing is typically evaluated on held-out data from the same temporal distribution as the training data (i.e. synchronous settings). While model performance often deteriorates due to temporal variation (i.e. temporal concept drift), it is currently unknown how explanation faithfulness is impacted when the time span of the target data is different from the data used to train the model (i.e. asynchronous settings). For this purpose, we examine the impact of temporal variation on model explanations extracted by eight feature attribution methods and three select-then-predict models across six text classification tasks. Our experiments show that (i)faithfulness is not consistent under temporal variations across feature attribution methods (e.g. it decreases or increases depending on the method), with an attention-based method demonstrating the most robust faithfulness scores across datasets; and (ii) select-then-predict models are mostly robust in asynchronous settings with only small degradation in predictive performance. Finally, feature attribution methods show conflicting behavior when used in FRESH (i.e. a select-and-predict model) and for measuring sufficiency/comprehensiveness (i.e. as post-hoc methods), suggesting that we need more robust metrics to evaluate post-hoc explanation faithfulness.

* Accepted at EMNLP Findings 2022 
Viaarxiv icon

An Empirical Study on Explanations in Out-of-Domain Settings

Feb 28, 2022
George Chrysostomou, Nikolaos Aletras

Figure 1 for An Empirical Study on Explanations in Out-of-Domain Settings
Figure 2 for An Empirical Study on Explanations in Out-of-Domain Settings
Figure 3 for An Empirical Study on Explanations in Out-of-Domain Settings
Figure 4 for An Empirical Study on Explanations in Out-of-Domain Settings

Recent work in Natural Language Processing has focused on developing approaches that extract faithful explanations, either via identifying the most important tokens in the input (i.e. post-hoc explanations) or by designing inherently faithful models that first select the most important tokens and then use them to predict the correct label (i.e. select-then-predict models). Currently, these approaches are largely evaluated on in-domain settings. Yet, little is known about how post-hoc explanations and inherently faithful models perform in out-of-domain settings. In this paper, we conduct an extensive empirical study that examines: (1) the out-of-domain faithfulness of post-hoc explanations, generated by five feature attribution methods; and (2) the out-of-domain performance of two inherently faithful models over six datasets. Contrary to our expectations, results show that in many cases out-of-domain post-hoc explanation faithfulness measured by sufficiency and comprehensiveness is higher compared to in-domain. We find this misleading and suggest using a random baseline as a yardstick for evaluating post-hoc explanation faithfulness. Our findings also show that select-then predict models demonstrate comparable predictive performance in out-of-domain settings to full-text trained models.

* ACL2022 Pre-print 
Viaarxiv icon

Frustratingly Simple Pretraining Alternatives to Masked Language Modeling

Sep 04, 2021
Atsuki Yamaguchi, George Chrysostomou, Katerina Margatina, Nikolaos Aletras

Figure 1 for Frustratingly Simple Pretraining Alternatives to Masked Language Modeling
Figure 2 for Frustratingly Simple Pretraining Alternatives to Masked Language Modeling
Figure 3 for Frustratingly Simple Pretraining Alternatives to Masked Language Modeling
Figure 4 for Frustratingly Simple Pretraining Alternatives to Masked Language Modeling

Masked language modeling (MLM), a self-supervised pretraining objective, is widely used in natural language processing for learning text representations. MLM trains a model to predict a random sample of input tokens that have been replaced by a [MASK] placeholder in a multi-class setting over the entire vocabulary. When pretraining, it is common to use alongside MLM other auxiliary objectives on the token or sequence level to improve downstream performance (e.g. next sentence prediction). However, no previous work so far has attempted in examining whether other simpler linguistically intuitive or not objectives can be used standalone as main pretraining objectives. In this paper, we explore five simple pretraining objectives based on token-level classification tasks as replacements of MLM. Empirical results on GLUE and SQuAD show that our proposed methods achieve comparable or better performance to MLM using a BERT-BASE architecture. We further validate our methods using smaller models, showing that pretraining a model with 41% of the BERT-BASE's parameters, BERT-MEDIUM results in only a 1% drop in GLUE scores with our best objective.

* Accepted at EMNLP 2021 
Viaarxiv icon

Enjoy the Salience: Towards Better Transformer-based Faithful Explanations with Word Salience

Aug 31, 2021
George Chrysostomou, Nikolaos Aletras

Figure 1 for Enjoy the Salience: Towards Better Transformer-based Faithful Explanations with Word Salience
Figure 2 for Enjoy the Salience: Towards Better Transformer-based Faithful Explanations with Word Salience
Figure 3 for Enjoy the Salience: Towards Better Transformer-based Faithful Explanations with Word Salience
Figure 4 for Enjoy the Salience: Towards Better Transformer-based Faithful Explanations with Word Salience

Pretrained transformer-based models such as BERT have demonstrated state-of-the-art predictive performance when adapted into a range of natural language processing tasks. An open problem is how to improve the faithfulness of explanations (rationales) for the predictions of these models. In this paper, we hypothesize that salient information extracted a priori from the training data can complement the task-specific information learned by the model during fine-tuning on a downstream task. In this way, we aim to help BERT not to forget assigning importance to informative input tokens when making predictions by proposing SaLoss; an auxiliary loss function for guiding the multi-head attention mechanism during training to be close to salient information extracted a priori using TextRank. Experiments for explanation faithfulness across five datasets, show that models trained with SaLoss consistently provide more faithful explanations across four different feature attribution methods compared to vanilla BERT. Using the rationales extracted from vanilla BERT and SaLoss models to train inherently faithful classifiers, we further show that the latter result in higher predictive performance in downstream tasks.

* EMNLP 2021 Pre-print 
Viaarxiv icon

Improving the Faithfulness of Attention-based Explanations with Task-specific Information for Text Classification

May 07, 2021
George Chrysostomou, Nikolaos Aletras

Figure 1 for Improving the Faithfulness of Attention-based Explanations with Task-specific Information for Text Classification
Figure 2 for Improving the Faithfulness of Attention-based Explanations with Task-specific Information for Text Classification
Figure 3 for Improving the Faithfulness of Attention-based Explanations with Task-specific Information for Text Classification
Figure 4 for Improving the Faithfulness of Attention-based Explanations with Task-specific Information for Text Classification

Neural network architectures in natural language processing often use attention mechanisms to produce probability distributions over input token representations. Attention has empirically been demonstrated to improve performance in various tasks, while its weights have been extensively used as explanations for model predictions. Recent studies (Jain and Wallace, 2019; Serrano and Smith, 2019; Wiegreffe and Pinter, 2019) have showed that it cannot generally be considered as a faithful explanation (Jacovi and Goldberg, 2020) across encoders and tasks. In this paper, we seek to improve the faithfulness of attention-based explanations for text classification. We achieve this by proposing a new family of Task-Scaling (TaSc) mechanisms that learn task-specific non-contextualised information to scale the original attention weights. Evaluation tests for explanation faithfulness, show that the three proposed variants of TaSc improve attention-based explanations across two attention mechanisms, five encoders and five text classification datasets without sacrificing predictive performance. Finally, we demonstrate that TaSc consistently provides more faithful attention-based explanations compared to three widely-used interpretability techniques.

* NLP Interpretability ; Accepted at ACL2021 
Viaarxiv icon

Variable Instance-Level Explainability for Text Classification

Apr 16, 2021
George Chrysostomou, Nikolaos Aletras

Figure 1 for Variable Instance-Level Explainability for Text Classification
Figure 2 for Variable Instance-Level Explainability for Text Classification
Figure 3 for Variable Instance-Level Explainability for Text Classification
Figure 4 for Variable Instance-Level Explainability for Text Classification

Despite the high accuracy of pretrained transformer networks in text classification, a persisting issue is their significant complexity that makes them hard to interpret. Recent research has focused on developing feature scoring methods for identifying which parts of the input are most important for the model to make a particular prediction and use it as an explanation (i.e. rationale). A limitation of these approaches is that they assume that a particular feature scoring method should be used across all instances in a dataset using a predefined fixed length, which might not be optimal across all instances. To address this, we propose a method for extracting variable-length explanations using a set of different feature scoring methods at instance-level. Our method is inspired by word erasure approaches which assume that the most faithful rationale for a prediction should be the one with the highest divergence between the model's output distribution using the full text and the text after removing the rationale for a particular instance. Evaluation on four standard text classification datasets shows that our method consistently provides more faithful explanations compared to previous fixed-length and fixed-feature scoring methods for rationale extraction.

* NLP Interpretability 
Viaarxiv icon