Alert button
Picture for Sebastian Möller

Sebastian Möller

Alert button

InterroLang: Exploring NLP Models and Datasets through Dialogue-based Explanations

Oct 23, 2023
Nils Feldhus, Qianli Wang, Tatiana Anikina, Sahil Chopra, Cennet Oguz, Sebastian Möller

While recently developed NLP explainability methods let us open the black box in various ways (Madsen et al., 2022), a missing ingredient in this endeavor is an interactive tool offering a conversational interface. Such a dialogue system can help users explore datasets and models with explanations in a contextualized manner, e.g. via clarification or follow-up questions, and through a natural language interface. We adapt the conversational explanation framework TalkToModel (Slack et al., 2022) to the NLP domain, add new NLP-specific operations such as free-text rationalization, and illustrate its generalizability on three NLP tasks (dialogue act classification, question answering, hate speech detection). To recognize user queries for explanations, we evaluate fine-tuned and few-shot prompting models and implement a novel Adapter-based approach. We then conduct two user studies on (1) the perceived correctness and helpfulness of the dialogues, and (2) the simulatability, i.e. how objectively helpful dialogical explanations are for humans in figuring out the model's predicted label when it's not shown. We found rationalization and feature attribution were helpful in explaining the model behavior. Moreover, users could more reliably predict the model outcome based on an explanation dialogue rather than one-off explanations.

* EMNLP 2023 Findings. Camera-ready version 
Viaarxiv icon

MultiTACRED: A Multilingual Version of the TAC Relation Extraction Dataset

May 15, 2023
Leonhard Hennig, Philippe Thomas, Sebastian Möller

Figure 1 for MultiTACRED: A Multilingual Version of the TAC Relation Extraction Dataset
Figure 2 for MultiTACRED: A Multilingual Version of the TAC Relation Extraction Dataset
Figure 3 for MultiTACRED: A Multilingual Version of the TAC Relation Extraction Dataset
Figure 4 for MultiTACRED: A Multilingual Version of the TAC Relation Extraction Dataset

Relation extraction (RE) is a fundamental task in information extraction, whose extension to multilingual settings has been hindered by the lack of supervised resources comparable in size to large English datasets such as TACRED (Zhang et al., 2017). To address this gap, we introduce the MultiTACRED dataset, covering 12 typologically diverse languages from 9 language families, which is created by machine-translating TACRED instances and automatically projecting their entity annotations. We analyze translation and annotation projection quality, identify error categories, and experimentally evaluate fine-tuned pretrained mono- and multilingual language models in common transfer learning scenarios. Our analyses show that machine translation is a viable strategy to transfer RE instances, with native speakers judging more than 83% of the translated instances to be linguistically and semantically acceptable. We find monolingual RE model performance to be comparable to the English original for many of the target languages, and that multilingual models trained on a combination of English and target language data can outperform their monolingual counterparts. However, we also observe a variety of translation and annotation projection errors, both due to the MT systems and linguistic features of the target languages, such as pronoun-dropping, compounding and inflection, that degrade dataset quality and RE model performance.

* Accepted at ACL 2023 
Viaarxiv icon

Constructing Natural Language Explanations via Saliency Map Verbalization

Oct 13, 2022
Nils Feldhus, Leonhard Hennig, Maximilian Dustin Nasert, Christopher Ebert, Robert Schwarzenberg, Sebastian Möller

Figure 1 for Constructing Natural Language Explanations via Saliency Map Verbalization
Figure 2 for Constructing Natural Language Explanations via Saliency Map Verbalization
Figure 3 for Constructing Natural Language Explanations via Saliency Map Verbalization
Figure 4 for Constructing Natural Language Explanations via Saliency Map Verbalization

Saliency maps can explain a neural model's prediction by identifying important input features. While they excel in being faithful to the explained model, saliency maps in their entirety are difficult to interpret for humans, especially for instances with many input features. In contrast, natural language explanations (NLEs) are flexible and can be tuned to a recipient's expectations, but are costly to generate: Rationalization models are usually trained on specific tasks and require high-quality and diverse datasets of human annotations. We combine the advantages from both explainability methods by verbalizing saliency maps. We formalize this underexplored task and propose a novel methodology that addresses two key challenges of this approach -- what and how to verbalize. Our approach utilizes efficient search methods that are task- and model-agnostic and do not require another black-box model, and hand-crafted templates to preserve faithfulness. We conduct a human evaluation of explanation representations across two natural language processing (NLP) tasks: news topic classification and sentiment analysis. Our results suggest that saliency map verbalization makes explanations more understandable and less cognitively challenging to humans than conventional heatmap visualization.

Viaarxiv icon

Cross-lingual Approaches for the Detection of Adverse Drug Reactions in German from a Patient's Perspective

Aug 03, 2022
Lisa Raithel, Philippe Thomas, Roland Roller, Oliver Sapina, Sebastian Möller, Pierre Zweigenbaum

Figure 1 for Cross-lingual Approaches for the Detection of Adverse Drug Reactions in German from a Patient's Perspective
Figure 2 for Cross-lingual Approaches for the Detection of Adverse Drug Reactions in German from a Patient's Perspective
Figure 3 for Cross-lingual Approaches for the Detection of Adverse Drug Reactions in German from a Patient's Perspective
Figure 4 for Cross-lingual Approaches for the Detection of Adverse Drug Reactions in German from a Patient's Perspective

In this work, we present the first corpus for German Adverse Drug Reaction (ADR) detection in patient-generated content. The data consists of 4,169 binary annotated documents from a German patient forum, where users talk about health issues and get advice from medical doctors. As is common in social media data in this domain, the class labels of the corpus are very imbalanced. This and a high topic imbalance make it a very challenging dataset, since often, the same symptom can have several causes and is not always related to a medication intake. We aim to encourage further multi-lingual efforts in the domain of ADR detection and provide preliminary experiments for binary classification using different methods of zero- and few-shot learning based on a multi-lingual model. When fine-tuning XLM-RoBERTa first on English patient forum data and then on the new German data, we achieve an F1-score of 37.52 for the positive class. We make the dataset and models publicly available for the community.

* Accepted at LREC 2022 
Viaarxiv icon

A Transfer Learning Based Model for Text Readability Assessment in German

Jul 13, 2022
Salar Mohtaj, Babak Naderi, Sebastian Möller, Faraz Maschhur, Chuyang Wu, Max Reinhard

Figure 1 for A Transfer Learning Based Model for Text Readability Assessment in German
Figure 2 for A Transfer Learning Based Model for Text Readability Assessment in German
Figure 3 for A Transfer Learning Based Model for Text Readability Assessment in German
Figure 4 for A Transfer Learning Based Model for Text Readability Assessment in German

Text readability assessment has a wide range of applications for different target people, from language learners to people with disabilities. The fast pace of textual content production on the web makes it impossible to measure text complexity without the benefit of machine learning and natural language processing techniques. Although various research addressed the readability assessment of English text in recent years, there is still room for improvement of the models for other languages. In this paper, we proposed a new model for text complexity assessment for German text based on transfer learning. Our results show that the model outperforms more classical solutions based on linguistic features extraction from input text. The best model is based on the BERT pre-trained language model achieved the Root Mean Square Error (RMSE) of 0.483.

Viaarxiv icon

A Medical Information Extraction Workbench to Process German Clinical Text

Jul 08, 2022
Roland Roller, Laura Seiffe, Ammer Ayach, Sebastian Möller, Oliver Marten, Michael Mikhailov, Christoph Alt, Danilo Schmidt, Fabian Halleck, Marcel Naik, Wiebke Duettmann, Klemens Budde

Figure 1 for A Medical Information Extraction Workbench to Process German Clinical Text
Figure 2 for A Medical Information Extraction Workbench to Process German Clinical Text
Figure 3 for A Medical Information Extraction Workbench to Process German Clinical Text
Figure 4 for A Medical Information Extraction Workbench to Process German Clinical Text

Background: In the information extraction and natural language processing domain, accessible datasets are crucial to reproduce and compare results. Publicly available implementations and tools can serve as benchmark and facilitate the development of more complex applications. However, in the context of clinical text processing the number of accessible datasets is scarce -- and so is the number of existing tools. One of the main reasons is the sensitivity of the data. This problem is even more evident for non-English languages. Approach: In order to address this situation, we introduce a workbench: a collection of German clinical text processing models. The models are trained on a de-identified corpus of German nephrology reports. Result: The presented models provide promising results on in-domain data. Moreover, we show that our models can be also successfully applied to other biomedical text in German. Our workbench is made publicly available so it can be used out of the box, as a benchmark or transferred to related problems.

* Paper under review since 2021 
Viaarxiv icon

Mediators: Conversational Agents Explaining NLP Model Behavior

Jun 13, 2022
Nils Feldhus, Ajay Madhavan Ravichandran, Sebastian Möller

Figure 1 for Mediators: Conversational Agents Explaining NLP Model Behavior
Figure 2 for Mediators: Conversational Agents Explaining NLP Model Behavior
Figure 3 for Mediators: Conversational Agents Explaining NLP Model Behavior
Figure 4 for Mediators: Conversational Agents Explaining NLP Model Behavior

The human-centric explainable artificial intelligence (HCXAI) community has raised the need for framing the explanation process as a conversation between human and machine. In this position paper, we establish desiderata for Mediators, text-based conversational agents which are capable of explaining the behavior of neural models interactively using natural language. From the perspective of natural language processing (NLP) research, we engineer a blueprint of such a Mediator for the task of sentiment analysis and assess how far along current research is on the path towards dialogue-based explanations.

* Accepted to IJCAI-ECAI 2022 Workshop on Explainable Artificial Intelligence (XAI) 
Viaarxiv icon

When Performance is not Enough -- A Multidisciplinary View on Clinical Decision Support

Apr 27, 2022
Roland Roller, Klemens Budde, Aljoscha Burchardt, Peter Dabrock, Sebastian Möller, Bilgin Osmanodja, Simon Ronicke, David Samhammer, Sven Schmeier

Figure 1 for When Performance is not Enough -- A Multidisciplinary View on Clinical Decision Support
Figure 2 for When Performance is not Enough -- A Multidisciplinary View on Clinical Decision Support
Figure 3 for When Performance is not Enough -- A Multidisciplinary View on Clinical Decision Support
Figure 4 for When Performance is not Enough -- A Multidisciplinary View on Clinical Decision Support

Scientific publications about machine learning in healthcare are often about implementing novel methods and boosting the performance - at least from a computer science perspective. However, beyond such often short-lived improvements, much more needs to be taken into consideration if we want to arrive at a sustainable progress in healthcare. What does it take to actually implement such a system, make it usable for the domain expert, and possibly bring it into practical usage? Targeted at Computer Scientists, this work presents a multidisciplinary view on machine learning in medical decision support systems and covers information technology, medical, as well as ethical aspects. Along with an implemented risk prediction system in nephrology, challenges and lessons learned in a pilot project are presented.

* Paper currently under review 
Viaarxiv icon

On incorporating social speaker characteristics in synthetic speech

Apr 03, 2022
Sai Sirisha Rallabandi, Sebastian Möller

Figure 1 for On incorporating social speaker characteristics in synthetic speech
Figure 2 for On incorporating social speaker characteristics in synthetic speech
Figure 3 for On incorporating social speaker characteristics in synthetic speech
Figure 4 for On incorporating social speaker characteristics in synthetic speech

In our previous work, we derived the acoustic features, that contribute to the perception of warmth and competence in synthetic speech. As an extension, in our current work, we investigate the impact of the derived vocal features in the generation of the desired characteristics. The acoustic features, spectral flux, F1 mean and F2 mean and their convex combinations were explored for the generation of higher warmth in female speech. The voiced slope, spectral flux, and their convex combinations were investigated for the generation of higher competence in female speech. We have employed a feature quantization approach in the traditional end-to-end tacotron based speech synthesis model. The listening tests have shown that the convex combination of acoustic features displays higher Mean Opinion Scores of warmth and competence when compared to that of individual features.

* Submitted to Interspeech 2022 
Viaarxiv icon