Alert button
Picture for Georg Groh

Georg Groh

Alert button

This is not correct! Negation-aware Evaluation of Language Generation Systems

Jul 26, 2023
Miriam Anschütz, Diego Miguel Lozano, Georg Groh

Large language models underestimate the impact of negations on how much they change the meaning of a sentence. Therefore, learned evaluation metrics based on these models are insensitive to negations. In this paper, we propose NegBLEURT, a negation-aware version of the BLEURT evaluation metric. For that, we designed a rule-based sentence negation tool and used it to create the CANNOT negation evaluation dataset. Based on this dataset, we fine-tuned a sentence transformer and an evaluation metric to improve their negation sensitivity. Evaluating these models on existing benchmarks shows that our fine-tuned models outperform existing metrics on the negated sentences by far while preserving their base models' performances on other perturbations.

* Accepted to INLG 2023 
Viaarxiv icon

Language Models for German Text Simplification: Overcoming Parallel Data Scarcity through Style-specific Pre-training

May 22, 2023
Miriam Anschütz, Joshua Oehms, Thomas Wimmer, Bartłomiej Jezierski, Georg Groh

Figure 1 for Language Models for German Text Simplification: Overcoming Parallel Data Scarcity through Style-specific Pre-training
Figure 2 for Language Models for German Text Simplification: Overcoming Parallel Data Scarcity through Style-specific Pre-training
Figure 3 for Language Models for German Text Simplification: Overcoming Parallel Data Scarcity through Style-specific Pre-training
Figure 4 for Language Models for German Text Simplification: Overcoming Parallel Data Scarcity through Style-specific Pre-training

Automatic text simplification systems help to reduce textual information barriers on the internet. However, for languages other than English, only few parallel data to train these systems exists. We propose a two-step approach to overcome this data scarcity issue. First, we fine-tuned language models on a corpus of German Easy Language, a specific style of German. Then, we used these models as decoders in a sequence-to-sequence simplification task. We show that the language models adapt to the style characteristics of Easy Language and output more accessible texts. Moreover, with the style-specific pre-training, we reduced the number of trainable parameters in text simplification models. Hence, less parallel data is sufficient for training. Our results indicate that pre-training on unaligned data can reduce the required parallel data while improving the performance on downstream tasks.

* Accepted to ACL Findings 2023 
Viaarxiv icon

AdamR at SemEval-2023 Task 10: Solving the Class Imbalance Problem in Sexism Detection with Ensemble Learning

May 15, 2023
Adam Rydelek, Daryna Dementieva, Georg Groh

Figure 1 for AdamR at SemEval-2023 Task 10: Solving the Class Imbalance Problem in Sexism Detection with Ensemble Learning
Figure 2 for AdamR at SemEval-2023 Task 10: Solving the Class Imbalance Problem in Sexism Detection with Ensemble Learning
Figure 3 for AdamR at SemEval-2023 Task 10: Solving the Class Imbalance Problem in Sexism Detection with Ensemble Learning
Figure 4 for AdamR at SemEval-2023 Task 10: Solving the Class Imbalance Problem in Sexism Detection with Ensemble Learning

The Explainable Detection of Online Sexism task presents the problem of explainable sexism detection through fine-grained categorisation of sexist cases with three subtasks. Our team experimented with different ways to combat class imbalance throughout the tasks using data augmentation and loss alteration techniques. We tackled the challenge by utilising ensembles of Transformer models trained on different datasets, which are tested to find the balance between performance and interpretability. This solution ranked us in the top 40\% of teams for each of the tracks.

* One of the top solutions at the SemEval-2023 task "The Explainable Detection of Online Sexism" 
Viaarxiv icon

Adam-Smith at SemEval-2023 Task 4: Discovering Human Values in Arguments with Ensembles of Transformer-based Models

May 15, 2023
Daniel Schroter, Daryna Dementieva, Georg Groh

Figure 1 for Adam-Smith at SemEval-2023 Task 4: Discovering Human Values in Arguments with Ensembles of Transformer-based Models
Figure 2 for Adam-Smith at SemEval-2023 Task 4: Discovering Human Values in Arguments with Ensembles of Transformer-based Models
Figure 3 for Adam-Smith at SemEval-2023 Task 4: Discovering Human Values in Arguments with Ensembles of Transformer-based Models
Figure 4 for Adam-Smith at SemEval-2023 Task 4: Discovering Human Values in Arguments with Ensembles of Transformer-based Models

This paper presents the best-performing approach alias "Adam Smith" for the SemEval-2023 Task 4: "Identification of Human Values behind Arguments". The goal of the task was to create systems that automatically identify the values within textual arguments. We train transformer-based models until they reach their loss minimum or f1-score maximum. Ensembling the models by selecting one global decision threshold that maximizes the f1-score leads to the best-performing system in the competition. Ensembling based on stacking with logistic regressions shows the best performance on an additional dataset provided to evaluate the robustness ("Nahj al-Balagha"). Apart from outlining the submitted system, we demonstrate that the use of the large ensemble model is not necessary and that the system size can be significantly reduced.

* The winner of SemEval-2023 Task 4: "Identification of Human Values behind Arguments" 
Viaarxiv icon

IFAN: An Explainability-Focused Interaction Framework for Humans and NLP Models

Mar 06, 2023
Edoardo Mosca, Daryna Dementieva, Tohid Ebrahim Ajdari, Maximilian Kummeth, Kirill Gringauz, Georg Groh

Figure 1 for IFAN: An Explainability-Focused Interaction Framework for Humans and NLP Models
Figure 2 for IFAN: An Explainability-Focused Interaction Framework for Humans and NLP Models
Figure 3 for IFAN: An Explainability-Focused Interaction Framework for Humans and NLP Models
Figure 4 for IFAN: An Explainability-Focused Interaction Framework for Humans and NLP Models

Interpretability and human oversight are fundamental pillars of deploying complex NLP models into real-world applications. However, applying explainability and human-in-the-loop methods requires technical proficiency. Despite existing toolkits for model understanding and analysis, options to integrate human feedback are still limited. We propose IFAN, a framework for real-time explanation-based interaction with NLP models. Through IFAN's interface, users can provide feedback to selected model explanations, which is then integrated through adapter layers to align the model with human rationale. We show the system to be effective in debiasing a hate speech classifier with minimal performance loss. IFAN also offers a visual admin system and API to manage models (and datasets) as well as control access rights. A demo is live at https://ifan.ml/

* ACL Demo 2023 Submission 
Viaarxiv icon

From Judgement's Premises Towards Key Points

Dec 23, 2022
Oren Sultan, Rayen Dhahri, Yauheni Mardan, Tobias Eder, Georg Groh

Figure 1 for From Judgement's Premises Towards Key Points
Figure 2 for From Judgement's Premises Towards Key Points
Figure 3 for From Judgement's Premises Towards Key Points

Key Point Analysis(KPA) is a relatively new task in NLP that combines summarization and classification by extracting argumentative key points (KPs) for a topic from a collection of texts and categorizing their closeness to the different arguments. In our work, we focus on the legal domain and develop methods that identify and extract KPs from premises derived from texts of judgments. The first method is an adaptation to an existing state-of-the-art method, and the two others are new methods that we developed from scratch. We present our methods and examples of their outputs, as well a comparison between them. The full evaluation of our results is done in the matching task -- match between the generated KPs to arguments (premises).

Viaarxiv icon

Structuring User-Generated Content on Social Media with Multimodal Aspect-Based Sentiment Analysis

Oct 27, 2022
Miriam Anschütz, Tobias Eder, Georg Groh

Figure 1 for Structuring User-Generated Content on Social Media with Multimodal Aspect-Based Sentiment Analysis
Figure 2 for Structuring User-Generated Content on Social Media with Multimodal Aspect-Based Sentiment Analysis
Figure 3 for Structuring User-Generated Content on Social Media with Multimodal Aspect-Based Sentiment Analysis
Figure 4 for Structuring User-Generated Content on Social Media with Multimodal Aspect-Based Sentiment Analysis

People post their opinions and experiences on social media, yielding rich databases of end users' sentiments. This paper shows to what extent machine learning can analyze and structure these databases. An automated data analysis pipeline is deployed to provide insights into user-generated content for researchers in other domains. First, the domain expert can select an image and a term of interest. Then, the pipeline uses image retrieval to find all images showing similar contents and applies aspect-based sentiment analysis to outline users' opinions about the selected term. As part of an interdisciplinary project between architecture and computer science researchers, an empirical study of Hamburg's Elbphilharmonie was conveyed on 300 thousand posts from the platform Flickr with the hashtag 'hamburg'. Image retrieval methods generated a subset of slightly more than 1.5 thousand images displaying the Elbphilharmonie. We found that these posts mainly convey a neutral or positive sentiment towards it. With this pipeline, we suggest a new big data analysis method that offers new insights into end-users opinions, e.g., for architecture domain experts.

* 9 pages, 5 figures, short paper version to be published at 9th IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (BDCAT2022) 
Viaarxiv icon

"That Is a Suspicious Reaction!": Interpreting Logits Variation to Detect NLP Adversarial Attacks

Apr 10, 2022
Edoardo Mosca, Shreyash Agarwal, Javier Rando-Ramirez, Georg Groh

Figure 1 for "That Is a Suspicious Reaction!": Interpreting Logits Variation to Detect NLP Adversarial Attacks
Figure 2 for "That Is a Suspicious Reaction!": Interpreting Logits Variation to Detect NLP Adversarial Attacks
Figure 3 for "That Is a Suspicious Reaction!": Interpreting Logits Variation to Detect NLP Adversarial Attacks
Figure 4 for "That Is a Suspicious Reaction!": Interpreting Logits Variation to Detect NLP Adversarial Attacks

Adversarial attacks are a major challenge faced by current machine learning research. These purposely crafted inputs fool even the most advanced models, precluding their deployment in safety-critical applications. Extensive research in computer vision has been carried to develop reliable defense strategies. However, the same issue remains less explored in natural language processing. Our work presents a model-agnostic detector of adversarial text examples. The approach identifies patterns in the logits of the target classifier when perturbing the input text. The proposed detector improves the current state-of-the-art performance in recognizing adversarial inputs and exhibits strong generalization capabilities across different NLP models, datasets, and word-level attacks.

* ACL 2022 
Viaarxiv icon

How to Build Robust FAQ Chatbot with Controllable Question Generator?

Nov 18, 2021
Yan Pan, Mingyang Ma, Bernhard Pflugfelder, Georg Groh

Figure 1 for How to Build Robust FAQ Chatbot with Controllable Question Generator?
Figure 2 for How to Build Robust FAQ Chatbot with Controllable Question Generator?
Figure 3 for How to Build Robust FAQ Chatbot with Controllable Question Generator?
Figure 4 for How to Build Robust FAQ Chatbot with Controllable Question Generator?

Many unanswerable adversarial questions fool the question-answer (QA) system with some plausible answers. Building a robust, frequently asked questions (FAQ) chatbot needs a large amount of diverse adversarial examples. Recent question generation methods are ineffective at generating many high-quality and diverse adversarial question-answer pairs from unstructured text. We propose the diversity controllable semantically valid adversarial attacker (DCSA), a high-quality, diverse, controllable method to generate standard and adversarial samples with a semantic graph. The fluent and semantically generated QA pairs fool our passage retrieval model successfully. After that, we conduct a study on the robustness and generalization of the QA model with generated QA pairs among different domains. We find that the generated data set improves the generalizability of the QA model to the new target domain and the robustness of the QA model to detect unanswerable adversarial questions.

Viaarxiv icon

A Case Study and Qualitative Analysis of Simple Cross-Lingual Opinion Mining

Nov 04, 2021
Gerhard Hagerer, Wing Sheung Leung, Qiaoxi Liu, Hannah Danner, Georg Groh

Figure 1 for A Case Study and Qualitative Analysis of Simple Cross-Lingual Opinion Mining
Figure 2 for A Case Study and Qualitative Analysis of Simple Cross-Lingual Opinion Mining
Figure 3 for A Case Study and Qualitative Analysis of Simple Cross-Lingual Opinion Mining
Figure 4 for A Case Study and Qualitative Analysis of Simple Cross-Lingual Opinion Mining

User-generated content from social media is produced in many languages, making it technically challenging to compare the discussed themes from one domain across different cultures and regions. It is relevant for domains in a globalized world, such as market research, where people from two nations and markets might have different requirements for a product. We propose a simple, modern, and effective method for building a single topic model with sentiment analysis capable of covering multiple languages simultanteously, based on a pre-trained state-of-the-art deep neural network for natural language understanding. To demonstrate its feasibility, we apply the model to newspaper articles and user comments of a specific domain, i.e., organic food products and related consumption behavior. The themes match across languages. Additionally, we obtain an high proportion of stable and domain-relevant topics, a meaningful relation between topics and their respective textual contents, and an interpretable representation for social media documents. Marketing can potentially benefit from our method, since it provides an easy-to-use means of addressing specific customer interests from different market regions around the globe. For reproducibility, we provide the code, data, and results of our study.

* Proceedings of the 13th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - KDIR 2021  
* 10 pages, 2 tables, 5 figures, full paper, peer-reviewed, published at KDIR/IC3k 2021 conference 
Viaarxiv icon