Alert button
Picture for Kiril Gashteovski

Kiril Gashteovski

Alert button

Robust Text Classification: Analyzing Prototype-Based Networks

Nov 11, 2023
Zhivar Sourati, Darshan Deshpande, Filip Ilievski, Kiril Gashteovski, Sascha Saralajew

Downstream applications often require text classification models to be accurate, robust, and interpretable. While the accuracy of the stateof-the-art language models approximates human performance, they are not designed to be interpretable and often exhibit a drop in performance on noisy data. The family of PrototypeBased Networks (PBNs) that classify examples based on their similarity to prototypical examples of a class (prototypes) is natively interpretable and shown to be robust to noise, which enabled its wide usage for computer vision tasks. In this paper, we study whether the robustness properties of PBNs transfer to text classification tasks. We design a modular and comprehensive framework for studying PBNs, which includes different backbone architectures, backbone sizes, and objective functions. Our evaluation protocol assesses the robustness of models against character-, word-, and sentence-level perturbations. Our experiments on three benchmarks show that the robustness of PBNs transfers to NLP classification tasks facing realistic perturbations. Moreover, the robustness of PBNs is supported mostly by the objective function that keeps prototypes interpretable, while the robustness superiority of PBNs over vanilla models becomes more salient as datasets get more complex.

Viaarxiv icon

Linking Surface Facts to Large-Scale Knowledge Graphs

Oct 23, 2023
Gorjan Radevski, Kiril Gashteovski, Chia-Chien Hung, Carolin Lawrence, Goran Glavaš

Open Information Extraction (OIE) methods extract facts from natural language text in the form of ("subject"; "relation"; "object") triples. These facts are, however, merely surface forms, the ambiguity of which impedes their downstream usage; e.g., the surface phrase "Michael Jordan" may refer to either the former basketball player or the university professor. Knowledge Graphs (KGs), on the other hand, contain facts in a canonical (i.e., unambiguous) form, but their coverage is limited by a static schema (i.e., a fixed set of entities and predicates). To bridge this gap, we need the best of both worlds: (i) high coverage of free-text OIEs, and (ii) semantic precision (i.e., monosemy) of KGs. In order to achieve this goal, we propose a new benchmark with novel evaluation protocols that can, for example, measure fact linking performance on a granular triple slot level, while also measuring if a system has the ability to recognize that a surface form has no match in the existing KG. Our extensive evaluation of several baselines show that detection of out-of-KG entities and predicates is more difficult than accurate linking to existing ones, thus calling for more research efforts on this difficult task. We publicly release all resources (data, benchmark and code) on https://github.com/nec-research/fact-linking.

Viaarxiv icon

Large Language Models Enable Few-Shot Clustering

Jul 02, 2023
Vijay Viswanathan, Kiril Gashteovski, Carolin Lawrence, Tongshuang Wu, Graham Neubig

Figure 1 for Large Language Models Enable Few-Shot Clustering
Figure 2 for Large Language Models Enable Few-Shot Clustering
Figure 3 for Large Language Models Enable Few-Shot Clustering
Figure 4 for Large Language Models Enable Few-Shot Clustering

Unlike traditional unsupervised clustering, semi-supervised clustering allows users to provide meaningful structure to the data, which helps the clustering algorithm to match the user's intent. Existing approaches to semi-supervised clustering require a significant amount of feedback from an expert to improve the clusters. In this paper, we ask whether a large language model can amplify an expert's guidance to enable query-efficient, few-shot semi-supervised text clustering. We show that LLMs are surprisingly effective at improving clustering. We explore three stages where LLMs can be incorporated into clustering: before clustering (improving input features), during clustering (by providing constraints to the clusterer), and after clustering (using LLMs post-correction). We find incorporating LLMs in the first two stages can routinely provide significant improvements in cluster quality, and that LLMs enable a user to make trade-offs between cost and accuracy to produce desired clusters. We release our code and LLM prompts for the public to use.

Viaarxiv icon

Leveraging Open Information Extraction for Improving Few-Shot Trigger Detection Domain Transfer

May 23, 2023
David Dukić, Kiril Gashteovski, Goran Glavaš, Jan Šnajder

Figure 1 for Leveraging Open Information Extraction for Improving Few-Shot Trigger Detection Domain Transfer
Figure 2 for Leveraging Open Information Extraction for Improving Few-Shot Trigger Detection Domain Transfer
Figure 3 for Leveraging Open Information Extraction for Improving Few-Shot Trigger Detection Domain Transfer
Figure 4 for Leveraging Open Information Extraction for Improving Few-Shot Trigger Detection Domain Transfer

Event detection is a crucial information extraction task in many domains, such as Wikipedia or news. The task typically relies on trigger detection (TD) -- identifying token spans in the text that evoke specific events. While the notion of triggers should ideally be universal across domains, domain transfer for TD from high- to low-resource domains results in significant performance drops. We address the problem of negative transfer for TD by coupling triggers between domains using subject-object relations obtained from a rule-based open information extraction (OIE) system. We demonstrate that relations injected through multi-task training can act as mediators between triggers in different domains, enhancing zero- and few-shot TD domain transfer and reducing negative transfer, in particular when transferring from a high-resource source Wikipedia domain to a low-resource target news domain. Additionally, we combine the extracted relations with masked language modeling on the target domain and obtain further TD performance gains. Finally, we demonstrate that the results are robust to the choice of the OIE system.

Viaarxiv icon

KGxBoard: Explainable and Interactive Leaderboard for Evaluation of Knowledge Graph Completion Models

Aug 23, 2022
Haris Widjaja, Kiril Gashteovski, Wiem Ben Rim, Pengfei Liu, Christopher Malon, Daniel Ruffinelli, Carolin Lawrence, Graham Neubig

Figure 1 for KGxBoard: Explainable and Interactive Leaderboard for Evaluation of Knowledge Graph Completion Models
Figure 2 for KGxBoard: Explainable and Interactive Leaderboard for Evaluation of Knowledge Graph Completion Models
Figure 3 for KGxBoard: Explainable and Interactive Leaderboard for Evaluation of Knowledge Graph Completion Models
Figure 4 for KGxBoard: Explainable and Interactive Leaderboard for Evaluation of Knowledge Graph Completion Models

Knowledge Graphs (KGs) store information in the form of (head, predicate, tail)-triples. To augment KGs with new knowledge, researchers proposed models for KG Completion (KGC) tasks such as link prediction; i.e., answering (h; p; ?) or (?; p; t) queries. Such models are usually evaluated with averaged metrics on a held-out test set. While useful for tracking progress, averaged single-score metrics cannot reveal what exactly a model has learned -- or failed to learn. To address this issue, we propose KGxBoard: an interactive framework for performing fine-grained evaluation on meaningful subsets of the data, each of which tests individual and interpretable capabilities of a KGC model. In our experiments, we highlight the findings that we discovered with the use of KGxBoard, which would have been impossible to detect with standard averaged single-score metrics.

Viaarxiv icon

Human-Centric Research for NLP: Towards a Definition and Guiding Questions

Jul 10, 2022
Bhushan Kotnis, Kiril Gashteovski, Julia Gastinger, Giuseppe Serra, Francesco Alesiani, Timo Sztyler, Ammar Shaker, Na Gong, Carolin Lawrence, Zhao Xu

Figure 1 for Human-Centric Research for NLP: Towards a Definition and Guiding Questions
Figure 2 for Human-Centric Research for NLP: Towards a Definition and Guiding Questions
Figure 3 for Human-Centric Research for NLP: Towards a Definition and Guiding Questions
Figure 4 for Human-Centric Research for NLP: Towards a Definition and Guiding Questions

With Human-Centric Research (HCR) we can steer research activities so that the research outcome is beneficial for human stakeholders, such as end users. But what exactly makes research human-centric? We address this question by providing a working definition and define how a research pipeline can be split into different stages in which human-centric components can be added. Additionally, we discuss existing NLP with HCR components and define a series of guiding questions, which can serve as starting points for researchers interested in exploring human-centric research approaches. We hope that this work would inspire researchers to refine the proposed definition and to pose other questions that might be meaningful for achieving HCR.

Viaarxiv icon

A Human-Centric Assessment Framework for AI

May 25, 2022
Sascha Saralajew, Ammar Shaker, Zhao Xu, Kiril Gashteovski, Bhushan Kotnis, Wiem Ben-Rim, Jürgen Quittek, Carolin Lawrence

Figure 1 for A Human-Centric Assessment Framework for AI
Figure 2 for A Human-Centric Assessment Framework for AI

With the rise of AI systems in real-world applications comes the need for reliable and trustworthy AI. An important aspect for this are explainable AI systems. However, there is no agreed standard on how explainable AI systems should be assessed. Inspired by the Turing test, we introduce a human-centric assessment framework where a leading domain expert accepts or rejects the solutions of an AI system and another domain expert. By comparing the acceptance rates of provided solutions, we can assess how the AI system performs in comparison to the domain expert, and in turn whether or not the AI system's explanations (if provided) are human understandable. This setup -- comparable to the Turing test -- can serve as framework for a wide range of human-centric AI system assessments. We demonstrate this by presenting two instantiations: (1) an assessment that measures the classification accuracy of a system with the option to incorporate label uncertainties; (2) an assessment where the usefulness of provided explanations is determined in a human-centric manner.

Viaarxiv icon

Integrating diverse extraction pathways using iterative predictions for Multilingual Open Information Extraction

Oct 15, 2021
Bhushan Kotnis, Kiril Gashteovski, Carolin Lawrence, Daniel Oñoro Rubio, Vanesa Rodriguez-Tembras, Makoto Takamoto, Mathias Niepert

Figure 1 for Integrating diverse extraction pathways using iterative predictions for Multilingual Open Information Extraction
Figure 2 for Integrating diverse extraction pathways using iterative predictions for Multilingual Open Information Extraction
Figure 3 for Integrating diverse extraction pathways using iterative predictions for Multilingual Open Information Extraction
Figure 4 for Integrating diverse extraction pathways using iterative predictions for Multilingual Open Information Extraction

In this paper we investigate a simple hypothesis for the Open Information Extraction (OpenIE) task, that it may be easier to extract some elements of an triple if the extraction is conditioned on prior extractions which may be easier to extract. We successfully exploit this and propose a neural multilingual OpenIE system that iteratively extracts triples by conditioning extractions on different elements of the triple leading to a rich set of extractions. The iterative nature of MiLIE also allows for seamlessly integrating rule based extraction systems with a neural end-to-end system leading to improved performance. MiLIE outperforms SOTA systems on multiple languages ranging from Chinese to Galician thanks to it's ability of combining multiple extraction pathways. Our analysis confirms that it is indeed true that certain elements of an extraction are easier to extract than others. Finally, we introduce OpenIE evaluation datasets for two low resource languages namely Japanese and Galician.

Viaarxiv icon

AnnIE: An Annotation Platform for Constructing Complete Open Information Extraction Benchmark

Sep 15, 2021
Niklas Friedrich, Kiril Gashteovski, Mingying Yu, Bhushan Kotnis, Carolin Lawrence, Mathias Niepert, Goran Glavaš

Figure 1 for AnnIE: An Annotation Platform for Constructing Complete Open Information Extraction Benchmark
Figure 2 for AnnIE: An Annotation Platform for Constructing Complete Open Information Extraction Benchmark
Figure 3 for AnnIE: An Annotation Platform for Constructing Complete Open Information Extraction Benchmark
Figure 4 for AnnIE: An Annotation Platform for Constructing Complete Open Information Extraction Benchmark

Open Information Extraction (OIE) is the task of extracting facts from sentences in the form of relations and their corresponding arguments in schema-free manner. Intrinsic performance of OIE systems is difficult to measure due to the incompleteness of existing OIE benchmarks: the ground truth extractions do not group all acceptable surface realizations of the same fact that can be extracted from a sentence. To measure performance of OIE systems more realistically, it is necessary to manually annotate complete facts (i.e., clusters of all acceptable surface realizations of the same fact) from input sentences. We propose AnnIE: an interactive annotation platform that facilitates such challenging annotation tasks and supports creation of complete fact-oriented OIE evaluation benchmarks. AnnIE is modular and flexible in order to support different use case scenarios (i.e., benchmarks covering different types of facts). We use AnnIE to build two complete OIE benchmarks: one with verb-mediated facts and another with facts encompassing named entities. Finally, we evaluate several OIE systems on our complete benchmarks created with AnnIE. Our results suggest that existing incomplete benchmarks are overly lenient, and that OIE systems are not as robust as previously reported. We publicly release AnnIE under non-restrictive license.

Viaarxiv icon

BenchIE: Open Information Extraction Evaluation Based on Facts, Not Tokens

Sep 14, 2021
Kiril Gashteovski, Mingying Yu, Bhushan Kotnis, Carolin Lawrence, Goran Glavas, Mathias Niepert

Figure 1 for BenchIE: Open Information Extraction Evaluation Based on Facts, Not Tokens
Figure 2 for BenchIE: Open Information Extraction Evaluation Based on Facts, Not Tokens
Figure 3 for BenchIE: Open Information Extraction Evaluation Based on Facts, Not Tokens
Figure 4 for BenchIE: Open Information Extraction Evaluation Based on Facts, Not Tokens

Intrinsic evaluations of OIE systems are carried out either manually -- with human evaluators judging the correctness of extractions -- or automatically, on standardized benchmarks. The latter, while much more cost-effective, is less reliable, primarily because of the incompleteness of the existing OIE benchmarks: the ground truth extractions do not include all acceptable variants of the same fact, leading to unreliable assessment of models' performance. Moreover, the existing OIE benchmarks are available for English only. In this work, we introduce BenchIE: a benchmark and evaluation framework for comprehensive evaluation of OIE systems for English, Chinese and German. In contrast to existing OIE benchmarks, BenchIE takes into account informational equivalence of extractions: our gold standard consists of fact synsets, clusters in which we exhaustively list all surface forms of the same fact. We benchmark several state-of-the-art OIE systems using BenchIE and demonstrate that these systems are significantly less effective than indicated by existing OIE benchmarks. We make BenchIE (data and evaluation code) publicly available.

Viaarxiv icon