Alert button
Picture for Christos Christodoulopoulos

Christos Christodoulopoulos

Alert button

WebIE: Faithful and Robust Information Extraction on the Web

May 23, 2023
Chenxi Whitehouse, Clara Vania, Alham Fikri Aji, Christos Christodoulopoulos, Andrea Pierleoni

Figure 1 for WebIE: Faithful and Robust Information Extraction on the Web
Figure 2 for WebIE: Faithful and Robust Information Extraction on the Web
Figure 3 for WebIE: Faithful and Robust Information Extraction on the Web
Figure 4 for WebIE: Faithful and Robust Information Extraction on the Web

Extracting structured and grounded fact triples from raw text is a fundamental task in Information Extraction (IE). Existing IE datasets are typically collected from Wikipedia articles, using hyperlinks to link entities to the Wikidata knowledge base. However, models trained only on Wikipedia have limitations when applied to web domains, which often contain noisy text or text that does not have any factual information. We present WebIE, the first large-scale, entity-linked closed IE dataset consisting of 1.6M sentences automatically collected from the English Common Crawl corpus. WebIE also includes negative examples, i.e. sentences without fact triples, to better reflect the data on the web. We annotate ~25K triples from WebIE through crowdsourcing and introduce mWebIE, a translation of the annotated set in four other languages: French, Spanish, Portuguese, and Hindi. We evaluate the in-domain, out-of-domain, and zero-shot cross-lingual performance of generative IE models and find models trained on WebIE show better generalisability. We also propose three training strategies that use entity linking as an auxiliary task. Our experiments show that adding Entity-Linking objectives improves the faithfulness of our generative IE models.

* ACL 2023 Main Conference 
Viaarxiv icon

State-of-the-art generalisation research in NLP: a taxonomy and review

Oct 10, 2022
Dieuwke Hupkes, Mario Giulianelli, Verna Dankers, Mikel Artetxe, Yanai Elazar, Tiago Pimentel, Christos Christodoulopoulos, Karim Lasri, Naomi Saphra, Arabella Sinclair, Dennis Ulmer, Florian Schottmann, Khuyagbaatar Batsuren, Kaiser Sun, Koustuv Sinha, Leila Khalatbari, Maria Ryskina, Rita Frieske, Ryan Cotterell, Zhijing Jin

Figure 1 for State-of-the-art generalisation research in NLP: a taxonomy and review
Figure 2 for State-of-the-art generalisation research in NLP: a taxonomy and review
Figure 3 for State-of-the-art generalisation research in NLP: a taxonomy and review
Figure 4 for State-of-the-art generalisation research in NLP: a taxonomy and review

The ability to generalise well is one of the primary desiderata of natural language processing (NLP). Yet, what `good generalisation' entails and how it should be evaluated is not well understood, nor are there any common standards to evaluate it. In this paper, we aim to lay the ground-work to improve both of these issues. We present a taxonomy for characterising and understanding generalisation research in NLP, we use that taxonomy to present a comprehensive map of published generalisation studies, and we make recommendations for which areas might deserve attention in the future. Our taxonomy is based on an extensive literature review of generalisation research, and contains five axes along which studies can differ: their main motivation, the type of generalisation they aim to solve, the type of data shift they consider, the source by which this data shift is obtained, and the locus of the shift within the modelling pipeline. We use our taxonomy to classify over 400 previous papers that test generalisation, for a total of more than 600 individual experiments. Considering the results of this review, we present an in-depth analysis of the current state of generalisation research in NLP, and make recommendations for the future. Along with this paper, we release a webpage where the results of our review can be dynamically explored, and which we intend to up-date as new NLP generalisation studies are published. With this work, we aim to make steps towards making state-of-the-art generalisation testing the new status quo in NLP.

* 35 pages of content + 53 pages of references 
Viaarxiv icon

ReFinED: An Efficient Zero-shot-capable Approach to End-to-End Entity Linking

Jul 08, 2022
Tom Ayoola, Shubhi Tyagi, Joseph Fisher, Christos Christodoulopoulos, Andrea Pierleoni

Figure 1 for ReFinED: An Efficient Zero-shot-capable Approach to End-to-End Entity Linking
Figure 2 for ReFinED: An Efficient Zero-shot-capable Approach to End-to-End Entity Linking
Figure 3 for ReFinED: An Efficient Zero-shot-capable Approach to End-to-End Entity Linking
Figure 4 for ReFinED: An Efficient Zero-shot-capable Approach to End-to-End Entity Linking

We introduce ReFinED, an efficient end-to-end entity linking model which uses fine-grained entity types and entity descriptions to perform linking. The model performs mention detection, fine-grained entity typing, and entity disambiguation for all mentions within a document in a single forward pass, making it more than 60 times faster than competitive existing approaches. ReFinED also surpasses state-of-the-art performance on standard entity linking datasets by an average of 3.7 F1. The model is capable of generalising to large-scale knowledge bases such as Wikidata (which has 15 times more entities than Wikipedia) and of zero-shot entity linking. The combination of speed, accuracy and scale makes ReFinED an effective and cost-efficient system for extracting entities from web-scale datasets, for which the model has been successfully deployed. Our code and pre-trained models are available at https://github.com/alexa/ReFinED

* Accepted at NAACL Industry Track 2022 
Viaarxiv icon

Robust Information Retrieval for False Claims with Distracting Entities In Fact Extraction and Verification

Dec 10, 2021
Mingwen Dong, Christos Christodoulopoulos, Sheng-Min Shih, Xiaofei Ma

Figure 1 for Robust Information Retrieval for False Claims with Distracting Entities In Fact Extraction and Verification
Figure 2 for Robust Information Retrieval for False Claims with Distracting Entities In Fact Extraction and Verification
Figure 3 for Robust Information Retrieval for False Claims with Distracting Entities In Fact Extraction and Verification
Figure 4 for Robust Information Retrieval for False Claims with Distracting Entities In Fact Extraction and Verification

Accurate evidence retrieval is essential for automated fact checking. Little previous research has focused on the differences between true and false claims and how they affect evidence retrieval. This paper shows that, compared with true claims, false claims more frequently contain irrelevant entities which can distract evidence retrieval model. A BERT-based retrieval model made more mistakes in retrieving refuting evidence for false claims than supporting evidence for true claims. When tested with adversarial false claims (synthetically generated) containing irrelevant entities, the recall of the retrieval model is significantly lower than that for original claims. These results suggest that the vanilla BERT-based retrieval model is not robust to irrelevant entities in the false claims. By augmenting the training data with synthetic false claims containing irrelevant entities, the trained model achieved higher evidence recall, including that of false claims with irrelevant entities. In addition, using separate models to retrieve refuting and supporting evidence and then aggregating them can also increase the evidence recall, including that of false claims with irrelevant entities. These results suggest that we can increase the BERT-based retrieval model's robustness to false claims with irrelevant entities via data augmentation and model ensemble.

Viaarxiv icon

FEVEROUS: Fact Extraction and VERification Over Unstructured and Structured information

Jun 10, 2021
Rami Aly, Zhijiang Guo, Michael Schlichtkrull, James Thorne, Andreas Vlachos, Christos Christodoulopoulos, Oana Cocarascu, Arpit Mittal

Figure 1 for FEVEROUS: Fact Extraction and VERification Over Unstructured and Structured information
Figure 2 for FEVEROUS: Fact Extraction and VERification Over Unstructured and Structured information
Figure 3 for FEVEROUS: Fact Extraction and VERification Over Unstructured and Structured information
Figure 4 for FEVEROUS: Fact Extraction and VERification Over Unstructured and Structured information

Fact verification has attracted a lot of attention in the machine learning and natural language processing communities, as it is one of the key methods for detecting misinformation. Existing large-scale benchmarks for this task have focused mostly on textual sources, i.e. unstructured information, and thus ignored the wealth of information available in structured formats, such as tables. In this paper we introduce a novel dataset and benchmark, Fact Extraction and VERification Over Unstructured and Structured information (FEVEROUS), which consists of 87,026 verified claims. Each claim is annotated with evidence in the form of sentences and/or cells from tables in Wikipedia, as well as a label indicating whether this evidence supports, refutes, or does not provide enough information to reach a verdict. Furthermore, we detail our efforts to track and minimize the biases present in the dataset and could be exploited by models, e.g. being able to predict the label without using evidence. Finally, we develop a baseline for verifying claims against text and tables which predicts both the correct evidence and verdict for 18% of the claims.

Viaarxiv icon

Hidden Biases in Unreliable News Detection Datasets

Apr 20, 2021
Xiang Zhou, Heba Elfardy, Christos Christodoulopoulos, Thomas Butler, Mohit Bansal

Figure 1 for Hidden Biases in Unreliable News Detection Datasets
Figure 2 for Hidden Biases in Unreliable News Detection Datasets
Figure 3 for Hidden Biases in Unreliable News Detection Datasets
Figure 4 for Hidden Biases in Unreliable News Detection Datasets

Automatic unreliable news detection is a research problem with great potential impact. Recently, several papers have shown promising results on large-scale news datasets with models that only use the article itself without resorting to any fact-checking mechanism or retrieving any supporting evidence. In this work, we take a closer look at these datasets. While they all provide valuable resources for future research, we observe a number of problems that may lead to results that do not generalize in more realistic settings. Specifically, we show that selection bias during data collection leads to undesired artifacts in the datasets. In addition, while most systems train and predict at the level of individual articles, overlapping article sources in the training and evaluation data can provide a strong confounding factor that models can exploit. In the presence of this confounding factor, the models can achieve good performance by directly memorizing the site-label mapping instead of modeling the real task of unreliable news detection. We observed a significant drop (>10%) in accuracy for all models tested in a clean split with no train/test source overlap. Using the observations and experimental results, we provide practical suggestions on how to create more reliable datasets for the unreliable news detection task. We suggest future dataset creation include a simple model as a difficulty/bias probe and future model development use a clean non-overlapping site and date split.

* EACL 2021 (11 pages, 3 figures, 8 tables) 
Viaarxiv icon

Generating Token-Level Explanations for Natural Language Inference

Apr 24, 2019
James Thorne, Andreas Vlachos, Christos Christodoulopoulos, Arpit Mittal

Figure 1 for Generating Token-Level Explanations for Natural Language Inference

The task of Natural Language Inference (NLI) is widely modeled as supervised sentence pair classification. While there has been a lot of work recently on generating explanations of the predictions of classifiers on a single piece of text, there have been no attempts to generate explanations of classifiers operating on pairs of sentences. In this paper, we show that it is possible to generate token-level explanations for NLI without the need for training data explicitly annotated for this purpose. We use a simple LSTM architecture and evaluate both LIME and Anchor explanations for this task. We compare these to a Multiple Instance Learning (MIL) method that uses thresholded attention make token-level predictions. The approach we present in this paper is a novel extension of zero-shot single-sentence tagging to sentence pairs for NLI. We conduct our experiments on the well-studied SNLI dataset that was recently augmented with manually annotation of the tokens that explain the entailment relation. We find that our white-box MIL-based method, while orders of magnitude faster, does not reach the same accuracy as the black-box methods.

* Accepted at NAACL2019 
Viaarxiv icon

The Fact Extraction and VERification (FEVER) Shared Task

Nov 30, 2018
James Thorne, Andreas Vlachos, Oana Cocarascu, Christos Christodoulopoulos, Arpit Mittal

Figure 1 for The Fact Extraction and VERification (FEVER) Shared Task
Figure 2 for The Fact Extraction and VERification (FEVER) Shared Task
Figure 3 for The Fact Extraction and VERification (FEVER) Shared Task

We present the results of the first Fact Extraction and VERification (FEVER) Shared Task. The task challenged participants to classify whether human-written factoid claims could be Supported or Refuted using evidence retrieved from Wikipedia. We received entries from 23 competing teams, 19 of which scored higher than the previously published baseline. The best performing system achieved a FEVER score of 64.21%. In this paper, we present the results of the shared task and a summary of the systems, highlighting commonalities and innovations among participating systems.

* Revised from published version in the proceedings of the FEVER workshop at EMNLP 2018 
Viaarxiv icon

FEVER: a large-scale dataset for Fact Extraction and VERification

Apr 16, 2018
James Thorne, Andreas Vlachos, Christos Christodoulopoulos, Arpit Mittal

Figure 1 for FEVER: a large-scale dataset for Fact Extraction and VERification
Figure 2 for FEVER: a large-scale dataset for Fact Extraction and VERification
Figure 3 for FEVER: a large-scale dataset for Fact Extraction and VERification
Figure 4 for FEVER: a large-scale dataset for Fact Extraction and VERification

In this paper we introduce a new publicly available dataset for verification against textual sources, FEVER: Fact Extraction and VERification. It consists of 185,445 claims generated by altering sentences extracted from Wikipedia and subsequently verified without knowledge of the sentence they were derived from. The claims are classified as Supported, Refuted or NotEnoughInfo by annotators achieving 0.6841 in Fleiss $\kappa$. For the first two classes, the annotators also recorded the sentence(s) forming the necessary evidence for their judgment. To characterize the challenge of the dataset presented, we develop a pipeline approach and compare it to suitably designed oracles. The best accuracy we achieve on labeling a claim accompanied by the correct evidence is 31.87%, while if we ignore the evidence we achieve 50.91%. Thus we believe that FEVER is a challenging testbed that will help stimulate progress on claim verification against textual sources.

* Camera ready version to appear at NAACL2018. Data is released on http://fever.ai 
Viaarxiv icon

Simple Large-scale Relation Extraction from Unstructured Text

Mar 24, 2018
Christos Christodoulopoulos, Arpit Mittal

Figure 1 for Simple Large-scale Relation Extraction from Unstructured Text
Figure 2 for Simple Large-scale Relation Extraction from Unstructured Text
Figure 3 for Simple Large-scale Relation Extraction from Unstructured Text
Figure 4 for Simple Large-scale Relation Extraction from Unstructured Text

Knowledge-based question answering relies on the availability of facts, the majority of which cannot be found in structured sources (e.g. Wikipedia info-boxes, Wikidata). One of the major components of extracting facts from unstructured text is Relation Extraction (RE). In this paper we propose a novel method for creating distant (weak) supervision labels for training a large-scale RE system. We also provide new evidence about the effectiveness of neural network approaches by decoupling the model architecture from the feature design of a state-of-the-art neural network system. Surprisingly, a much simpler classifier trained on similar features performs on par with the highly complex neural network system (at 75x reduction to the training time), suggesting that the features are a bigger contributor to the final performance.

* To be published in LREC 2018 
Viaarxiv icon