Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Information Extraction": models, code, and papers

Information Extraction Tool Text2ALM: From Narratives to Action Language System Descriptions

Sep 18, 2019
Craig Olson, Yuliya Lierler

In this work we design a narrative understanding tool Text2ALM. This tool uses an action language ALM to perform inferences on complex interactions of events described in narratives. The methodology used to implement the Text2ALM system was originally outlined by Lierler, Inclezan, and Gelfond (2017) via a manual process of converting a narrative to an ALM model. It relies on a conglomeration of resources and techniques from two distinct fields of artificial intelligence, namely, natural language processing and knowledge representation and reasoning. The effectiveness of system Text2ALM is measured by its ability to correctly answer questions from the bAbI tasks published by Facebook Research in 2015. This tool matched or exceeded the performance of state-of-the-art machine learning methods in six of the seven tested tasks. We also illustrate that the Text2ALM approach generalizes to a broader spectrum of narratives.

* EPTCS 306, 2019, pp. 87-100 
* In Proceedings ICLP 2019, arXiv:1909.07646 

  Access Paper or Ask Questions

Toward Socially-Infused Information Extraction: Embedding Authors, Mentions, and Entities

Sep 26, 2016
Yi Yang, Ming-Wei Chang, Jacob Eisenstein

Entity linking is the task of identifying mentions of entities in text, and linking them to entries in a knowledge base. This task is especially difficult in microblogs, as there is little additional text to provide disambiguating context; rather, authors rely on an implicit common ground of shared knowledge with their readers. In this paper, we attempt to capture some of this implicit context by exploiting the social network structure in microblogs. We build on the theory of homophily, which implies that socially linked individuals share interests, and are therefore likely to mention the same sorts of entities. We implement this idea by encoding authors, mentions, and entities in a continuous vector space, which is constructed so that socially-connected authors have similar vector representations. These vectors are incorporated into a neural structured prediction model, which captures structural constraints that are inherent in the entity linking task. Together, these design decisions yield F1 improvements of 1%-5% on benchmark datasets, as compared to the previous state-of-the-art.

* Accepted to EMNLP 2016 

  Access Paper or Ask Questions

Robust Information Retrieval for False Claims with Distracting Entities In Fact Extraction and Verification

Dec 10, 2021
Mingwen Dong, Christos Christodoulopoulos, Sheng-Min Shih, Xiaofei Ma

Accurate evidence retrieval is essential for automated fact checking. Little previous research has focused on the differences between true and false claims and how they affect evidence retrieval. This paper shows that, compared with true claims, false claims more frequently contain irrelevant entities which can distract evidence retrieval model. A BERT-based retrieval model made more mistakes in retrieving refuting evidence for false claims than supporting evidence for true claims. When tested with adversarial false claims (synthetically generated) containing irrelevant entities, the recall of the retrieval model is significantly lower than that for original claims. These results suggest that the vanilla BERT-based retrieval model is not robust to irrelevant entities in the false claims. By augmenting the training data with synthetic false claims containing irrelevant entities, the trained model achieved higher evidence recall, including that of false claims with irrelevant entities. In addition, using separate models to retrieve refuting and supporting evidence and then aggregating them can also increase the evidence recall, including that of false claims with irrelevant entities. These results suggest that we can increase the BERT-based retrieval model's robustness to false claims with irrelevant entities via data augmentation and model ensemble.


  Access Paper or Ask Questions

XCI-Sketch: Extraction of Color Information from Images for Generation of Colored Outlines and Sketches

Aug 26, 2021
Harsh Rathod, Manisimha Varma, Parna Chowdhury, Sameer Saxena, V Manushree, Ankita Ghosh, Sahil Khose

Sketches are a medium to convey a visual scene from an individual's creative perspective. The addition of color substantially enhances the overall expressivity of a sketch. This paper proposes two methods to mimic human-drawn colored sketches by utilizing the Contour Drawing Dataset. Our first approach renders colored outline sketches by applying image processing techniques aided by k-means color clustering. The second method uses a generative adversarial network to develop a model that can generate colored sketches from previously unobserved images. We assess the results obtained through quantitative and qualitative evaluations.


  Access Paper or Ask Questions

Generating Information Extraction Patterns from Overlapping and Variable Length Annotations using Sequence Alignment

Aug 09, 2019
Frank Meng, Craig A. Morioka, Danne C. Elbers

Sequence alignments are used to capture patterns composed of elements representing multiple conceptual levels through the alignment of sequences that contain overlapping and variable length annotations. The alignments also determine the proper context window of words and phrases that most directly impact the meaning of a given target within a sentence, eliminating the need to predefine a fixed context window of words surrounding the targets. We evaluated the system using the CoNLL-2003 named entity recognition (NER) task.


  Access Paper or Ask Questions

COVID-Fact: Fact Extraction and Verification of Real-World Claims on COVID-19 Pandemic

Jun 07, 2021
Arkadiy Saakyan, Tuhin Chakrabarty, Smaranda Muresan

We introduce a FEVER-like dataset COVID-Fact of $4,086$ claims concerning the COVID-19 pandemic. The dataset contains claims, evidence for the claims, and contradictory claims refuted by the evidence. Unlike previous approaches, we automatically detect true claims and their source articles and then generate counter-claims using automatic methods rather than employing human annotators. Along with our constructed resource, we formally present the task of identifying relevant evidence for the claims and verifying whether the evidence refutes or supports a given claim. In addition to scientific claims, our data contains simplified general claims from media sources, making it better suited for detecting general misinformation regarding COVID-19. Our experiments indicate that COVID-Fact will provide a challenging testbed for the development of new systems and our approach will reduce the costs of building domain-specific datasets for detecting misinformation.

* ACL 2021 Camera Ready 

  Access Paper or Ask Questions

Towards Incorporating Entity-specific Knowledge Graph Information in Predicting Drug-Drug Interactions

Dec 21, 2020
Ishani Mondal

Off-the-shelf biomedical embeddings obtained from the recently released various pre-trained language models (such as BERT, XLNET) have demonstrated state-of-the-art results (in terms of accuracy) for the various natural language understanding tasks (NLU) in the biomedical domain. Relation Classification (RC) falls into one of the most critical tasks. In this paper, we explore how to incorporate domain knowledge of the biomedical entities (such as drug, disease, genes), obtained from Knowledge Graph (KG) Embeddings, for predicting Drug-Drug Interaction from textual corpus. We propose a new method, BERTKG-DDI, to combine drug embeddings obtained from its interaction with other biomedical entities along with domain-specific BioBERT embedding-based RC architecture. Experiments conducted on the DDIExtraction 2013 corpus clearly indicate that this strategy improves other baselines architectures by 4.1% macro F1-score.


  Access Paper or Ask Questions

<<
296
297
298
299
300
301
302