Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Information Extraction": models, code, and papers

Multimodal Approach for Metadata Extraction from German Scientific Publications

Nov 10, 2021
Azeddine Bouabdallah, Jorge Gavilan, Jennifer Gerbl, Prayuth Patumcharoenpol

Nowadays, metadata information is often given by the authors themselves upon submission. However, a significant part of already existing research papers have missing or incomplete metadata information. German scientific papers come in a large variety of layouts which makes the extraction of metadata a non-trivial task that requires a precise way to classify the metadata extracted from the documents. In this paper, we propose a multimodal deep learning approach for metadata extraction from scientific papers in the German language. We consider multiple types of input data by combining natural language processing and image vision processing. This model aims to increase the overall accuracy of metadata extraction compared to other state-of-the-art approaches. It enables the utilization of both spatial and contextual features in order to achieve a more reliable extraction. Our model for this approach was trained on a dataset consisting of around 8800 documents and is able to obtain an overall F1-score of 0.923.

* 8 pages, 5 figures, 4 tables 
Access Paper or Ask Questions

RadGraph: Extracting Clinical Entities and Relations from Radiology Reports

Jun 28, 2021
Saahil Jain, Ashwin Agrawal, Adriel Saporta, Steven QH Truong, Du Nguyen Duong, Tan Bui, Pierre Chambon, Yuhao Zhang, Matthew P. Lungren, Andrew Y. Ng, Curtis P. Langlotz, Pranav Rajpurkar

Extracting structured clinical information from free-text radiology reports can enable the use of radiology report information for a variety of critical healthcare applications. In our work, we present RadGraph, a dataset of entities and relations in full-text chest X-ray radiology reports based on a novel information extraction schema we designed to structure radiology reports. We release a development dataset, which contains board-certified radiologist annotations for 500 radiology reports from the MIMIC-CXR dataset (14,579 entities and 10,889 relations), and a test dataset, which contains two independent sets of board-certified radiologist annotations for 100 radiology reports split equally across the MIMIC-CXR and CheXpert datasets. Using these datasets, we train and test a deep learning model, RadGraph Benchmark, that achieves a micro F1 of 0.82 and 0.73 on relation extraction on the MIMIC-CXR and CheXpert test sets respectively. Additionally, we release an inference dataset, which contains annotations automatically generated by RadGraph Benchmark across 220,763 MIMIC-CXR reports (around 6 million entities and 4 million relations) and 500 CheXpert reports (13,783 entities and 9,908 relations) with mappings to associated chest radiographs. Our freely available dataset can facilitate a wide range of research in medical natural language processing, as well as computer vision and multi-modal learning when linked to chest radiographs.

Access Paper or Ask Questions

Coarse-to-Fine Entity Representations for Document-level Relation Extraction

Dec 04, 2020
Damai Dai, Jing Ren, Shuang Zeng, Baobao Chang, Zhifang Sui

Document-level Relation Extraction (RE) requires extracting relations expressed within and across sentences. Recent works show that graph-based methods, usually constructing a document-level graph that captures document-aware interactions, can obtain useful entity representations thus helping tackle document-level RE. These methods either focus more on the entire graph, or pay more attention to a part of the graph, e.g., paths between the target entity pair. However, we find that document-level RE may benefit from focusing on both of them simultaneously. Therefore, to obtain more comprehensive entity representations, we propose the \textbf{C}oarse-to-\textbf{F}ine \textbf{E}ntity \textbf{R}epresentation model (\textbf{CFER}) that adopts a coarse-to-fine strategy involving two phases. First, CFER uses graph neural networks to integrate global information in the entire graph at a coarse level. Next, CFER utilizes the global information as a guidance to selectively aggregate path information between the target entity pair at a fine level. In classification, we combine the entity representations from both two levels into more comprehensive representations for relation extraction. Experimental results on a large-scale document-level RE dataset show that CFER achieves better performance than previous baseline models. Further, we verify the effectiveness of our strategy through elaborate model analysis.

Access Paper or Ask Questions

Speaker activity driven neural speech extraction

Jan 14, 2021
Marc Delcroix, Katerina Zmolikova, Tsubasa Ochiai, Keisuke Kinoshita, Tomohiro Nakatani

Target speech extraction, which extracts the speech of a target speaker in a mixture given auxiliary speaker clues, has recently received increased interest. Various clues have been investigated such as pre-recorded enrollment utterances, direction information, or video of the target speaker. In this paper, we explore the use of speaker activity information as an auxiliary clue for single-channel neural network-based speech extraction. We propose a speaker activity driven speech extraction neural network (ADEnet) and show that it can achieve performance levels competitive with enrollment-based approaches, without the need for pre-recordings. We further demonstrate the potential of the proposed approach for processing meeting-like recordings, where speaker activity obtained from a diarization system is used as a speaker clue for ADEnet. We show that this simple yet practical approach can successfully extract speakers after diarization, which leads to improved ASR performance when using a single microphone, especially in high overlapping conditions, with a relative word error rate reduction of up to 25 %.

* Submitted to ICASSP'21 
Access Paper or Ask Questions

Event Coreference Resolution via a Multi-loss Neural Network without Using Argument Information

Sep 22, 2020
Xinyu Zuo, Yubo Chen, Kang Liu, Jun Zhao

Event coreference resolution(ECR) is an important task in Natural Language Processing (NLP) and nearly all the existing approaches to this task rely on event argument information. However, these methods tend to suffer from error propagation from the stage of event argument extraction. Besides, not every event mention contains all arguments of an event, and argument information may confuse the model that events have arguments to detect event coreference in real text. Furthermore, the context information of an event is useful to infer the coreference between events. Thus, in order to reduce the errors propagated from event argument extraction and use context information effectively, we propose a multi-loss neural network model that does not need any argument information to do the within-document event coreference resolution task and achieve a significant performance than the state-of-the-art methods.

* SCIENCE CHINA Information Sciences, Volume 62, Issue 11:212101(2019) 
* Published on SCIENCE CHINA Information Sciences 
Access Paper or Ask Questions

Building Extraction at Scale using Convolutional Neural Network: Mapping of the United States

May 23, 2018
Hsiuhan Lexie Yang, Jiangye Yuan, Dalton Lunga, Melanie Laverdiere, Amy Rose, Budhendra Bhaduri

Establishing up-to-date large scale building maps is essential to understand urban dynamics, such as estimating population, urban planning and many other applications. Although many computer vision tasks has been successfully carried out with deep convolutional neural networks, there is a growing need to understand their large scale impact on building mapping with remote sensing imagery. Taking advantage of the scalability of CNNs and using only few areas with the abundance of building footprints, for the first time we conduct a comparative analysis of four state-of-the-art CNNs for extracting building footprints across the entire continental United States. The four CNN architectures namely: branch-out CNN, fully convolutional neural network (FCN), conditional random field as recurrent neural network (CRFasRNN), and SegNet, support semantic pixel-wise labeling and focus on capturing textural information at multi-scale. We use 1-meter resolution aerial images from National Agriculture Imagery Program (NAIP) as the test-bed, and compare the extraction results across the four methods. In addition, we propose to combine signed-distance labels with SegNet, the preferred CNN architecture identified by our extensive evaluations, to advance building extraction results to instance level. We further demonstrate the usefulness of fusing additional near IR information into the building extraction framework. Large scale experimental evaluations are conducted and reported using metrics that include: precision, recall rate, intersection over union, and the number of buildings extracted. With the improved CNN model and no requirement of further post-processing, we have generated building maps for the United States. The quality of extracted buildings and processing time demonstrated the proposed CNN-based framework fits the need of building extraction at scale.

* Accepted by IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 
Access Paper or Ask Questions

A Hierarchical Approach for Joint Multi-view Object Pose Estimation and Categorization

Mar 04, 2015
Mete Ozay, Krzysztof Walas, Ales Leonardis

We propose a joint object pose estimation and categorization approach which extracts information about object poses and categories from the object parts and compositions constructed at different layers of a hierarchical object representation algorithm, namely Learned Hierarchy of Parts (LHOP). In the proposed approach, we first employ the LHOP to learn hierarchical part libraries which represent entity parts and compositions across different object categories and views. Then, we extract statistical and geometric features from the part realizations of the objects in the images in order to represent the information about object pose and category at each different layer of the hierarchy. Unlike the traditional approaches which consider specific layers of the hierarchies in order to extract information to perform specific tasks, we combine the information extracted at different layers to solve a joint object pose estimation and categorization problem using distributed optimization algorithms. We examine the proposed generative-discriminative learning approach and the algorithms on two benchmark 2-D multi-view image datasets. The proposed approach and the algorithms outperform state-of-the-art classification, regression and feature extraction algorithms. In addition, the experimental results shed light on the relationship between object categorization, pose estimation and the part realizations observed at different layers of the hierarchy.

* Proceedings of IEEE International Conference on Robotics and Automation (ICRA), pp. 5480 - 5487, Hong Kong, 2014 
* 7 Figures 
Access Paper or Ask Questions

Continuous-Time Relationship Prediction in Dynamic Heterogeneous Information Networks

Oct 08, 2018
Sina Sajadmanesh, Sogol Bazargani, Jiawei Zhang, Hamid R. Rabiee

Online social networks, World Wide Web, media and technological networks, and other types of so-called information networks are ubiquitous nowadays. These information networks are inherently heterogeneous and dynamic. They are heterogeneous as they consist of multi-typed objects and relations, and they are dynamic as they are constantly evolving over time. One of the challenging issues in such heterogeneous and dynamic environments is to forecast those relationships in the network that will appear in the future. In this paper, we try to solve the problem of continuous-time relationship prediction in dynamic and heterogeneous information networks. This implies predicting the time it takes for a relationship to appear in the future, given its features that have been extracted by considering both heterogeneity and temporal dynamics of the underlying network. To this end, we first introduce a feature extraction framework that combines the power of meta-path-based modeling and recurrent neural networks to effectively extract features suitable for relationship prediction regarding heterogeneity and dynamicity of the networks. Next, we propose a supervised non-parametric approach, called Non-Parametric Generalized Linear Model (NP-GLM), which infers the hidden underlying probability distribution of the relationship building time given its features. We then present a learning algorithm to train NP-GLM and an inference method to answer time-related queries. Extensive experiments conducted on synthetic data and three real-world datasets, namely Delicious, MovieLens, and DBLP, demonstrate the effectiveness of NP-GLM in solving continuous-time relationship prediction problem vis-a-vis competitive baselines

* Submitted to ACM Transactions on Knowledge Discovery from Data. arXiv admin note: text overlap with arXiv:1706.06783 
Access Paper or Ask Questions

A Probabilistic Model with Commonsense Constraints for Pattern-based Temporal Fact Extraction

Jun 11, 2020
Yang Zhou, Tong Zhao, Meng Jiang

Textual patterns (e.g., Country's president Person) are specified and/or generated for extracting factual information from unstructured data. Pattern-based information extraction methods have been recognized for their efficiency and transferability. However, not every pattern is reliable: A major challenge is to derive the most complete and accurate facts from diverse and sometimes conflicting extractions. In this work, we propose a probabilistic graphical model which formulates fact extraction in a generative process. It automatically infers true facts and pattern reliability without any supervision. It has two novel designs specially for temporal facts: (1) it models pattern reliability on two types of time signals, including temporal tag in text and text generation time; (2) it models commonsense constraints as observable variables. Experimental results demonstrate that our model significantly outperforms existing methods on extracting true temporal facts from news data.

* 7 pages, 1 figure 
Access Paper or Ask Questions

Auto-context Convolutional Neural Network (Auto-Net) for Brain Extraction in Magnetic Resonance Imaging

Jun 19, 2017
Seyed Sadegh Mohseni Salehi, Deniz Erdogmus, Ali Gholipour

Brain extraction or whole brain segmentation is an important first step in many of the neuroimage analysis pipelines. The accuracy and robustness of brain extraction, therefore, is crucial for the accuracy of the entire brain analysis process. With the aim of designing a learning-based, geometry-independent and registration-free brain extraction tool in this study, we present a technique based on an auto-context convolutional neural network (CNN), in which intrinsic local and global image features are learned through 2D patches of different window sizes. In this architecture three parallel 2D convolutional pathways for three different directions (axial, coronal, and sagittal) implicitly learn 3D image information without the need for computationally expensive 3D convolutions. Posterior probability maps generated by the network are used iteratively as context information along with the original image patches to learn the local shape and connectedness of the brain, to extract it from non-brain tissue. The brain extraction results we have obtained from our algorithm are superior to the recently reported results in the literature on two publicly available benchmark datasets, namely LPBA40 and OASIS, in which we obtained Dice overlap coefficients of 97.42% and 95.40%, respectively. Furthermore, we evaluated the performance of our algorithm in the challenging problem of extracting arbitrarily-oriented fetal brains in reconstructed fetal brain magnetic resonance imaging (MRI) datasets. In this application our algorithm performed much better than the other methods (Dice coefficient: 95.98%), where the other methods performed poorly due to the non-standard orientation and geometry of the fetal brain in MRI. Our CNN-based method can provide accurate, geometry-independent brain extraction in challenging applications.

* This manuscripts has been submitted to TMI 
Access Paper or Ask Questions