Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Information Extraction": models, code, and papers

Deep covariate-learning: optimising information extraction from terrain texture for geostatistical modelling applications

May 22, 2020
Charlie Kirkwood

Where data is available, it is desirable in geostatistical modelling to make use of additional covariates, for example terrain data, in order to improve prediction accuracy in the modelling task. While elevation itself may be important, additional explanatory power for any given problem can be sought (but not necessarily found) by filtering digital elevation models to extact higher-order derivatives such as slope angles, curvatures, and roughness. In essence, it would be beneficial to extract as much task-relevant information as possible from the elevation grid. However, given the complexities of the natural world, chance dictates that the use of 'off-the-shelf' filters is unlikely to derive covariates that provide strong explanatory power to the target variable at hand, and any attempt to manually design informative covariates is likely to be a trial-and-error process -- not optimal. In this paper we present a solution to this problem in the form of a deep learning approach to automatically deriving optimal task-specific terrain texture covariates from a standard SRTM 90m gridded digital elevation model (DEM). For our target variables we use point-sampled geochemical data from the British Geological Survey: concentrations of potassium, calcium and arsenic in stream sediments. We find that our deep learning approach produces covariates for geostatistical modelling that have surprisingly strong explanatory power on their own, with R-squared values around 0.6 for all three elements (with arsenic on the log scale). These results are achieved without the neural network being provided with easting, northing, or absolute elevation as inputs, and purely reflect the capacity of our deep neural network to extract task-specific information from terrain texture. We hope that these results will inspire further investigation into the capabilities of deep learning within geostatistical applications.

* 13 pages, 8 figures, to be submitted to journal 

Dual Stream Computer-Generated Image Detection Network Based On Channel Joint And Softpool

Jul 07, 2022
Ziyi Xi, Hao Lin, Weiqi Luo

With the development of computer graphics technology, the images synthesized by computer software become more and more closer to the photographs. While computer graphics technology brings us a grand visual feast in the field of games and movies, it may also be utilized by someone with bad intentions to guide public opinions and cause political crisis or social unrest. Therefore, how to distinguish the computer-generated graphics (CG) from the photographs (PG) has become an important topic in the field of digital image forensics. This paper proposes a dual stream convolutional neural network based on channel joint and softpool. The proposed network architecture includes a residual module for extracting image noise information and a joint channel information extraction module for capturing the shallow semantic information of image. In addition, we also design a residual structure to enhance feature extraction and reduce the loss of information in residual flow. The joint channel information extraction module can obtain the shallow semantic information of the input image which can be used as the information supplement block of the residual module. The whole network uses SoftPool to reduce the information loss of down-sampling for image. Finally, we fuse the two flows to get the classification results. Experiments on SPL2018 and DsTok show that the proposed method outperforms existing methods, especially on the DsTok dataset. For example, the performance of our model surpasses the state-of-the-art by a large margin of 3%.

* 7 pages, 4 figures 

Data Mining of Causal Relations from Text: Analysing Maritime Accident Investigation Reports

Jul 09, 2015
Santosh Tirunagari

Text mining is a process of extracting information of interest from text. Such a method includes techniques from various areas such as Information Retrieval (IR), Natural Language Processing (NLP), and Information Extraction (IE). In this study, text mining methods are applied to extract causal relations from maritime accident investigation reports collected from the Marine Accident Investigation Branch (MAIB). These causal relations provide information on various mechanisms behind accidents, including human and organizational factors relating to the accident. The objective of this study is to facilitate the analysis of the maritime accident investigation reports, by means of extracting contributory causes with more feasibility. A careful investigation of contributory causes from the reports provide opportunity to improve safety in future. Two methods have been employed in this study to extract the causal relations. They are 1) Pattern classification method and 2) Connectives method. The earlier one uses naive Bayes and Support Vector Machines (SVM) as classifiers. The latter simply searches for the words connecting cause and effect in sentences. The causal patterns extracted using these two methods are compared to the manual (human expert) extraction. The pattern classification method showed a fair and sensible performance with F-measure(average) = 65% when compared to connectives method with F-measure(average) = 58%. This study is an evidence, that text mining methods could be employed in extracting causal relations from marine accident investigation reports.


Clinical Relationships Extraction Techniques from Patient Narratives

Jun 21, 2013
Wafaa Tawfik Abdel-moneim, Mohamed Hashem Abdel-Aziz, Mohamed Monier Hassan

The Clinical E-Science Framework (CLEF) project was used to extract important information from medical texts by building a system for the purpose of clinical research, evidence-based healthcare and genotype-meets-phenotype informatics. The system is divided into two parts, one part concerns with the identification of relationships between clinically important entities in the text. The full parses and domain-specific grammars had been used to apply many approaches to extract the relationship. In the second part of the system, statistical machine learning (ML) approaches are applied to extract relationship. A corpus of oncology narratives that hand annotated with clinical relationships can be used to train and test a system that has been designed and implemented by supervised machine learning (ML) approaches. Many features can be extracted from these texts that are used to build a model by the classifier. Multiple supervised machine learning algorithms can be applied for relationship extraction. Effects of adding the features, changing the size of the corpus, and changing the type of the algorithm on relationship extraction are examined. Keywords: Text mining; information extraction; NLP; entities; and relations.

* IJCSI International Journal of Computer Science Issues, Vol.10, Issue 1, January 2013 
* 15 pages 13 figures 7 tables 

Method of Tibetan Person Knowledge Extraction

Apr 11, 2016
Yuan Sun, Zhen Zhu

Person knowledge extraction is the foundation of the Tibetan knowledge graph construction, which provides support for Tibetan question answering system, information retrieval, information extraction and other researches, and promotes national unity and social stability. This paper proposes a SVM and template based approach to Tibetan person knowledge extraction. Through constructing the training corpus, we build the templates based the shallow parsing analysis of Tibetan syntactic, semantic features and verbs. Using the training corpus, we design a hierarchical SVM classifier to realize the entity knowledge extraction. Finally, experimental results prove the method has greater improvement in Tibetan person knowledge extraction.

* 6 pages 

Extraction Of Technical Information From Normative Documents Using Automated Methods Based On Ontologies : Application To The Iso 15531 Mandate Standard - Methodology And First Results

Jun 16, 2018
A. F. Cutting-Decelle, A. Digeon, R. I. Young, J. L. Barraud, P. Lamboley

Problems faced by international standardization bodies become more and more crucial as the number and the size of the standards they produce increase. Sometimes, also, the lack of coordination among the committees in charge of the development of standards may lead to overlaps, mistakes or incompatibilities in the documents. The aim of this study is to present a methodology enabling an automatic extraction of the technical concepts (terms) found in normative documents, through the use of semantic tools coming from the field of language processing. The first part of the paper provides a description of the standardization world, its structure, its way of working and the problems faced; we then introduce the concepts of semantic annotation, information extraction and the software tools available in this domain. The next section explains the concept of ontology and its potential use in the field of standardization. We propose here a methodology enabling the extraction of technical information from a given normative corpus, based on a semantic annotation process done according to reference ontologies. The application to the ISO 15531 MANDATE corpus provides a first use case of the methodology described in this paper. The paper ends with the description of the first experimental results produced by this approach, and with some issues and perspectives, notably its application to other standards and, or Technical Committees and the possibility offered to create pre-defined technical dictionaries of terms.

* 28 pages, 11 figures 

Noun-Phrase Analysis in Unrestricted Text for Information Retrieval

May 13, 1996
David A. Evans, Chengxiang Zhai

Information retrieval is an important application area of natural-language processing where one encounters the genuine challenge of processing large quantities of unrestricted natural-language text. This paper reports on the application of a few simple, yet robust and efficient noun-phrase analysis techniques to create better indexing phrases for information retrieval. In particular, we describe a hybrid approach to the extraction of meaningful (continuous or discontinuous) subcompounds from complex noun phrases using both corpus statistics and linguistic heuristics. Results of experiments show that indexing based on such extracted subcompounds improves both recall and precision in an information retrieval system. The noun-phrase analysis techniques are also potentially useful for book indexing and automatic thesaurus extraction.

* Proceedings of the 34th Annual Meeting of Association for Computational Linguistics, Santa Cruz, California, June 24-28, 1996. 17-24. 
* 8 pages, gzipped, uuencoded Postscript file, to appear in ACL'96 

DeepKE: A Deep Learning Based Knowledge Extraction Toolkit for Knowledge Base Population

Jan 24, 2022
Ningyu Zhang, Xin Xu, Liankuan Tao, Haiyang Yu, Hongbin Ye, Xin Xie, Xiang Chen, Zhoubo Li, Lei Li, Xiaozhuan Liang, Yunzhi Yao, Shumin Deng, Wen Zhang, Zhenru Zhang, Chuanqi Tan, Fei Huang, Guozhou Zheng, Huajun Chen

We present a new open-source and extensible knowledge extraction toolkit, called DeepKE (Deep learning based Knowledge Extraction), supporting standard fully supervised, low-resource few-shot and document-level scenarios. DeepKE implements various information extraction tasks, including named entity recognition, relation extraction and attribute extraction. With a unified framework, DeepKE allows developers and researchers to customize datasets and models to extract information from unstructured texts according to their requirements. Specifically, DeepKE not only provides various functional modules and model implementation for different tasks and scenarios but also organizes all components by consistent frameworks to maintain sufficient modularity and extensibility. Besides, we present an online platform in for real-time extraction of various tasks. DeepKE has been equipped with Google Colab tutorials and comprehensive documents for beginners. We release the source code at, with a demo video.

* work in progress 

CompactIE: Compact Facts in Open Information Extraction

May 05, 2022
Farima Fatahi Bayat, Nikita Bhutani, H. V. Jagadish

A major drawback of modern neural OpenIE systems and benchmarks is that they prioritize high coverage of information in extractions over compactness of their constituents. This severely limits the usefulness of OpenIE extractions in many downstream tasks. The utility of extractions can be improved if extractions are compact and share constituents. To this end, we study the problem of identifying compact extractions with neural-based methods. We propose CompactIE, an OpenIE system that uses a novel pipelined approach to produce compact extractions with overlapping constituents. It first detects constituents of the extractions and then links them to build extractions. We train our system on compact extractions obtained by processing existing benchmarks. Our experiments on CaRB and Wire57 datasets indicate that CompactIE finds 1.5x-2x more compact extractions than previous systems, with high precision, establishing a new state-of-the-art performance in OpenIE.


Few-Shot Segmentation with Global and Local Contrastive Learning

Aug 11, 2021
Weide Liu, Zhonghua Wu, Henghui Ding, Fayao Liu, Jie Lin, Guosheng Lin

In this work, we address the challenging task of few-shot segmentation. Previous few-shot segmentation methods mainly employ the information of support images as guidance for query image segmentation. Although some works propose to build cross-reference between support and query images, their extraction of query information still depends on the support images. We here propose to extract the information from the query itself independently to benefit the few-shot segmentation task. To this end, we first propose a prior extractor to learn the query information from the unlabeled images with our proposed global-local contrastive learning. Then, we extract a set of predetermined priors via this prior extractor. With the obtained priors, we generate the prior region maps for query images, which locate the objects, as guidance to perform cross interaction with support features. In such a way, the extraction of query information is detached from the support branch, overcoming the limitation by support, and could obtain more informative query clues to achieve better interaction. Without bells and whistles, the proposed approach achieves new state-of-the-art performance for the few-shot segmentation task on PASCAL-5$^{i}$ and COCO datasets.