Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Information Extraction": models, code, and papers

Differentiable Supervector Extraction for Encoding Speaker and Phrase Information in Text Dependent Speaker Verification

Dec 22, 2018
Victoria Mingote, Antonio Miguel, Alfonso Ortega, Eduardo Lleida

In this paper, we propose a new differentiable neural network alignment mechanism for text-dependent speaker verification which uses alignment models to produce a supervector representation of an utterance. Unlike previous works with similar approaches, we do not extract the embedding of an utterance from the mean reduction of the temporal dimension. Our system replaces the mean by a phrase alignment model to keep the temporal structure of each phrase which is relevant in this application since the phonetic information is part of the identity in the verification task. Moreover, we can apply a convolutional neural network as front-end, and thanks to the alignment process being differentiable, we can train the whole network to produce a supervector for each utterance which will be discriminative with respect to the speaker and the phrase simultaneously. As we show, this choice has the advantage that the supervector encodes the phrase and speaker information providing good performance in text-dependent speaker verification tasks. In this work, the process of verification is performed using a basic similarity metric, due to simplicity, compared to other more elaborate models that are commonly used. The new model using alignment to produce supervectors was tested on the RSR2015-Part I database for text-dependent speaker verification, providing competitive results compared to similar size networks using the mean to extract embeddings.

* 5 pages, IberSPEECH 2018 
Access Paper or Ask Questions

Major Limitations of Satellite images

Jul 09, 2013
Firouz A. Al-Wassai, N. V. Kalyankar

Remote sensing has proven to be a powerful tool for the monitoring of the Earth surface to improve our perception of our surroundings has led to unprecedented developments in sensor and information technologies. However, technologies for effective use of the data and for extracting useful information from the data of Remote sensing are still very limited since no single sensor combines the optimal spectral, spatial and temporal resolution. This paper briefly reviews the limitations of satellite remote sensing. Also, reviews on the problems of image fusion techniques. The conclusion of this, According to literature, the remote sensing is still the lack of software tools for effective information extraction from remote sensing data. The trade-off in spectral and spatial resolution will remain and new advanced data fusion approaches are needed to make optimal use of remote sensors for extract the most useful information.

* Journal of Global Research in Computer Science, 4 (5), May 2013, 51-59 
Access Paper or Ask Questions

Mining Frequent Patterns in Process Models

Oct 11, 2017
David Chapela-Campa, Manuel Mucientes, Manuel Lama

Process mining has emerged as a way to analyze the behavior of an organization by extracting knowledge from event logs and by offering techniques to discover, monitor and enhance real processes. In the discovery of process models, retrieving a complex one, i.e., a hardly readable process model, can hinder the extraction of information. Even in well-structured process models, there is information that cannot be obtained with the current techniques. In this paper, we present WoMine, an algorithm to retrieve frequent behavioural patterns from the model. Our approach searches in process models extracting structures with sequences, selections, parallels and loops, which are frequently executed in the logs. This proposal has been validated with a set of process models, including some from BPI Challenges, and compared with the state of the art techniques. Experiments have validated that WoMine can find all types of patterns, extracting information that cannot be mined with the state of the art techniques.

Access Paper or Ask Questions

Webpage Segmentation for Extracting Images and Their Surrounding Contextual Information

May 18, 2020
F. Fauzi, H. J. Long, M. Belkhatir

Web images come in hand with valuable contextual information. Although this information has long been mined for various uses such as image annotation, clustering of images, inference of image semantic content, etc., insufficient attention has been given to address issues in mining this contextual information. In this paper, we propose a webpage segmentation algorithm targeting the extraction of web images and their contextual information based on their characteristics as they appear on webpages. We conducted a user study to obtain a human-labeled dataset to validate the effectiveness of our method and experiments demonstrated that our method can achieve better results compared to an existing segmentation algorithm.

* arXiv admin note: substantial text overlap with arXiv:2005.02156 
Access Paper or Ask Questions

An Effective Approach to Biomedical Information Extraction with Limited Training Data

Sep 12, 2011
Siddhartha Jonnalagadda

Overall, the two main contributions of this work include the application of sentence simplification to association extraction as described above, and the use of distributional semantics for concept extraction. The proposed work on concept extraction amalgamates for the first time two diverse research areas -distributional semantics and information extraction. This approach renders all the advantages offered in other semi-supervised machine learning systems, and, unlike other proposed semi-supervised approaches, it can be used on top of different basic frameworks and algorithms.

* Jonnalagadda S. An effective approach to biomedical information extraction with limited training data (PhD Dissertation, Arizona State University). 2011; 
* This paper has been withdrawn 
Access Paper or Ask Questions

A Real World Implementation of Answer Extraction

Jan 14, 2000
D. Molla, J. Berri, M. Hess

In this paper we describe ExtrAns, an answer extraction system. Answer extraction (AE) aims at retrieving those exact passages of a document that directly answer a given user question. AE is more ambitious than information retrieval and information extraction in that the retrieval results are phrases, not entire documents, and in that the queries may be arbitrarily specific. It is less ambitious than full-fledged question answering in that the answers are not generated from a knowledge base but looked up in the text of documents. The current version of ExtrAns is able to parse unedited Unix "man pages", and derive the logical form of their sentences. User queries are also translated into logical forms. A theorem prover then retrieves the relevant phrases, which are presented through selective highlighting in their context.

* Proc. of 9th International Conference and Workshop on Database and Expert Systems. Workshop "Natural Language and Information Systems" (NLIS'98). Vienna: 1998 
* 5 pages 
Access Paper or Ask Questions

An Integrated, Conditional Model of Information Extraction and Coreference with Applications to Citation Matching

Jul 11, 2012
Ben Wellner, Andrew McCallum, Fuchun Peng, Michael Hay

Although information extraction and coreference resolution appear together in many applications, most current systems perform them as ndependent steps. This paper describes an approach to integrated inference for extraction and coreference based on conditionally-trained undirected graphical models. We discuss the advantages of conditional probability training, and of a coreference model structure based on graph partitioning. On a data set of research paper citations, we show significant reduction in error by using extraction uncertainty to improve coreference citation matching accuracy, and using coreference to improve the accuracy of the extracted fields.

* Appears in Proceedings of the Twentieth Conference on Uncertainty in Artificial Intelligence (UAI2004) 
Access Paper or Ask Questions

Faithful to the Original: Fact Aware Neural Abstractive Summarization

Nov 13, 2017
Ziqiang Cao, Furu Wei, Wenjie Li, Sujian Li

Unlike extractive summarization, abstractive summarization has to fuse different parts of the source text, which inclines to create fake facts. Our preliminary study reveals nearly 30% of the outputs from a state-of-the-art neural summarization system suffer from this problem. While previous abstractive summarization approaches usually focus on the improvement of informativeness, we argue that faithfulness is also a vital prerequisite for a practical abstractive summarization system. To avoid generating fake facts in a summary, we leverage open information extraction and dependency parse technologies to extract actual fact descriptions from the source text. The dual-attention sequence-to-sequence framework is then proposed to force the generation conditioned on both the source text and the extracted fact descriptions. Experiments on the Gigaword benchmark dataset demonstrate that our model can greatly reduce fake summaries by 80%. Notably, the fact descriptions also bring significant improvement on informativeness since they often condense the meaning of the source text.

* 8 pages, 3 figures, AAAI 2018 
Access Paper or Ask Questions

Mining Measured Information from Text

May 05, 2015
Arun S. Maiya, Dale Visser, Andrew Wan

We present an approach to extract measured information from text (e.g., a 1370 degrees C melting point, a BMI greater than 29.9 kg/m^2 ). Such extractions are critically important across a wide range of domains - especially those involving search and exploration of scientific and technical documents. We first propose a rule-based entity extractor to mine measured quantities (i.e., a numeric value paired with a measurement unit), which supports a vast and comprehensive set of both common and obscure measurement units. Our method is highly robust and can correctly recover valid measured quantities even when significant errors are introduced through the process of converting document formats like PDF to plain text. Next, we describe an approach to extracting the properties being measured (e.g., the property "pixel pitch" in the phrase "a pixel pitch as high as 352 {\mu}m"). Finally, we present MQSearch: the realization of a search engine with full support for measured information.

* 4 pages; 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '15) 
Access Paper or Ask Questions

High Order Local Directional Pattern Based Pyramidal Multi-structure for Robust Face Recognition

Dec 12, 2020
Almabrok Essa, Vijayan Asari

Derived from a general definition of texture in a local neighborhood, local directional pattern (LDP) encodes the directional information in the small local 3x3 neighborhood of a pixel, which may fail to extract detailed information especially during changes in the input image due to illumination variations. Therefore, in this paper we introduce a novel feature extraction technique that calculates the nth order direction variation patterns, named high order local directional pattern (HOLDP). The proposed HOLDP can capture more detailed discriminative information than the conventional LDP. Unlike the LDP operator, our proposed technique extracts nth order local information by encoding various distinctive spatial relationships from each neighborhood layer of a pixel in the pyramidal multi-structure way. Then we concatenate the feature vector of each neighborhood layer to form the final HOLDP feature vector. The performance evaluation of the proposed HOLDP algorithm is conducted on several publicly available face databases and observed the superiority of HOLDP under extreme illumination conditions.

* 9 pages, 10 figures 
Access Paper or Ask Questions