Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Information Extraction": models, code, and papers

Shift-of-Perspective Identification Within Legal Cases

Jun 06, 2019
Gathika Ratnayaka, Thejan Rupasinghe, Nisansa de Silva, Viraj Salaka Gamage, Menuka Warushavithana, Amal Shehan Perera

Arguments, counter-arguments, facts, and evidence obtained via documents related to previous court cases are of essential need for legal professionals. Therefore, the process of automatic information extraction from documents containing legal opinions related to court cases can be considered to be of significant importance. This study is focused on the identification of sentences in legal opinion texts which convey different perspectives on a certain topic or entity. We combined several approaches based on semantic analysis, open information extraction, and sentiment analysis to achieve our objective. Then, our methodology was evaluated with the help of human judges. The outcomes of the evaluation demonstrate that our system is successful in detecting situations where two sentences deliver different opinions on the same topic or entity. The proposed methodology can be used to facilitate other information extraction tasks related to the legal domain. One such task is the automated detection of counter arguments for a given argument. Another is the identification of opponent parties in a court case.

Access Paper or Ask Questions

High-Throughput and Language-Agnostic Entity Disambiguation and Linking on User Generated Data

Mar 13, 2017
Preeti Bhargava, Nemanja Spasojevic, Guoning Hu

The Entity Disambiguation and Linking (EDL) task matches entity mentions in text to a unique Knowledge Base (KB) identifier such as a Wikipedia or Freebase id. It plays a critical role in the construction of a high quality information network, and can be further leveraged for a variety of information retrieval and NLP tasks such as text categorization and document tagging. EDL is a complex and challenging problem due to ambiguity of the mentions and real world text being multi-lingual. Moreover, EDL systems need to have high throughput and should be lightweight in order to scale to large datasets and run on off-the-shelf machines. More importantly, these systems need to be able to extract and disambiguate dense annotations from the data in order to enable an Information Retrieval or Extraction task running on the data to be more efficient and accurate. In order to address all these challenges, we present the Lithium EDL system and algorithm - a high-throughput, lightweight, language-agnostic EDL system that extracts and correctly disambiguates 75% more entities than state-of-the-art EDL systems and is significantly faster than them.

* 10 pages, 7 figures, 5 tables, WWW2017, Linked Data on the Web workshop 2017, LDOW'17 
Access Paper or Ask Questions

Information Extraction Using the Structured Language Model

Aug 29, 2001
Ciprian Chelba, Milind Mahajan

The paper presents a data-driven approach to information extraction (viewed as template filling) using the structured language model (SLM) as a statistical parser. The task of template filling is cast as constrained parsing using the SLM. The model is automatically trained from a set of sentences annotated with frame/slot labels and spans. Training proceeds in stages: first a constrained syntactic parser is trained such that the parses on training data meet the specified semantic spans, then the non-terminal labels are enriched to contain semantic information and finally a constrained syntactic+semantic parser is trained on the parse trees resulting from the previous stage. Despite the small amount of training data used, the model is shown to outperform the slot level accuracy of a simple semantic grammar authored manually for the MiPad --- personal information management --- task.

* EMNLP/NAACL 2001 Conference Proceedings 
* EMNLP'01, Pittsburgh; 8 pages 
Access Paper or Ask Questions

Multi-Dimension Fusion Network for Light Field Spatial Super-Resolution using Dynamic Filters

Aug 26, 2020
Qingyan Sun, Shuo Zhang, Song Chang, Lixi Zhu, Youfang Lin

Light field cameras have been proved to be powerful tools for 3D reconstruction and virtual reality applications. However, the limited resolution of light field images brings a lot of difficulties for further information display and extraction. In this paper, we introduce a novel learning-based framework to improve the spatial resolution of light fields. First, features from different dimensions are parallelly extracted and fused together in our multi-dimension fusion architecture. These features are then used to generate dynamic filters, which extract subpixel information from micro-lens images and also implicitly consider the disparity information. Finally, more high-frequency details learned in the residual branch are added to the upsampled images and the final super-resolved light fields are obtained. Experimental results show that the proposed method uses fewer parameters but achieves better performances than other state-of-the-art methods in various kinds of datasets. Our reconstructed images also show sharp details and distinct lines in both sub-aperture images and epipolar plane images.

Access Paper or Ask Questions

Edge-aware Guidance Fusion Network for RGB Thermal Scene Parsing

Dec 09, 2021
Wujie Zhou, Shaohua Dong, Caie Xu, Yaguan Qian

RGB thermal scene parsing has recently attracted increasing research interest in the field of computer vision. However, most existing methods fail to perform good boundary extraction for prediction maps and cannot fully use high level features. In addition, these methods simply fuse the features from RGB and thermal modalities but are unable to obtain comprehensive fused features. To address these problems, we propose an edge-aware guidance fusion network (EGFNet) for RGB thermal scene parsing. First, we introduce a prior edge map generated using the RGB and thermal images to capture detailed information in the prediction map and then embed the prior edge information in the feature maps. To effectively fuse the RGB and thermal information, we propose a multimodal fusion module that guarantees adequate cross-modal fusion. Considering the importance of high level semantic information, we propose a global information module and a semantic information module to extract rich semantic information from the high-level features. For decoding, we use simple elementwise addition for cascaded feature fusion. Finally, to improve the parsing accuracy, we apply multitask deep supervision to the semantic and boundary maps. Extensive experiments were performed on benchmark datasets to demonstrate the effectiveness of the proposed EGFNet and its superior performance compared with state of the art methods. The code and results can be found at

* Accepted by AAAI2022 
Access Paper or Ask Questions

A Measure of Similarity in Textual Data Using Spearman's Rank Correlation Coefficient

Nov 26, 2019
Nino Arsov, Milan Dukovski, Blagoja Evkoski, Stefan Cvetkovski

In the last decade, many diverse advances have occurred in the field of information extraction from data. Information extraction in its simplest form takes place in computing environments, where structured data can be extracted through a series of queries. The continuous expansion of quantities of data have therefore provided an opportunity for knowledge extraction (KE) from a textual document (TD). A typical problem of this kind is the extraction of common characteristics and knowledge from a group of TDs, with the possibility to group such similar TDs in a process known as clustering. In this paper we present a technique for such KE among a group of TDs related to the common characteristics and meaning of their content. Our technique is based on the Spearman's Rank Correlation Coefficient (SRCC), for which the conducted experiments have proven to be comprehensive measure to achieve a high-quality KE.

Access Paper or Ask Questions

VoxSegNet: Volumetric CNNs for Semantic Part Segmentation of 3D Shapes

Sep 01, 2018
Zongji Wang, Feng Lu

Voxel is an important format to represent geometric data, which has been widely used for 3D deep learning in shape analysis due to its generalization ability and regular data format. However, fine-grained tasks like part segmentation require detailed structural information, which increases voxel resolution and thus causes other issues such as the exhaustion of computational resources. In this paper, we propose a novel volumetric convolutional neural network, which could extract discriminative features encoding detailed information from voxelized 3D data under a limited resolution. To this purpose, a spatial dense extraction (SDE) module is designed to preserve the spatial resolution during the feature extraction procedure, alleviating the loss of detail caused by sub-sampling operations such as max-pooling. An attention feature aggregation (AFA) module is also introduced to adaptively select informative features from different abstraction scales, leading to segmentation with both semantic consistency and high accuracy of details. Experiment results on the large-scale dataset demonstrate the effectiveness of our method in 3D shape part segmentation.

* 11 pages, 10 figures 
Access Paper or Ask Questions

Comparative evaluation of CNN architectures for Image Caption Generation

Feb 23, 2021
Sulabh Katiyar, Samir Kumar Borgohain

Aided by recent advances in Deep Learning, Image Caption Generation has seen tremendous progress over the last few years. Most methods use transfer learning to extract visual information, in the form of image features, with the help of pre-trained Convolutional Neural Network models followed by transformation of the visual information using a Caption Generator module to generate the output sentences. Different methods have used different Convolutional Neural Network Architectures and, to the best of our knowledge, there is no systematic study which compares the relative efficacy of different Convolutional Neural Network architectures for extracting the visual information. In this work, we have evaluated 17 different Convolutional Neural Networks on two popular Image Caption Generation frameworks: the first based on Neural Image Caption (NIC) generation model and the second based on Soft-Attention framework. We observe that model complexity of Convolutional Neural Network, as measured by number of parameters, and the accuracy of the model on Object Recognition task does not necessarily co-relate with its efficacy on feature extraction for Image Caption Generation task.

* in International Journal of Advanced Computer Science and Applications, 11(12), 2020 
* Article Published in International Journal of Advanced Computer Science and Applications(IJACSA), Volume 11 Issue 12, 2020 
Access Paper or Ask Questions

ScopeIt: Scoping Task Relevant Sentences in Documents

Feb 23, 2020
Vishwas Suryanarayanan, Barun Patra, Pamela Bhattacharya, Chala Fufa, Charles Lee

Intelligent assistants like Cortana, Siri, Alexa, and Google Assistant are trained to parse information when the conversation is synchronous and short; however, for email-based conversational agents, the communication is asynchronous, and often contains information irrelevant to the assistant. This makes it harder for the system to accurately detect intents, extract entities relevant to those intents and thereby perform the desired action. We present a neural model for scoping relevant information for the agent from a large query. We show that when used as a preprocessing step, the model improves performance of both intent detection and entity extraction tasks. We demonstrate the model's impact on Scheduler (Cortana is the persona of the agent, while Scheduler is the name of the service. We use them interchangeably in the context of this paper.) - a virtual conversational meeting scheduling assistant that interacts asynchronously with users through email. The model helps the entity extraction and intent detection tasks requisite by Scheduler achieve an average gain of 35% in precision without any drop in recall. Additionally, we demonstrate that the same approach can be used for component level analysis in large documents, such as signature block identification.

Access Paper or Ask Questions

Quantifying (Hyper) Parameter Leakage in Machine Learning

Oct 31, 2019
Vasisht Duddu, D. Vijay Rao

Black Box Machine Learning models leak information about the proprietary model parameters and architecture, both through side channels and output predictions. An adversary can thus, exploit this leakage to reconstruct a substitute architecture similar to the target model, violating the model privacy and Intellectual Property. However, all such attacks, infer a subset of the target model attributes and identifying the rest of the architecture and parameters (optimally) is a search problem. Extracting the exact target model is not possible owing to the uncertainty in the inference attack outputs and stochastic nature of the training process. In this work, we propose a probabilistic framework, Airavata, to estimate the leakage in such model extraction attacks. Specifically, we use Bayesian Networks to capture the uncertainty, under the subjective notion of probability, in estimating the target model attributes using various model extraction attacks. We experimentally validate the model under different adversary assumptions commonly adopted by various model extraction attacks to reason about the attack efficacy. Further, this provides a practical approach of inferring actionable knowledge about extracting black box models and identify the best combination of attacks which maximise the knowledge extracted (information leaked) from the target model.

Access Paper or Ask Questions