Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Information Extraction": models, code, and papers

A Proposed Artificial intelligence Model for Real-Time Human Action Localization and Tracking

Nov 09, 2019
Ahmed Ali Hammam, Mona Soliman, Aboul Ella Hassanien

In recent years, artificial intelligence (AI) based on deep learning (DL) has sparked tremendous global interest. DL is widely used today and has expanded into various interesting areas. It is becoming more popular in cross-subject research, such as studies of smart city systems, which combine computer science with engineering applications. Human action detection is one of these areas. Human action detection is an interesting challenge due to its stringent requirements in terms of computing speed and accuracy. High-accuracy real-time object tracking is also considered a significant challenge. This paper integrates the YOLO detection network, which is considered a state-of-the-art tool for real-time object detection, with motion vectors and the Coyote Optimization Algorithm (COA) to construct a real-time human action localization and tracking system. The proposed system starts with the extraction of motion information from a compressed video stream and the extraction of appearance information from RGB frames using an object detector. Then, a fusion step between the two streams is performed, and the results are fed into the proposed action tracking model. The COA is used in object tracking due to its accuracy and fast convergence. The basic foundation of the proposed model is the utilization of motion vectors, which already exist in a compressed video bit stream and provide sufficient information to improve the localization of the target action without requiring high consumption of computational resources compared with other popular methods of extracting motion information, such as optical flows. This advantage allows the proposed approach to be implemented in challenging environments where the computational resources are limited, such as Internet of Things (IoT) systems.


Neural Medication Extraction: A Comparison of Recent Models in Supervised and Semi-supervised Learning Settings

Oct 19, 2021
Ali Can Kocabiyikoglu, François Portet, Raheel Qader, Jean-Marc Babouchkine

Drug prescriptions are essential information that must be encoded in electronic medical records. However, much of this information is hidden within free-text reports. This is why the medication extraction task has emerged. To date, most of the research effort has focused on small amount of data and has only recently considered deep learning methods. In this paper, we present an independent and comprehensive evaluation of state-of-the-art neural architectures on the I2B2 medical prescription extraction task both in the supervised and semi-supervised settings. The study shows the very competitive performance of simple DNN models on the task as well as the high interest of pre-trained models. Adapting the latter models on the I2B2 dataset enables to push medication extraction performances above the state-of-the-art. Finally, the study also confirms that semi-supervised techniques are promising to leverage large amounts of unlabeled data in particular in low resource setting when labeled data is too costly to acquire.

* IEEE International Conference on Healthcare Informatics (ICHI 2021) 

Pixel-Wise PolSAR Image Classification via a Novel Complex-Valued Deep Fully Convolutional Network

Sep 29, 2019
Yice Cao, Yan Wu, Peng Zhang, Wenkai Liang, Ming Li

Although complex-valued (CV) neural networks have shown better classification results compared to their real-valued (RV) counterparts for polarimetric synthetic aperture radar (PolSAR) classification, the extension of pixel-level RV networks to the complex domain has not yet thoroughly examined. This paper presents a novel complex-valued deep fully convolutional neural network (CV-FCN) designed for PolSAR image classification. Specifically, CV-FCN uses PolSAR CV data that includes the phase information and utilizes the deep FCN architecture that performs pixel-level labeling. It integrates the feature extraction module and the classification module in a united framework. Technically, for the particularity of PolSAR data, a dedicated complex-valued weight initialization scheme is defined to initialize CV-FCN. It considers the distribution of polarization data to conduct CV-FCN training from scratch in an efficient and fast manner. CV-FCN employs a complex downsampling-then-upsampling scheme to extract dense features. To enrich discriminative information, multi-level CV features that retain more polarization information are extracted via the complex downsampling scheme. Then, a complex upsampling scheme is proposed to predict dense CV labeling. It employs complex max-unpooling layers to greatly capture more spatial information for better robustness to speckle noise. In addition, to achieve faster convergence and obtain more precise classification results, a novel average cross-entropy loss function is derived for CV-FCN optimization. Experiments on real PolSAR datasets demonstrate that CV-FCN achieves better classification performance than other state-of-art methods.

* 17 pages, 12 figures, first submission on May 20th, 2019 

End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures

Jun 08, 2016
Makoto Miwa, Mohit Bansal

We present a novel end-to-end neural model to extract entities and relations between them. Our recurrent neural network based model captures both word sequence and dependency tree substructure information by stacking bidirectional tree-structured LSTM-RNNs on bidirectional sequential LSTM-RNNs. This allows our model to jointly represent both entities and relations with shared parameters in a single model. We further encourage detection of entities during training and use of entity information in relation extraction via entity pretraining and scheduled sampling. Our model improves over the state-of-the-art feature-based model on end-to-end relation extraction, achieving 12.1% and 5.7% relative error reductions in F1-score on ACE2005 and ACE2004, respectively. We also show that our LSTM-RNN based model compares favorably to the state-of-the-art CNN based model (in F1-score) on nominal relation classification (SemEval-2010 Task 8). Finally, we present an extensive ablation analysis of several model components.

* Accepted for publication at the Association for Computational Linguistics (ACL), 2016. 13 pages, 1 figure, 6 tables 

Spontaneous Facial Micro-Expression Recognition using Discriminative Spatiotemporal Local Binary Pattern with an Improved Integral Projection

Aug 07, 2016
Xiaohua Huang, Sujing Wang, Xin Liu, Guoying Zhao, Xiaoyi Feng, Matti Pietikainen

Recently, there are increasing interests in inferring mirco-expression from facial image sequences. Due to subtle facial movement of micro-expressions, feature extraction has become an important and critical issue for spontaneous facial micro-expression recognition. Recent works usually used spatiotemporal local binary pattern for micro-expression analysis. However, the commonly used spatiotemporal local binary pattern considers dynamic texture information to represent face images while misses the shape attribute of face images. On the other hand, their works extracted the spatiotemporal features from the global face regions, which ignore the discriminative information between two micro-expression classes. The above-mentioned problems seriously limit the application of spatiotemporal local binary pattern on micro-expression recognition. In this paper, we propose a discriminative spatiotemporal local binary pattern based on an improved integral projection to resolve the problems of spatiotemporal local binary pattern for micro-expression recognition. Firstly, we develop an improved integral projection for preserving the shape attribute of micro-expressions. Furthermore, an improved integral projection is incorporated with local binary pattern operators across spatial and temporal domains. Specifically, we extract the novel spatiotemporal features incorporating shape attributes into spatiotemporal texture features. For increasing the discrimination of micro-expressions, we propose a new feature selection based on Laplacian method to extract the discriminative information for facial micro-expression recognition. Intensive experiments are conducted on three availably published micro-expression databases. We compare our method with the state-of-the-art algorithms. Experimental results demonstrate that our proposed method achieves promising performance for micro-expression recognition.

* 13pages, 8 figures, 5 tables, submitted to IEEE Transactions on Image Processing 

CERES: Distantly Supervised Relation Extraction from the Semi-Structured Web

Apr 12, 2018
Colin Lockard, Xin Luna Dong, Arash Einolghozati, Prashant Shiralkar

The web contains countless semi-structured websites, which can be a rich source of information for populating knowledge bases. Existing methods for extracting relations from the DOM trees of semi-structured webpages can achieve high precision and recall only when manual annotations for each website are available. Although there have been efforts to learn extractors from automatically-generated labels, these methods are not sufficiently robust to succeed in settings with complex schemas and information-rich websites. In this paper we present a new method for automatic extraction from semi-structured websites based on distant supervision. We automatically generate training labels by aligning an existing knowledge base with a web page and leveraging the unique structural characteristics of semi-structured websites. We then train a classifier based on the potentially noisy and incomplete labels to predict new relation instances. Our method can compete with annotation-based techniques in the literature in terms of extraction quality. A large-scale experiment on over 400,000 pages from dozens of multi-lingual long-tail websites harvested 1.25 million facts at a precision of 90%.

* Expanded version of paper under review for VLDB 

Task-Oriented Communication for Multi-Device Cooperative Edge Inference

Sep 01, 2021
Jiawei Shao, Yuyi Mao, Jun Zhang

This paper investigates task-oriented communication for multi-device cooperative edge inference, where a group of distributed low-end edge devices transmit the extracted features of local samples to a powerful edge server for inference. While cooperative edge inference can overcome the limited sensing capability of a single device, it substantially increases the communication overhead and may incur excessive latency. To enable low-latency cooperative inference, we propose a learning-based communication scheme that optimizes local feature extraction and distributed feature encoding in a task-oriented manner, i.e., to remove data redundancy and transmit information that is essential for the downstream inference task rather than reconstructing the data samples at the edge server. Specifically, we leverage an information bottleneck (IB) principle to extract the task-relevant feature at each edge device and adopt a distributed information bottleneck (DIB) framework to formalize a single-letter characterization of the optimal rate-relevance tradeoff for distributed feature encoding. To admit flexible control of the communication overhead, we extend the DIB framework to a distributed deterministic information bottleneck (DDIB) objective that explicitly incorporates the representational costs of the encoded features. As the IB-based objectives are computationally prohibitive for high-dimensional data, we adopt variational approximations to make the optimization problems tractable. To compensate the potential performance loss due to the variational approximations, we also develop a selective retransmission (SR) mechanism to identify the redundancy in the encoded features of multiple edge devices to attain additional communication overhead reduction. Extensive experiments evidence that the proposed task-oriented communication scheme achieves a better rate-relevance tradeoff than baseline methods.


An Empirical Study on Leveraging Position Embeddings for Target-oriented Opinion Words Extraction

Sep 02, 2021
Samuel Mensah, Kai Sun, Nikolaos Aletras

Target-oriented opinion words extraction (TOWE) (Fan et al., 2019b) is a new subtask of target-oriented sentiment analysis that aims to extract opinion words for a given aspect in text. Current state-of-the-art methods leverage position embeddings to capture the relative position of a word to the target. However, the performance of these methods depends on the ability to incorporate this information into word representations. In this paper, we explore a variety of text encoders based on pretrained word embeddings or language models that leverage part-of-speech and position embeddings, aiming to examine the actual contribution of each component in TOWE. We also adapt a graph convolutional network (GCN) to enhance word representations by incorporating syntactic information. Our experimental results demonstrate that BiLSTM-based models can effectively encode position information into word representations while using a GCN only achieves marginal gains. Interestingly, our simple methods outperform several state-of-the-art complex neural structures.

* Accepted at EMNLP 2021 

Multi scale Feature Extraction and Fusion for Online Knowledge Distillation

Jun 16, 2022
Panpan Zou, Yinglei Teng, Tao Niu

Online knowledge distillation conducts knowledge transfer among all student models to alleviate the reliance on pre-trained models. However, existing online methods rely heavily on the prediction distributions and neglect the further exploration of the representational knowledge. In this paper, we propose a novel Multi-scale Feature Extraction and Fusion method (MFEF) for online knowledge distillation, which comprises three key components: Multi-scale Feature Extraction, Dual-attention and Feature Fusion, towards generating more informative feature maps for distillation. The multiscale feature extraction exploiting divide-and-concatenate in channel dimension is proposed to improve the multi-scale representation ability of feature maps. To obtain more accurate information, we design a dual-attention to strengthen the important channel and spatial regions adaptively. Moreover, we aggregate and fuse the former processed feature maps via feature fusion to assist the training of student models. Extensive experiments on CIF AR-10, CIF AR-100, and CINIC-10 show that MFEF transfers more beneficial representational knowledge for distillation and outperforms alternative methods among various network architectures

* 12 pages, 3 figures 

United adversarial learning for liver tumor segmentation and detection of multi-modality non-contrast MRI

Jan 07, 2022
Jianfeng Zhao, Dengwang Li, Shuo Li

Simultaneous segmentation and detection of liver tumors (hemangioma and hepatocellular carcinoma (HCC)) by using multi-modality non-contrast magnetic resonance imaging (NCMRI) are crucial for the clinical diagnosis. However, it is still a challenging task due to: (1) the HCC information on NCMRI is invisible or insufficient makes extraction of liver tumors feature difficult; (2) diverse imaging characteristics in multi-modality NCMRI causes feature fusion and selection difficult; (3) no specific information between hemangioma and HCC on NCMRI cause liver tumors detection difficult. In this study, we propose a united adversarial learning framework (UAL) for simultaneous liver tumors segmentation and detection using multi-modality NCMRI. The UAL first utilizes a multi-view aware encoder to extract multi-modality NCMRI information for liver tumor segmentation and detection. In this encoder, a novel edge dissimilarity feature pyramid module is designed to facilitate the complementary multi-modality feature extraction. Second, the newly designed fusion and selection channel is used to fuse the multi-modality feature and make the decision of the feature selection. Then, the proposed mechanism of coordinate sharing with padding integrates the multi-task of segmentation and detection so that it enables multi-task to perform united adversarial learning in one discriminator. Lastly, an innovative multi-phase radiomics guided discriminator exploits the clear and specific tumor information to improve the multi-task performance via the adversarial learning strategy. The UAL is validated in corresponding multi-modality NCMRI (i.e. T1FS pre-contrast MRI, T2FS MRI, and DWI) and three phases contrast-enhanced MRI of 255 clinical subjects. The experiments show that UAL has great potential in the clinical diagnosis of liver tumors.