With increasing urbanization, in recent years there has been a growing interest and need in monitoring and analyzing urban flood events. Social media, as a new data source, can provide real-time information for flood monitoring. The social media posts with locations are often referred to as Volunteered Geographic Information (VGI), which can reveal the spatial pattern of such events. Since more images are shared on social media than ever before, recent research focused on the extraction of flood-related posts by analyzing images in addition to texts. Apart from merely classifying posts as flood relevant or not, more detailed information, e.g. the flood severity, can also be extracted based on image interpretation. However, it has been less tackled and has not yet been applied for flood severity mapping. In this paper, we propose a novel three-step pipeline method to extract and map flood severity information. First, flood relevant images are retrieved with the help of pre-trained convolutional neural networks as feature extractors. Second, the images containing people are further classified into four severity levels by observing the relationship between body parts and their partial inundation, i.e. images are classified according to the water level with respect to different body parts, namely ankle, knee, hip, and chest. Lastly, locations of the Tweets are used for generating a map of estimated flood extent and severity. This pipeline was applied to an image dataset collected during Hurricane Harvey in 2017, as a proof of concept. The results show that VGI can be used as a supplement to remote sensing observations for flood extent mapping and is beneficial, especially for urban areas, where the infrastructure is often occluding water. Based on the extracted water level information, an integrated overview of flood severity can be provided for the early stages of emergency response.
This paper focuses on the problem of unsupervised relation extraction. Existing probabilistic generative model-based relation extraction methods work by extracting sentence features and using these features as inputs to train a generative model. This model is then used to cluster similar relations. However, these methods do not consider correlations between sentences with the same entity pair during training, which can negatively impact model performance. To address this issue, we propose a Clustering-based Unsupervised generative Relation Extraction (CURE) framework that leverages an "Encoder-Decoder" architecture to perform self-supervised learning so the encoder can extract relation information. Given multiple sentences with the same entity pair as inputs, self-supervised learning is deployed by predicting the shortest path between entity pairs on the dependency graph of one of the sentences. After that, we extract the relation information using the well-trained encoder. Then, entity pairs that share the same relation are clustered based on their corresponding relation information. Each cluster is labeled with a few words based on the words in the shortest paths corresponding to the entity pairs in each cluster. These cluster labels also describe the meaning of these relation clusters. We compare the triplets extracted by our proposed framework (CURE) and baseline methods with a ground-truth Knowledge Base. Experimental results show that our model performs better than state-of-the-art models on both New York Times (NYT) and United Nations Parallel Corpus (UNPC) standard datasets.
Keyphrase extraction is a fundamental task in natural language processing and information retrieval that aims to extract a set of phrases with important information from a source document. Identifying important keyphrase is the central component of the keyphrase extraction task, and its main challenge is how to represent information comprehensively and discriminate importance accurately. In this paper, to address these issues, we design a new hyperbolic matching model (HyperMatch) to represent phrases and documents in the same hyperbolic space and explicitly estimate the phrase-document relevance via the Poincar\'e distance as the important score of each phrase. Specifically, to capture the hierarchical syntactic and semantic structure information, HyperMatch takes advantage of the hidden representations in multiple layers of RoBERTa and integrates them as the word embeddings via an adaptive mixing layer. Meanwhile, considering the hierarchical structure hidden in the document, HyperMatch embeds both phrases and documents in the same hyperbolic space via a hyperbolic phrase encoder and a hyperbolic document encoder. This strategy can further enhance the estimation of phrase-document relevance due to the good properties of hyperbolic space. In this setting, the keyphrase extraction can be taken as a matching problem and effectively implemented by minimizing a hyperbolic margin-based triplet loss. Extensive experiments are conducted on six benchmarks and demonstrate that HyperMatch outperforms the state-of-the-art baselines.
Attention mechanism enables the Graph Neural Networks(GNNs) to learn the attention weights between the target node and its one-hop neighbors, the performance is further improved. However, the most existing GNNs are oriented to homogeneous graphs and each layer can only aggregate the information of one-hop neighbors. Stacking multi-layer networks will introduce a lot of noise and easily lead to over smoothing. We propose a Multi-hop Heterogeneous Neighborhood information Fusion graph representation learning method (MHNF). Specifically, we first propose a hybrid metapath autonomous extraction model to efficiently extract multi-hop hybrid neighbors. Then, we propose a hop-level heterogeneous Information aggregation model, which selectively aggregates different-hop neighborhood information within the same hybrid metapath. Finally, a hierarchical semantic attention fusion model (HSAF) is proposed, which can efficiently integrate different-hop and different-path neighborhood information respectively. This paper can solve the problem of aggregating the multi-hop neighborhood information and can learn hybrid metapaths for target task, reducing the limitation of manually specifying metapaths. In addition, HSAF can extract the internal node information of the metapaths and better integrate the semantic information of different levels. Experimental results on real datasets show that MHNF is superior to state-of-the-art methods in node classification and clustering tasks (10.94% - 69.09% and 11.58% - 394.93% relative improvement on average, respectively).
Scanned receipts OCR and key information extraction (SROIE) represent the processeses of recognizing text from scanned receipts and extracting key texts from them and save the extracted tests to structured documents. SROIE plays critical roles for many document analysis applications and holds great commercial potentials, but very little research works and advances have been published in this area. In recognition of the technical challenges, importance and huge commercial potentials of SROIE, we organized the ICDAR 2019 competition on SROIE. In this competition, we set up three tasks, namely, Scanned Receipt Text Localisation (Task 1), Scanned Receipt OCR (Task 2) and Key Information Extraction from Scanned Receipts (Task 3). A new dataset with 1000 whole scanned receipt images and annotations is created for the competition. In this report we will presents the motivation, competition datasets, task definition, evaluation protocol, submission statistics, performance of submitted methods and results analysis.
Knowledge about entities and their interrelations is a crucial factor of success for tasks like question answering or text summarization. Publicly available knowledge graphs like Wikidata or DBpedia are, however, far from being complete. In this paper, we explore how information extracted from similar entities that co-occur in structures like tables or lists can help to increase the coverage of such knowledge graphs. In contrast to existing approaches, we do not focus on relationships within a listing (e.g., between two entities in a table row) but on the relationship between a listing's subject entities and the context of the listing. To that end, we propose a descriptive rule mining approach that uses distant supervision to derive rules for these relationships based on a listing's context. Extracted from a suitable data corpus, the rules can be used to extend a knowledge graph with novel entities and assertions. In our experiments we demonstrate that the approach is able to extract up to 3M novel entities and 30M additional assertions from listings in Wikipedia. We find that the extracted information is of high quality and thus suitable to extend Wikipedia-based knowledge graphs like DBpedia, YAGO, and CaLiGraph. For the case of DBpedia, this would result in an increase of covered entities by roughly 50%.
Visually rich documents (VRDs) are ubiquitous in daily business and life. Examples are purchase receipts, insurance policy documents, custom declaration forms and so on. In VRDs, visual and layout information is critical for document understanding, and texts in such documents cannot be serialized into the one-dimensional sequence without losing information. Classic information extraction models such as BiLSTM-CRF typically operate on text sequences and do not incorporate visual features. In this paper, we introduce a graph convolution based model to combine textual and visual information presented in VRDs. Graph embeddings are trained to summarize the context of a text segment in the document, and further combined with text embeddings for entity extraction. Extensive experiments have been conducted to show that our method outperforms BiLSTM-CRF baselines by significant margins, on two real-world datasets. Additionally, ablation studies are also performed to evaluate the effectiveness of each component of our model.
We introduce an advanced information extraction pipeline to automatically process very large collections of unstructured textual data for the purpose of investigative journalism. The pipeline serves as a new input processor for the upcoming major release of our New/s/leak 2.0 software, which we develop in cooperation with a large German news organization. The use case is that journalists receive a large collection of files up to several Gigabytes containing unknown contents. Collections may originate either from official disclosures of documents, e.g. Freedom of Information Act requests, or unofficial data leaks. Our software prepares a visually-aided exploration of the collection to quickly learn about potential stories contained in the data. It is based on the automatic extraction of entities and their co-occurrence in documents. In contrast to comparable projects, we focus on the following three major requirements particularly serving the use case of investigative journalism in cross-border collaborations: 1) composition of multiple state-of-the-art NLP tools for entity extraction, 2) support of multi-lingual document sets up to 40 languages, 3) fast and easy-to-use extraction of full-text, metadata and entities from various file formats.
In this paper, we present a novel multi-channel speech extraction system to simultaneously extract multiple clean individual sources from a mixture in noisy and reverberant environments. The proposed method is built on an improved multi-channel time-domain speech separation network which employs speaker embeddings to identify and extract multiple targets without label permutation ambiguity. To efficiently inform the speaker information to the extraction model, we propose a new speaker conditioning mechanism by designing an additional speaker branch for receiving external speaker embeddings. Experiments on 2-channel WHAMR! data show that the proposed system improves by 9% relative the source separation performance over a strong multi-channel baseline, and it increases the speech recognition accuracy by more than 16% relative over the same baseline.