Information extraction from conversational data is particularly challenging because the task-centric nature of conversation allows for effective communication of implicit information by humans, but is challenging for machines. The challenges may differ between utterances depending on the role of the speaker within the conversation, especially when relevant expertise is distributed asymmetrically across roles. Further, the challenges may also increase over the conversation as more shared context is built up through information communicated implicitly earlier in the dialogue. In this paper, we propose the novel modeling approach MedFilter, which addresses these insights in order to increase performance at identifying and categorizing task-relevant utterances, and in so doing, positively impacts performance at a downstream information extraction task. We evaluate this approach on a corpus of nearly 7,000 doctor-patient conversations where MedFilter is used to identify medically relevant contributions to the discussion (achieving a 10% improvement over SOTA baselines in terms of area under the PR curve). Identifying task-relevant utterances benefits downstream medical processing, achieving improvements of 15%, 105%, and 23% respectively for the extraction of symptoms, medications, and complaints.
Event Extraction plays an important role in information-extraction to understand the world. Event extraction could be split into two subtasks: one is event trigger extraction, the other is event arguments extraction. However, the F-Score of event arguments extraction is much lower than that of event trigger extraction, i.e. in the most recent work, event trigger extraction achieves 80.7%, while event arguments extraction achieves only 58%. In pipelined structures, the difficulty of event arguments extraction lies in its lack of classification feature, and the much higher computation consumption. In this work, we proposed a novel Event Extraction approach based on multi-layer Dilate Gated Convolutional Neural Network (EE-DGCNN) which has fewer parameters. In addition, enhanced local information is incorporated into word features, to assign event arguments roles for triggers predicted by the first subtask. The numerical experiments demonstrated significant performance improvement beyond state-of-art event extraction approaches on real-world datasets. Further analysis of extraction procedure is presented, as well as experiments are conducted to analyze impact factors related to the performance improvement.
Purpose: Terminology is the set of technical words or expressions used in specific contexts, which denotes the core concept in a formal discipline and is usually applied in the fields of machine translation, information retrieval, information extraction and text categorization, etc. Bilingual terminology extraction plays an important role in the application of bilingual dictionary compilation, bilingual Ontology construction, machine translation and cross-language information retrieval etc. This paper addresses the issues of monolingual terminology extraction and bilingual term alignment based on multi-level termhood. Design/methodology/approach: A method based on multi-level termhood is proposed. The new method computes the termhood of the terminology candidate as well as the sentence that includes the terminology by the comparison of the corpus. Since terminologies and general words usually have differently distribution in the corpus, termhood can also be used to constrain and enhance the performance of term alignment when aligning bilingual terms on the parallel corpus. In this paper, bilingual term alignment based on termhood constraints is presented. Findings: Experiment results show multi-level termhood can get better performance than existing method for terminology extraction. If termhood is used as constrain factor, the performance of bilingual term alignment can be improved.
In the speaker extraction problem, it is found that additional information from the target speaker contributes to the tracking and extraction of the target speaker, which includes voiceprint, lip movement, facial expression, and spatial information. However, no one cares for the cue of sound onset, which has been emphasized in the auditory scene analysis and psychology. Inspired by it, we explicitly modeled the onset cue and verified the effectiveness in the speaker extraction task. We further extended to the onset/offset cues and got performance improvement. From the perspective of tasks, our onset/offset-based model completes the composite task, a complementary combination of speaker extraction and speaker-dependent voice activity detection. We also combined voiceprint with onset/offset cues. Voiceprint models voice characteristics of the target while onset/offset models the start/end information of the speech. From the perspective of auditory scene analysis, the combination of two perception cues can promote the integrity of the auditory object. The experiment results are also close to state-of-the-art performance, using nearly half of the parameters. We hope that this work will inspire communities of speech processing and psychology, and contribute to communication between them. Our code will be available in https://github.com/aispeech-lab/wase/.
During the past decade, neural networks have become prominent in Natural Language Processing (NLP), notably for their capacity to learn relevant word representations from large unlabeled corpora. These word embeddings can then be transferred and finetuned for diverse end applications during a supervised training phase. More recently, in 2018, the transfer of entire pretrained Language Models and the preservation of their contextualization capacities enabled to reach unprecedented performance on virtually every NLP benchmark, sometimes even outperforming human baselines. However, as models reach such impressive scores, their comprehension abilities still appear as shallow, which reveal limitations of benchmarks to provide useful insights on their factors of performance and to accurately measure understanding capabilities. In this thesis, we study the behaviour of state-of-the-art models regarding generalization to facts unseen during training in two important Information Extraction tasks: Named Entity Recognition (NER) and Relation Extraction (RE). Indeed, traditional benchmarks present important lexical overlap between mentions and relations used for training and evaluating models, whereas the main interest of Information Extraction is to extract previously unknown information. We propose empirical studies to separate performance based on mention and relation overlap with the training set and find that pretrained Language Models are mainly beneficial to detect unseen mentions, in particular out-of-domain. While this makes them suited for real use cases, there is still a gap in performance between seen and unseen mentions that hurts generalization to new facts. In particular, even state-of-the-art ERE models rely on a shallow retention heuristic, basing their prediction more on arguments surface forms than context.
When a disaster occurs, maintaining and restoring community lifelines subsequently require collective efforts from various stakeholders. Aiming at reducing the efforts associated with generating Stakeholder Collaboration Networks (SCNs), this paper proposes a systematic approach to reliable information extraction for stakeholder collaboration and automated network generation. Specifically, stakeholders and their interactions are extracted from texts through Named Entity Recognition (NER), one of the techniques in natural language processing. Once extracted, the collaboration information is transformed into structured datasets to generate the SCNs automatically. A case study of stakeholder collaboration during Hurricane Harvey was investigated and it had demonstrated the feasibility and applicability of the proposed method. Hence, the proposed approach was proved to significantly reduce practitioners' interpretation and data collection workloads. In the end, discussions and future work are provided.
In this work, we aim to address the challenging task of open set recognition (OSR). Many recent OSR methods rely on auto-encoders to extract class-specific features by a reconstruction strategy, requiring the network to restore the input image on pixel-level. This strategy is commonly over-demanding for OSR since class-specific features are generally contained in target objects, not in all pixels. To address this shortcoming, here we discard the pixel-level reconstruction strategy and pay more attention to improving the effectiveness of class-specific feature extraction. We propose a mutual information-based method with a streamlined architecture, Maximal Mutual Information Open Set Recognition (M2IOSR). The proposed M2IOSR only uses an encoder to extract class-specific features by maximizing the mutual information between the given input and its latent features across multiple scales. Meanwhile, to further reduce the open space risk, latent features are constrained to class conditional Gaussian distributions by a KL-divergence loss function. In this way, a strong function is learned to prevent the network from mapping different observations to similar latent features and help the network extract class-specific features with desired statistical characteristics. The proposed method significantly improves the performance of baselines and achieves new state-of-the-art results on several benchmarks consistently.
Information Extraction (IE) refers to automatically extracting structured relation tuples from unstructured texts. Common IE solutions, including Relation Extraction (RE) and open IE systems, can hardly handle cross-sentence tuples, and are severely restricted by limited relation types as well as informal relation specifications (e.g., free-text based relation tuples). In order to overcome these weaknesses, we propose a novel IE framework named QA4IE, which leverages the flexible question answering (QA) approaches to produce high quality relation triples across sentences. Based on the framework, we develop a large IE benchmark with high quality human evaluation. This benchmark contains 293K documents, 2M golden relation triples, and 636 relation types. We compare our system with some IE baselines on our benchmark and the results show that our system achieves great improvements.
The urgency of mitigating COVID-19 has spawned a large and diverse body of scientific literature that is challenging for researchers to navigate. This explosion of information has stimulated interest in automated tools to help identify useful knowledge. We have pursued the use of methods for extracting diverse forms of mechanism relations from the natural language of scientific papers. We seek to identify concepts in COVID-19 and related literature which represent activities, functions, associations and causal relations, ranging from cellular processes to economic impacts. We formulate a broad, coarse-grained schema targeting mechanism relations between open, free-form entities. Our approach strikes a balance between expressivity and breadth that supports generalization across diverse concepts. We curate a dataset of scientific papers annotated according to our novel schema. Using an information extraction model trained on this new corpus, we construct a knowledge base (KB) of 2M mechanism relations, which we make publicly available. Our model is able to extract relations at an F1 at least twice that of baselines such as open IE or related scientific IE systems. We conduct experiments examining the ability of our system to retrieve relevant information on viral mechanisms of action, and on applications of AI to COVID-19 research. In both cases, our system identifies relevant information from our automatically-constructed knowledge base with high precision.
A deep learning-based monocular depth estimation (MDE) technique is proposed for selection of most informative frames (key frames) of an endoscopic video. In most of the cases, ground truth depth maps of polyps are not readily available and that is why the transfer learning approach is adopted in our method. An endoscopic modalities generally capture thousands of frames. In this scenario, it is quite important to discard low-quality and clinically irrelevant frames of an endoscopic video while the most informative frames should be retained for clinical diagnosis. In this view, a key-frame selection strategy is proposed by utilizing the depth information of polyps. In our method, image moment, edge magnitude, and key-points are considered for adaptively selecting the key frames. One important application of our proposed method could be the 3D reconstruction of polyps with the help of extracted key frames. Also, polyps are localized with the help of extracted depth maps.