Unstructured documents serving as external knowledge of the dialogues help to generate more informative responses. Previous research focused on knowledge selection (KS) in the document with dialogue. However, dialogue history that is not related to the current dialogue may introduce noise in the KS processing. In this paper, we propose a Compare Aggregate Transformer (CAT) to jointly denoise the dialogue context and aggregate the document information for response generation. We designed two different comparison mechanisms to reduce noise (before and during decoding). In addition, we propose two metrics for evaluating document utilization efficiency based on word overlap. Experimental results on the CMUDoG dataset show that the proposed CAT model outperforms the state-of-the-art approach and strong baselines.
Maintaining a consistent attribute profile is crucial for dialogue agents to naturally converse with humans. Existing studies on improving attribute consistency mainly explored how to incorporate attribute information in the responses, but few efforts have been made to identify the consistency relations between response and attribute profile. To facilitate the study of profile consistency identification, we create a large-scale human-annotated dataset with over 110K single-turn conversations and their key-value attribute profiles. Explicit relation between response and profile is manually labeled. We also propose a key-value structure information enriched BERT model to identify the profile consistency, and it gained improvements over strong baselines. Further evaluations on downstream tasks demonstrate that the profile consistency identification model is conducive for improving dialogue consistency.
We introduce N-LTP, an open-source Python Chinese natural language processing toolkit supporting five basic tasks: Chinese word segmentation, part-of-speech tagging, named entity recognition, dependency parsing, and semantic dependency parsing. N-LTP adopts the multi-task framework with the pre-trained model to capture the shared knowledge across all Chinese relevant tasks. In addition, we propose to use knowledge distillation where single-task models teach a multi-task model, helping the multi-task model surpass its single-task teachers. Finally, we provide fundamental tasks API and a visualization tool to make users easier to use and view the processing results directly. To the best of our knowledge, this is the first toolkit to support all Chinese NLP fundamental tasks. Source code, documentation, and pre-trained models are available at https://github.com/HIT-SCIR/ltp.
Few-learn learning (FSL) is one of the key future steps in machine learning and has raised a lot of attention. However, in contrast to the rapid development in other domains, such as Computer Vision, the progress of FSL in Nature Language Processing (NLP) is much slower. One of the key reasons for this is the lacking of public benchmarks. NLP FSL researches always report new results on their own constructed few-shot datasets, which is pretty inefficient in results comparison and thus impedes cumulative progress. In this paper, we present FewJoint, a novel Few-Shot Learning benchmark for NLP. Different from most NLP FSL research that only focus on simple N-classification problems, our benchmark introduces few-shot joint dialogue language understanding, which additionally covers the structure prediction and multi-task reliance problems. This allows our benchmark to reflect the real-word NLP complexity beyond simple N-classification. Our benchmark is used in the few-shot learning contest of SMP2020-ECDT task-1. We also provide a compatible FSL platform to ease experiment set-up.
In dialog system, dialog act recognition and sentiment classification are two correlative tasks to capture speakers intentions, where dialog act and sentiment can indicate the explicit and the implicit intentions separately. Most of the existing systems either treat them as separate tasks or just jointly model the two tasks by sharing parameters in an implicit way without explicitly modeling mutual interaction and relation. To address this problem, we propose a Deep Co-Interactive Relation Network (DCR-Net) to explicitly consider the cross-impact and model the interaction between the two tasks by introducing a co-interactive relation layer. In addition, the proposed relation layer can be stacked to gradually capture mutual knowledge with multiple steps of interaction. Especially, we thoroughly study different relation layers and their effects. Experimental results on two public datasets (Mastodon and Dailydialog) show that our model outperforms the state-of-the-art joint model by 4.3% and 3.4% in terms of F1 score on dialog act recognition task, 5.7% and 12.4% on sentiment classification respectively. Comprehensive analysis empirically verifies the effectiveness of explicitly modeling the relation between the two tasks and the multi-steps interaction mechanism. Finally, we employ the Bidirectional Encoder Representation from Transformer (BERT) in our framework, which can further boost our performance in both tasks.
Deep Neural Networks are well known to be vulnerable to adversarial attacks and backdoor attacks, where minor modifications on the input can mislead the models to give wrong results. Although defenses against adversarial attacks have been widely studied, research on mitigating backdoor attacks is still at an early stage. It is unknown whether there are any connections and common characteristics between the defenses against these two attacks. In this paper, we present a unified framework for detecting malicious examples and protecting the inference results of Deep Learning models. This framework is based on our observation that both adversarial examples and backdoor examples have anomalies during the inference process, highly distinguishable from benign samples. As a result, we repurpose and revise four existing adversarial defense methods for detecting backdoor examples. Extensive evaluations indicate these approaches provide reliable protection against backdoor attacks, with a higher accuracy than detecting adversarial examples. These solutions also reveal the relations of adversarial examples, backdoor examples and normal samples in model sensitivity, activation space and feature space. This can enhance our understanding about the inherent features of these two attacks, as well as the defense opportunities.
Noun phrases and relational phrases in Open Knowledge Bases are often not canonical, leading to redundant and ambiguous facts. In this work, we integrate structural information (from which tuple, which sentence) and semantic information (semantic similarity) to do the canonicalization. We represent the two types of information as a multi-layered graph: the structural information forms the links across the sentence, relational phrase, and noun phrase layers; the semantic information forms weighted intra-layer links for each layer. We propose a graph neural network model to aggregate the representations of noun phrases and relational phrases through the multi-layered meta-graph structure. Experiments show that our model outperforms existing approaches on a public datasets in general domain.
Person re-identification (re-ID) plays an important role in applications such as public security and video surveillance. Recently, learning from synthetic data, which benefits from the popularity of synthetic data engine, have achieved remarkable performance. However, existing synthetic datasets are in small size and lack of diversity, which hinders the development of person re-ID in real-world scenarios. To address this problem, firstly, we develop a large-scale synthetic data engine, the salient characteristic of this engine is controllable. Based on it, we build a large-scale synthetic dataset, which are diversified and customized from different attributes, such as illumination and viewpoint. Secondly, we quantitatively analyze the influence of dataset attributes on re-ID system. To our best knowledge, this is the first attempt to explicitly dissect person re-ID from the aspect of attribute on synthetic dataset. Comprehensive experiments help us have a deeper understanding of the fundamental problems in person re-ID. Our research also provides useful insights for dataset building and future practical usage.
In this paper, we explore the slot tagging with only a few labeled support sentences (a.k.a. few-shot). Few-shot slot tagging faces a unique challenge compared to the other few-shot classification problems as it calls for modeling the dependencies between labels. But it is hard to apply previously learned label dependencies to an unseen domain, due to the discrepancy of label sets. To tackle this, we introduce a collapsed dependency transfer mechanism into the conditional random field (CRF) to transfer abstract label dependency patterns as transition scores. In the few-shot setting, the emission score of CRF can be calculated as a word's similarity to the representation of each label. To calculate such similarity, we propose a Label-enhanced Task-Adaptive Projection Network (L-TapNet) based on the state-of-the-art few-shot classification model -- TapNet, by leveraging label name semantics in representing labels. Experimental results show that our model significantly outperforms the strongest few-shot learning baseline by 14.64 F1 scores in the one-shot setting.