Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiaodan Liang

REM-Net: Recursive Erasure Memory Network for Commonsense Evidence Refinement

Jan 03, 2021

Yinya Huang, Meng Fang, Xunlin Zhan, Qingxing Cao, Xiaodan Liang, Liang Lin

Figure 1 for REM-Net: Recursive Erasure Memory Network for Commonsense Evidence Refinement

Figure 2 for REM-Net: Recursive Erasure Memory Network for Commonsense Evidence Refinement

Figure 3 for REM-Net: Recursive Erasure Memory Network for Commonsense Evidence Refinement

Figure 4 for REM-Net: Recursive Erasure Memory Network for Commonsense Evidence Refinement

Abstract:When answering a question, people often draw upon their rich world knowledge in addition to the particular context. While recent works retrieve supporting facts/evidence from commonsense knowledge bases to supply additional information to each question, there is still ample opportunity to advance it on the quality of the evidence. It is crucial since the quality of the evidence is the key to answering commonsense questions, and even determines the upper bound on the QA systems performance. In this paper, we propose a recursive erasure memory network (REM-Net) to cope with the quality improvement of evidence. To address this, REM-Net is equipped with a module to refine the evidence by recursively erasing the low-quality evidence that does not explain the question answering. Besides, instead of retrieving evidence from existing knowledge bases, REM-Net leverages a pre-trained generative model to generate candidate evidence customized for the question. We conduct experiments on two commonsense question answering datasets, WIQA and CosmosQA. The results demonstrate the performance of REM-Net and show that the refined evidence is explainable.

* Accepted by AAAI 2021

Via

Access Paper or Ask Questions

Adversarial Meta Sampling for Multilingual Low-Resource Speech Recognition

Dec 23, 2020

Yubei Xiao, Ke Gong, Pan Zhou, Guolin Zheng, Xiaodan Liang, Liang Lin

Figure 1 for Adversarial Meta Sampling for Multilingual Low-Resource Speech Recognition

Figure 2 for Adversarial Meta Sampling for Multilingual Low-Resource Speech Recognition

Figure 3 for Adversarial Meta Sampling for Multilingual Low-Resource Speech Recognition

Figure 4 for Adversarial Meta Sampling for Multilingual Low-Resource Speech Recognition

Abstract:Low-resource automatic speech recognition (ASR) is challenging, as the low-resource target language data cannot well train an ASR model. To solve this issue, meta-learning formulates ASR for each source language into many small ASR tasks and meta-learns a model initialization on all tasks from different source languages to access fast adaptation on unseen target languages. However, for different source languages, the quantity and difficulty vary greatly because of their different data scales and diverse phonological systems, which leads to task-quantity and task-difficulty imbalance issues and thus a failure of multilingual meta-learning ASR (MML-ASR). In this work, we solve this problem by developing a novel adversarial meta sampling (AMS) approach to improve MML-ASR. When sampling tasks in MML-ASR, AMS adaptively determines the task sampling probability for each source language. Specifically, for each source language, if the query loss is large, it means that its tasks are not well sampled to train ASR model in terms of its quantity and difficulty and thus should be sampled more frequently for extra learning. Inspired by this fact, we feed the historical task query loss of all source language domain into a network to learn a task sampling policy for adversarially increasing the current query loss of MML-ASR. Thus, the learnt task sampling policy can master the learning situation of each language and thus predicts good task sampling probability for each language for more effective learning. Finally, experiment results on two multilingual datasets show significant performance improvement when applying our AMS on MML-ASR, and also demonstrate the applicability of AMS to other low-resource speech tasks and transfer learning ASR approaches. Our codes are available at: https://github.com/iamxiaoyubei/AMS.

* accepted in AAAI2021

Via

Access Paper or Ask Questions

Graph-Evolving Meta-Learning for Low-Resource Medical Dialogue Generation

Dec 22, 2020

Shuai Lin, Pan Zhou, Xiaodan Liang, Jianheng Tang, Ruihui Zhao, Ziliang Chen, Liang Lin

Figure 1 for Graph-Evolving Meta-Learning for Low-Resource Medical Dialogue Generation

Figure 2 for Graph-Evolving Meta-Learning for Low-Resource Medical Dialogue Generation

Figure 3 for Graph-Evolving Meta-Learning for Low-Resource Medical Dialogue Generation

Figure 4 for Graph-Evolving Meta-Learning for Low-Resource Medical Dialogue Generation

Abstract:Human doctors with well-structured medical knowledge can diagnose a disease merely via a few conversations with patients about symptoms. In contrast, existing knowledge-grounded dialogue systems often require a large number of dialogue instances to learn as they fail to capture the correlations between different diseases and neglect the diagnostic experience shared among them. To address this issue, we propose a more natural and practical paradigm, i.e., low-resource medical dialogue generation, which can transfer the diagnostic experience from source diseases to target ones with a handful of data for adaptation. It is capitalized on a commonsense knowledge graph to characterize the prior disease-symptom relations. Besides, we develop a Graph-Evolving Meta-Learning (GEML) framework that learns to evolve the commonsense graph for reasoning disease-symptom correlations in a new disease, which effectively alleviates the needs of a large number of dialogues. More importantly, by dynamically evolving disease-symptom graphs, GEML also well addresses the real-world challenges that the disease-symptom correlations of each disease may vary or evolve along with more diagnostic cases. Extensive experiment results on the CMDD dataset and our newly-collected Chunyu dataset testify the superiority of our approach over state-of-the-art approaches. Besides, our GEML can generate an enriched dialogue-sensitive knowledge graph in an online manner, which could benefit other tasks grounded on knowledge graph.

* Accepted by AAAI 2021

Via

Access Paper or Ask Questions

Knowledge-Routed Visual Question Reasoning: Challenges for Deep Representation Embedding

Dec 14, 2020

Qingxing Cao, Bailin Li, Xiaodan Liang, Keze Wang, Liang Lin

Figure 1 for Knowledge-Routed Visual Question Reasoning: Challenges for Deep Representation Embedding

Figure 2 for Knowledge-Routed Visual Question Reasoning: Challenges for Deep Representation Embedding

Figure 3 for Knowledge-Routed Visual Question Reasoning: Challenges for Deep Representation Embedding

Figure 4 for Knowledge-Routed Visual Question Reasoning: Challenges for Deep Representation Embedding

Abstract:Though beneficial for encouraging the Visual Question Answering (VQA) models to discover the underlying knowledge by exploiting the input-output correlation beyond image and text contexts, the existing knowledge VQA datasets are mostly annotated in a crowdsource way, e.g., collecting questions and external reasons from different users via the internet. In addition to the challenge of knowledge reasoning, how to deal with the annotator bias also remains unsolved, which often leads to superficial over-fitted correlations between questions and answers. To address this issue, we propose a novel dataset named Knowledge-Routed Visual Question Reasoning for VQA model evaluation. Considering that a desirable VQA model should correctly perceive the image context, understand the question, and incorporate its learned knowledge, our proposed dataset aims to cutoff the shortcut learning exploited by the current deep embedding models and push the research boundary of the knowledge-based visual question reasoning. Specifically, we generate the question-answer pair based on both the Visual Genome scene graph and an external knowledge base with controlled programs to disentangle the knowledge from other biases. The programs can select one or two triplets from the scene graph or knowledge base to push multi-step reasoning, avoid answer ambiguity, and balanced the answer distribution. In contrast to the existing VQA datasets, we further imply the following two major constraints on the programs to incorporate knowledge reasoning: i) multiple knowledge triplets can be related to the question, but only one knowledge relates to the image object. This can enforce the VQA model to correctly perceive the image instead of guessing the knowledge based on the given question solely; ii) all questions are based on different knowledge, but the candidate answers are the same for both the training and test sets.

* To appear in TNNLS 2021. Considering that a desirable VQA model should correctly perceive the image context, understand the question, and incorporate its learned knowledge, our proposed dataset aims to cutoff the shortcut learning exploited by the current deep embedding models and push the research boundary of the knowledge-based visual question reasoning

Via

Access Paper or Ask Questions

Ada-Segment: Automated Multi-loss Adaptation for Panoptic Segmentation

Dec 07, 2020

Gengwei Zhang, Yiming Gao, Hang Xu, Hao Zhang, Zhenguo Li, Xiaodan Liang

Figure 1 for Ada-Segment: Automated Multi-loss Adaptation for Panoptic Segmentation

Figure 2 for Ada-Segment: Automated Multi-loss Adaptation for Panoptic Segmentation

Figure 3 for Ada-Segment: Automated Multi-loss Adaptation for Panoptic Segmentation

Figure 4 for Ada-Segment: Automated Multi-loss Adaptation for Panoptic Segmentation

Abstract:Panoptic segmentation that unifies instance segmentation and semantic segmentation has recently attracted increasing attention. While most existing methods focus on designing novel architectures, we steer toward a different perspective: performing automated multi-loss adaptation (named Ada-Segment) on the fly to flexibly adjust multiple training losses over the course of training using a controller trained to capture the learning dynamics. This offers a few advantages: it bypasses manual tuning of the sensitive loss combination, a decisive factor for panoptic segmentation; it allows to explicitly model the learning dynamics, and reconcile the learning of multiple objectives (up to ten in our experiments); with an end-to-end architecture, it generalizes to different datasets without the need of re-tuning hyperparameters or re-adjusting the training process laboriously. Our Ada-Segment brings 2.7% panoptic quality (PQ) improvement on COCO val split from the vanilla baseline, achieving the state-of-the-art 48.5% PQ on COCO test-dev split and 32.9% PQ on ADE20K dataset. The extensive ablation studies reveal the ever-changing dynamics throughout the training process, necessitating the incorporation of an automated and adaptive learning strategy as presented in this paper.

* Accepted by AAAI2021

Via

Access Paper or Ask Questions

Continuous Transition: Improving Sample Efficiency for Continuous Control Problems via MixUp

Nov 30, 2020

Junfan Lin, Zhongzhan Huang, Keze Wang, Xiaodan Liang, Weiwei Chen, Liang Lin

Figure 1 for Continuous Transition: Improving Sample Efficiency for Continuous Control Problems via MixUp

Figure 2 for Continuous Transition: Improving Sample Efficiency for Continuous Control Problems via MixUp

Figure 3 for Continuous Transition: Improving Sample Efficiency for Continuous Control Problems via MixUp

Figure 4 for Continuous Transition: Improving Sample Efficiency for Continuous Control Problems via MixUp

Abstract:Although deep reinforcement learning~(RL) has been successfully applied to a variety of robotic control tasks, it's still challenging to apply it to real-world tasks, due to the poor sample efficiency. Attempting to overcome this shortcoming, several works focus on reusing the collected trajectory data during the training by decomposing them into a set of policy-irrelevant discrete transitions. However, their improvements are somewhat marginal since i) the amount of the transitions is usually small, and ii) the value assignment only happens in the joint states. To address these issues, this paper introduces a concise yet powerful method to construct \textit{Continuous Transition}, which exploits the trajectory information by exploiting the potential transitions along the trajectory. Specifically, we propose to synthesize new transitions for training by linearly interpolating the conjunctive transitions. To keep the constructed transitions authentic, we also develop a discriminator to guide the construction process automatically. Extensive experiments demonstrate that our proposed method achieves a significant improvement in sample efficiency on various complex continuous robotic control problems in MuJoCo and outperforms the advanced model-based / model-free RL methods.

Via

Access Paper or Ask Questions

Towards Robust Medical Image Segmentation on Small-Scale Data with Incomplete Labels

Nov 28, 2020

Nanqing Dong, Michael Kampffmeyer, Xiaodan Liang, Min Xu, Irina Voiculescu, Eric P. Xing

Figure 1 for Towards Robust Medical Image Segmentation on Small-Scale Data with Incomplete Labels

Figure 2 for Towards Robust Medical Image Segmentation on Small-Scale Data with Incomplete Labels

Figure 3 for Towards Robust Medical Image Segmentation on Small-Scale Data with Incomplete Labels

Figure 4 for Towards Robust Medical Image Segmentation on Small-Scale Data with Incomplete Labels

Abstract:The data-driven nature of deep learning models for semantic segmentation requires a large number of pixel-level annotations. However, large-scale and fully labeled medical datasets are often unavailable for practical tasks. Recently, partially supervised methods have been proposed to utilize images with incomplete labels to mitigate the data scarcity problem in the medical domain. As an emerging research area, the breakthroughs made by existing methods rely on either large-scale data or complex model design, which makes them 1) less practical for certain real-life tasks and 2) less robust for small-scale data. It is time to step back and think about the robustness of partially supervised methods and how to maximally utilize small-scale and partially labeled data for medical image segmentation tasks. To bridge the methodological gaps in label-efficient deep learning with partial supervision, we propose RAMP, a simple yet efficient data augmentation framework for partially supervised medical image segmentation by exploiting the assumption that patients share anatomical similarities. We systematically evaluate RAMP and the previous methods in various controlled multi-structure segmentation tasks. Compared to the mainstream approaches, RAMP consistently improves the performance of traditional segmentation networks on small-scale partially labeled data and utilize additional image-wise weak annotations.

Via

Access Paper or Ask Questions

Auto-Panoptic: Cooperative Multi-Component Architecture Search for Panoptic Segmentation

Oct 30, 2020

Yangxin Wu, Gengwei Zhang, Hang Xu, Xiaodan Liang, Liang Lin

Figure 1 for Auto-Panoptic: Cooperative Multi-Component Architecture Search for Panoptic Segmentation

Figure 2 for Auto-Panoptic: Cooperative Multi-Component Architecture Search for Panoptic Segmentation

Figure 3 for Auto-Panoptic: Cooperative Multi-Component Architecture Search for Panoptic Segmentation

Figure 4 for Auto-Panoptic: Cooperative Multi-Component Architecture Search for Panoptic Segmentation

Abstract:Panoptic segmentation is posed as a new popular test-bed for the state-of-the-art holistic scene understanding methods with the requirement of simultaneously segmenting both foreground things and background stuff. The state-of-the-art panoptic segmentation network exhibits high structural complexity in different network components, i.e. backbone, proposal-based foreground branch, segmentation-based background branch, and feature fusion module across branches, which heavily relies on expert knowledge and tedious trials. In this work, we propose an efficient, cooperative and highly automated framework to simultaneously search for all main components including backbone, segmentation branches, and feature fusion module in a unified panoptic segmentation pipeline based on the prevailing one-shot Network Architecture Search (NAS) paradigm. Notably, we extend the common single-task NAS into the multi-component scenario by taking the advantage of the newly proposed intra-modular search space and problem-oriented inter-modular search space, which helps us to obtain an optimal network architecture that not only performs well in both instance segmentation and semantic segmentation tasks but also be aware of the reciprocal relations between foreground things and background stuff classes. To relieve the vast computation burden incurred by applying NAS to complicated network architectures, we present a novel path-priority greedy search policy to find a robust, transferrable architecture with significantly reduced searching overhead. Our searched architecture, namely Auto-Panoptic, achieves the new state-of-the-art on the challenging COCO and ADE20K benchmarks. Moreover, extensive experiments are conducted to demonstrate the effectiveness of path-priority policy and transferability of Auto-Panoptic across different datasets. Codes and models are available at: https://github.com/Jacobew/AutoPanoptic.

* NeurIPS2020

Via

Access Paper or Ask Questions

Towards Interpretable Natural Language Understanding with Explanations as Latent Variables

Oct 24, 2020

Wangchunshu Zhou, Jinyi Hu, Hanlin Zhang, Xiaodan Liang, Maosong Sun, Chenyan Xiong, Jian Tang

Figure 1 for Towards Interpretable Natural Language Understanding with Explanations as Latent Variables

Figure 2 for Towards Interpretable Natural Language Understanding with Explanations as Latent Variables

Figure 3 for Towards Interpretable Natural Language Understanding with Explanations as Latent Variables

Figure 4 for Towards Interpretable Natural Language Understanding with Explanations as Latent Variables

Abstract:Recently generating natural language explanations has shown very promising results in not only offering interpretable explanations but also providing additional information and supervision for prediction. However, existing approaches usually require a large set of human annotated explanations for training while collecting a large set of explanations is not only time consuming but also expensive. In this paper, we develop a general framework for interpretable natural language understanding that requires only a small set of human annotated explanations for training. Our framework treats natural language explanations as latent variables that model the underlying reasoning process of a neural model. We develop a variational EM framework for optimization where an explanation generation module and an explanation-augmented prediction module are alternatively optimized and mutually enhance each other. Moreover, we further propose an explanation-based self-training method under this framework for semi-supervised learning. It alternates between assigning pseudo-labels to unlabeled data and generating new explanations to iteratively improve each other. Experiments on two natural language understanding tasks demonstrate that our framework can not only make effective predictions in both supervised and semi-supervised settings, but also generate good natural language explanation.

Via

Access Paper or Ask Questions

Iterative Graph Self-Distillation

Oct 23, 2020

Hanlin Zhang, Shuai Lin, Weiyang Liu, Pan Zhou, Jian Tang, Xiaodan Liang, Eric P. Xing

Figure 1 for Iterative Graph Self-Distillation

Figure 2 for Iterative Graph Self-Distillation

Figure 3 for Iterative Graph Self-Distillation

Figure 4 for Iterative Graph Self-Distillation

Abstract:How to discriminatively vectorize graphs is a fundamental challenge that attracts increasing attentions in recent years. Inspired by the recent success of unsupervised contrastive learning, we aim to learn graph-level representation in an unsupervised manner. Specifically, we propose a novel unsupervised graph learning paradigm called Iterative Graph Self-Distillation (IGSD) which iteratively performs the teacher-student distillation with graph augmentations. Different from conventional knowledge distillation, IGSD constructs the teacher with an exponential moving average of the student model and distills the knowledge of itself. The intuition behind IGSD is to predict the teacher network representation of the graph pairs under different augmented views. As a natural extension, we also apply IGSD to semi-supervised scenarios by jointly regularizing the network with both supervised and unsupervised contrastive loss. Finally, we show that finetuning the IGSD-trained models with self-training can further improve the graph representation power. Empirically, we achieve significant and consistent performance gain on various graph datasets in both unsupervised and semi-supervised settings, which well validates the superiority of IGSD.

Via

Access Paper or Ask Questions