Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yichong Xu

KG-FiD: Infusing Knowledge Graph in Fusion-in-Decoder for Open-Domain Question Answering

Oct 08, 2021
Donghan Yu, Chenguang Zhu, Yuwei Fang, Wenhao Yu, Shuohang Wang, Yichong Xu, Xiang Ren, Yiming Yang, Michael Zeng

Figure 1 for KG-FiD: Infusing Knowledge Graph in Fusion-in-Decoder for Open-Domain Question Answering

Figure 2 for KG-FiD: Infusing Knowledge Graph in Fusion-in-Decoder for Open-Domain Question Answering

Figure 3 for KG-FiD: Infusing Knowledge Graph in Fusion-in-Decoder for Open-Domain Question Answering

Figure 4 for KG-FiD: Infusing Knowledge Graph in Fusion-in-Decoder for Open-Domain Question Answering

Current Open-Domain Question Answering (ODQA) model paradigm often contains a retrieving module and a reading module. Given an input question, the reading module predicts the answer from the relevant passages which are retrieved by the retriever. The recent proposed Fusion-in-Decoder (FiD), which is built on top of the pretrained generative model T5, achieves the state-of-the-art performance in the reading module. Although being effective, it remains constrained by inefficient attention on all retrieved passages which contain a lot of noise. In this work, we propose a novel method KG-FiD, which filters noisy passages by leveraging the structural relationship among the retrieved passages with a knowledge graph. We initiate the passage node embedding from the FiD encoder and then use graph neural network (GNN) to update the representation for reranking. To improve the efficiency, we build the GNN on top of the intermediate layer output of the FiD encoder and only pass a few top reranked passages into the higher layers of encoder and decoder for answer generation. We also apply the proposed GNN based reranking method to enhance the passage retrieval results in the retrieving module. Extensive experiments on common ODQA benchmark datasets (Natural Question and TriviaQA) demonstrate that KG-FiD can improve vanilla FiD by up to 1.5% on answer exact match score and achieve comparable performance with FiD with only 40% of computation cost.

Via

Access Paper or Ask Questions

DialogLM: Pre-trained Model for Long Dialogue Understanding and Summarization

Sep 06, 2021
Ming Zhong, Yang Liu, Yichong Xu, Chenguang Zhu, Michael Zeng

Figure 1 for DialogLM: Pre-trained Model for Long Dialogue Understanding and Summarization

Figure 2 for DialogLM: Pre-trained Model for Long Dialogue Understanding and Summarization

Figure 3 for DialogLM: Pre-trained Model for Long Dialogue Understanding and Summarization

Figure 4 for DialogLM: Pre-trained Model for Long Dialogue Understanding and Summarization

Dialogue is an essential part of human communication and cooperation. Existing research mainly focuses on short dialogue scenarios in a one-on-one fashion. However, multi-person interactions in the real world, such as meetings or interviews, are frequently over a few thousand words. There is still a lack of corresponding research and powerful tools to understand and process such long dialogues. Therefore, in this work, we present a pre-training framework for long dialogue understanding and summarization. Considering the nature of long conversations, we propose a window-based denoising approach for generative pre-training. For a dialogue, it corrupts a window of text with dialogue-inspired noise, and guides the model to reconstruct this window based on the content of the remaining conversation. Furthermore, to process longer input, we augment the model with sparse attention which is combined with conventional attention in a hybrid manner. We conduct extensive experiments on five datasets of long dialogues, covering tasks of dialogue summarization, abstractive question answering and topic segmentation. Experimentally, we show that our pre-trained model DialogLM significantly surpasses the state-of-the-art models across datasets and tasks.

* Preprint

Via

Access Paper or Ask Questions

Want To Reduce Labeling Cost? GPT-3 Can Help

Aug 30, 2021
Shuohang Wang, Yang Liu, Yichong Xu, Chenguang Zhu, Michael Zeng

Figure 1 for Want To Reduce Labeling Cost? GPT-3 Can Help

Figure 2 for Want To Reduce Labeling Cost? GPT-3 Can Help

Figure 3 for Want To Reduce Labeling Cost? GPT-3 Can Help

Figure 4 for Want To Reduce Labeling Cost? GPT-3 Can Help

Data annotation is a time-consuming and labor-intensive process for many NLP tasks. Although there exist various methods to produce pseudo data labels, they are often task-specific and require a decent amount of labeled data to start with. Recently, the immense language model GPT-3 with 175 billion parameters has achieved tremendous improvement across many few-shot learning tasks. In this paper, we explore ways to leverage GPT-3 as a low-cost data labeler to train other models. We find that, to make the downstream model achieve the same performance on a variety of NLU and NLG tasks, it costs 50% to 96% less to use labels from GPT-3 than using labels from humans. Furthermore, we propose a novel framework of combining pseudo labels from GPT-3 with human labels, which leads to even better performance with limited labeling budget. These results present a cost-effective data labeling methodology that is generalizable to many practical applications.

* Findings of EMNLP 2021, 11 pages

Via

Access Paper or Ask Questions

Retrieval Enhanced Model for Commonsense Generation

May 24, 2021
Han Wang, Yang Liu, Chenguang Zhu, Linjun Shou, Ming Gong, Yichong Xu, Michael Zeng

Figure 1 for Retrieval Enhanced Model for Commonsense Generation

Figure 2 for Retrieval Enhanced Model for Commonsense Generation

Figure 3 for Retrieval Enhanced Model for Commonsense Generation

Figure 4 for Retrieval Enhanced Model for Commonsense Generation

Commonsense generation is a challenging task of generating a plausible sentence describing an everyday scenario using provided concepts. Its requirement of reasoning over commonsense knowledge and compositional generalization ability even puzzles strong pre-trained language generation models. We propose a novel framework using retrieval methods to enhance both the pre-training and fine-tuning for commonsense generation. We retrieve prototype sentence candidates by concept matching and use them as auxiliary input. For fine-tuning, we further boost its performance with a trainable sentence retriever. We demonstrate experimentally on the large-scale CommonGen benchmark that our approach achieves new state-of-the-art results.

* Findings of ACL-IJCNLP 2021

Via

Access Paper or Ask Questions

Fusing Context Into Knowledge Graph for Commonsense Reasoning

Dec 09, 2020
Yichong Xu, Chenguang Zhu, Ruochen Xu, Yang Liu, Michael Zeng, Xuedong Huang

Figure 1 for Fusing Context Into Knowledge Graph for Commonsense Reasoning

Figure 2 for Fusing Context Into Knowledge Graph for Commonsense Reasoning

Figure 3 for Fusing Context Into Knowledge Graph for Commonsense Reasoning

Figure 4 for Fusing Context Into Knowledge Graph for Commonsense Reasoning

Commonsense reasoning requires a model to make presumptions about world events via language understanding. Many methods couple pre-trained language models with knowledge graphs in order to combine the merits in language modeling and entity-based relational learning. However, although a knowledge graph contains rich structural information, it lacks the context to provide a more precise understanding of the concepts and relations. This creates a gap when fusing knowledge graphs into language modeling, especially in the scenario of insufficient paired text-knowledge data. In this paper, we propose to utilize external entity description to provide contextual information for graph entities. For the CommonsenseQA task, our model first extracts concepts from the question and choice, and then finds a related triple between these concepts. Next, it retrieves the descriptions of these concepts from Wiktionary and feed them as additional input to a pre-trained language model, together with the triple. The resulting model can attain much more effective commonsense reasoning capability, achieving state-of-the-art results in the CommonsenseQA dataset with an accuracy of 80.7% (single model) and 83.3% (ensemble model) on the official leaderboard.

Via

Access Paper or Ask Questions

Preference-based Reinforcement Learning with Finite-Time Guarantees

Jun 16, 2020
Yichong Xu, Ruosong Wang, Lin F. Yang, Aarti Singh, Artur Dubrawski

Figure 1 for Preference-based Reinforcement Learning with Finite-Time Guarantees

Figure 2 for Preference-based Reinforcement Learning with Finite-Time Guarantees

Preference-based Reinforcement Learning (PbRL) replaces reward values in traditional reinforcement learning by preferences to better elicit human opinion on the target objective, especially when numerical reward values are hard to design or interpret. Despite promising results in applications, the theoretical understanding of PbRL is still in its infancy. In this paper, we present the first finite-time analysis for general PbRL problems. We first show that a unique optimal policy may not exist if preferences over trajectories are deterministic for PbRL. If preferences are stochastic, and the preference probability relates to the hidden reward values, we present algorithms for PbRL, both with and without a simulator, that are able to identify the best policy up to accuracy $\varepsilon$ with high probability. Our method explores the state space by navigating to under-explored states, and solves PbRL using a combination of dueling bandits and policy search. Experiments show the efficacy of our method when it is applied to real-world problems.

* 22 pages, 2 figures

Via

Access Paper or Ask Questions

Zeroth Order Non-convex optimization with Dueling-Choice Bandits

Nov 03, 2019
Yichong Xu, Aparna Joshi, Aarti Singh, Artur Dubrawski

Figure 1 for Zeroth Order Non-convex optimization with Dueling-Choice Bandits

Figure 2 for Zeroth Order Non-convex optimization with Dueling-Choice Bandits

Figure 3 for Zeroth Order Non-convex optimization with Dueling-Choice Bandits

We consider a novel setting of zeroth order non-convex optimization, where in addition to querying the function value at a given point, we can also duel two points and get the point with the larger function value. We refer to this setting as optimization with dueling-choice bandits since both direct queries and duels are available for optimization. We give the COMP-GP-UCB algorithm based on GP-UCB (Srinivas et al., 2009), where instead of directly querying the point with the maximum Upper Confidence Bound (UCB), we perform a constrained optimization and use comparisons to filter out suboptimal points. COMP-GP-UCB comes with theoretical guarantee of $O(\frac{\Phi}{\sqrt{T}})$ on simple regret where $T$ is the number of direct queries and $\Phi$ is an improved information gain corresponding to a comparison based constraint set that restricts the search space for the optimum. In contrast, in the direct query only setting, $\Phi$ depends on the entire domain. Finally, we present experimental results to show the efficacy of our algorithm.

* 19 pages, 3 figures

Via

Access Paper or Ask Questions

Active Learning for Graph Neural Networks via Node Feature Propagation

Oct 16, 2019
Yuexin Wu, Yichong Xu, Aarti Singh, Yiming Yang, Artur Dubrawski

Figure 1 for Active Learning for Graph Neural Networks via Node Feature Propagation

Figure 2 for Active Learning for Graph Neural Networks via Node Feature Propagation

Figure 3 for Active Learning for Graph Neural Networks via Node Feature Propagation

Figure 4 for Active Learning for Graph Neural Networks via Node Feature Propagation

Graph Neural Networks (GNNs) for prediction tasks like node classification or edge prediction have received increasing attention in recent machine learning from graphically structured data. However, a large quantity of labeled graphs is difficult to obtain, which significantly limits the true success of GNNs. Although active learning has been widely studied for addressing label-sparse issues with other data types like text, images, etc., how to make it effective over graphs is an open question for research. In this paper, we present an investigation on active learning with GNNs for node classification tasks. Specifically, we propose a new method, which uses node feature propagation followed by K-Medoids clustering of the nodes for instance selection in active learning. With a theoretical bound analysis we justify the design choice of our approach. In our experiments on four benchmark datasets, the proposed method outperforms other representative baseline methods consistently and significantly.

* 16 pages, 5 figures

Via

Access Paper or Ask Questions

Thresholding Bandit Problem with Both Duels and Pulls

Oct 14, 2019
Yichong Xu, Xi Chen, Aarti Singh, Artur Dubrawski

Figure 1 for Thresholding Bandit Problem with Both Duels and Pulls

Figure 2 for Thresholding Bandit Problem with Both Duels and Pulls

Figure 3 for Thresholding Bandit Problem with Both Duels and Pulls

Figure 4 for Thresholding Bandit Problem with Both Duels and Pulls

The Thresholding Bandit Problem (TBP) aims to find the set of arms with mean rewards greater than a given threshold. We consider a new setting of TBP, where in addition to pulling arms, one can also duel two arms and get the arm with a greater mean. In our motivating application from crowdsourcing, dueling two arms can be more cost and time efficient than direct pulls. We refer to this problem as TBP with Dueling Choices (TBP-DC). This paper provides an algorithm called Rank-Search (RS) for solving TBP-DC by alternating between ranking and binary search. We prove theoretical guarantees for RS, and also give lower bounds to show the optimality of it. Experiments show that RS outperforms previous baseline algorithms that only use pulls or duels.

* 21 pages, 8 figures

Via

Access Paper or Ask Questions

DoubleTransfer at MEDIQA 2019: Multi-Source Transfer Learning for Natural Language Understanding in the Medical Domain

Jun 11, 2019
Yichong Xu, Xiaodong Liu, Chunyuan Li, Hoifung Poon, Jianfeng Gao

Figure 1 for DoubleTransfer at MEDIQA 2019: Multi-Source Transfer Learning for Natural Language Understanding in the Medical Domain

Figure 2 for DoubleTransfer at MEDIQA 2019: Multi-Source Transfer Learning for Natural Language Understanding in the Medical Domain

Figure 3 for DoubleTransfer at MEDIQA 2019: Multi-Source Transfer Learning for Natural Language Understanding in the Medical Domain

Figure 4 for DoubleTransfer at MEDIQA 2019: Multi-Source Transfer Learning for Natural Language Understanding in the Medical Domain

This paper describes our competing system to enter the MEDIQA-2019 competition. We use a multi-source transfer learning approach to transfer the knowledge from MT-DNN and SciBERT to natural language understanding tasks in the medical domain. For transfer learning fine-tuning, we use multi-task learning on NLI, RQE and QA tasks on general and medical domains to improve performance. The proposed methods are proved effective for natural language understanding in the medical domain, and we rank the first place on the QA task.

* Proceedings of the BioNLP 2019 workshop, ACL 2019; 7 pages, 5 tables, 1 figure

Via

Access Paper or Ask Questions