Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Bing Xiang

ContraGen: Effective Contrastive Learning For Causal Language Model

Oct 03, 2022
Nihal Jain, Dejiao Zhang, Wasi Uddin Ahmad, Zijian Wang, Feng Nan, Xiaopeng Li, Ming Tan, Ramesh Nallapati, Baishakhi Ray, Parminder Bhatia, Xiaofei Ma, Bing Xiang

Figure 1 for ContraGen: Effective Contrastive Learning For Causal Language Model

Figure 2 for ContraGen: Effective Contrastive Learning For Causal Language Model

Figure 3 for ContraGen: Effective Contrastive Learning For Causal Language Model

Figure 4 for ContraGen: Effective Contrastive Learning For Causal Language Model

Despite exciting progress in large-scale language generation, the expressiveness of its representations is severely limited by the \textit{anisotropy} issue where the hidden representations are distributed into a narrow cone in the vector space. To address this issue, we present ContraGen, a novel contrastive learning framework to improve the representation with better uniformity and discrimination. We assess ContraGen on a wide range of downstream tasks in natural and programming languages. We show that ContraGen can effectively enhance both uniformity and discrimination of the representations and lead to the desired improvement on various language understanding tasks where discriminative representations are crucial for attaining good performance. Specifically, we attain $44\%$ relative improvement on the Semantic Textual Similarity tasks and $34\%$ on Code-to-Code Search tasks. Furthermore, by improving the expressiveness of the representations, ContraGen also boosts the source code generation capability with $9\%$ relative improvement on execution accuracy on the HumanEval benchmark.

* 10 pages

Via

Access Paper or Ask Questions

DecAF: Joint Decoding of Answers and Logical Forms for Question Answering over Knowledge Bases

Sep 30, 2022
Donghan Yu, Sheng Zhang, Patrick Ng, Henghui Zhu, Alexander Hanbo Li, Jun Wang, Yiqun Hu, William Wang, Zhiguo Wang, Bing Xiang

Figure 1 for DecAF: Joint Decoding of Answers and Logical Forms for Question Answering over Knowledge Bases

Figure 2 for DecAF: Joint Decoding of Answers and Logical Forms for Question Answering over Knowledge Bases

Figure 3 for DecAF: Joint Decoding of Answers and Logical Forms for Question Answering over Knowledge Bases

Figure 4 for DecAF: Joint Decoding of Answers and Logical Forms for Question Answering over Knowledge Bases

Question answering over knowledge bases (KBs) aims to answer natural language questions with factual information such as entities and relations in KBs. Previous methods either generate logical forms that can be executed over KBs to obtain final answers or predict answers directly. Empirical results show that the former often produces more accurate answers, but it suffers from non-execution issues due to potential syntactic and semantic errors in the generated logical forms. In this work, we propose a novel framework DecAF that jointly generates both logical forms and direct answers, and then combines the merits of them to get the final answers. Moreover, different from most of the previous methods, DecAF is based on simple free-text retrieval without relying on any entity linking tools -- this simplification eases its adaptation to different datasets. DecAF achieves new state-of-the-art accuracy on WebQSP, FreebaseQA, and GrailQA benchmarks, while getting competitive results on the ComplexWebQuestions benchmark.

Via

Access Paper or Ask Questions

Improving Text-to-SQL Semantic Parsing with Fine-grained Query Understanding

Sep 28, 2022
Jun Wang, Patrick Ng, Alexander Hanbo Li, Jiarong Jiang, Zhiguo Wang, Ramesh Nallapati, Bing Xiang, Sudipta Sengupta

Figure 1 for Improving Text-to-SQL Semantic Parsing with Fine-grained Query Understanding

Figure 2 for Improving Text-to-SQL Semantic Parsing with Fine-grained Query Understanding

Figure 3 for Improving Text-to-SQL Semantic Parsing with Fine-grained Query Understanding

Figure 4 for Improving Text-to-SQL Semantic Parsing with Fine-grained Query Understanding

Most recent research on Text-to-SQL semantic parsing relies on either parser itself or simple heuristic based approach to understand natural language query (NLQ). When synthesizing a SQL query, there is no explicit semantic information of NLQ available to the parser which leads to undesirable generalization performance. In addition, without lexical-level fine-grained query understanding, linking between query and database can only rely on fuzzy string match which leads to suboptimal performance in real applications. In view of this, in this paper we present a general-purpose, modular neural semantic parsing framework that is based on token-level fine-grained query understanding. Our framework consists of three modules: named entity recognizer (NER), neural entity linker (NEL) and neural semantic parser (NSP). By jointly modeling query and database, NER model analyzes user intents and identifies entities in the query. NEL model links typed entities to schema and cell values in database. Parser model leverages available semantic information and linking results and synthesizes tree-structured SQL queries based on dynamically generated grammar. Experiments on SQUALL, a newly released semantic parsing dataset, show that we can achieve 56.8% execution accuracy on WikiTableQuestions (WTQ) test set, which outperforms the state-of-the-art model by 2.7%.

* EMNLP Industry Track 2022

Via

Access Paper or Ask Questions

REKnow: Enhanced Knowledge for Joint Entity and Relation Extraction

Jun 20, 2022
Sheng Zhang, Patrick Ng, Zhiguo Wang, Bing Xiang

Figure 1 for REKnow: Enhanced Knowledge for Joint Entity and Relation Extraction

Figure 2 for REKnow: Enhanced Knowledge for Joint Entity and Relation Extraction

Figure 3 for REKnow: Enhanced Knowledge for Joint Entity and Relation Extraction

Figure 4 for REKnow: Enhanced Knowledge for Joint Entity and Relation Extraction

Relation extraction is an important but challenging task that aims to extract all hidden relational facts from the text. With the development of deep language models, relation extraction methods have achieved good performance on various benchmarks. However, we observe two shortcomings of previous methods: first, there is no unified framework that works well under various relation extraction settings; second, effectively utilizing external knowledge as background information is absent. In this work, we propose a knowledge-enhanced generative model to mitigate these two issues. Our generative model is a unified framework to sequentially generate relational triplets under various relation extraction settings and explicitly utilizes relevant knowledge from Knowledge Graph (KG) to resolve ambiguities. Our model achieves superior performance on multiple benchmarks and settings, including WebNLG, NYT10, and TACRED.

Via

Access Paper or Ask Questions

Learning Dialogue Representations from Consecutive Utterances

May 26, 2022
Zhihan Zhou, Dejiao Zhang, Wei Xiao, Nicholas Dingwall, Xiaofei Ma, Andrew O. Arnold, Bing Xiang

Figure 1 for Learning Dialogue Representations from Consecutive Utterances

Figure 2 for Learning Dialogue Representations from Consecutive Utterances

Figure 3 for Learning Dialogue Representations from Consecutive Utterances

Figure 4 for Learning Dialogue Representations from Consecutive Utterances

Learning high-quality dialogue representations is essential for solving a variety of dialogue-oriented tasks, especially considering that dialogue systems often suffer from data scarcity. In this paper, we introduce Dialogue Sentence Embedding (DSE), a self-supervised contrastive learning method that learns effective dialogue representations suitable for a wide range of dialogue tasks. DSE learns from dialogues by taking consecutive utterances of the same dialogue as positive pairs for contrastive learning. Despite its simplicity, DSE achieves significantly better representation capability than other dialogue representation and universal sentence representation models. We evaluate DSE on five downstream dialogue tasks that examine dialogue representation at different semantic granularities. Experiments in few-shot and zero-shot settings show that DSE outperforms baselines by a large margin. For example, it achieves 13 average performance improvement over the strongest unsupervised baseline in 1-shot intent classification on 6 datasets. We also provide analyses on the benefits and limitations of our model.

* NAACL 2022 main conference

Via

Access Paper or Ask Questions

DQ-BART: Efficient Sequence-to-Sequence Model via Joint Distillation and Quantization

Mar 21, 2022
Zheng Li, Zijian Wang, Ming Tan, Ramesh Nallapati, Parminder Bhatia, Andrew Arnold, Bing Xiang, Dan Roth

Figure 1 for DQ-BART: Efficient Sequence-to-Sequence Model via Joint Distillation and Quantization

Figure 2 for DQ-BART: Efficient Sequence-to-Sequence Model via Joint Distillation and Quantization

Figure 3 for DQ-BART: Efficient Sequence-to-Sequence Model via Joint Distillation and Quantization

Figure 4 for DQ-BART: Efficient Sequence-to-Sequence Model via Joint Distillation and Quantization

Large-scale pre-trained sequence-to-sequence models like BART and T5 achieve state-of-the-art performance on many generative NLP tasks. However, such models pose a great challenge in resource-constrained scenarios owing to their large memory requirements and high latency. To alleviate this issue, we propose to jointly distill and quantize the model, where knowledge is transferred from the full-precision teacher model to the quantized and distilled low-precision student model. Empirical analyses show that, despite the challenging nature of generative tasks, we were able to achieve a 16.5x model footprint compression ratio with little performance drop relative to the full-precision counterparts on multiple summarization and QA datasets. We further pushed the limit of compression ratio to 27.7x and presented the performance-efficiency trade-off for generative tasks using pre-trained models. To the best of our knowledge, this is the first work aiming to effectively distill and quantize sequence-to-sequence pre-trained models for language generation tasks.

* ACL 2022

Via

Access Paper or Ask Questions

Contrastive Document Representation Learning with Graph Attention Networks

Oct 20, 2021
Peng Xu, Xinchi Chen, Xiaofei Ma, Zhiheng Huang, Bing Xiang

Figure 1 for Contrastive Document Representation Learning with Graph Attention Networks

Figure 2 for Contrastive Document Representation Learning with Graph Attention Networks

Figure 3 for Contrastive Document Representation Learning with Graph Attention Networks

Figure 4 for Contrastive Document Representation Learning with Graph Attention Networks

Recent progress in pretrained Transformer-based language models has shown great success in learning contextual representation of text. However, due to the quadratic self-attention complexity, most of the pretrained Transformers models can only handle relatively short text. It is still a challenge when it comes to modeling very long documents. In this work, we propose to use a graph attention network on top of the available pretrained Transformers model to learn document embeddings. This graph attention network allows us to leverage the high-level semantic structure of the document. In addition, based on our graph document model, we design a simple contrastive learning strategy to pretrain our models on a large amount of unlabeled corpus. Empirically, we demonstrate the effectiveness of our approaches in document classification and document retrieval tasks.

* Findings of EMNLP 2021

Via

Access Paper or Ask Questions

Attention-guided Generative Models for Extractive Question Answering

Oct 12, 2021
Peng Xu, Davis Liang, Zhiheng Huang, Bing Xiang

Figure 1 for Attention-guided Generative Models for Extractive Question Answering

Figure 2 for Attention-guided Generative Models for Extractive Question Answering

Figure 3 for Attention-guided Generative Models for Extractive Question Answering

Figure 4 for Attention-guided Generative Models for Extractive Question Answering

We propose a novel method for applying Transformer models to extractive question answering (QA) tasks. Recently, pretrained generative sequence-to-sequence (seq2seq) models have achieved great success in question answering. Contributing to the success of these models are internal attention mechanisms such as cross-attention. We propose a simple strategy to obtain an extractive answer span from the generative model by leveraging the decoder cross-attention patterns. Viewing cross-attention as an architectural prior, we apply joint training to further improve QA performance. Empirical results show that on open-domain question answering datasets like NaturalQuestions and TriviaQA, our method approaches state-of-the-art performance on both generative and extractive inference, all while using much fewer parameters. Furthermore, this strategy allows us to perform hallucination-free inference while conferring significant improvements to the model's ability to rerank relevant passages.

* 10 pages

Via

Access Paper or Ask Questions

Multiplicative Position-aware Transformer Models for Language Understanding

Sep 27, 2021
Zhiheng Huang, Davis Liang, Peng Xu, Bing Xiang

Figure 1 for Multiplicative Position-aware Transformer Models for Language Understanding

Figure 2 for Multiplicative Position-aware Transformer Models for Language Understanding

Figure 3 for Multiplicative Position-aware Transformer Models for Language Understanding

Figure 4 for Multiplicative Position-aware Transformer Models for Language Understanding

Transformer models, which leverage architectural improvements like self-attention, perform remarkably well on Natural Language Processing (NLP) tasks. The self-attention mechanism is position agnostic. In order to capture positional ordering information, various flavors of absolute and relative position embeddings have been proposed. However, there is no systematic analysis on their contributions and a comprehensive comparison of these methods is missing in the literature. In this paper, we review major existing position embedding methods and compare their accuracy on downstream NLP tasks, using our own implementations. We also propose a novel multiplicative embedding method which leads to superior accuracy when compared to existing methods. Finally, we show that our proposed embedding method, served as a drop-in replacement of the default absolute position embedding, can improve the RoBERTa-base and RoBERTa-large models on SQuAD1.1 and SQuAD2.0 datasets.

* arXiv admin note: text overlap with arXiv:2009.13658

Via

Access Paper or Ask Questions

Pairwise Supervised Contrastive Learning of Sentence Representations

Sep 12, 2021
Dejiao Zhang, Shang-Wen Li, Wei Xiao, Henghui Zhu, Ramesh Nallapati, Andrew O. Arnold, Bing Xiang

Figure 1 for Pairwise Supervised Contrastive Learning of Sentence Representations

Figure 2 for Pairwise Supervised Contrastive Learning of Sentence Representations

Figure 3 for Pairwise Supervised Contrastive Learning of Sentence Representations

Figure 4 for Pairwise Supervised Contrastive Learning of Sentence Representations

Many recent successes in sentence representation learning have been achieved by simply fine-tuning on the Natural Language Inference (NLI) datasets with triplet loss or siamese loss. Nevertheless, they share a common weakness: sentences in a contradiction pair are not necessarily from different semantic categories. Therefore, optimizing the semantic entailment and contradiction reasoning objective alone is inadequate to capture the high-level semantic structure. The drawback is compounded by the fact that the vanilla siamese or triplet losses only learn from individual sentence pairs or triplets, which often suffer from bad local optima. In this paper, we propose PairSupCon, an instance discrimination based approach aiming to bridge semantic entailment and contradiction understanding with high-level categorical concept encoding. We evaluate PairSupCon on various downstream tasks that involve understanding sentence semantics at different granularities. We outperform the previous state-of-the-art method with $10\%$--$13\%$ averaged improvement on eight clustering tasks, and $5\%$--$6\%$ averaged improvement on seven semantic textual similarity (STS) tasks.

* 9 pages, EMNLP 2021

Via

Access Paper or Ask Questions