Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jian-Guang Lou

CodeT: Code Generation with Generated Tests

Jul 21, 2022

Bei Chen, Fengji Zhang, Anh Nguyen, Daoguang Zan, Zeqi Lin, Jian-Guang Lou, Weizhu Chen

Figure 1 for CodeT: Code Generation with Generated Tests

Figure 2 for CodeT: Code Generation with Generated Tests

Figure 3 for CodeT: Code Generation with Generated Tests

Figure 4 for CodeT: Code Generation with Generated Tests

Abstract:Given a programming problem, pre-trained language models such as Codex have demonstrated the ability to generate multiple different code solutions via sampling. However, selecting a correct or best solution from those samples still remains a challenge. While an easy way to verify the correctness of a code solution is through executing test cases, producing high-quality test cases is prohibitively expensive. In this paper, we explore the use of pre-trained language models to automatically generate test cases, calling our method CodeT: Code generation with generated Tests. CodeT executes the code solutions using the generated test cases, and then chooses the best solution based on a dual execution agreement with both the generated test cases and other generated solutions. We evaluate CodeT on five different pre-trained models with both HumanEval and MBPP benchmarks. Extensive experimental results demonstrate CodeT can achieve significant, consistent, and surprising improvements over previous methods. For example, CodeT improves the pass@1 on HumanEval to 65.8%, an increase of absolute 18.8% on the code-davinci-002 model, and an absolute 20+% improvement over previous state-of-the-art results.

Via

Access Paper or Ask Questions

CERT: Continual Pre-Training on Sketches for Library-Oriented Code Generation

Jun 14, 2022

Daoguang Zan, Bei Chen, Dejian Yang, Zeqi Lin, Minsu Kim, Bei Guan, Yongji Wang, Weizhu Chen, Jian-Guang Lou

Figure 1 for CERT: Continual Pre-Training on Sketches for Library-Oriented Code Generation

Figure 2 for CERT: Continual Pre-Training on Sketches for Library-Oriented Code Generation

Figure 3 for CERT: Continual Pre-Training on Sketches for Library-Oriented Code Generation

Figure 4 for CERT: Continual Pre-Training on Sketches for Library-Oriented Code Generation

Abstract:Code generation is a longstanding challenge, aiming to generate a code snippet based on a natural language description. Usually, expensive text-code paired data is essential for training a code generation model. Recently, thanks to the success of pre-training techniques, large language models are trained on large-scale unlabelled code corpora and perform well in code generation. In this paper, we investigate how to leverage an unlabelled code corpus to train a model for library-oriented code generation. Since it is a common practice for programmers to reuse third-party libraries, in which case the text-code paired data are harder to obtain due to the huge number of libraries. We observe that library-oriented code snippets are more likely to share similar code sketches. Hence, we present CERT with two steps: a sketcher generates the sketch, then a generator fills the details in the sketch. Both the sketcher and the generator are continually pre-trained upon a base model using unlabelled data. Furthermore, we craft two benchmarks named PandasEval and NumpyEval to evaluate library-oriented code generation. Experimental results demonstrate the impressive performance of CERT. For example, it surpasses the base model by an absolute 15.67% improvement in terms of pass@1 on PandasEval. Our work is available at https://github.com/microsoft/PyCodeGPT.

* Accepted for publication at IJCAI-ECAI 2022

Via

Access Paper or Ask Questions

On the Advance of Making Language Models Better Reasoners

Jun 07, 2022

Yifei Li, Zeqi Lin, Shizhuo Zhang, Qiang Fu, Bei Chen, Jian-Guang Lou, Weizhu Chen

Figure 1 for On the Advance of Making Language Models Better Reasoners

Figure 2 for On the Advance of Making Language Models Better Reasoners

Figure 3 for On the Advance of Making Language Models Better Reasoners

Figure 4 for On the Advance of Making Language Models Better Reasoners

Abstract:Large language models such as GPT-3 and PaLM have shown remarkable performance in few-shot learning. However, they still struggle with reasoning tasks such as the arithmetic benchmark GSM8K. Recent advances deliberately guide the language model to generate a chain of reasoning steps before producing the final answer, successfully boosting the GSM8K benchmark from 17.9% to 58.1% in terms of problem solving rate. In this paper, we propose a new approach, DiVeRSe (Diverse Verifier on Reasoning Step), to further advance their reasoning capability. DiVeRSe first explores different prompts to enhance the diversity in reasoning paths. Second, DiVeRSe introduces a verifier to distinguish good answers from bad answers for a better weighted voting. Finally, DiVeRSe verifies the correctness of each single step rather than all the steps in a whole. We conduct extensive experiments using the latest language model code-davinci-002 and demonstrate that DiVeRSe can achieve new state-of-the-art performance on six out of eight reasoning benchmarks (e.g., GSM8K 74.4% to 83.2%), outperforming the PaLM model with 540B parameters.

Via

Access Paper or Ask Questions

DialogZoo: Large-Scale Dialog-Oriented Task Learning

May 25, 2022

Zhi Chen, Jijia Bao, Lu Chen, Yuncong Liu, Da Ma, Bei Chen, Mengyue Wu, Su Zhu, Jian-Guang Lou, Kai Yu

Figure 1 for DialogZoo: Large-Scale Dialog-Oriented Task Learning

Figure 2 for DialogZoo: Large-Scale Dialog-Oriented Task Learning

Figure 3 for DialogZoo: Large-Scale Dialog-Oriented Task Learning

Figure 4 for DialogZoo: Large-Scale Dialog-Oriented Task Learning

Abstract:Building unified conversational agents has been a long-standing goal of the dialogue research community. Most previous works only focus on a subset of various dialogue tasks. In this work, we aim to build a unified foundation model which can solve massive diverse dialogue tasks. To achieve this goal, we first collect a large-scale well-labeled dialogue dataset from 73 publicly available datasets. In addition to this dataset, we further propose two dialogue-oriented self-supervised tasks, and finally use the mixture of supervised and self-supervised datasets to train our foundation model. The supervised examples make the model learn task-specific skills, while the self-supervised examples make the model learn more general skills. We evaluate our model on various downstream dialogue tasks. The experimental results show that our method not only improves the ability of dialogue generation and knowledge distillation, but also the representation ability of models.

* Work in Progress

Via

Access Paper or Ask Questions

LogiGAN: Learning Logical Reasoning via Adversarial Pre-training

May 18, 2022

Xinyu Pi, Wanjun Zhong, Yan Gao, Nan Duan, Jian-Guang Lou

Figure 1 for LogiGAN: Learning Logical Reasoning via Adversarial Pre-training

Figure 2 for LogiGAN: Learning Logical Reasoning via Adversarial Pre-training

Figure 3 for LogiGAN: Learning Logical Reasoning via Adversarial Pre-training

Figure 4 for LogiGAN: Learning Logical Reasoning via Adversarial Pre-training

Abstract:We present LogiGAN, an unsupervised adversarial pre-training framework for improving logical reasoning abilities of language models. Upon automatic identifying logical reasoning phenomena in massive text corpus via detection heuristics, we train language models to predict the masked-out logical statements. Inspired by the facilitation effect of reflective thinking in human learning, we analogically simulate the learning-thinking process with an adversarial Generator-Verifier architecture to assist logic learning. LogiGAN implements a novel sequential GAN approach that (a) circumvents the non-differentiable challenge of the sequential GAN by leveraging the Generator as a sentence-level generative likelihood scorer with a learning objective of reaching scoring consensus with the Verifier; (b) is computationally feasible for large-scale pre-training with arbitrary target length. Both base and large size language models pre-trained with LogiGAN demonstrate obvious performance improvement on 12 datasets requiring general reasoning abilities, revealing the fundamental role of logic in broad reasoning, as well as the effectiveness of LogiGAN. Ablation studies on LogiGAN components reveal the relative orthogonality between linguistic and logic abilities and suggest that reflective thinking's facilitation effect might also generalize to machine learning.

* 19 pages

Via

Access Paper or Ask Questions

GL-CLeF: A Global-Local Contrastive Learning Framework for Cross-lingual Spoken Language Understanding

Apr 18, 2022

Libo Qin, Qiguang Chen, Tianbao Xie, Qixin Li, Jian-Guang Lou, Wanxiang Che, Min-Yen Kan

Figure 1 for GL-CLeF: A Global-Local Contrastive Learning Framework for Cross-lingual Spoken Language Understanding

Figure 2 for GL-CLeF: A Global-Local Contrastive Learning Framework for Cross-lingual Spoken Language Understanding

Figure 3 for GL-CLeF: A Global-Local Contrastive Learning Framework for Cross-lingual Spoken Language Understanding

Figure 4 for GL-CLeF: A Global-Local Contrastive Learning Framework for Cross-lingual Spoken Language Understanding

Abstract:Due to high data demands of current methods, attention to zero-shot cross-lingual spoken language understanding (SLU) has grown, as such approaches greatly reduce human annotation effort. However, existing models solely rely on shared parameters, which can only perform implicit alignment across languages. We present Global--Local Contrastive Learning Framework (GL-CLeF) to address this shortcoming. Specifically, we employ contrastive learning, leveraging bilingual dictionaries to construct multilingual views of the same utterance, then encourage their representations to be more similar than negative example pairs, which achieves to explicitly aligned representations of similar sentences across languages. In addition, a key step in GL-CLeF is a proposed Local and Global component, which achieves a fine-grained cross-lingual transfer (i.e., sentence-level Local intent transfer, token-level Local slot transfer, and semantic-level Global transfer across intent and slot). Experiments on MultiATIS++ show that GL-CLeF achieves the best performance and successfully pulls representations of similar sentences across languages closer.

* Accepted at ACL2022 Main Conference

Via

Access Paper or Ask Questions

UniSAr: A Unified Structure-Aware Autoregressive Language Model for Text-to-SQL

Apr 13, 2022

Longxu Dou, Yan Gao, Mingyang Pan, Dingzirui Wang, Wanxiang Che, Dechen Zhan, Jian-Guang Lou

Figure 1 for UniSAr: A Unified Structure-Aware Autoregressive Language Model for Text-to-SQL

Figure 2 for UniSAr: A Unified Structure-Aware Autoregressive Language Model for Text-to-SQL

Figure 3 for UniSAr: A Unified Structure-Aware Autoregressive Language Model for Text-to-SQL

Figure 4 for UniSAr: A Unified Structure-Aware Autoregressive Language Model for Text-to-SQL

Abstract:Existing text-to-SQL semantic parsers are typically designed for particular settings such as handling queries that span multiple tables, domains or turns which makes them ineffective when applied to different settings. We present UniSAr (Unified Structure-Aware Autoregressive Language Model), which benefits from directly using an off-the-shelf language model architecture and demonstrates consistently high performance under different settings. Specifically, UniSAr extends existing autoregressive language models to incorporate three non-invasive extensions to make them structure-aware: (1) adding structure mark to encode database schema, conversation context, and their relationships; (2) constrained decoding to decode well structured SQL for a given database schema; and (3) SQL completion to complete potential missing JOIN relationships in SQL based on database schema. On seven well-known text-to-SQL datasets covering multi-domain, multi-table and multi-turn, UniSAr demonstrates highly comparable or better performance to the most advanced specifically-designed text-to-SQL models. Importantly, our UniSAr is non-invasive, such that other core model advances in text-to-SQL can also adopt our extensions to further enhance performance.

* Codes and checkpoints are available at https://github.com/microsoft/ContextualSP/tree/master/unified_parser_text_to_sql

Via

Access Paper or Ask Questions

UniDU: Towards A Unified Generative Dialogue Understanding Framework

Apr 10, 2022

Zhi Chen, Lu Chen, Bei Chen, Libo Qin, Yuncong Liu, Su Zhu, Jian-Guang Lou, Kai Yu

Figure 1 for UniDU: Towards A Unified Generative Dialogue Understanding Framework

Figure 2 for UniDU: Towards A Unified Generative Dialogue Understanding Framework

Figure 3 for UniDU: Towards A Unified Generative Dialogue Understanding Framework

Figure 4 for UniDU: Towards A Unified Generative Dialogue Understanding Framework

Abstract:With the development of pre-trained language models, remarkable success has been witnessed in dialogue understanding (DU) direction. However, the current DU approaches just employ an individual model for each DU task, independently, without considering the shared knowledge across different DU tasks. In this paper, we investigate a unified generative dialogue understanding framework, namely UniDU, to achieve information exchange among DU tasks. Specifically, we reformulate the DU tasks into unified generative paradigm. In addition, to consider different training data for each task, we further introduce model-agnostic training strategy to optimize unified model in a balanced manner. We conduct the experiments on ten dialogue understanding datasets, which span five fundamental tasks: dialogue summary, dialogue completion, slot filling, intent detection and dialogue state tracking. The proposed UniDU framework outperforms task-specific well-designed methods on all 5 tasks. We further conduct comprehensive analysis experiments to study the effect factors. The experimental results also show that the proposed method obtains promising performance on unseen dialogue domain.

* 13 pages, 9 figures

Via

Access Paper or Ask Questions

Input-Tuning: Adapting Unfamiliar Inputs to Frozen Pretrained Models

Mar 07, 2022

Shengnan An, Yifei Li, Zeqi Lin, Qian Liu, Bei Chen, Qiang Fu, Weizhu Chen, Nanning Zheng, Jian-Guang Lou

Figure 1 for Input-Tuning: Adapting Unfamiliar Inputs to Frozen Pretrained Models

Figure 2 for Input-Tuning: Adapting Unfamiliar Inputs to Frozen Pretrained Models

Figure 3 for Input-Tuning: Adapting Unfamiliar Inputs to Frozen Pretrained Models

Figure 4 for Input-Tuning: Adapting Unfamiliar Inputs to Frozen Pretrained Models

Abstract:Recently the prompt-tuning paradigm has attracted significant attention. By only tuning continuous prompts with a frozen pre-trained language model (PLM), prompt-tuning takes a step towards deploying a shared frozen PLM to serve numerous downstream tasks. Although prompt-tuning shows good performance on certain natural language understanding (NLU) tasks, its effectiveness on natural language generation (NLG) tasks is still under-explored. In this paper, we argue that one of the factors hindering the development of prompt-tuning on NLG tasks is the unfamiliar inputs (i.e., inputs are linguistically different from the pretraining corpus). For example, our preliminary exploration reveals a large performance gap between prompt-tuning and fine-tuning when unfamiliar inputs occur frequently in NLG tasks. This motivates us to propose input-tuning, which fine-tunes both the continuous prompts and the input representations, leading to a more effective way to adapt unfamiliar inputs to frozen PLMs. Our proposed input-tuning is conceptually simple and empirically powerful. Experimental results on seven NLG tasks demonstrate that input-tuning is significantly and consistently better than prompt-tuning. Furthermore, on three of these tasks, input-tuning can achieve a comparable or even better performance than fine-tuning.

Via

Access Paper or Ask Questions

Reasoning Like Program Executors

Jan 27, 2022

Xinyu Pi, Qian Liu, Bei Chen, Morteza Ziyadi, Zeqi Lin, Yan Gao, Qiang Fu, Jian-Guang Lou, Weizhu Chen

Figure 1 for Reasoning Like Program Executors

Figure 2 for Reasoning Like Program Executors

Figure 3 for Reasoning Like Program Executors

Figure 4 for Reasoning Like Program Executors

Abstract:Reasoning over natural language is a long-standing goal for the research community. However, studies have shown that existing language models are inadequate in reasoning. To address the issue, we present POET, a new pre-training paradigm. Through pre-training language models with programs and their execution results, POET empowers language models to harvest the reasoning knowledge possessed in program executors via a data-driven approach. POET is conceptually simple and can be instantiated by different kinds of programs. In this paper, we show three empirically powerful instances, i.e., POET-Math, POET-Logic, and POET-SQL. Experimental results on six benchmarks demonstrate that POET can significantly boost model performance on natural language reasoning, such as numerical reasoning, logical reasoning, and multi-hop reasoning. Taking the DROP benchmark as a representative example, POET improves the F1 metric of BART from 69.2% to 80.6%. Furthermore, POET shines in giant language models, pushing the F1 metric of T5-11B to 87.6% and achieving a new state-of-the-art performance on DROP. POET opens a new gate on reasoning-enhancement pre-training and we hope our analysis would shed light on the future research of reasoning like program executors.

* Work in progress.The first two authors contributed equally

Via

Access Paper or Ask Questions