Alert button
Picture for Wasi Uddin Ahmad

Wasi Uddin Ahmad

Alert button

Pre-trained Language Models for Keyphrase Generation: A Thorough Empirical Study

Dec 20, 2022
Di Wu, Wasi Uddin Ahmad, Kai-Wei Chang

Figure 1 for Pre-trained Language Models for Keyphrase Generation: A Thorough Empirical Study
Figure 2 for Pre-trained Language Models for Keyphrase Generation: A Thorough Empirical Study
Figure 3 for Pre-trained Language Models for Keyphrase Generation: A Thorough Empirical Study
Figure 4 for Pre-trained Language Models for Keyphrase Generation: A Thorough Empirical Study

Neural models that do not rely on pre-training have excelled in the keyphrase generation task with large annotated datasets. Meanwhile, new approaches have incorporated pre-trained language models (PLMs) for their data efficiency. However, there lacks a systematic study of how the two types of approaches compare and how different design choices can affect the performance of PLM-based models. To fill in this knowledge gap and facilitate a more informed use of PLMs for keyphrase extraction and keyphrase generation, we present an in-depth empirical study. Formulating keyphrase extraction as sequence labeling and keyphrase generation as sequence-to-sequence generation, we perform extensive experiments in three domains. After showing that PLMs have competitive high-resource performance and state-of-the-art low-resource performance, we investigate important design choices including in-domain PLMs, PLMs with different pre-training objectives, using PLMs with a parameter budget, and different formulations for present keyphrases. Further results show that (1) in-domain BERT-like PLMs can be used to build strong and data-efficient keyphrase generation models; (2) with a fixed parameter budget, prioritizing model depth over width and allocating more layers in the encoder leads to better encoder-decoder models; and (3) introducing four in-domain PLMs, we achieve a competitive performance in the news domain and the state-of-the-art performance in the scientific domain.

Viaarxiv icon

PLUE: Language Understanding Evaluation Benchmark for Privacy Policies in English

Dec 20, 2022
Jianfeng Chi, Wasi Uddin Ahmad, Yuan Tian, Kai-Wei Chang

Figure 1 for PLUE: Language Understanding Evaluation Benchmark for Privacy Policies in English
Figure 2 for PLUE: Language Understanding Evaluation Benchmark for Privacy Policies in English
Figure 3 for PLUE: Language Understanding Evaluation Benchmark for Privacy Policies in English

Privacy policies provide individuals with information about their rights and how their personal information is handled. Natural language understanding (NLU) technologies can support individuals and practitioners to understand better privacy practices described in lengthy and complex documents. However, existing efforts that use NLU technologies are limited by processing the language in a way exclusive to a single task focusing on certain privacy practices. To this end, we introduce the Privacy Policy Language Understanding Evaluation (PLUE) benchmark, a multi-task benchmark for evaluating the privacy policy language understanding across various tasks. We also collect a large corpus of privacy policies to enable privacy policy domain-specific language model pre-training. We demonstrate that domain-specific pre-training offers performance improvements across all tasks. We release the benchmark to encourage future research in this domain.

Viaarxiv icon

CoCoMIC: Code Completion By Jointly Modeling In-file and Cross-file Context

Dec 20, 2022
Yangruibo Ding, Zijian Wang, Wasi Uddin Ahmad, Murali Krishna Ramanathan, Ramesh Nallapati, Parminder Bhatia, Dan Roth, Bing Xiang

Figure 1 for CoCoMIC: Code Completion By Jointly Modeling In-file and Cross-file Context
Figure 2 for CoCoMIC: Code Completion By Jointly Modeling In-file and Cross-file Context
Figure 3 for CoCoMIC: Code Completion By Jointly Modeling In-file and Cross-file Context
Figure 4 for CoCoMIC: Code Completion By Jointly Modeling In-file and Cross-file Context

While pre-trained language models (LM) for code have achieved great success in code completion, they generate code conditioned only on the contents within the file, i.e., in-file context, but ignore the rich semantics in other files within the same project, i.e., cross-file context, a critical source of information that is especially useful in modern modular software development. Such overlooking constrains code language models' capacity in code completion, leading to unexpected behaviors such as generating hallucinated class member functions or function calls with unexpected arguments. In this work, we develop a cross-file context finder tool, CCFINDER, that effectively locates and retrieves the most relevant cross-file context. We propose CoCoMIC, a framework that incorporates cross-file context to learn the in-file and cross-file context jointly on top of pretrained code LMs. CoCoMIC successfully improves the existing code LM with a 19.30% relative increase in exact match and a 15.41% relative increase in identifier matching for code completion when the cross-file context is provided.

Viaarxiv icon

Multi-lingual Evaluation of Code Generation Models

Oct 26, 2022
Ben Athiwaratkun, Sanjay Krishna Gouda, Zijian Wang, Xiaopeng Li, Yuchen Tian, Ming Tan, Wasi Uddin Ahmad, Shiqi Wang, Qing Sun, Mingyue Shang, Sujan Kumar Gonugondla, Hantian Ding, Varun Kumar, Nathan Fulton, Arash Farahani, Siddhartha Jain, Robert Giaquinto, Haifeng Qian, Murali Krishna Ramanathan, Ramesh Nallapati, Baishakhi Ray, Parminder Bhatia, Sudipta Sengupta, Dan Roth, Bing Xiang

Figure 1 for Multi-lingual Evaluation of Code Generation Models
Figure 2 for Multi-lingual Evaluation of Code Generation Models
Figure 3 for Multi-lingual Evaluation of Code Generation Models
Figure 4 for Multi-lingual Evaluation of Code Generation Models

We present MBXP, an execution-based code completion benchmark in 10+ programming languages. This collection of datasets is generated by our conversion framework that translates prompts and test cases from the original MBPP dataset to the corresponding data in a target language. Based on this benchmark, we are able to evaluate code generation models in a multi-lingual fashion, and in particular discover generalization ability of language models on out-of-domain languages, advantages of large multi-lingual models over mono-lingual, benefits of few-shot prompting, and zero-shot translation abilities. In addition, we use our code generation model to perform large-scale bootstrapping to obtain synthetic canonical solutions in several languages. These solutions can be used for other code-related evaluations such as insertion-based, summarization, or code translation tasks where we demonstrate results and release as part of our benchmark.

* Code and data release: https://github.com/amazon-research/mbxp-exec-eval 
Viaarxiv icon

ContraGen: Effective Contrastive Learning For Causal Language Model

Oct 03, 2022
Nihal Jain, Dejiao Zhang, Wasi Uddin Ahmad, Zijian Wang, Feng Nan, Xiaopeng Li, Ming Tan, Ramesh Nallapati, Baishakhi Ray, Parminder Bhatia, Xiaofei Ma, Bing Xiang

Figure 1 for ContraGen: Effective Contrastive Learning For Causal Language Model
Figure 2 for ContraGen: Effective Contrastive Learning For Causal Language Model
Figure 3 for ContraGen: Effective Contrastive Learning For Causal Language Model
Figure 4 for ContraGen: Effective Contrastive Learning For Causal Language Model

Despite exciting progress in large-scale language generation, the expressiveness of its representations is severely limited by the \textit{anisotropy} issue where the hidden representations are distributed into a narrow cone in the vector space. To address this issue, we present ContraGen, a novel contrastive learning framework to improve the representation with better uniformity and discrimination. We assess ContraGen on a wide range of downstream tasks in natural and programming languages. We show that ContraGen can effectively enhance both uniformity and discrimination of the representations and lead to the desired improvement on various language understanding tasks where discriminative representations are crucial for attaining good performance. Specifically, we attain $44\%$ relative improvement on the Semantic Textual Similarity tasks and $34\%$ on Code-to-Code Search tasks. Furthermore, by improving the expressiveness of the representations, ContraGen also boosts the source code generation capability with $9\%$ relative improvement on execution accuracy on the HumanEval benchmark.

* 10 pages 
Viaarxiv icon

FixEval: Execution-based Evaluation of Program Fixes for Competitive Programming Problems

Jun 15, 2022
Md Mahim Anjum Haque, Wasi Uddin Ahmad, Ismini Lourentzou, Chris Brown

Figure 1 for FixEval: Execution-based Evaluation of Program Fixes for Competitive Programming Problems
Figure 2 for FixEval: Execution-based Evaluation of Program Fixes for Competitive Programming Problems
Figure 3 for FixEval: Execution-based Evaluation of Program Fixes for Competitive Programming Problems
Figure 4 for FixEval: Execution-based Evaluation of Program Fixes for Competitive Programming Problems

Source code repositories consist of large codebases, often containing error-prone programs. The increasing complexity of software has led to a drastic rise in time and costs for identifying and fixing these defects. Various methods exist to automatically generate fixes for buggy code. However, due to the large combinatorial space of possible solutions for a particular bug, there are not many tools and datasets available to evaluate generated code effectively. In this work, we introduce FixEval, a benchmark comprising buggy code submissions to competitive programming problems and their respective fixes. We introduce a rich test suite to evaluate and assess the correctness of model-generated program fixes. We consider two Transformer language models pretrained on programming languages as our baselines, and compare them using match-based and execution-based evaluation metrics. Our experiments show that match-based metrics do not reflect model-generated program fixes accurately, while execution-based methods evaluate programs through all cases and scenarios specifically designed for that solution. Therefore, we believe FixEval provides a step towards real-world automatic bug fixing and model-generated code evaluation.

Viaarxiv icon

BanglaNLG: Benchmarks and Resources for Evaluating Low-Resource Natural Language Generation in Bangla

May 24, 2022
Abhik Bhattacharjee, Tahmid Hasan, Wasi Uddin Ahmad, Rifat Shahriyar

Figure 1 for BanglaNLG: Benchmarks and Resources for Evaluating Low-Resource Natural Language Generation in Bangla
Figure 2 for BanglaNLG: Benchmarks and Resources for Evaluating Low-Resource Natural Language Generation in Bangla

This work presents BanglaNLG, a comprehensive benchmark for evaluating natural language generation (NLG) models in Bangla, a widely spoken yet low-resource language in the web domain. We aggregate three challenging conditional text generation tasks under the BanglaNLG benchmark. Then, using a clean corpus of 27.5 GB of Bangla data, we pretrain BanglaT5, a sequence-to-sequence Transformer model for Bangla. BanglaT5 achieves state-of-the-art performance in all of these tasks, outperforming mT5 (base) by up to 5.4%. We are making the BanglaT5 language model and a leaderboard publicly available in the hope of advancing future research and evaluation on Bangla NLG. The resources can be found at https://github.com/csebuetnlp/BanglaNLG.

Viaarxiv icon

Summarize and Generate to Back-translate: Unsupervised Translation of Programming Languages

May 23, 2022
Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang

Figure 1 for Summarize and Generate to Back-translate: Unsupervised Translation of Programming Languages
Figure 2 for Summarize and Generate to Back-translate: Unsupervised Translation of Programming Languages
Figure 3 for Summarize and Generate to Back-translate: Unsupervised Translation of Programming Languages
Figure 4 for Summarize and Generate to Back-translate: Unsupervised Translation of Programming Languages

Back-translation is widely known for its effectiveness for neural machine translation when little to no parallel data is available. In this approach, a source-to-target model is coupled with a target-to-source model trained in parallel. The target-to-source model generates noisy sources, while the source-to-target model is trained to reconstruct the targets and vice versa. Recent developments of multilingual pre-trained sequence-to-sequence models for programming languages have been very effective for a broad spectrum of downstream software engineering tasks. Hence, it is compelling to train them to build programming language translation systems via back-translation. However, these models cannot be further trained via back-translation since they learn to output sequences in the same language as the inputs during pre-training. As an alternative, we propose performing back-translation via code summarization and generation. In code summarization, a model learns to generate natural language (NL) summaries given code snippets. In code generation, the model learns to do the opposite. Therefore, target-to-source generation in back-translation can be viewed as target-to-NL-to-source generation. We show that our proposed approach performs competitively with state-of-the-art methods.

* Work in progress 
Viaarxiv icon

Retrieval Enhanced Data Augmentation for Question Answering on Privacy Policies

Apr 28, 2022
Md Rizwan Parvez, Jianfeng Chi, Wasi Uddin Ahmad, Yuan Tian, Kai-Wei Chang

Figure 1 for Retrieval Enhanced Data Augmentation for Question Answering on Privacy Policies
Figure 2 for Retrieval Enhanced Data Augmentation for Question Answering on Privacy Policies
Figure 3 for Retrieval Enhanced Data Augmentation for Question Answering on Privacy Policies
Figure 4 for Retrieval Enhanced Data Augmentation for Question Answering on Privacy Policies

Prior studies in privacy policies frame the question answering (QA) tasks as identifying the most relevant text segment or a list of sentences from the policy document for a user query. However, annotating such a dataset is challenging as it requires specific domain expertise (e.g., law academics). Even if we manage a small-scale one, a bottleneck that remains is that the labeled data are heavily imbalanced (only a few segments are relevant) --limiting the gain in this domain. Therefore, in this paper, we develop a novel data augmentation framework based on ensembling retriever models that captures the relevant text segments from unlabeled policy documents and expand the positive examples in the training set. In addition, to improve the diversity and quality of the augmented data, we leverage multiple pre-trained language models (LMs) and cascaded them with noise reduction oracles. Using our augmented data on the PrivacyQA benchmark, we elevate the existing baseline by a large margin (10% F1) and achieve a new state-of-the-art F1 score of 50%. Our ablation studies provide further insights into the effectiveness of our approach.

Viaarxiv icon