Alert button
Picture for Mingyue Shang

Mingyue Shang

Alert button

Few-Shot Data-to-Text Generation via Unified Representation and Multi-Source Learning

Aug 10, 2023
Alexander Hanbo Li, Mingyue Shang, Evangelia Spiliopoulou, Jie Ma, Patrick Ng, Zhiguo Wang, Bonan Min, William Wang, Kathleen McKeown, Vittorio Castelli, Dan Roth, Bing Xiang

Figure 1 for Few-Shot Data-to-Text Generation via Unified Representation and Multi-Source Learning
Figure 2 for Few-Shot Data-to-Text Generation via Unified Representation and Multi-Source Learning
Figure 3 for Few-Shot Data-to-Text Generation via Unified Representation and Multi-Source Learning
Figure 4 for Few-Shot Data-to-Text Generation via Unified Representation and Multi-Source Learning

We present a novel approach for structured data-to-text generation that addresses the limitations of existing methods that primarily focus on specific types of structured data. Our proposed method aims to improve performance in multi-task training, zero-shot and few-shot scenarios by providing a unified representation that can handle various forms of structured data such as tables, knowledge graph triples, and meaning representations. We demonstrate that our proposed approach can effectively adapt to new structured forms, and can improve performance in comparison to current methods. For example, our method resulted in a 66% improvement in zero-shot BLEU scores when transferring models trained on table inputs to a knowledge graph dataset. Our proposed method is an important step towards a more general data-to-text generation framework.

Viaarxiv icon

Greener yet Powerful: Taming Large Code Generation Models with Quantization

Mar 09, 2023
Xiaokai Wei, Sujan Gonugondla, Wasi Ahmad, Shiqi Wang, Baishakhi Ray, Haifeng Qian, Xiaopeng Li, Varun Kumar, Zijian Wang, Yuchen Tian, Qing Sun, Ben Athiwaratkun, Mingyue Shang, Murali Krishna Ramanathan, Parminder Bhatia, Bing Xiang

Figure 1 for Greener yet Powerful: Taming Large Code Generation Models with Quantization
Figure 2 for Greener yet Powerful: Taming Large Code Generation Models with Quantization
Figure 3 for Greener yet Powerful: Taming Large Code Generation Models with Quantization
Figure 4 for Greener yet Powerful: Taming Large Code Generation Models with Quantization

ML-powered code generation aims to assist developers to write code in a more productive manner, by intelligently generating code blocks based on natural language prompts. Recently, large pretrained deep learning models have substantially pushed the boundary of code generation and achieved impressive performance. Despite their great power, the huge number of model parameters poses a significant threat to adapting them in a regular software development environment, where a developer might use a standard laptop or mid-size server to develop her code. Such large models incur significant resource usage (in terms of memory, latency, and dollars) as well as carbon footprint. Model compression is a promising approach to address these challenges. Several techniques are proposed to compress large pretrained models typically used for vision or textual data. Out of many available compression techniques, we identified that quantization is mostly applicable for code generation task as it does not require significant retraining cost. As quantization represents model parameters with lower-bit integer (e.g., int8), the model size and runtime latency would both benefit from such int representation. We extensively study the impact of quantized model on code generation tasks across different dimension: (i) resource usage and carbon footprint, (ii) accuracy, and (iii) robustness. To this end, through systematic experiments we find a recipe of quantization technique that could run even a $6$B model in a regular laptop without significant accuracy or robustness degradation. We further found the recipe is readily applicable to code summarization task as well.

* 10 pages, 7 figures, 10 tables 
Viaarxiv icon

ReCode: Robustness Evaluation of Code Generation Models

Dec 20, 2022
Shiqi Wang, Zheng Li, Haifeng Qian, Chenghao Yang, Zijian Wang, Mingyue Shang, Varun Kumar, Samson Tan, Baishakhi Ray, Parminder Bhatia, Ramesh Nallapati, Murali Krishna Ramanathan, Dan Roth, Bing Xiang

Figure 1 for ReCode: Robustness Evaluation of Code Generation Models
Figure 2 for ReCode: Robustness Evaluation of Code Generation Models
Figure 3 for ReCode: Robustness Evaluation of Code Generation Models
Figure 4 for ReCode: Robustness Evaluation of Code Generation Models

Code generation models have achieved impressive performance. However, they tend to be brittle as slight edits to a prompt could lead to very different generations; these robustness properties, critical for user experience when deployed in real-life applications, are not well understood. Most existing works on robustness in text or code tasks have focused on classification, while robustness in generation tasks is an uncharted area and to date there is no comprehensive benchmark for robustness in code generation. In this paper, we propose ReCode, a comprehensive robustness evaluation benchmark for code generation models. We customize over 30 transformations specifically for code on docstrings, function and variable names, code syntax, and code format. They are carefully designed to be natural in real-life coding practice, preserve the original semantic meaning, and thus provide multifaceted assessments of a model's robustness performance. With human annotators, we verified that over 90% of the perturbed prompts do not alter the semantic meaning of the original prompt. In addition, we define robustness metrics for code generation models considering the worst-case behavior under each type of perturbation, taking advantage of the fact that executing the generated code can serve as objective evaluation. We demonstrate ReCode on SOTA models using HumanEval, MBPP, as well as function completion tasks derived from them. Interesting observations include: better robustness for CodeGen over InCoder and GPT-J; models are most sensitive to syntax perturbations; more challenging robustness evaluation on MBPP over HumanEval.

* Code and data available at https://github.com/amazon-science/recode 
Viaarxiv icon

Multi-lingual Evaluation of Code Generation Models

Oct 26, 2022
Ben Athiwaratkun, Sanjay Krishna Gouda, Zijian Wang, Xiaopeng Li, Yuchen Tian, Ming Tan, Wasi Uddin Ahmad, Shiqi Wang, Qing Sun, Mingyue Shang, Sujan Kumar Gonugondla, Hantian Ding, Varun Kumar, Nathan Fulton, Arash Farahani, Siddhartha Jain, Robert Giaquinto, Haifeng Qian, Murali Krishna Ramanathan, Ramesh Nallapati, Baishakhi Ray, Parminder Bhatia, Sudipta Sengupta, Dan Roth, Bing Xiang

Figure 1 for Multi-lingual Evaluation of Code Generation Models
Figure 2 for Multi-lingual Evaluation of Code Generation Models
Figure 3 for Multi-lingual Evaluation of Code Generation Models
Figure 4 for Multi-lingual Evaluation of Code Generation Models

We present MBXP, an execution-based code completion benchmark in 10+ programming languages. This collection of datasets is generated by our conversion framework that translates prompts and test cases from the original MBPP dataset to the corresponding data in a target language. Based on this benchmark, we are able to evaluate code generation models in a multi-lingual fashion, and in particular discover generalization ability of language models on out-of-domain languages, advantages of large multi-lingual models over mono-lingual, benefits of few-shot prompting, and zero-shot translation abilities. In addition, we use our code generation model to perform large-scale bootstrapping to obtain synthetic canonical solutions in several languages. These solutions can be used for other code-related evaluations such as insertion-based, summarization, or code translation tasks where we demonstrate results and release as part of our benchmark.

* Code and data release: https://github.com/amazon-research/mbxp-exec-eval 
Viaarxiv icon

Ape210K: A Large-Scale and Template-Rich Dataset of Math Word Problems

Oct 09, 2020
Wei Zhao, Mingyue Shang, Yang Liu, Liang Wang, Jingming Liu

Figure 1 for Ape210K: A Large-Scale and Template-Rich Dataset of Math Word Problems
Figure 2 for Ape210K: A Large-Scale and Template-Rich Dataset of Math Word Problems
Figure 3 for Ape210K: A Large-Scale and Template-Rich Dataset of Math Word Problems
Figure 4 for Ape210K: A Large-Scale and Template-Rich Dataset of Math Word Problems

Automatic math word problem solving has attracted growing attention in recent years. The evaluation datasets used by previous works have serious limitations in terms of scale and diversity. In this paper, we release a new large-scale and template-rich math word problem dataset named Ape210K. It consists of 210K Chinese elementary school-level math problems, which is 9 times the size of the largest public dataset Math23K. Each problem contains both the gold answer and the equations needed to derive the answer. Ape210K is also of greater diversity with 56K templates, which is 25 times more than Math23K. Our analysis shows that solving Ape210K requires not only natural language understanding but also commonsense knowledge. We expect Ape210K to be a benchmark for math word problem solving systems. Experiments indicate that state-of-the-art models on the Math23K dataset perform poorly on Ape210K. We propose a copy-augmented and feature-enriched sequence to sequence (seq2seq) model, which outperforms existing models by 3.2% on the Math23K dataset and serves as a strong baseline of the Ape210K dataset. The gap is still significant between human and our baseline model, calling for further research efforts. We make Ape210K dataset publicly available at https://github.com/yuantiku/ape210k

* We decide to withdraw this paper, since the proposed Ape210K dataset is not going public, the experiments in this paper is meaningless and irreproducible without access to the dataset. Please contact wangliang01@fenbi.com if you have any questions 
Viaarxiv icon

Semi-supervised Text Style Transfer: Cross Projection in Latent Space

Sep 25, 2019
Mingyue Shang, Piji Li, Zhenxin Fu, Lidong Bing, Dongyan Zhao, Shuming Shi, Rui Yan

Figure 1 for Semi-supervised Text Style Transfer: Cross Projection in Latent Space
Figure 2 for Semi-supervised Text Style Transfer: Cross Projection in Latent Space
Figure 3 for Semi-supervised Text Style Transfer: Cross Projection in Latent Space
Figure 4 for Semi-supervised Text Style Transfer: Cross Projection in Latent Space

Text style transfer task requires the model to transfer a sentence of one style to another style while retaining its original content meaning, which is a challenging problem that has long suffered from the shortage of parallel data. In this paper, we first propose a semi-supervised text style transfer model that combines the small-scale parallel data with the large-scale nonparallel data. With these two types of training data, we introduce a projection function between the latent space of different styles and design two constraints to train it. We also introduce two other simple but effective semi-supervised methods to compare with. To evaluate the performance of the proposed methods, we build and release a novel style transfer dataset that alters sentences between the style of ancient Chinese poem and the modern Chinese.

* EMNLP 2019 
Viaarxiv icon

Find a Reasonable Ending for Stories: Does Logic Relation Help the Story Cloze Test?

Dec 13, 2018
Mingyue Shang, Zhenxin Fu, Hongzhi Yin, Bo Tang, Dongyan Zhao, Rui Yan

Figure 1 for Find a Reasonable Ending for Stories: Does Logic Relation Help the Story Cloze Test?
Figure 2 for Find a Reasonable Ending for Stories: Does Logic Relation Help the Story Cloze Test?
Figure 3 for Find a Reasonable Ending for Stories: Does Logic Relation Help the Story Cloze Test?

Natural language understanding is a challenging problem that covers a wide range of tasks. While previous methods generally train each task separately, we consider combining the cross-task features to enhance the task performance. In this paper, we incorporate the logic information with the help of the Natural Language Inference (NLI) task to the Story Cloze Test (SCT). Previous work on SCT considered various semantic information, such as sentiment and topic, but lack the logic information between sentences which is an essential element of stories. Thus we propose to extract the logic information during the course of the story to improve the understanding of the whole story. The logic information is modeled with the help of the NLI task. Experimental results prove the strength of the logic information.

* Student Abstract in AAAI-2019 
Viaarxiv icon

One "Ruler" for All Languages: Multi-Lingual Dialogue Evaluation with Adversarial Multi-Task Learning

May 08, 2018
Xiaowei Tong, Zhenxin Fu, Mingyue Shang, Dongyan Zhao, Rui Yan

Figure 1 for One "Ruler" for All Languages: Multi-Lingual Dialogue Evaluation with Adversarial Multi-Task Learning
Figure 2 for One "Ruler" for All Languages: Multi-Lingual Dialogue Evaluation with Adversarial Multi-Task Learning
Figure 3 for One "Ruler" for All Languages: Multi-Lingual Dialogue Evaluation with Adversarial Multi-Task Learning
Figure 4 for One "Ruler" for All Languages: Multi-Lingual Dialogue Evaluation with Adversarial Multi-Task Learning

Automatic evaluating the performance of Open-domain dialogue system is a challenging problem. Recent work in neural network-based metrics has shown promising opportunities for automatic dialogue evaluation. However, existing methods mainly focus on monolingual evaluation, in which the trained metric is not flexible enough to transfer across different languages. To address this issue, we propose an adversarial multi-task neural metric (ADVMT) for multi-lingual dialogue evaluation, with shared feature extraction across languages. We evaluate the proposed model in two different languages. Experiments show that the adversarial multi-task neural metric achieves a high correlation with human annotation, which yields better performance than monolingual ones and various existing metrics.

* To appear in IJCAI 2018 
Viaarxiv icon