Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Nanyun Peng

Shammie

On Compositionality and Improved Training of NADO

Jun 20, 2023

Sidi Lu, Wenbo Zhao, Chenyang Tao, Arpit Gupta, Shanchan Wu, Tagyoung Chung, Nanyun Peng

Abstract:NeurAlly-Decomposed Oracle (NADO) is a powerful approach for controllable generation with large language models. Differentiating from finetuning/prompt tuning, it has the potential to avoid catastrophic forgetting of the large base model and achieve guaranteed convergence to an entropy-maximized closed-form solution without significantly limiting the model capacity. Despite its success, several challenges arise when applying NADO to more complex scenarios. First, the best practice of using NADO for the composition of multiple control signals is under-explored. Second, vanilla NADO suffers from gradient vanishing for low-probability control signals and is highly reliant on the forward-consistency regularization. In this paper, we study the aforementioned challenges when using NADO theoretically and empirically. We show we can achieve guaranteed compositional generalization of NADO with a certain practice, and propose a novel alternative parameterization of NADO to perfectly guarantee the forward-consistency. We evaluate the improved training of NADO, i.e. NADO++, on CommonGen. Results show that NADO++ improves the effectiveness of the algorithm in multiple aspects.

Via

Access Paper or Ask Questions

Open-Domain Text Evaluation via Meta Distribution Modeling

Jun 20, 2023

Sidi Lu, Asli Celikyilmaz, Tianlu Wang, Nanyun Peng

Figure 1 for Open-Domain Text Evaluation via Meta Distribution Modeling

Figure 2 for Open-Domain Text Evaluation via Meta Distribution Modeling

Figure 3 for Open-Domain Text Evaluation via Meta Distribution Modeling

Figure 4 for Open-Domain Text Evaluation via Meta Distribution Modeling

Abstract:Recent advances in open-domain text generation models powered by large pre-trained language models (LLMs) have achieved remarkable performance. However, evaluating and controlling these models for desired attributes remains a challenge, as traditional reference-based metrics such as BLEU, ROUGE, and METEOR are insufficient for open-ended generation tasks. Similarly, while trainable discriminator-based evaluation metrics show promise, obtaining high-quality training data is a non-trivial task. In this paper, we introduce a novel approach to evaluate open-domain generation - the Meta-Distribution Methods (MDM). Drawing on the correlation between the rising parameter counts and the improving performance of LLMs, MDM creates a mapping from the contrast of two probabilistic distributions -- one known to be superior to the other -- to quality measures, which can be viewed as a distribution of distributions i.e. Meta-Distribution. We investigate MDM for open-domain text generation evaluation under two paradigms: 1) \emph{Generative} MDM, which leverages the Meta-Distribution Methods to generate in-domain negative samples for training discriminator-based metrics; 2) \emph{Discriminative} MDM, which directly uses distribution discrepancies between two language models for evaluation. Our experiments on multi-turn dialogue and factuality in abstractive summarization demonstrate that MDMs correlate better with human judgment than existing automatic evaluation metrics on both tasks, highlighting the strong performance and generalizability of such methods.

Via

Access Paper or Ask Questions

Unsupervised Melody-to-Lyric Generation

May 30, 2023

Yufei Tian, Anjali Narayan-Chen, Shereen Oraby, Alessandra Cervone, Gunnar Sigurdsson, Chenyang Tao, Wenbo Zhao, Tagyoung Chung, Jing Huang, Nanyun Peng

Figure 1 for Unsupervised Melody-to-Lyric Generation

Figure 2 for Unsupervised Melody-to-Lyric Generation

Figure 3 for Unsupervised Melody-to-Lyric Generation

Figure 4 for Unsupervised Melody-to-Lyric Generation

Abstract:Automatic melody-to-lyric generation is a task in which song lyrics are generated to go with a given melody. It is of significant practical interest and more challenging than unconstrained lyric generation as the music imposes additional constraints onto the lyrics. The training data is limited as most songs are copyrighted, resulting in models that underfit the complicated cross-modal relationship between melody and lyrics. In this work, we propose a method for generating high-quality lyrics without training on any aligned melody-lyric data. Specifically, we design a hierarchical lyric generation framework that first generates a song outline and second the complete lyrics. The framework enables disentanglement of training (based purely on text) from inference (melody-guided text generation) to circumvent the shortage of parallel data. We leverage the segmentation and rhythm alignment between melody and lyrics to compile the given melody into decoding constraints as guidance during inference. The two-step hierarchical design also enables content control via the lyric outline, a much-desired feature for democratizing collaborative song creation. Experimental results show that our model can generate high-quality lyrics that are more on-topic, singable, intelligible, and coherent than strong baselines, for example SongMASS, a SOTA model trained on a parallel dataset, with a 24% relative overall quality improvement based on human ratings. O

* Accepted to ACL 23. arXiv admin note: substantial text overlap with arXiv:2305.07760

Via

Access Paper or Ask Questions

Are Fairy Tales Fair? Analyzing Gender Bias in Temporal Narrative Event Chains of Children's Fairy Tales

May 26, 2023

Paulina Toro Isaza, Guangxuan Xu, Akintoye Oloko, Yufang Hou, Nanyun Peng, Dakuo Wang

Abstract:Social biases and stereotypes are embedded in our culture in part through their presence in our stories, as evidenced by the rich history of humanities and social science literature analyzing such biases in children stories. Because these analyses are often conducted manually and at a small scale, such investigations can benefit from the use of more recent natural language processing methods that examine social bias in models and data corpora. Our work joins this interdisciplinary effort and makes a unique contribution by taking into account the event narrative structures when analyzing the social bias of stories. We propose a computational pipeline that automatically extracts a story's temporal narrative verb-based event chain for each of its characters as well as character attributes such as gender. We also present a verb-based event annotation scheme that can facilitate bias analysis by including categories such as those that align with traditional stereotypes. Through a case study analyzing gender bias in fairy tales, we demonstrate that our framework can reveal bias in not only the unigram verb-based events in which female and male characters participate but also in the temporal narrative order of such event participation.

* acl 23

Via

Access Paper or Ask Questions

AMPERE: AMR-Aware Prefix for Generation-Based Event Argument Extraction Model

May 26, 2023

I-Hung Hsu, Zhiyu Xie, Kuan-Hao Huang, Prem Natarajan, Nanyun Peng

Figure 1 for AMPERE: AMR-Aware Prefix for Generation-Based Event Argument Extraction Model

Figure 2 for AMPERE: AMR-Aware Prefix for Generation-Based Event Argument Extraction Model

Figure 3 for AMPERE: AMR-Aware Prefix for Generation-Based Event Argument Extraction Model

Figure 4 for AMPERE: AMR-Aware Prefix for Generation-Based Event Argument Extraction Model

Abstract:Event argument extraction (EAE) identifies event arguments and their specific roles for a given event. Recent advancement in generation-based EAE models has shown great performance and generalizability over classification-based models. However, existing generation-based EAE models mostly focus on problem re-formulation and prompt design, without incorporating additional information that has been shown to be effective for classification-based models, such as the abstract meaning representation (AMR) of the input passages. Incorporating such information into generation-based models is challenging due to the heterogeneous nature of the natural language form prevalently used in generation-based models and the structured form of AMRs. In this work, we study strategies to incorporate AMR into generation-based EAE models. We propose AMPERE, which generates AMR-aware prefixes for every layer of the generation model. Thus, the prefix introduces AMR information to the generation-based EAE model and then improves the generation. We also introduce an adjusted copy mechanism to AMPERE to help overcome potential noises brought by the AMR graph. Comprehensive experiments and analyses on ACE2005 and ERE datasets show that AMPERE can get 4% - 10% absolute F1 score improvements with reduced training data and it is in general powerful across different training sizes.

* Paper accepted by ACL2023 as a main conference paper. The first two authors contribute equally. Code can be publicly accessible at https://github.com/PlusLabNLP/AMPERE

Via

Access Paper or Ask Questions

Unsupervised Melody-Guided Lyrics Generation

May 26, 2023

Yufei Tian, Anjali Narayan-Chen, Shereen Oraby, Alessandra Cervone, Gunnar Sigurdsson, Chenyang Tao, Wenbo Zhao, Tagyoung Chung, Jing Huang, Nanyun Peng

Figure 1 for Unsupervised Melody-Guided Lyrics Generation

Figure 2 for Unsupervised Melody-Guided Lyrics Generation

Figure 3 for Unsupervised Melody-Guided Lyrics Generation

Figure 4 for Unsupervised Melody-Guided Lyrics Generation

Abstract:Automatic song writing is a topic of significant practical interest. However, its research is largely hindered by the lack of training data due to copyright concerns and challenged by its creative nature. Most noticeably, prior works often fall short of modeling the cross-modal correlation between melody and lyrics due to limited parallel data, hence generating lyrics that are less singable. Existing works also lack effective mechanisms for content control, a much desired feature for democratizing song creation for people with limited music background. In this work, we propose to generate pleasantly listenable lyrics without training on melody-lyric aligned data. Instead, we design a hierarchical lyric generation framework that disentangles training (based purely on text) from inference (melody-guided text generation). At inference time, we leverage the crucial alignments between melody and lyrics and compile the given melody into constraints to guide the generation process. Evaluation results show that our model can generate high-quality lyrics that are more singable, intelligible, coherent, and in rhyme than strong baselines including those supervised on parallel data.

* Presented at AAAI23 CreativeAI workshop (Non-Archival). A later version is accepted to ACL23

Via

Access Paper or Ask Questions

Code-Switched Text Synthesis in Unseen Language Pairs

May 26, 2023

I-Hung Hsu, Avik Ray, Shubham Garg, Nanyun Peng, Jing Huang

Abstract:Existing efforts on text synthesis for code-switching mostly require training on code-switched texts in the target language pairs, limiting the deployment of the models to cases lacking code-switched data. In this work, we study the problem of synthesizing code-switched texts for language pairs absent from the training data. We introduce GLOSS, a model built on top of a pre-trained multilingual machine translation model (PMMTM) with an additional code-switching module. This module, either an adapter or extra prefixes, learns code-switching patterns from code-switched data during training, while the primary component of GLOSS, i.e., the PMMTM, is frozen. The design of only adjusting the code-switching module prevents our model from overfitting to the constrained training data for code-switching. Hence, GLOSS exhibits the ability to generalize and synthesize code-switched texts across a broader spectrum of language pairs. Additionally, we develop a self-training algorithm on target language pairs further to enhance the reliability of GLOSS. Automatic evaluations on four language pairs show that GLOSS achieves at least 55% relative BLEU and METEOR scores improvements compared to strong baselines. Human evaluations on two language pairs further validate the success of GLOSS.

* Paper accepted by ACL2023 as a Finding paper

Via

Access Paper or Ask Questions

Do Models Really Learn to Follow Instructions? An Empirical Study of Instruction Tuning

May 25, 2023

Po-Nien Kung, Nanyun Peng

Figure 1 for Do Models Really Learn to Follow Instructions? An Empirical Study of Instruction Tuning

Figure 2 for Do Models Really Learn to Follow Instructions? An Empirical Study of Instruction Tuning

Figure 3 for Do Models Really Learn to Follow Instructions? An Empirical Study of Instruction Tuning

Figure 4 for Do Models Really Learn to Follow Instructions? An Empirical Study of Instruction Tuning

Abstract:Recent works on instruction tuning (IT) have achieved great performance with zero-shot generalizability to unseen tasks. With additional context (e.g., task definition, examples) provided to models for fine-tuning, they achieved much higher performance than untuned models. Despite impressive performance gains, what models learn from IT remains understudied. In this work, we analyze how models utilize instructions during IT by comparing model training with altered vs. original instructions. Specifically, we create simplified task definitions by removing all semantic components and only leaving the output space information, and delusive examples that contain incorrect input-output mapping. Our experiments show that models trained on simplified task definition or delusive examples can achieve comparable performance to the ones trained on the original instructions and examples. Furthermore, we introduce a random baseline to perform zeroshot classification tasks, and find it achieves similar performance (42.6% exact-match) as IT does (43% exact-match) in low resource setting, while both methods outperform naive T5 significantly (30% per exact-match). Our analysis provides evidence that the impressive performance gain of current IT models can come from picking up superficial patterns, such as learning the output format and guessing. Our study highlights the urgent need for more reliable IT methods and evaluation.

* Proceedings of the 61th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Via

Access Paper or Ask Questions

Gender Biases in Automatic Evaluation Metrics: A Case Study on Image Captioning

May 24, 2023

Haoyi Qiu, Zi-Yi Dou, Tianlu Wang, Asli Celikyilmaz, Nanyun Peng

Figure 1 for Gender Biases in Automatic Evaluation Metrics: A Case Study on Image Captioning

Figure 2 for Gender Biases in Automatic Evaluation Metrics: A Case Study on Image Captioning

Figure 3 for Gender Biases in Automatic Evaluation Metrics: A Case Study on Image Captioning

Figure 4 for Gender Biases in Automatic Evaluation Metrics: A Case Study on Image Captioning

Abstract:Pretrained model-based evaluation metrics have demonstrated strong performance with high correlations with human judgments in various natural language generation tasks such as image captioning. Despite the impressive results, their impact on fairness is under-explored -- it is widely acknowledged that pretrained models can encode societal biases, and utilizing them for evaluation purposes may inadvertently manifest and potentially amplify biases. In this paper, we conduct a systematic study in gender biases of model-based evaluation metrics with a focus on image captioning tasks. Specifically, we first identify and quantify gender biases in different evaluation metrics regarding profession, activity, and object concepts. Then, we demonstrate the negative consequences of using these biased metrics, such as favoring biased generation models in deployment and propagating the biases to generation models through reinforcement learning. We also present a simple but effective alternative to reduce gender biases by combining n-gram matching-based and pretrained model-based evaluation metrics.

Via

Access Paper or Ask Questions

STAR: Boosting Low-Resource Event Extraction by Structure-to-Text Data Generation with Large Language Models

May 24, 2023

Mingyu Derek Ma, Xiaoxuan Wang, Po-Nien Kung, P. Jeffrey Brantingham, Nanyun Peng, Wei Wang

Figure 1 for STAR: Boosting Low-Resource Event Extraction by Structure-to-Text Data Generation with Large Language Models

Figure 2 for STAR: Boosting Low-Resource Event Extraction by Structure-to-Text Data Generation with Large Language Models

Figure 3 for STAR: Boosting Low-Resource Event Extraction by Structure-to-Text Data Generation with Large Language Models

Figure 4 for STAR: Boosting Low-Resource Event Extraction by Structure-to-Text Data Generation with Large Language Models

Abstract:Structure prediction tasks such as event extraction require an in-depth understanding of the output structure and sub-task dependencies, thus they still heavily rely on task-specific training data to obtain reasonable performance. Due to the high cost of human annotation, low-resource event extraction, which requires minimal human cost, is urgently needed in real-world information extraction applications. We propose to synthesize data instances given limited seed demonstrations to boost low-resource event extraction performance. We propose STAR, a structure-to-text data generation method that first generates complicated event structures (Y) and then generates input passages (X), all with Large Language Models. We design fine-grained step-by-step instructions and the error cases and quality issues identified through self-reflection can be self-refined. Our experiments indicate that data generated by STAR can significantly improve the low-resource event extraction performance and they are even more effective than human-curated data points in some cases.

Via

Access Paper or Ask Questions