Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yufei Tian

Rethinking Creativity Evaluation: A Critical Analysis of Existing Creativity Evaluations

Aug 07, 2025

Li-Chun Lu, Miri Liu, Pin-Chun Lu, Yufei Tian, Shao-Hua Sun, Nanyun Peng

Abstract:We systematically examine, analyze, and compare representative creativity measures--creativity index, perplexity, syntactic templates, and LLM-as-a-Judge--across diverse creative domains, including creative writing, unconventional problem-solving, and research ideation. Our analyses reveal that these metrics exhibit limited consistency, capturing different dimensions of creativity. We highlight key limitations, including the creativity index's focus on lexical diversity, perplexity's sensitivity to model confidence, and syntactic templates' inability to capture conceptual creativity. Additionally, LLM-as-a-Judge shows instability and bias. Our findings underscore the need for more robust, generalizable evaluation frameworks that better align with human judgments of creativity.

* 15 pages, 6 figures

Via

Access Paper or Ask Questions

Detecting Machine-Generated Long-Form Content with Latent-Space Variables

Oct 04, 2024

Yufei Tian, Zeyu Pan, Nanyun Peng

Figure 1 for Detecting Machine-Generated Long-Form Content with Latent-Space Variables

Figure 2 for Detecting Machine-Generated Long-Form Content with Latent-Space Variables

Figure 3 for Detecting Machine-Generated Long-Form Content with Latent-Space Variables

Figure 4 for Detecting Machine-Generated Long-Form Content with Latent-Space Variables

Abstract:The increasing capability of large language models (LLMs) to generate fluent long-form texts is presenting new challenges in distinguishing machine-generated outputs from human-written ones, which is crucial for ensuring authenticity and trustworthiness of expressions. Existing zero-shot detectors primarily focus on token-level distributions, which are vulnerable to real-world domain shifts, including different prompting and decoding strategies, and adversarial attacks. We propose a more robust method that incorporates abstract elements, such as event transitions, as key deciding factors to detect machine versus human texts by training a latent-space model on sequences of events or topics derived from human-written texts. In three different domains, machine-generated texts, which are originally inseparable from human texts on the token level, can be better distinguished with our latent-space model, leading to a 31% improvement over strong baselines such as DetectGPT. Our analysis further reveals that, unlike humans, modern LLMs like GPT-4 generate event triggers and their transitions differently, an inherent disparity that helps our method to robustly detect machine-generated texts.

Via

Access Paper or Ask Questions

REFFLY: Melody-Constrained Lyrics Editing Model

Aug 30, 2024

Songyan Zhao, Bingxuan Li, Yufei Tian, Nanyun Peng

Figure 1 for REFFLY: Melody-Constrained Lyrics Editing Model

Figure 2 for REFFLY: Melody-Constrained Lyrics Editing Model

Figure 3 for REFFLY: Melody-Constrained Lyrics Editing Model

Figure 4 for REFFLY: Melody-Constrained Lyrics Editing Model

Abstract:Automatic melody-to-lyric generation aims to produce lyrics that align with a given melody. Although previous work can generate lyrics based on high-level control signals, such as keywords or genre, they often struggle with three challenges: (1) lack of controllability, as prior works are only able to produce lyrics from scratch, with little or no control over the content; (2) inability to generate fully structured songs with the desired format; and (3) failure to align prominent words in the lyrics with prominent notes in the melody, resulting in poor lyrics-melody alignment. In this work, we introduce REFFLY (REvision Framework For Lyrics), the first revision framework designed to edit arbitrary forms of plain text draft into high-quality, full-fledged song lyrics. Our approach ensures that the generated lyrics retain the original meaning of the draft, align with the melody, and adhere to the desired song structures. We demonstrate that REFFLY performs well in diverse task settings, such as lyrics revision and song translation. Experimental results show that our model outperforms strong baselines, such as Lyra (Tian et al. 2023) and GPT-4, by 25% in both musicality and text quality.

Via

Access Paper or Ask Questions

Are Large Language Models Capable of Generating Human-Level Narratives?

Jul 18, 2024

Yufei Tian, Tenghao Huang, Miri Liu, Derek Jiang, Alexander Spangher, Muhao Chen, Jonathan May, Nanyun Peng

Figure 1 for Are Large Language Models Capable of Generating Human-Level Narratives?

Figure 2 for Are Large Language Models Capable of Generating Human-Level Narratives?

Figure 3 for Are Large Language Models Capable of Generating Human-Level Narratives?

Figure 4 for Are Large Language Models Capable of Generating Human-Level Narratives?

Abstract:This paper investigates the capability of LLMs in storytelling, focusing on narrative development and plot progression. We introduce a novel computational framework to analyze narratives through three discourse-level aspects: i) story arcs, ii) turning points, and iii) affective dimensions, including arousal and valence. By leveraging expert and automatic annotations, we uncover significant discrepancies between the LLM- and human- written stories. While human-written stories are suspenseful, arousing, and diverse in narrative structures, LLM stories are homogeneously positive and lack tension. Next, we measure narrative reasoning skills as a precursor to generative capacities, concluding that most LLMs fall short of human abilities in discourse understanding. Finally, we show that explicit integration of aforementioned discourse features can enhance storytelling, as is demonstrated by over 40% improvement in neural storytelling in terms of diversity, suspense, and arousal.

Via

Access Paper or Ask Questions

MacGyver: Are Large Language Models Creative Problem Solvers?

Nov 16, 2023

Yufei Tian, Abhilasha Ravichander, Lianhui Qin, Ronan Le Bras, Raja Marjieh, Nanyun Peng, Yejin Choi, Thomas L. Griffiths, Faeze Brahman

Figure 1 for MacGyver: Are Large Language Models Creative Problem Solvers?

Figure 2 for MacGyver: Are Large Language Models Creative Problem Solvers?

Figure 3 for MacGyver: Are Large Language Models Creative Problem Solvers?

Figure 4 for MacGyver: Are Large Language Models Creative Problem Solvers?

Abstract:We explore the creative problem-solving capabilities of modern large language models (LLMs) in a constrained setting. The setting requires circumventing a cognitive bias known in psychology as ''functional fixedness'' to use familiar objects in innovative or unconventional ways. To this end, we create MacGyver, an automatically generated dataset consisting of 1,600 real-world problems that deliberately trigger functional fixedness and require thinking 'out-of-the-box'. We then present our collection of problems to both LLMs and humans to compare and contrast their problem-solving abilities. We show that MacGyver is challenging for both groups, but in unique and complementary ways. For example, humans typically excel in solving problems that they are familiar with but may struggle with tasks requiring domain-specific knowledge, leading to a higher variance. On the other hand, LLMs, being exposed to a variety of highly specialized knowledge, attempt broader problems but are prone to overconfidence and propose actions that are physically infeasible or inefficient. We also provide a detailed error analysis of LLMs, and demonstrate the potential of enhancing their problem-solving ability with novel prompting techniques such as iterative step-wise reflection and divergent-convergent thinking. This work provides insight into the creative problem-solving capabilities of humans and AI and illustrates how psychological paradigms can be extended into large-scale tasks for comparing humans and machines.

Via

Access Paper or Ask Questions

BOOST: Harnessing Black-Box Control to Boost Commonsense in LMs' Generation

Oct 25, 2023

Yufei Tian, Felix Zhang, Nanyun Peng

Figure 1 for BOOST: Harnessing Black-Box Control to Boost Commonsense in LMs' Generation

Figure 2 for BOOST: Harnessing Black-Box Control to Boost Commonsense in LMs' Generation

Figure 3 for BOOST: Harnessing Black-Box Control to Boost Commonsense in LMs' Generation

Figure 4 for BOOST: Harnessing Black-Box Control to Boost Commonsense in LMs' Generation

Abstract:Large language models (LLMs) such as GPT-3 have demonstrated a strong capability to generate coherent and contextually relevant text. However, amidst their successes, a crucial issue persists: their generated outputs still lack commonsense at times. Moreover, fine-tuning the entire LLM towards more commonsensical outputs is computationally expensive if not infeasible. In this paper, we present a computation-efficient framework that steers a frozen Pre-Trained Language Model (PTLM) towards more commonsensical generation (i.e., producing a plausible output that incorporates a list of concepts in a meaningful way). Specifically, we first construct a reference-free evaluator that assigns a sentence with a commonsensical score by grounding the sentence to a dynamic commonsense knowledge base from four different relational aspects. We then use the scorer as the oracle for commonsense knowledge, and extend the controllable generation method called NADO to train an auxiliary head that guides a fixed PTLM to better satisfy the oracle. We test our framework on a series of GPT-2-, Flan-T5-, and Alpaca-based language models (LMs) on two constrained concept-to-sentence benchmarks. Human evaluation results demonstrate that our method consistently leads to the most commonsensical outputs.

* EMNLP 2023

Via

Access Paper or Ask Questions

Evaluating Large Language Models on Controlled Generation Tasks

Oct 23, 2023

Jiao Sun, Yufei Tian, Wangchunshu Zhou, Nan Xu, Qian Hu, Rahul Gupta, John Frederick Wieting, Nanyun Peng, Xuezhe Ma

Figure 1 for Evaluating Large Language Models on Controlled Generation Tasks

Figure 2 for Evaluating Large Language Models on Controlled Generation Tasks

Figure 3 for Evaluating Large Language Models on Controlled Generation Tasks

Figure 4 for Evaluating Large Language Models on Controlled Generation Tasks

Abstract:While recent studies have looked into the abilities of large language models in various benchmark tasks, including question generation, reading comprehension, multilingual and etc, there have been few studies looking into the controllability of large language models on generation tasks. We present an extensive analysis of various benchmarks including a sentence planning benchmark with different granularities. After comparing large language models against state-of-the-start finetuned smaller models, we present a spectrum showing large language models falling behind, are comparable, or exceed the ability of smaller models. We conclude that **large language models struggle at meeting fine-grained hard constraints**.

* EMNLP 2023

Via

Access Paper or Ask Questions

Unsupervised Melody-to-Lyric Generation

May 30, 2023

Yufei Tian, Anjali Narayan-Chen, Shereen Oraby, Alessandra Cervone, Gunnar Sigurdsson, Chenyang Tao, Wenbo Zhao, Tagyoung Chung, Jing Huang, Nanyun Peng

Figure 1 for Unsupervised Melody-to-Lyric Generation

Figure 2 for Unsupervised Melody-to-Lyric Generation

Figure 3 for Unsupervised Melody-to-Lyric Generation

Figure 4 for Unsupervised Melody-to-Lyric Generation

Abstract:Automatic melody-to-lyric generation is a task in which song lyrics are generated to go with a given melody. It is of significant practical interest and more challenging than unconstrained lyric generation as the music imposes additional constraints onto the lyrics. The training data is limited as most songs are copyrighted, resulting in models that underfit the complicated cross-modal relationship between melody and lyrics. In this work, we propose a method for generating high-quality lyrics without training on any aligned melody-lyric data. Specifically, we design a hierarchical lyric generation framework that first generates a song outline and second the complete lyrics. The framework enables disentanglement of training (based purely on text) from inference (melody-guided text generation) to circumvent the shortage of parallel data. We leverage the segmentation and rhythm alignment between melody and lyrics to compile the given melody into decoding constraints as guidance during inference. The two-step hierarchical design also enables content control via the lyric outline, a much-desired feature for democratizing collaborative song creation. Experimental results show that our model can generate high-quality lyrics that are more on-topic, singable, intelligible, and coherent than strong baselines, for example SongMASS, a SOTA model trained on a parallel dataset, with a 24% relative overall quality improvement based on human ratings. O

* Accepted to ACL 23. arXiv admin note: substantial text overlap with arXiv:2305.07760

Via

Access Paper or Ask Questions

Unsupervised Melody-Guided Lyrics Generation

May 26, 2023

Yufei Tian, Anjali Narayan-Chen, Shereen Oraby, Alessandra Cervone, Gunnar Sigurdsson, Chenyang Tao, Wenbo Zhao, Tagyoung Chung, Jing Huang, Nanyun Peng

Figure 1 for Unsupervised Melody-Guided Lyrics Generation

Figure 2 for Unsupervised Melody-Guided Lyrics Generation

Figure 3 for Unsupervised Melody-Guided Lyrics Generation

Figure 4 for Unsupervised Melody-Guided Lyrics Generation

Abstract:Automatic song writing is a topic of significant practical interest. However, its research is largely hindered by the lack of training data due to copyright concerns and challenged by its creative nature. Most noticeably, prior works often fall short of modeling the cross-modal correlation between melody and lyrics due to limited parallel data, hence generating lyrics that are less singable. Existing works also lack effective mechanisms for content control, a much desired feature for democratizing song creation for people with limited music background. In this work, we propose to generate pleasantly listenable lyrics without training on melody-lyric aligned data. Instead, we design a hierarchical lyric generation framework that disentangles training (based purely on text) from inference (melody-guided text generation). At inference time, we leverage the crucial alignments between melody and lyrics and compile the given melody into constraints to guide the generation process. Evaluation results show that our model can generate high-quality lyrics that are more singable, intelligible, coherent, and in rhyme than strong baselines including those supervised on parallel data.

* Presented at AAAI23 CreativeAI workshop (Non-Archival). A later version is accepted to ACL23

Via

Access Paper or Ask Questions

A Unified Framework for Pun Generation with Humor Principles

Oct 24, 2022

Yufei Tian, Divyanshu Sheth, Nanyun Peng

Figure 1 for A Unified Framework for Pun Generation with Humor Principles

Figure 2 for A Unified Framework for Pun Generation with Humor Principles

Figure 3 for A Unified Framework for Pun Generation with Humor Principles

Figure 4 for A Unified Framework for Pun Generation with Humor Principles

Abstract:We propose a unified framework to generate both homophonic and homographic puns to resolve the split-up in existing works. Specifically, we incorporate three linguistic attributes of puns to the language models: ambiguity, distinctiveness, and surprise. Our framework consists of three parts: 1) a context words/phrases selector to promote the aforementioned attributes, 2) a generation model trained on non-pun sentences to incorporate the context words/phrases into the generation output, and 3) a label predictor that learns the structure of puns which is used to steer the generation model at inference time. Evaluation results on both pun types demonstrate the efficacy of our model over strong baselines.

* Findings of EMNLP 2022

Via

Access Paper or Ask Questions