Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yixuan Su

Instruct-SCTG: Guiding Sequential Controlled Text Generation through Instructions

Dec 19, 2023

Yinhong Liu, Yixuan Su, Ehsan Shareghi, Nigel Collier

Abstract:Instruction-tuned large language models have shown remarkable performance in aligning generated text with user intentions across various tasks. However, maintaining human-like discourse structure in the generated text remains a challenging research question. In this paper, we propose Instruct-SCTG, a flexible and effective sequential framework that harnesses instruction-tuned language models to generate structurally coherent text in both fine-tuned and zero-shot setups. Our framework generates articles in a section-by-section manner, aligned with the desired human structure using natural language instructions. Furthermore, we introduce a new automatic metric that measures discourse divergence in a fuzzy manner. Extensive experiments on three datasets from representative domains of news and recipes demonstrate the state-of-the-art performance of our framework in imposing discourse structure during text generation, as verified by both automatic and human evaluation. Our code will be available on Github.

Via

Access Paper or Ask Questions

Specialist or Generalist? Instruction Tuning for Specific NLP Tasks

Oct 23, 2023

Chufan Shi, Yixuan Su, Cheng Yang, Yujiu Yang, Deng Cai

Abstract:The potential of large language models (LLMs) to simultaneously perform a wide range of natural language processing (NLP) tasks has been the subject of extensive research. Although instruction tuning has proven to be a data-efficient method for transforming LLMs into such generalist models, their performance still lags behind specialist models trained exclusively for specific tasks. In this paper, we investigate whether incorporating broad-coverage generalist instruction tuning can contribute to building a specialist model. We hypothesize that its efficacy depends on task specificity and skill requirements. Our experiments assess four target tasks with distinct coverage levels, revealing that integrating generalist instruction tuning consistently enhances model performance when the task coverage is broad. The effect is particularly pronounced when the amount of task-specific training data is limited. Further investigation into three target tasks focusing on different capabilities demonstrates that generalist instruction tuning improves understanding and reasoning abilities. However, for tasks requiring factual knowledge, generalist data containing hallucinatory information may negatively affect the model's performance. Overall, our work provides a systematic guide for developing specialist models with general instruction tuning. Our code and other related resources can be found at https://github.com/DavidFanzz/Generalist_or_Specialist.

* Accepted to EMNLP 2023

Via

Access Paper or Ask Questions

Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective

Oct 16, 2023

Huayang Li, Tian Lan, Zihao Fu, Deng Cai, Lemao Liu, Nigel Collier, Taro Watanabe, Yixuan Su

Figure 1 for Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective

Figure 2 for Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective

Figure 3 for Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective

Figure 4 for Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective

Abstract:There are a number of diverging hypotheses about the neural text degeneration problem, i.e., generating repetitive and dull loops, which makes this problem both interesting and confusing. In this work, we aim to advance our understanding by presenting a straightforward and fundamental explanation from the data perspective. Our preliminary investigation reveals a strong correlation between the degeneration issue and the presence of repetitions in training data. Subsequent experiments also demonstrate that by selectively dropping out the attention to repetitive words in training data, degeneration can be significantly minimized. Furthermore, our empirical analysis illustrates that prior works addressing the degeneration issue from various standpoints, such as the high-inflow words, the likelihood objective, and the self-reinforcement phenomenon, can be interpreted by one simple explanation. That is, penalizing the repetitions in training data is a common and fundamental factor for their effectiveness. Moreover, our experiments reveal that penalizing the repetitions in training data remains critical even when considering larger model sizes and instruction tuning.

* Accepted to NeurIPS 2023

Via

Access Paper or Ask Questions

Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models

Aug 31, 2023

Yupan Huang, Zaiqiao Meng, Fangyu Liu, Yixuan Su, Nigel Collier, Yutong Lu

Figure 1 for Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models

Figure 2 for Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models

Figure 3 for Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models

Figure 4 for Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models

Abstract:Large language models exhibit enhanced zero-shot performance on various tasks when fine-tuned with instruction-following data. Multimodal instruction-following models extend these capabilities by integrating both text and images. However, existing models such as MiniGPT-4 face challenges in maintaining dialogue coherence in scenarios involving multiple images. A primary reason is the lack of a specialized dataset for this critical application. To bridge these gaps, we present SparklesChat, a multimodal instruction-following model for open-ended dialogues across multiple images. To support the training, we introduce SparklesDialogue, the first machine-generated dialogue dataset tailored for word-level interleaved multi-image and text interactions. Furthermore, we construct SparklesEval, a GPT-assisted benchmark for quantitatively assessing a model's conversational competence across multiple images and dialogue turns. Our experiments validate the effectiveness of SparklesChat in understanding and reasoning across multiple images and dialogue turns. Specifically, SparklesChat outperformed MiniGPT-4 on established vision-and-language benchmarks, including the BISON binary image selection task and the NLVR2 visual reasoning task. Moreover, SparklesChat scored 8.56 out of 10 on SparklesEval, substantially exceeding MiniGPT-4's score of 3.91 and nearing GPT-4's score of 9.26. Qualitative evaluations further demonstrate SparklesChat's generality in handling real-world applications. All resources will be available at https://github.com/HYPJUDY/Sparkles.

Via

Access Paper or Ask Questions

PandaGPT: One Model To Instruction-Follow Them All

May 25, 2023

Yixuan Su, Tian Lan, Huayang Li, Jialu Xu, Yan Wang, Deng Cai

Figure 1 for PandaGPT: One Model To Instruction-Follow Them All

Figure 2 for PandaGPT: One Model To Instruction-Follow Them All

Figure 3 for PandaGPT: One Model To Instruction-Follow Them All

Figure 4 for PandaGPT: One Model To Instruction-Follow Them All

Abstract:We present PandaGPT, an approach to emPower large lANguage moDels with visual and Auditory instruction-following capabilities. Our pilot experiments show that PandaGPT can perform complex tasks such as detailed image description generation, writing stories inspired by videos, and answering questions about audios. More interestingly, PandaGPT can take multimodal inputs simultaneously and compose their semantics naturally. For example, PandaGPT can connect how objects look in an image/video and how they sound in an audio. To do so, PandaGPT combines the multimodal encoders from ImageBind and the large language models from Vicuna. Notably, only aligned image-text pairs are required for the training of PandaGPT. Thanks to the strong capability of ImageBind in embedding data from different modalities into the same space, PandaGPT displays emergent, i.e. zero-shot, cross-modal behaviors for data other than image and text (e.g., video, audio, depth, thermal, and IMU). We hope that PandaGPT serves as an initial step toward building AGI that can perceive and understand inputs in different modalities holistically, as we humans do. Our project page is at https://panda-gpt.github.io/.

* Technical report, work in progress. Our project page is at https://panda-gpt.github.io/

Via

Access Paper or Ask Questions

Biomedical Named Entity Recognition via Dictionary-based Synonym Generalization

May 22, 2023

Zihao Fu, Yixuan Su, Zaiqiao Meng, Nigel Collier

Figure 1 for Biomedical Named Entity Recognition via Dictionary-based Synonym Generalization

Figure 2 for Biomedical Named Entity Recognition via Dictionary-based Synonym Generalization

Figure 3 for Biomedical Named Entity Recognition via Dictionary-based Synonym Generalization

Figure 4 for Biomedical Named Entity Recognition via Dictionary-based Synonym Generalization

Abstract:Biomedical named entity recognition is one of the core tasks in biomedical natural language processing (BioNLP). To tackle this task, numerous supervised/distantly supervised approaches have been proposed. Despite their remarkable success, these approaches inescapably demand laborious human effort. To alleviate the need of human effort, dictionary-based approaches have been proposed to extract named entities simply based on a given dictionary. However, one downside of existing dictionary-based approaches is that they are challenged to identify concept synonyms that are not listed in the given dictionary, which we refer as the synonym generalization problem. In this study, we propose a novel Synonym Generalization (SynGen) framework that recognizes the biomedical concepts contained in the input text using span-based predictions. In particular, SynGen introduces two regularization terms, namely, (1) a synonym distance regularizer; and (2) a noise perturbation regularizer, to minimize the synonym generalization error. To demonstrate the effectiveness of our approach, we provide a theoretical analysis of the bound of synonym generalization error. We extensively evaluate our approach on a wide range of benchmarks and the results verify that SynGen outperforms previous dictionary-based models by notable margins. Lastly, we provide a detailed analysis to further reveal the merits and inner-workings of our approach.

Via

Access Paper or Ask Questions

COFFEE: A Contrastive Oracle-Free Framework for Event Extraction

Mar 25, 2023

Meiru Zhang, Yixuan Su, Zaiqiao Meng, Zihao Fu, Nigel Collier

Figure 1 for COFFEE: A Contrastive Oracle-Free Framework for Event Extraction

Figure 2 for COFFEE: A Contrastive Oracle-Free Framework for Event Extraction

Figure 3 for COFFEE: A Contrastive Oracle-Free Framework for Event Extraction

Figure 4 for COFFEE: A Contrastive Oracle-Free Framework for Event Extraction

Abstract:Event extraction is a complex information extraction task that involves extracting events from unstructured text. Prior classification-based methods require comprehensive entity annotations for joint training, while newer generation-based methods rely on heuristic templates containing oracle information such as event type, which is often unavailable in real-world scenarios. In this study, we consider a more realistic setting of this task, namely the Oracle-Free Event Extraction (OFEE) task, where only the input context is given without any oracle information, including event type, event ontology and trigger word. To solve this task, we propose a new framework, called COFFEE, which extracts the events solely based on the document context without referring to any oracle information. In particular, a contrastive selection model is introduced in COFFEE to rectify the generated triggers and handle multi-event instances. The proposed COFFEE outperforms state-of-the-art approaches under the oracle-free setting of the event extraction task, as evaluated on a public event extraction benchmark ACE05.

Via

Access Paper or Ask Questions

Plug-and-Play Recipe Generation with Content Planning

Dec 09, 2022

Yinhong Liu, Yixuan Su, Ehsan Shareghi, Nigel Collier

Abstract:Recent pre-trained language models have shown promising capabilities in generating fluent and realistic natural language text. However, generating multi-sentence text with global content planning has been a long-existing research question. Current approaches for controlled text generation can hardly address this issue, as they usually condition on single known control attributes. In this study, we propose a low-cost yet effective framework which explicitly models the global content plan of the generated text. Specifically, it optimizes the joint distribution of the natural language sequence and the global content plan in a plug-and-play manner. We conduct extensive experiments on the well-established Recipe1M+ benchmark. Both automatic and human evaluations verify that our model achieves the state-of-the-art performance on the task of recipe generation

* Paper accepted by EMNLP 2022 GEM workshop

Via

Access Paper or Ask Questions

Momentum Decoding: Open-ended Text Generation As Graph Exploration

Dec 05, 2022

Tian Lan, Yixuan Su, Shuhang Liu, Heyan Huang, Xian-Ling Mao

Figure 1 for Momentum Decoding: Open-ended Text Generation As Graph Exploration

Figure 2 for Momentum Decoding: Open-ended Text Generation As Graph Exploration

Figure 3 for Momentum Decoding: Open-ended Text Generation As Graph Exploration

Figure 4 for Momentum Decoding: Open-ended Text Generation As Graph Exploration

Abstract:Open-ended text generation with autoregressive language models (LMs) is one of the core tasks in natural language processing. However, maximization-based decoding methods (e.g., greedy/beam search) often lead to the degeneration problem, i.e., the generated text is unnatural and contains undesirable repetitions. Existing solutions to this problem either introduce randomness prone to incoherence or require a look-ahead mechanism that demands extra computational overhead. In this study, we formulate open-ended text generation from a new perspective, i.e., we view it as an exploration process within a directed graph. Thereby, we understand the phenomenon of degeneration as circular loops within the directed graph. Based on our formulation, we propose a novel decoding method -- \textit{momentum decoding} -- which encourages the LM to \textit{greedily} explore new nodes outside the current graph. Meanwhile, it also allows the LM to return to the existing nodes with a momentum downgraded by a pre-defined resistance function. We extensively test our approach on three benchmarks from different domains through automatic and human evaluations. The results show that momentum decoding performs comparably with the current state of the art while enjoying notably improved inference speed and computation FLOPs. Furthermore, we conduct a detailed analysis to reveal the merits and inner workings of our approach. Our codes and other related resources are publicly available at https://github.com/gmftbyGMFTBY/MomentumDecoding.

* Work in progress

Via

Access Paper or Ask Questions

An Empirical Study On Contrastive Search And Contrastive Decoding For Open-ended Text Generation

Nov 19, 2022

Yixuan Su, Jialu Xu

Abstract:In the study, we empirically compare the two recently proposed decoding methods, i.e. Contrastive Search (CS) and Contrastive Decoding (CD), for open-ended text generation. The automatic evaluation results suggest that, while CS performs worse than CD on the MAUVE metric, it substantially surpasses CD on the diversity and coherence metrics. More notably, extensive human evaluations across three different domains demonstrate that human annotators are universally more in favor of CS over CD with substantial margins. The contradicted results between MAUVE and human evaluations reveal that MAUVE does not accurately reflect human preferences. Therefore, we call upon the research community to develop better evaluation metrics for open-ended text generation. To ensure the reproducibility of our work, we have open-sourced all our code, evaluation results, as well as human annotations at https://github.com/yxuansu/Contrastive_Search_versus_Contrastive_Decoding.

* Technical report with 9 pages, 5 tables, and 6 figures

Via

Access Paper or Ask Questions