Alert button
Picture for Huayang Li

Huayang Li

Alert button

Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective

Oct 16, 2023
Huayang Li, Tian Lan, Zihao Fu, Deng Cai, Lemao Liu, Nigel Collier, Taro Watanabe, Yixuan Su

There are a number of diverging hypotheses about the neural text degeneration problem, i.e., generating repetitive and dull loops, which makes this problem both interesting and confusing. In this work, we aim to advance our understanding by presenting a straightforward and fundamental explanation from the data perspective. Our preliminary investigation reveals a strong correlation between the degeneration issue and the presence of repetitions in training data. Subsequent experiments also demonstrate that by selectively dropping out the attention to repetitive words in training data, degeneration can be significantly minimized. Furthermore, our empirical analysis illustrates that prior works addressing the degeneration issue from various standpoints, such as the high-inflow words, the likelihood objective, and the self-reinforcement phenomenon, can be interpreted by one simple explanation. That is, penalizing the repetitions in training data is a common and fundamental factor for their effectiveness. Moreover, our experiments reveal that penalizing the repetitions in training data remains critical even when considering larger model sizes and instruction tuning.

* Accepted to NeurIPS 2023 
Viaarxiv icon

TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wild

Sep 19, 2023
Huayang Li, Siheng Li, Deng Cai, Longyue Wang, Lemao Liu, Taro Watanabe, Yujiu Yang, Shuming Shi

Figure 1 for TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wild
Figure 2 for TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wild
Figure 3 for TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wild
Figure 4 for TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wild

Large language models with instruction-following abilities have revolutionized the field of artificial intelligence. These models show exceptional generalizability to tackle various real-world tasks through their natural language interfaces. However, their performance heavily relies on high-quality exemplar data, which is often difficult to obtain. This challenge is further exacerbated when it comes to multimodal instruction following. We introduce TextBind, an almost annotation-free framework for empowering larger language models with the multi-turn interleaved multimodal instruction-following capabilities. Our approach requires only image-caption pairs and generates multi-turn multimodal instruction-response conversations from a language model. To accommodate interleaved image-text inputs and outputs, we devise MIM, a language model-centric architecture that seamlessly integrates image encoder and decoder models. We release our dataset, model, and demo to foster future research in the area of multimodal instruction following.

* work in progress. https://textbind.github.io/ 
Viaarxiv icon

PandaGPT: One Model To Instruction-Follow Them All

May 25, 2023
Yixuan Su, Tian Lan, Huayang Li, Jialu Xu, Yan Wang, Deng Cai

Figure 1 for PandaGPT: One Model To Instruction-Follow Them All
Figure 2 for PandaGPT: One Model To Instruction-Follow Them All
Figure 3 for PandaGPT: One Model To Instruction-Follow Them All
Figure 4 for PandaGPT: One Model To Instruction-Follow Them All

We present PandaGPT, an approach to emPower large lANguage moDels with visual and Auditory instruction-following capabilities. Our pilot experiments show that PandaGPT can perform complex tasks such as detailed image description generation, writing stories inspired by videos, and answering questions about audios. More interestingly, PandaGPT can take multimodal inputs simultaneously and compose their semantics naturally. For example, PandaGPT can connect how objects look in an image/video and how they sound in an audio. To do so, PandaGPT combines the multimodal encoders from ImageBind and the large language models from Vicuna. Notably, only aligned image-text pairs are required for the training of PandaGPT. Thanks to the strong capability of ImageBind in embedding data from different modalities into the same space, PandaGPT displays emergent, i.e. zero-shot, cross-modal behaviors for data other than image and text (e.g., video, audio, depth, thermal, and IMU). We hope that PandaGPT serves as an initial step toward building AGI that can perceive and understand inputs in different modalities holistically, as we humans do. Our project page is at https://panda-gpt.github.io/.

* Technical report, work in progress. Our project page is at https://panda-gpt.github.io/ 
Viaarxiv icon

A Frustratingly Simple Decoding Method for Neural Text Generation

May 22, 2023
Haoran Yang, Deng Cai, Huayang Li, Wei Bi, Wai Lam, Shuming Shi

Figure 1 for A Frustratingly Simple Decoding Method for Neural Text Generation
Figure 2 for A Frustratingly Simple Decoding Method for Neural Text Generation
Figure 3 for A Frustratingly Simple Decoding Method for Neural Text Generation
Figure 4 for A Frustratingly Simple Decoding Method for Neural Text Generation

We introduce a frustratingly simple, super efficient and surprisingly effective decoding method, which we call Frustratingly Simple Decoding (FSD), for neural text generation. The idea behind FSD is straightforward: we build an anti-LM based on previously generated text and use this anti-LM to penalize future generation of what has been generated. The anti-LM can be implemented as simple as an n-gram language model or a vectorized variant. In this way, FSD introduces no extra model parameters and negligible computational overhead (FSD can be as fast as greedy search). Despite the simplicity, FSD is surprisingly effective; Experiments show that FSD can outperform the canonical methods to date (i.e., nucleus sampling) as well as several strong baselines that were proposed recently.

Viaarxiv icon

Unified Text Structuralization with Instruction-tuned Language Models

Mar 30, 2023
Xuanfan Ni, Piji Li, Huayang Li

Figure 1 for Unified Text Structuralization with Instruction-tuned Language Models
Figure 2 for Unified Text Structuralization with Instruction-tuned Language Models
Figure 3 for Unified Text Structuralization with Instruction-tuned Language Models
Figure 4 for Unified Text Structuralization with Instruction-tuned Language Models

Text structuralization is one of the important fields of natural language processing (NLP) consists of information extraction (IE) and structure formalization. However, current studies of text structuralization suffer from a shortage of manually annotated high-quality datasets from different domains and languages, which require specialized professional knowledge. In addition, most IE methods are designed for a specific type of structured data, e.g., entities, relations, and events, making them hard to generalize to others. In this work, we propose a simple and efficient approach to instruct large language model (LLM) to extract a variety of structures from texts. More concretely, we add a prefix and a suffix instruction to indicate the desired IE task and structure type, respectively, before feeding the text into a LLM. Experiments on two LLMs show that this approach can enable language models to perform comparable with other state-of-the-art methods on datasets of a variety of languages and knowledge, and can generalize to other IE sub-tasks via changing the content of instruction. Another benefit of our approach is that it can help researchers to build datasets in low-source and domain-specific scenarios, e.g., fields in finance and law, with low cost.

* 13 pages, 5 figures 
Viaarxiv icon

$N$-gram Is Back: Residual Learning of Neural Text Generation with $n$-gram Language Model

Nov 03, 2022
Huayang Li, Deng Cai, Jin Xu, Taro Watanabe

Figure 1 for $N$-gram Is Back: Residual Learning of Neural Text Generation with $n$-gram Language Model
Figure 2 for $N$-gram Is Back: Residual Learning of Neural Text Generation with $n$-gram Language Model
Figure 3 for $N$-gram Is Back: Residual Learning of Neural Text Generation with $n$-gram Language Model
Figure 4 for $N$-gram Is Back: Residual Learning of Neural Text Generation with $n$-gram Language Model

$N$-gram language models (LM) have been largely superseded by neural LMs as the latter exhibits better performance. However, we find that $n$-gram models can achieve satisfactory performance on a large proportion of testing cases, indicating they have already captured abundant knowledge of the language with relatively low computational cost. With this observation, we propose to learn a neural LM that fits the residual between an $n$-gram LM and the real-data distribution. The combination of $n$-gram and neural LMs not only allows the neural part to focus on the deeper understanding of language but also provides a flexible way to customize an LM by switching the underlying $n$-gram model without changing the neural model. Experimental results on three typical language tasks (i.e., language modeling, machine translation, and summarization) demonstrate that our approach attains additional performance gains over popular standalone neural models consistently. We also show that our approach allows for effective domain adaptation by simply switching to a domain-specific $n$-gram model, without any extra training. Our code is released at https://github.com/ghrua/NgramRes.

* Accepted to findings of EMNLP 2022 
Viaarxiv icon

Learning to Break the Loop: Analyzing and Mitigating Repetitions for Neural Text Generation

Jun 06, 2022
Jin Xu, Xiaojiang Liu, Jianhao Yan, Deng Cai, Huayang Li, Jian Li

Figure 1 for Learning to Break the Loop: Analyzing and Mitigating Repetitions for Neural Text Generation
Figure 2 for Learning to Break the Loop: Analyzing and Mitigating Repetitions for Neural Text Generation
Figure 3 for Learning to Break the Loop: Analyzing and Mitigating Repetitions for Neural Text Generation
Figure 4 for Learning to Break the Loop: Analyzing and Mitigating Repetitions for Neural Text Generation

While large-scale neural language models, such as GPT2 and BART, have achieved impressive results on various text generation tasks, they tend to get stuck in undesirable sentence-level loops with maximization-based decoding algorithms (\textit{e.g.}, greedy search). This phenomenon is counter-intuitive since there are few consecutive sentence-level repetitions in human corpora (e.g., 0.02\% in Wikitext-103). To investigate the underlying reasons for generating consecutive sentence-level repetitions, we study the relationship between the probabilities of the repetitive tokens and their previous repetitions in the context. Through our quantitative experiments, we find that 1) Language models have a preference to repeat the previous sentence; 2) The sentence-level repetitions have a \textit{self-reinforcement effect}: the more times a sentence is repeated in the context, the higher the probability of continuing to generate that sentence; 3) The sentences with higher initial probabilities usually have a stronger self-reinforcement effect. Motivated by our findings, we propose a simple and effective training method \textbf{DITTO} (Pseu\underline{D}o-Repet\underline{IT}ion Penaliza\underline{T}i\underline{O}n), where the model learns to penalize probabilities of sentence-level repetitions from pseudo repetitive data. Although our method is motivated by mitigating repetitions, experiments show that DITTO not only mitigates the repetition issue without sacrificing perplexity, but also achieves better generation quality. Extensive experiments on open-ended text generation (Wikitext-103) and text summarization (CNN/DailyMail) demonstrate the generality and effectiveness of our method.

Viaarxiv icon

Visualizing the Relationship Between Encoded Linguistic Information and Task Performance

Mar 29, 2022
Jiannan Xiang, Huayang Li, Defu Lian, Guoping Huang, Taro Watanabe, Lemao Liu

Figure 1 for Visualizing the Relationship Between Encoded Linguistic Information and Task Performance
Figure 2 for Visualizing the Relationship Between Encoded Linguistic Information and Task Performance
Figure 3 for Visualizing the Relationship Between Encoded Linguistic Information and Task Performance
Figure 4 for Visualizing the Relationship Between Encoded Linguistic Information and Task Performance

Probing is popular to analyze whether linguistic information can be captured by a well-trained deep neural model, but it is hard to answer how the change of the encoded linguistic information will affect task performance. To this end, we study the dynamic relationship between the encoded linguistic information and task performance from the viewpoint of Pareto Optimality. Its key idea is to obtain a set of models which are Pareto-optimal in terms of both objectives. From this viewpoint, we propose a method to optimize the Pareto-optimal models by formalizing it as a multi-objective optimization problem. We conduct experiments on two popular NLP tasks, i.e., machine translation and language modeling, and investigate the relationship between several kinds of linguistic information and task performances. Experimental results demonstrate that the proposed method is better than a baseline method. Our empirical findings suggest that some syntactic information is helpful for NLP tasks whereas encoding more syntactic information does not necessarily lead to better performance, because the model architecture is also an important factor.

* Findings of ACL 2022 
Viaarxiv icon

Investigating Data Variance in Evaluations of Automatic Machine Translation Metrics

Mar 29, 2022
Jiannan Xiang, Huayang Li, Yahui Liu, Lemao Liu, Guoping Huang, Defu Lian, Shuming Shi

Figure 1 for Investigating Data Variance in Evaluations of Automatic Machine Translation Metrics
Figure 2 for Investigating Data Variance in Evaluations of Automatic Machine Translation Metrics
Figure 3 for Investigating Data Variance in Evaluations of Automatic Machine Translation Metrics
Figure 4 for Investigating Data Variance in Evaluations of Automatic Machine Translation Metrics

Current practices in metric evaluation focus on one single dataset, e.g., Newstest dataset in each year's WMT Metrics Shared Task. However, in this paper, we qualitatively and quantitatively show that the performances of metrics are sensitive to data. The ranking of metrics varies when the evaluation is conducted on different datasets. Then this paper further investigates two potential hypotheses, i.e., insignificant data points and the deviation of Independent and Identically Distributed (i.i.d) assumption, which may take responsibility for the issue of data variance. In conclusion, our findings suggest that when evaluating automatic translation metrics, researchers should take data variance into account and be cautious to claim the result on a single dataset, because it may leads to inconsistent results with most of other datasets.

* Findings of ACL 2022 
Viaarxiv icon

A Survey on Retrieval-Augmented Text Generation

Feb 13, 2022
Huayang Li, Yixuan Su, Deng Cai, Yan Wang, Lemao Liu

Figure 1 for A Survey on Retrieval-Augmented Text Generation

Recently, retrieval-augmented text generation attracted increasing attention of the computational linguistics community. Compared with conventional generation models, retrieval-augmented text generation has remarkable advantages and particularly has achieved state-of-the-art performance in many NLP tasks. This paper aims to conduct a survey about retrieval-augmented text generation. It firstly highlights the generic paradigm of retrieval-augmented generation, and then it reviews notable approaches according to different tasks including dialogue response generation, machine translation, and other generation tasks. Finally, it points out some important directions on top of recent methods to facilitate future research.

* all authors contributed equally 
Viaarxiv icon