Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shuming Shi

Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and Text Integration

Jun 15, 2023

Chenyang Lyu, Minghao Wu, Longyue Wang, Xinting Huang, Bingshuai Liu, Zefeng Du, Shuming Shi, Zhaopeng Tu

Abstract:Although instruction-tuned large language models (LLMs) have exhibited remarkable capabilities across various NLP tasks, their effectiveness on other data modalities beyond text has not been fully studied. In this work, we propose Macaw-LLM, a novel multi-modal LLM that seamlessly integrates visual, audio, and textual information. Macaw-LLM consists of three main components: a modality module for encoding multi-modal data, a cognitive module for harnessing pretrained LLMs, and an alignment module for harmonizing diverse representations. Our novel alignment module seamlessly bridges multi-modal features to textual features, simplifying the adaptation process from the modality modules to the cognitive module. In addition, we construct a large-scale multi-modal instruction dataset in terms of multi-turn dialogue, including 69K image instances and 50K video instances. We have made our data, code and model publicly available, which we hope can pave the way for future research in multi-modal LLMs and expand the capabilities of LLMs to handle diverse data modalities and address complex real-world scenarios.

* Longyue Wang is the corresponding author. Our project page is at https://github.com/lyuchenyang/Macaw-LLM

Via

Access Paper or Ask Questions

Rethinking Translation Memory Augmented Neural Machine Translation

Jun 12, 2023

Hongkun Hao, Guoping Huang, Lemao Liu, Zhirui Zhang, Shuming Shi, Rui Wang

Figure 1 for Rethinking Translation Memory Augmented Neural Machine Translation

Figure 2 for Rethinking Translation Memory Augmented Neural Machine Translation

Figure 3 for Rethinking Translation Memory Augmented Neural Machine Translation

Figure 4 for Rethinking Translation Memory Augmented Neural Machine Translation

Abstract:This paper rethinks translation memory augmented neural machine translation (TM-augmented NMT) from two perspectives, i.e., a probabilistic view of retrieval and the variance-bias decomposition principle. The finding demonstrates that TM-augmented NMT is good at the ability of fitting data (i.e., lower bias) but is more sensitive to the fluctuations in the training data (i.e., higher variance), which provides an explanation to a recently reported contradictory phenomenon on the same translation task: TM-augmented NMT substantially advances vanilla NMT under the high-resource scenario whereas it fails under the low-resource scenario. Then we propose a simple yet effective TM-augmented NMT model to promote the variance and address the contradictory phenomenon. Extensive experiments show that the proposed TM-augmented NMT achieves consistent gains over both conventional NMT and existing TM-augmented NMT under two variance-preferable (low-resource and plug-and-play) scenarios as well as the high-resource scenario.

* 15 pages, 2 figures, accepted by ACL2023 findings

Via

Access Paper or Ask Questions

Sen2Pro: A Probabilistic Perspective to Sentence Embedding from Pre-trained Language Model

Jun 04, 2023

Lingfeng Shen, Haiyun Jiang, Lemao Liu, Shuming Shi

Abstract:Sentence embedding is one of the most fundamental tasks in Natural Language Processing and plays an important role in various tasks. The recent breakthrough in sentence embedding is achieved by pre-trained language models (PLMs). Despite its success, an embedded vector (Sen2Vec) representing a point estimate does not naturally express uncertainty in a taskagnostic way. This paper thereby proposes an efficient framework on probabilistic sentence embedding (Sen2Pro) from PLMs, and it represents a sentence as a probability density distribution in an embedding space to reflect both model uncertainty and data uncertainty (i.e., many-to-one nature) in the sentence representation. The proposed framework performs in a plug-and-play way without retraining PLMs anymore, and it is easy to implement and generally applied on top of any PLM. The superiority of Sen2Pro over Sen2Vec has been theoretically verified and practically illustrated on different NLP tasks.

* Accepted to ACL2023 workshop Rep4NLP

Via

Access Paper or Ask Questions

Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate

May 30, 2023

Tian Liang, Zhiwei He, Wenxiang Jiao, Xing Wang, Yan Wang, Rui Wang, Yujiu Yang, Zhaopeng Tu, Shuming Shi

Figure 1 for Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate

Figure 2 for Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate

Figure 3 for Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate

Figure 4 for Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate

Abstract:Modern large language models (LLMs) like ChatGPT have shown remarkable performance on general language tasks but still struggle on complex reasoning tasks, which drives the research on cognitive behaviors of LLMs to explore human-like problem-solving strategies. Along this direction, one representative strategy is self-reflection, which asks an LLM to refine the solution with the feedback generated by itself iteratively. However, our study shows that such reflection-style methods suffer from the Degeneration-of-Thought (DoT) problem: once the LLM has established confidence in its solutions, it is unable to generate novel thoughts later through reflection even if its initial stance is incorrect. To address the DoT problem, we propose a Multi-Agent Debate (MAD) framework, in which multiple agents express their arguments in the state of "tit for tat" and a judge manages the debate process to obtain a final solution. Clearly, our MAD framework encourages divergent thinking in LLMs which would be helpful for tasks that require deep levels of contemplation. Experiment results on two challenging datasets, commonsense machine translation and counter-intuitive arithmetic reasoning, demonstrate the effectiveness of our MAD framework. Extensive analyses suggest that the adaptive break of debate and the modest level of "tit for tat" state are required for MAD to obtain good performance. Moreover, we find that LLMs might not be a fair judge if different LLMs are used for agents. Codes: https://github.com/Skytliang/Multi-Agents-Debate

* Work in progress

Via

Access Paper or Ask Questions

Improved Visual Story Generation with Adaptive Context Modeling

May 26, 2023

Zhangyin Feng, Yuchen Ren, Xinmiao Yu, Xiaocheng Feng, Duyu Tang, Shuming Shi, Bing Qin

Figure 1 for Improved Visual Story Generation with Adaptive Context Modeling

Figure 2 for Improved Visual Story Generation with Adaptive Context Modeling

Figure 3 for Improved Visual Story Generation with Adaptive Context Modeling

Figure 4 for Improved Visual Story Generation with Adaptive Context Modeling

Abstract:Diffusion models developed on top of powerful text-to-image generation models like Stable Diffusion achieve remarkable success in visual story generation. However, the best-performing approach considers historically generated results as flattened memory cells, ignoring the fact that not all preceding images contribute equally to the generation of the characters and scenes at the current stage. To address this, we present a simple method that improves the leading system with adaptive context modeling, which is not only incorporated in the encoder but also adopted as additional guidance in the sampling stage to boost the global consistency of the generated story. We evaluate our model on PororoSV and FlintstonesSV datasets and show that our approach achieves state-of-the-art FID scores on both story visualization and continuation scenarios. We conduct detailed model analysis and show that our model excels at generating semantically consistent images for stories.

Via

Access Paper or Ask Questions

Enhancing Grammatical Error Correction Systems with Explanations

May 25, 2023

Yuejiao Fei, Leyang Cui, Sen Yang, Wai Lam, Zhenzhong Lan, Shuming Shi

Abstract:Grammatical error correction systems improve written communication by detecting and correcting language mistakes. To help language learners better understand why the GEC system makes a certain correction, the causes of errors (evidence words) and the corresponding error types are two key factors. To enhance GEC systems with explanations, we introduce EXPECT, a large dataset annotated with evidence words and grammatical error types. We propose several baselines and anlysis to understand this task. Furthermore, human evaluation verifies our explainable GEC system's explanations can assist second-language learners in determining whether to accept a correction suggestion and in understanding the associated grammar rule.

* 9 pages, 7 figures, accepted to the main conference of ACL 2023

Via

Access Paper or Ask Questions

A Frustratingly Simple Decoding Method for Neural Text Generation

May 22, 2023

Haoran Yang, Deng Cai, Huayang Li, Wei Bi, Wai Lam, Shuming Shi

Abstract:We introduce a frustratingly simple, super efficient and surprisingly effective decoding method, which we call Frustratingly Simple Decoding (FSD), for neural text generation. The idea behind FSD is straightforward: we build an anti-LM based on previously generated text and use this anti-LM to penalize future generation of what has been generated. The anti-LM can be implemented as simple as an n-gram language model or a vectorized variant. In this way, FSD introduces no extra model parameters and negligible computational overhead (FSD can be as fast as greedy search). Despite the simplicity, FSD is surprisingly effective; Experiments show that FSD can outperform the canonical methods to date (i.e., nucleus sampling) as well as several strong baselines that were proposed recently.

Via

Access Paper or Ask Questions

Deepfake Text Detection in the Wild

May 22, 2023

Yafu Li, Qintong Li, Leyang Cui, Wei Bi, Longyue Wang, Linyi Yang, Shuming Shi, Yue Zhang

Figure 1 for Deepfake Text Detection in the Wild

Figure 2 for Deepfake Text Detection in the Wild

Figure 3 for Deepfake Text Detection in the Wild

Figure 4 for Deepfake Text Detection in the Wild

Abstract:Recent advances in large language models have enabled them to reach a level of text generation comparable to that of humans. These models show powerful capabilities across a wide range of content, including news article writing, story generation, and scientific writing. Such capability further narrows the gap between human-authored and machine-generated texts, highlighting the importance of deepfake text detection to avoid potential risks such as fake news propagation and plagiarism. However, previous work has been limited in that they testify methods on testbed of specific domains or certain language models. In practical scenarios, the detector faces texts from various domains or LLMs without knowing their sources. To this end, we build a wild testbed by gathering texts from various human writings and deepfake texts generated by different LLMs. Human annotators are only slightly better than random guessing at identifying machine-generated texts. Empirical results on automatic detection methods further showcase the challenges of deepfake text detection in a wild testbed. In addition, out-of-distribution poses a greater challenge for a detector to be employed in realistic application scenarios. We release our resources at https://github.com/yafuly/DeepfakeTextDetect.

* Working in progress

Via

Access Paper or Ask Questions

A Survey on Zero Pronoun Translation

May 17, 2023

Longyue Wang, Siyou Liu, Mingzhou Xu, Linfeng Song, Shuming Shi, Zhaopeng Tu

Figure 1 for A Survey on Zero Pronoun Translation

Figure 2 for A Survey on Zero Pronoun Translation

Figure 3 for A Survey on Zero Pronoun Translation

Figure 4 for A Survey on Zero Pronoun Translation

Abstract:Zero pronouns (ZPs) are frequently omitted in pro-drop languages (e.g. Chinese, Hungarian, and Hindi), but should be recalled in non-pro-drop languages (e.g. English). This phenomenon has been studied extensively in machine translation (MT), as it poses a significant challenge for MT systems due to the difficulty in determining the correct antecedent for the pronoun. This survey paper highlights the major works that have been undertaken in zero pronoun translation (ZPT) after the neural revolution, so that researchers can recognise the current state and future directions of this field. We provide an organisation of the literature based on evolution, dataset, method and evaluation. In addition, we compare and analyze competing models and evaluation metrics on different benchmarks. We uncover a number of insightful findings such as: 1) ZPT is in line with the development trend of large language model; 2) data limitation causes learning bias in languages and domains; 3) performance improvements are often reported on single benchmarks, but advanced methods are still far from real-world use; 4) general-purpose metrics are not reliable on nuances and complexities of ZPT, emphasizing the necessity of targeted metrics; 5) apart from commonly-cited errors, ZPs will cause risks of gender bias.

* ACL2023 Main Conference Long Paper. Longyue Wang and Siyou Liu contributed equally to this work

Via

Access Paper or Ask Questions

A Simple and Plug-and-play Method for Unsupervised Sentence Representation Enhancement

May 13, 2023

Lingfeng Shen, Haiyun Jiang, Lemao Liu, Shuming Shi

Figure 1 for A Simple and Plug-and-play Method for Unsupervised Sentence Representation Enhancement

Figure 2 for A Simple and Plug-and-play Method for Unsupervised Sentence Representation Enhancement

Figure 3 for A Simple and Plug-and-play Method for Unsupervised Sentence Representation Enhancement

Figure 4 for A Simple and Plug-and-play Method for Unsupervised Sentence Representation Enhancement

Abstract:Generating proper embedding of sentences through an unsupervised way is beneficial to semantic matching and retrieval problems in real-world scenarios. This paper presents Representation ALchemy (RepAL), an extremely simple post-processing method that enhances sentence representations. The basic idea in RepAL is to de-emphasize redundant information of sentence embedding generated by pre-trained models. Through comprehensive experiments, we show that RepAL is free of training and is a plug-and-play method that can be combined with most existing unsupervised sentence learning models. We also conducted in-depth analysis to understand RepAL.

Via

Access Paper or Ask Questions