Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lemao Liu

Rethinking Targeted Adversarial Attacks For Neural Machine Translation

Jul 07, 2024

Junjie Wu, Lemao Liu, Wei Bi, Dit-Yan Yeung

Figure 1 for Rethinking Targeted Adversarial Attacks For Neural Machine Translation

Figure 2 for Rethinking Targeted Adversarial Attacks For Neural Machine Translation

Figure 3 for Rethinking Targeted Adversarial Attacks For Neural Machine Translation

Figure 4 for Rethinking Targeted Adversarial Attacks For Neural Machine Translation

Abstract:Targeted adversarial attacks are widely used to evaluate the robustness of neural machine translation systems. Unfortunately, this paper first identifies a critical issue in the existing settings of NMT targeted adversarial attacks, where their attacking results are largely overestimated. To this end, this paper presents a new setting for NMT targeted adversarial attacks that could lead to reliable attacking results. Under the new setting, it then proposes a Targeted Word Gradient adversarial Attack (TWGA) method to craft adversarial examples. Experimental results demonstrate that our proposed setting could provide faithful attacking results for targeted adversarial attacks on NMT systems, and the proposed TWGA method can effectively attack such victim NMT systems. In-depth analyses on a large-scale dataset further illustrate some valuable findings. 1 Our code and data are available at https://github.com/wujunjie1998/TWGA.

* 5 pages, 2 figures, accepted by ICASSP 2024

Via

Access Paper or Ask Questions

On the Transformations across Reward Model, Parameter Update, and In-Context Prompt

Jun 24, 2024

Deng Cai, Huayang Li, Tingchen Fu, Siheng Li, Weiwen Xu, Shuaiyi Li, Bowen Cao, Zhisong Zhang, Xinting Huang, Leyang Cui(+4 more)

Abstract:Despite the general capabilities of pre-trained large language models (LLMs), they still need further adaptation to better serve practical applications. In this paper, we demonstrate the interchangeability of three popular and distinct adaptation tools: parameter updating, reward modeling, and in-context prompting. This interchangeability establishes a triangular framework with six transformation directions, each of which facilitates a variety of applications. Our work offers a holistic view that unifies numerous existing studies and suggests potential research directions. We envision our work as a useful roadmap for future research on LLMs.

Via

Access Paper or Ask Questions

On the Hallucination in Simultaneous Machine Translation

Jun 11, 2024

Meizhi Zhong, Kehai Chen, Zhengshan Xue, Lemao Liu, Mingming Yang, Min Zhang

Abstract:It is widely known that hallucination is a critical issue in Simultaneous Machine Translation (SiMT) due to the absence of source-side information. While many efforts have been made to enhance performance for SiMT, few of them attempt to understand and analyze hallucination in SiMT. Therefore, we conduct a comprehensive analysis of hallucination in SiMT from two perspectives: understanding the distribution of hallucination words and the target-side context usage of them. Intensive experiments demonstrate some valuable findings and particularly show that it is possible to alleviate hallucination by decreasing the over usage of target-side information for SiMT.

Via

Access Paper or Ask Questions

Disperse-Then-Merge: Pushing the Limits of Instruction Tuning via Alignment Tax Reduction

May 22, 2024

Tingchen Fu, Deng Cai, Lemao Liu, Shuming Shi, Rui Yan

Figure 1 for Disperse-Then-Merge: Pushing the Limits of Instruction Tuning via Alignment Tax Reduction

Figure 2 for Disperse-Then-Merge: Pushing the Limits of Instruction Tuning via Alignment Tax Reduction

Figure 3 for Disperse-Then-Merge: Pushing the Limits of Instruction Tuning via Alignment Tax Reduction

Figure 4 for Disperse-Then-Merge: Pushing the Limits of Instruction Tuning via Alignment Tax Reduction

Abstract:Supervised fine-tuning (SFT) on instruction-following corpus is a crucial approach toward the alignment of large language models (LLMs). However, the performance of LLMs on standard knowledge and reasoning benchmarks tends to suffer from deterioration at the latter stage of the SFT process, echoing the phenomenon of alignment tax. Through our pilot study, we put a hypothesis that the data biases are probably one cause behind the phenomenon. To address the issue, we introduce a simple disperse-then-merge framework. To be concrete, we disperse the instruction-following data into portions and train multiple sub-models using different data portions. Then we merge multiple models into a single one via model merging techniques. Despite its simplicity, our framework outperforms various sophisticated methods such as data curation and training regularization on a series of standard knowledge and reasoning benchmarks.

* Accepted to the findings of ACL2024

Via

Access Paper or Ask Questions

Cross-lingual Contextualized Phrase Retrieval

Mar 25, 2024

Huayang Li, Deng Cai, Zhi Qu, Qu Cui, Hidetaka Kamigaito, Lemao Liu, Taro Watanabe

Abstract:Phrase-level dense retrieval has shown many appealing characteristics in downstream NLP tasks by leveraging the fine-grained information that phrases offer. In our work, we propose a new task formulation of dense retrieval, cross-lingual contextualized phrase retrieval, which aims to augment cross-lingual applications by addressing polysemy using context information. However, the lack of specific training data and models are the primary challenges to achieve our goal. As a result, we extract pairs of cross-lingual phrases using word alignment information automatically induced from parallel sentences. Subsequently, we train our Cross-lingual Contextualized Phrase Retriever (CCPR) using contrastive learning, which encourages the hidden representations of phrases with similar contexts and semantics to align closely. Comprehensive experiments on both the cross-lingual phrase retrieval task and a downstream task, i.e, machine translation, demonstrate the effectiveness of CCPR. On the phrase retrieval task, CCPR surpasses baselines by a significant margin, achieving a top-1 accuracy that is at least 13 points higher. When utilizing CCPR to augment the large-language-model-based translator, it achieves average gains of 0.7 and 1.5 in BERTScore for translations from X=>En and vice versa, respectively, on WMT16 dataset. Our code and data are available at \url{https://github.com/ghrua/ccpr_release}.

* preprint

Via

Access Paper or Ask Questions

BBA: Bi-Modal Behavioral Alignment for Reasoning with Large Vision-Language Models

Feb 21, 2024

Xueliang Zhao, Xinting Huang, Tingchen Fu, Qintong Li, Shansan Gong, Lemao Liu, Wei Bi, Lingpeng Kong

Figure 1 for BBA: Bi-Modal Behavioral Alignment for Reasoning with Large Vision-Language Models

Figure 2 for BBA: Bi-Modal Behavioral Alignment for Reasoning with Large Vision-Language Models

Figure 3 for BBA: Bi-Modal Behavioral Alignment for Reasoning with Large Vision-Language Models

Figure 4 for BBA: Bi-Modal Behavioral Alignment for Reasoning with Large Vision-Language Models

Abstract:Multimodal reasoning stands as a pivotal capability for large vision-language models (LVLMs). The integration with Domain-Specific Languages (DSL), offering precise visual representations, equips these models with the opportunity to execute more accurate reasoning in complex and professional domains. However, the vanilla Chain-of-Thought (CoT) prompting method faces challenges in effectively leveraging the unique strengths of visual and DSL representations, primarily due to their differing reasoning mechanisms. Additionally, it often falls short in addressing critical steps in multi-step reasoning tasks. To mitigate these challenges, we introduce the \underline{B}i-Modal \underline{B}ehavioral \underline{A}lignment (BBA) prompting method, designed to maximize the potential of DSL in augmenting complex multi-modal reasoning tasks. This method initiates by guiding LVLMs to create separate reasoning chains for visual and DSL representations. Subsequently, it aligns these chains by addressing any inconsistencies, thus achieving a cohesive integration of behaviors from different modalities. Our experiments demonstrate that BBA substantially improves the performance of GPT-4V(ision) on geometry problem solving ($28.34\% \to 34.22\%$), chess positional advantage prediction ($42.08\% \to 46.99\%$) and molecular property prediction ($77.47\% \to 83.52\%$).

* Preprint

Via

Access Paper or Ask Questions

When Graph Data Meets Multimodal: A New Paradigm for Graph Understanding and Reasoning

Dec 16, 2023

Qihang Ai, Jianwu Zhou, Haiyun Jiang, Lemao Liu, Shuming Shi

Abstract:Graph data is ubiquitous in the physical world, and it has always been a challenge to efficiently model graph structures using a unified paradigm for the understanding and reasoning on various graphs. Moreover, in the era of large language models, integrating complex graph information into text sequences has become exceptionally difficult, which hinders the ability to interact with graph data through natural language instructions.The paper presents a new paradigm for understanding and reasoning about graph data by integrating image encoding and multimodal technologies. This approach enables the comprehension of graph data through an instruction-response format, utilizing GPT-4V's advanced capabilities. The study evaluates this paradigm on various graph types, highlighting the model's strengths and weaknesses, particularly in Chinese OCR performance and complex reasoning tasks. The findings suggest new direction for enhancing graph data processing and natural language interaction.

* 15 pages, 10 figures, 9 tables

Via

Access Paper or Ask Questions

Context Consistency between Training and Testing in Simultaneous Machine Translation

Nov 13, 2023

Meizhi Zhong, Lemao Liu, Kehai Chen, Mingming Yang, Min Zhang

Abstract:Simultaneous Machine Translation (SiMT) aims to yield a real-time partial translation with a monotonically growing the source-side context. However, there is a counterintuitive phenomenon about the context usage between training and testing: e.g., the wait-k testing model consistently trained with wait-k is much worse than that model inconsistently trained with wait-k' (k' is not equal to k) in terms of translation quality. To this end, we first investigate the underlying reasons behind this phenomenon and uncover the following two factors: 1) the limited correlation between translation quality and training (cross-entropy) loss; 2) exposure bias between training and testing. Based on both reasons, we then propose an effective training approach called context consistency training accordingly, which makes consistent the context usage between training and testing by optimizing translation quality and latency as bi-objectives and exposing the predictions to the model during the training. The experiments on three language pairs demonstrate our intuition: our system encouraging context consistency outperforms that existing systems with context inconsistency for the first time, with the help of our context consistency training approach.

Via

Access Paper or Ask Questions

Hint-enhanced In-Context Learning wakes Large Language Models up for knowledge-intensive tasks

Nov 03, 2023

Yifan Wang, Qingyan Guo, Xinzhe Ni, Chufan Shi, Lemao Liu, Haiyun Jiang, Yujiu Yang

Abstract:In-context learning (ICL) ability has emerged with the increasing scale of large language models (LLMs), enabling them to learn input-label mappings from demonstrations and perform well on downstream tasks. However, under the standard ICL setting, LLMs may sometimes neglect query-related information in demonstrations, leading to incorrect predictions. To address this limitation, we propose a new paradigm called Hint-enhanced In-Context Learning (HICL) to explore the power of ICL in open-domain question answering, an important form in knowledge-intensive tasks. HICL leverages LLMs' reasoning ability to extract query-related knowledge from demonstrations, then concatenates the knowledge to prompt LLMs in a more explicit way. Furthermore, we track the source of this knowledge to identify specific examples, and introduce a Hint-related Example Retriever (HER) to select informative examples for enhanced demonstrations. We evaluate HICL with HER on 3 open-domain QA benchmarks, and observe average performance gains of 2.89 EM score and 2.52 F1 score on gpt-3.5-turbo, 7.62 EM score and 7.27 F1 score on LLaMA-2-Chat-7B compared with standard setting.

Via

Access Paper or Ask Questions

DistillCSE: Distilled Contrastive Learning for Sentence Embeddings

Oct 30, 2023

Jiahao Xu, Wei Shao, Lihui Chen, Lemao Liu

Figure 1 for DistillCSE: Distilled Contrastive Learning for Sentence Embeddings

Figure 2 for DistillCSE: Distilled Contrastive Learning for Sentence Embeddings

Figure 3 for DistillCSE: Distilled Contrastive Learning for Sentence Embeddings

Figure 4 for DistillCSE: Distilled Contrastive Learning for Sentence Embeddings

Abstract:This paper proposes the DistillCSE framework, which performs contrastive learning under the self-training paradigm with knowledge distillation. The potential advantage of DistillCSE is its self-enhancing feature: using a base model to provide additional supervision signals, a stronger model may be learned through knowledge distillation. However, the vanilla DistillCSE through the standard implementation of knowledge distillation only achieves marginal improvements due to severe overfitting. The further quantitative analyses demonstrate the reason that the standard knowledge distillation exhibits a relatively large variance of the teacher model's logits due to the essence of contrastive learning. To mitigate the issue induced by high variance, this paper accordingly proposed two simple yet effective solutions for knowledge distillation: a Group-P shuffling strategy as an implicit regularization and the averaging logits from multiple teacher components. Experiments on standard benchmarks demonstrate that the proposed DistillCSE outperforms many strong baseline methods and yields a new state-of-the-art performance.

Via

Access Paper or Ask Questions