Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jia-Chen Gu

Synchronous Faithfulness Monitoring for Trustworthy Retrieval-Augmented Generation

Jun 19, 2024

Di Wu, Jia-Chen Gu, Fan Yin, Nanyun Peng, Kai-Wei Chang

Abstract:Retrieval-augmented language models (RALMs) have shown strong performance and wide applicability in knowledge-intensive tasks. However, there are significant trustworthiness concerns as RALMs are prone to generating unfaithful outputs, including baseless information or contradictions with the retrieved context. This paper proposes SynCheck, a lightweight monitor that leverages fine-grained decoding dynamics including sequence likelihood, uncertainty quantification, context influence, and semantic alignment to synchronously detect unfaithful sentences. By integrating efficiently measurable and complementary signals, SynCheck enables accurate and immediate feedback and intervention, achieving 0.85 AUROC in detecting faithfulness errors across six long-form retrieval-augmented generation tasks, improving prior best method by 4%. Leveraging SynCheck, we further introduce FOD, a faithfulness-oriented decoding algorithm guided by beam search for long-form retrieval-augmented generation. Empirical results demonstrate that FOD outperforms traditional strategies such as abstention, reranking, or contrastive decoding significantly in terms of faithfulness, achieving over 10% improvement across six datasets.

Via

Access Paper or Ask Questions

Perturbation-Restrained Sequential Model Editing

May 27, 2024

Jun-Yu Ma, Hong Wang, Hao-Xiang Xu, Zhen-Hua Ling, Jia-Chen Gu

Figure 1 for Perturbation-Restrained Sequential Model Editing

Figure 2 for Perturbation-Restrained Sequential Model Editing

Figure 3 for Perturbation-Restrained Sequential Model Editing

Figure 4 for Perturbation-Restrained Sequential Model Editing

Abstract:Model editing is an emerging field that focuses on updating the knowledge embedded within large language models (LLMs) without extensive retraining. However, current model editing methods significantly compromise the general abilities of LLMs as the number of edits increases, and this trade-off poses a substantial challenge to the continual learning of LLMs. In this paper, we first theoretically analyze that the factor affecting the general abilities in sequential model editing lies in the condition number of the edited matrix. The condition number of a matrix represents its numerical sensitivity, and therefore can be used to indicate the extent to which the original knowledge associations stored in LLMs are perturbed after editing. Subsequently, statistical findings demonstrate that the value of this factor becomes larger as the number of edits increases, thereby exacerbating the deterioration of general abilities. To this end, a framework termed Perturbation Restraint on Upper bouNd for Editing (PRUNE) is proposed, which applies the condition number restraints in sequential editing. These restraints can lower the upper bound on perturbation to edited models, thus preserving the general abilities. Systematically, we conduct experiments employing three popular editing methods on three LLMs across four representative downstream tasks. Evaluation results show that PRUNE can preserve considerable general abilities while maintaining the editing performance effectively in sequential model editing. The code and data are available at https://github.com/mjy1111/PRUNE.

Via

Access Paper or Ask Questions

Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection

May 25, 2024

Yun Zhu, Jia-Chen Gu, Caitlin Sikora, Ho Ko, Yinxiao Liu, Chu-Cheng Lin, Lei Shu, Liangchen Luo, Lei Meng, Bang Liu(+1 more)

Figure 1 for Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection

Figure 2 for Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection

Figure 3 for Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection

Figure 4 for Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection

Abstract:Large language models (LLMs) augmented with retrieval exhibit robust performance and extensive versatility by incorporating external contexts. However, the input length grows linearly in the number of retrieved documents, causing a dramatic increase in latency. In this paper, we propose a novel paradigm named Sparse RAG, which seeks to cut computation costs through sparsity. Specifically, Sparse RAG encodes retrieved documents in parallel, which eliminates latency introduced by long-range attention of retrieved documents. Then, LLMs selectively decode the output by only attending to highly relevant caches auto-regressively, which are chosen via prompting LLMs with special control tokens. It is notable that Sparse RAG combines the assessment of each individual document and the generation of the response into a single process. The designed sparse mechanism in a RAG system can facilitate the reduction of the number of documents loaded during decoding for accelerating the inference of the RAG system. Additionally, filtering out undesirable contexts enhances the model's focus on relevant context, inherently improving its generation quality. Evaluation results of two datasets show that Sparse RAG can strike an optimal balance between generation quality and computational efficiency, demonstrating its generalizability across both short- and long-form generation tasks.

Via

Access Paper or Ask Questions

Multiscale Matching Driven by Cross-Modal Similarity Consistency for Audio-Text Retrieval

Mar 15, 2024

Qian Wang, Jia-Chen Gu, Zhen-Hua Ling

Figure 1 for Multiscale Matching Driven by Cross-Modal Similarity Consistency for Audio-Text Retrieval

Figure 2 for Multiscale Matching Driven by Cross-Modal Similarity Consistency for Audio-Text Retrieval

Figure 3 for Multiscale Matching Driven by Cross-Modal Similarity Consistency for Audio-Text Retrieval

Figure 4 for Multiscale Matching Driven by Cross-Modal Similarity Consistency for Audio-Text Retrieval

Abstract:Audio-text retrieval (ATR), which retrieves a relevant caption given an audio clip (A2T) and vice versa (T2A), has recently attracted much research attention. Existing methods typically aggregate information from each modality into a single vector for matching, but this sacrifices local details and can hardly capture intricate relationships within and between modalities. Furthermore, current ATR datasets lack comprehensive alignment information, and simple binary contrastive learning labels overlook the measurement of fine-grained semantic differences between samples. To counter these challenges, we present a novel ATR framework that comprehensively captures the matching relationships of multimodal information from different perspectives and finer granularities. Specifically, a fine-grained alignment method is introduced, achieving a more detail-oriented matching through a multiscale process from local to global levels to capture meticulous cross-modal relationships. In addition, we pioneer the application of cross-modal similarity consistency, leveraging intra-modal similarity relationships as soft supervision to boost more intricate alignment. Extensive experiments validate the effectiveness of our approach, outperforming previous methods by significant margins of at least 3.9% (T2A) / 6.9% (A2T) R@1 on the AudioCaps dataset and 2.9% (T2A) / 5.4% (A2T) R@1 on the Clotho dataset.

* 5 pages, accepted to ICASSP2024

Via

Access Paper or Ask Questions

Neighboring Perturbations of Knowledge Editing on Large Language Models

Jan 31, 2024

Jun-Yu Ma, Jia-Chen Gu, Ningyu Zhang, Zhen-Hua Ling

Figure 1 for Neighboring Perturbations of Knowledge Editing on Large Language Models

Figure 2 for Neighboring Perturbations of Knowledge Editing on Large Language Models

Figure 3 for Neighboring Perturbations of Knowledge Editing on Large Language Models

Figure 4 for Neighboring Perturbations of Knowledge Editing on Large Language Models

Abstract:Despite their exceptional capabilities, large language models (LLMs) are prone to generating unintended text due to false or outdated knowledge. Given the resource-intensive nature of retraining LLMs, there has been a notable increase in the development of knowledge editing. However, current approaches and evaluations rarely explore the perturbation of editing on neighboring knowledge. This paper studies whether updating new knowledge to LLMs perturbs the neighboring knowledge encapsulated within them. Specifically, we seek to figure out whether appending a new answer into an answer list to a factual question leads to catastrophic forgetting of original correct answers in this list, as well as unintentional inclusion of incorrect answers. A metric of additivity is introduced and a benchmark dubbed as Perturbation Evaluation of Appending Knowledge (PEAK) is constructed to evaluate the degree of perturbation to neighboring knowledge when appending new knowledge. Besides, a plug-and-play framework termed Appending via Preservation and Prevention (APP) is proposed to mitigate the neighboring perturbation by maintaining the integrity of the answer list. Experiments demonstrate the effectiveness of APP coupling with four editing methods on three LLMs.

Via

Access Paper or Ask Questions

Corrective Retrieval Augmented Generation

Jan 29, 2024

Shi-Qi Yan, Jia-Chen Gu, Yun Zhu, Zhen-Hua Ling

Figure 1 for Corrective Retrieval Augmented Generation

Figure 2 for Corrective Retrieval Augmented Generation

Figure 3 for Corrective Retrieval Augmented Generation

Figure 4 for Corrective Retrieval Augmented Generation

Abstract:Large language models (LLMs) inevitably exhibit hallucinations since the accuracy of generated texts cannot be secured solely by the parametric knowledge they encapsulate. Although retrieval-augmented generation (RAG) is a practicable complement to LLMs, it relies heavily on the relevance of retrieved documents, raising concerns about how the model behaves if retrieval goes wrong. To this end, we propose the Corrective Retrieval Augmented Generation (CRAG) to improve the robustness of generation. Specifically, a lightweight retrieval evaluator is designed to assess the overall quality of retrieved documents for a query, returning a confidence degree based on which different knowledge retrieval actions can be triggered. Since retrieval from static and limited corpora can only return sub-optimal documents, large-scale web searches are utilized as an extension for augmenting the retrieval results. Besides, a decompose-then-recompose algorithm is designed for retrieved documents to selectively focus on key information and filter out irrelevant information in them. CRAG is plug-and-play and can be seamlessly coupled with various RAG-based approaches. Experiments on four datasets covering short- and long-form generation tasks show that CRAG can significantly improve the performance of RAG-based approaches.

Via

Access Paper or Ask Questions

Leveraging Large Language Models for NLG Evaluation: A Survey

Jan 13, 2024

Zhen Li, Xiaohan Xu, Tao Shen, Can Xu, Jia-Chen Gu, Chongyang Tao

Figure 1 for Leveraging Large Language Models for NLG Evaluation: A Survey

Figure 2 for Leveraging Large Language Models for NLG Evaluation: A Survey

Figure 3 for Leveraging Large Language Models for NLG Evaluation: A Survey

Figure 4 for Leveraging Large Language Models for NLG Evaluation: A Survey

Abstract:In the rapidly evolving domain of Natural Language Generation (NLG) evaluation, introducing Large Language Models (LLMs) has opened new avenues for assessing generated content quality, e.g., coherence, creativity, and context relevance. This survey aims to provide a thorough overview of leveraging LLMs for NLG evaluation, a burgeoning area that lacks a systematic analysis. We propose a coherent taxonomy for organizing existing LLM-based evaluation metrics, offering a structured framework to understand and compare these methods. Our detailed exploration includes critically assessing various LLM-based methodologies, as well as comparing their strengths and limitations in evaluating NLG outputs. By discussing unresolved challenges, including bias, robustness, domain-specificity, and unified evaluation, this survey seeks to offer insights to researchers and advocate for fairer and more advanced NLG evaluation techniques.

* 19pages

Via

Access Paper or Ask Questions

A Comprehensive Study of Knowledge Editing for Large Language Models

Jan 09, 2024

Ningyu Zhang, Yunzhi Yao, Bozhong Tian, Peng Wang, Shumin Deng, Mengru Wang, Zekun Xi, Shengyu Mao, Jintian Zhang, Yuansheng Ni(+12 more)

Figure 1 for A Comprehensive Study of Knowledge Editing for Large Language Models

Figure 2 for A Comprehensive Study of Knowledge Editing for Large Language Models

Figure 3 for A Comprehensive Study of Knowledge Editing for Large Language Models

Figure 4 for A Comprehensive Study of Knowledge Editing for Large Language Models

Abstract:Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication. However, a primary limitation lies in the significant computational demands during training, arising from their extensive parameterization. This challenge is further intensified by the dynamic nature of the world, necessitating frequent updates to LLMs to correct outdated information or integrate new knowledge, thereby ensuring their continued relevance. Note that many applications demand continual model adjustments post-training to address deficiencies or undesirable behaviors. There is an increasing interest in efficient, lightweight methods for on-the-fly model modifications. To this end, recent years have seen a burgeoning in the techniques of knowledge editing for LLMs, which aim to efficiently modify LLMs' behaviors within specific domains while preserving overall performance across various inputs. In this paper, we first define the knowledge editing problem and then provide a comprehensive review of cutting-edge approaches. Drawing inspiration from educational and cognitive research theories, we propose a unified categorization criterion that classifies knowledge editing methods into three groups: resorting to external knowledge, merging knowledge into the model, and editing intrinsic knowledge. Furthermore, we introduce a new benchmark, KnowEdit, for a comprehensive empirical evaluation of representative knowledge editing approaches. Additionally, we provide an in-depth analysis of knowledge location, which can give a deeper understanding of the knowledge structures inherent within LLMs. Finally, we discuss several potential applications of knowledge editing, outlining its broad and impactful implications.

* Ongoing work; 52 pages, 282 citations; benchmark is available at https://huggingface.co/datasets/zjunlp/KnowEdit code is available at https://github.com/zjunlp/EasyEdit paper list is available at https://github.com/zjunlp/KnowledgeEditingPapers

Via

Access Paper or Ask Questions

Model Editing Can Hurt General Abilities of Large Language Models

Jan 09, 2024

Jia-Chen Gu, Hao-Xiang Xu, Jun-Yu Ma, Pan Lu, Zhen-Hua Ling, Kai-Wei Chang, Nanyun Peng

Figure 1 for Model Editing Can Hurt General Abilities of Large Language Models

Figure 2 for Model Editing Can Hurt General Abilities of Large Language Models

Figure 3 for Model Editing Can Hurt General Abilities of Large Language Models

Figure 4 for Model Editing Can Hurt General Abilities of Large Language Models

Abstract:Recent advances in large language models (LLMs) have opened up new paradigms for accessing the knowledge stored in their parameters. One critical challenge that has emerged is the presence of hallucinations in LLM outputs due to false or outdated knowledge. Since retraining LLMs with updated information is resource-intensive, there has been a growing interest in model editing. However, many model editing methods, while effective in various scenarios, tend to overemphasize aspects such as efficacy, generalization, and locality in editing performance, often overlooking potential side effects on the general abilities of LLMs. In this paper, we raise concerns that the improvement of model factuality may come at the cost of a significant degradation of these general abilities, which is not conducive to the sustainable development of LLMs. Systematically, we analyze side effects by evaluating four popular editing methods on two LLMs across eight representative task categories. Extensive empirical research reveals that model editing does improve model factuality but at the expense of substantially impairing general abilities. Therefore, we advocate for more research efforts to minimize the loss of general abilities acquired during LLM pre-training and to ultimately preserve them during model editing.

Via

Access Paper or Ask Questions

Is ChatGPT a Good Multi-Party Conversation Solver?

Oct 25, 2023

Chao-Hong Tan, Jia-Chen Gu, Zhen-Hua Ling

Figure 1 for Is ChatGPT a Good Multi-Party Conversation Solver?

Figure 2 for Is ChatGPT a Good Multi-Party Conversation Solver?

Figure 3 for Is ChatGPT a Good Multi-Party Conversation Solver?

Figure 4 for Is ChatGPT a Good Multi-Party Conversation Solver?

Abstract:Large Language Models (LLMs) have emerged as influential instruments within the realm of natural language processing; nevertheless, their capacity to handle multi-party conversations (MPCs) -- a scenario marked by the presence of multiple interlocutors involved in intricate information exchanges -- remains uncharted. In this paper, we delve into the potential of generative LLMs such as ChatGPT and GPT-4 within the context of MPCs. An empirical analysis is conducted to assess the zero-shot learning capabilities of ChatGPT and GPT-4 by subjecting them to evaluation across three MPC datasets that encompass five representative tasks. The findings reveal that ChatGPT's performance on a number of evaluated MPC tasks leaves much to be desired, whilst GPT-4's results portend a promising future. Additionally, we endeavor to bolster performance through the incorporation of MPC structures, encompassing both speaker and addressee architecture. This study provides an exhaustive evaluation and analysis of applying generative LLMs to MPCs, casting a light upon the conception and creation of increasingly effective and robust MPC agents. Concurrently, this work underscores the challenges implicit in the utilization of LLMs for MPCs, such as deciphering graphical information flows and generating stylistically consistent responses.

* Accepted by Findings of EMNLP 2023

Via

Access Paper or Ask Questions