Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dongfang Li

Medico: Towards Hallucination Detection and Correction with Multi-source Evidence Fusion

Oct 14, 2024

Xinping Zhao, Jindi Yu, Zhenyu Liu, Jifang Wang, Dongfang Li, Yibin Chen, Baotian Hu, Min Zhang

Figure 1 for Medico: Towards Hallucination Detection and Correction with Multi-source Evidence Fusion

Figure 2 for Medico: Towards Hallucination Detection and Correction with Multi-source Evidence Fusion

Figure 3 for Medico: Towards Hallucination Detection and Correction with Multi-source Evidence Fusion

Figure 4 for Medico: Towards Hallucination Detection and Correction with Multi-source Evidence Fusion

Abstract:As we all know, hallucinations prevail in Large Language Models (LLMs), where the generated content is coherent but factually incorrect, which inflicts a heavy blow on the widespread application of LLMs. Previous studies have shown that LLMs could confidently state non-existent facts rather than answering ``I don't know''. Therefore, it is necessary to resort to external knowledge to detect and correct the hallucinated content. Since manual detection and correction of factual errors is labor-intensive, developing an automatic end-to-end hallucination-checking approach is indeed a needful thing. To this end, we present Medico, a Multi-source evidence fusion enhanced hallucination detection and correction framework. It fuses diverse evidence from multiple sources, detects whether the generated content contains factual errors, provides the rationale behind the judgment, and iteratively revises the hallucinated content. Experimental results on evidence retrieval (0.964 HR@5, 0.908 MRR@5), hallucination detection (0.927-0.951 F1), and hallucination correction (0.973-0.979 approval rate) manifest the great potential of Medico. A video demo of Medico can be found at https://youtu.be/RtsO6CSesBI.

* 12 pages, 3 figures, 6 tables. Accepted by EMNLP 2024's demo track

Via

Access Paper or Ask Questions

FunnelRAG: A Coarse-to-Fine Progressive Retrieval Paradigm for RAG

Oct 14, 2024

Xinping Zhao, Yan Zhong, Zetian Sun, Xinshuo Hu, Zhenyu Liu, Dongfang Li, Baotian Hu, Min Zhang

Figure 1 for FunnelRAG: A Coarse-to-Fine Progressive Retrieval Paradigm for RAG

Figure 2 for FunnelRAG: A Coarse-to-Fine Progressive Retrieval Paradigm for RAG

Figure 3 for FunnelRAG: A Coarse-to-Fine Progressive Retrieval Paradigm for RAG

Figure 4 for FunnelRAG: A Coarse-to-Fine Progressive Retrieval Paradigm for RAG

Abstract:Retrieval-Augmented Generation (RAG) prevails in Large Language Models. It mainly consists of retrieval and generation. The retrieval modules (a.k.a. retrievers) aim to find useful information used to facilitate generation modules (a.k.a. generators). As such, generators' performance largely depends on the effectiveness and efficiency of retrievers. However, the retrieval paradigm that we design and use remains flat, which treats the retrieval procedures as a one-off deal with constant granularity. Despite effectiveness, we argue that they suffer from two limitations: (1) flat retrieval exerts a significant burden on one retriever; (2) constant granularity limits the ceiling of retrieval performance. In this work, we propose a progressive retrieval paradigm with coarse-to-fine granularity for RAG, termed FunnelRAG, so as to balance effectiveness and efficiency. Specifically, FunnelRAG establishes a progressive retrieval pipeline by collaborating coarse-to-fine granularity, large-to-small quantity, and low-to-high capacity, which can relieve the burden on one retriever and also promote the ceiling of retrieval performance. Extensive experiments manifest that FunnelRAG achieves comparable retrieval performance while the time overhead is reduced by nearly 40 percent.

* 18 pages, 6 figures, 13 tables

Via

Access Paper or Ask Questions

In-Context Learning State Vector with Inner and Momentum Optimization

Apr 17, 2024

Dongfang Li, Zhenyu Liu, Xinshuo Hu, Zetian Sun, Baotian Hu, Min Zhang

Figure 1 for In-Context Learning State Vector with Inner and Momentum Optimization

Figure 2 for In-Context Learning State Vector with Inner and Momentum Optimization

Figure 3 for In-Context Learning State Vector with Inner and Momentum Optimization

Figure 4 for In-Context Learning State Vector with Inner and Momentum Optimization

Abstract:Large Language Models (LLMs) have exhibited an impressive ability to perform In-Context Learning (ICL) from only a few examples. Recent works have indicated that the functions learned by ICL can be represented through compressed vectors derived from the transformer. However, the working mechanisms and optimization of these vectors are yet to be thoroughly explored. In this paper, we address this gap by presenting a comprehensive analysis of these compressed vectors, drawing parallels to the parameters trained with gradient descent, and introduce the concept of state vector. Inspired by the works on model soup and momentum-based gradient descent, we propose inner and momentum optimization methods that are applied to refine the state vector progressively as test-time adaptation. Moreover, we simulate state vector aggregation in the multiple example setting, where demonstrations comprising numerous examples are usually too lengthy for regular ICL, and further propose a divide-and-conquer aggregation method to address this challenge. We conduct extensive experiments using Llama-2 and GPT-J in both zero-shot setting and few-shot setting. The experimental results show that our optimization method effectively enhances the state vector and achieves the state-of-the-art performance on diverse tasks. Code is available at https://github.com/HITsz-TMG/ICL-State-Vector

* 17 pages, 7 figures, 5 tables

Via

Access Paper or Ask Questions

Improving Attributed Text Generation of Large Language Models via Preference Learning

Mar 27, 2024

Dongfang Li, Zetian Sun, Baotian Hu, Zhenyu Liu, Xinshuo Hu, Xuebo Liu, Min Zhang

Figure 1 for Improving Attributed Text Generation of Large Language Models via Preference Learning

Figure 2 for Improving Attributed Text Generation of Large Language Models via Preference Learning

Figure 3 for Improving Attributed Text Generation of Large Language Models via Preference Learning

Figure 4 for Improving Attributed Text Generation of Large Language Models via Preference Learning

Abstract:Large language models have been widely adopted in natural language processing, yet they face the challenge of generating unreliable content. Recent works aim to reduce misinformation and hallucinations by resorting to attribution as a means to provide evidence (i.e., citations). However, current attribution methods usually focus on the retrieval stage and automatic evaluation that neglect mirroring the citation mechanisms in human scholarly writing to bolster credibility. In this paper, we address these challenges by modelling the attribution task as preference learning and introducing an Automatic Preference Optimization (APO) framework. First, we create a curated collection for post-training with 6,330 examples by collecting and filtering from existing datasets. Second, considering the high cost of labelling preference data, we further propose an automatic method to synthesize attribution preference data resulting in 95,263 pairs. Moreover, inspired by the human citation process, we further propose a progressive preference optimization method by leveraging fine-grained information. Extensive experiments on three datasets (i.e., ASQA, StrategyQA, and ELI5) demonstrate that APO achieves state-of-the-art citation F1 with higher answer quality.

* 23 pages, 15 tables, 2 figures

Via

Access Paper or Ask Questions

SelectIT: Selective Instruction Tuning for Large Language Models via Uncertainty-Aware Self-Reflection

Feb 26, 2024

Liangxin Liu, Xuebo Liu, Derek F. Wong, Dongfang Li, Ziyi Wang, Baotian Hu, Min Zhang

Figure 1 for SelectIT: Selective Instruction Tuning for Large Language Models via Uncertainty-Aware Self-Reflection

Figure 2 for SelectIT: Selective Instruction Tuning for Large Language Models via Uncertainty-Aware Self-Reflection

Figure 3 for SelectIT: Selective Instruction Tuning for Large Language Models via Uncertainty-Aware Self-Reflection

Figure 4 for SelectIT: Selective Instruction Tuning for Large Language Models via Uncertainty-Aware Self-Reflection

Abstract:Instruction tuning (IT) is crucial to tailoring large language models (LLMs) towards human-centric interactions. Recent advancements have shown that the careful selection of a small, high-quality subset of IT data can significantly enhance the performance of LLMs. Despite this, common approaches often rely on additional models or data sets, which increases costs and limits widespread adoption. In this work, we propose a novel approach, termed SelectIT, that capitalizes on the foundational capabilities of the LLM itself. Specifically, we exploit the intrinsic uncertainty present in LLMs to more effectively select high-quality IT data, without the need for extra resources. Furthermore, we introduce a novel IT dataset, the Selective Alpaca, created by applying SelectIT to the Alpaca-GPT4 dataset. Empirical results demonstrate that IT using Selective Alpaca leads to substantial model ability enhancement. The robustness of SelectIT has also been corroborated in various foundation models and domain-specific tasks. Our findings suggest that longer and more computationally intensive IT data may serve as superior sources of IT, offering valuable insights for future research in this area. Data, code, and scripts are freely available at https://github.com/Blue-Raincoat/SelectIT.

Via

Access Paper or Ask Questions

Does the Generator Mind its Contexts? An Analysis of Generative Model Faithfulness under Context Transfer

Feb 22, 2024

Xinshuo Hu, Baotian Hu, Dongfang Li, Xiaoguang Li, Lifeng Shang

Figure 1 for Does the Generator Mind its Contexts? An Analysis of Generative Model Faithfulness under Context Transfer

Figure 2 for Does the Generator Mind its Contexts? An Analysis of Generative Model Faithfulness under Context Transfer

Figure 3 for Does the Generator Mind its Contexts? An Analysis of Generative Model Faithfulness under Context Transfer

Figure 4 for Does the Generator Mind its Contexts? An Analysis of Generative Model Faithfulness under Context Transfer

Abstract:The present study introduces the knowledge-augmented generator, which is specifically designed to produce information that remains grounded in contextual knowledge, regardless of alterations in the context. Previous research has predominantly focused on examining hallucinations stemming from static input, such as in the domains of summarization or machine translation. However, our investigation delves into the faithfulness of generative question answering in the presence of dynamic knowledge. Our objective is to explore the existence of hallucinations arising from parametric memory when contextual knowledge undergoes changes, while also analyzing the underlying causes for their occurrence. In order to efficiently address this issue, we propose a straightforward yet effective measure for detecting such hallucinations. Intriguingly, our investigation uncovers that all models exhibit a tendency to generate previous answers as hallucinations. To gain deeper insights into the underlying causes of this phenomenon, we conduct a series of experiments that verify the critical role played by context in hallucination, both during training and testing, from various perspectives.

* LREC-Coling 2024

Via

Access Paper or Ask Questions

Towards Faithful Explanations for Text Classification with Robustness Improvement and Explanation Guided Training

Dec 29, 2023

Dongfang Li, Baotian Hu, Qingcai Chen, Shan He

Figure 1 for Towards Faithful Explanations for Text Classification with Robustness Improvement and Explanation Guided Training

Figure 2 for Towards Faithful Explanations for Text Classification with Robustness Improvement and Explanation Guided Training

Figure 3 for Towards Faithful Explanations for Text Classification with Robustness Improvement and Explanation Guided Training

Figure 4 for Towards Faithful Explanations for Text Classification with Robustness Improvement and Explanation Guided Training

Abstract:Feature attribution methods highlight the important input tokens as explanations to model predictions, which have been widely applied to deep neural networks towards trustworthy AI. However, recent works show that explanations provided by these methods face challenges of being faithful and robust. In this paper, we propose a method with Robustness improvement and Explanation Guided training towards more faithful EXplanations (REGEX) for text classification. First, we improve model robustness by input gradient regularization technique and virtual adversarial training. Secondly, we use salient ranking to mask noisy tokens and maximize the similarity between model attention and feature attribution, which can be seen as a self-training procedure without importing other external information. We conduct extensive experiments on six datasets with five attribution methods, and also evaluate the faithfulness in the out-of-domain setting. The results show that REGEX improves fidelity metrics of explanations in all settings and further achieves consistent gains based on two randomization tests. Moreover, we show that using highlight explanations produced by REGEX to train select-then-predict models results in comparable task performance to the end-to-end method.

Via

Access Paper or Ask Questions

Temporal Knowledge Question Answering via Abstract Reasoning Induction

Nov 15, 2023

Ziyang Chen, Dongfang Li, Xiang Zhao, Baotian Hu, Min Zhang

Figure 1 for Temporal Knowledge Question Answering via Abstract Reasoning Induction

Figure 2 for Temporal Knowledge Question Answering via Abstract Reasoning Induction

Figure 3 for Temporal Knowledge Question Answering via Abstract Reasoning Induction

Figure 4 for Temporal Knowledge Question Answering via Abstract Reasoning Induction

Abstract:In this paper, we tackle the significant challenge of temporal knowledge reasoning in Large Language Models (LLMs), an area where such models frequently encounter difficulties. These difficulties often result in the generation of misleading or incorrect information, primarily due to their limited capacity to process evolving factual knowledge and complex temporal logic. In response, we propose a novel, constructivism-based approach that advocates for a paradigm shift in LLM learning towards an active, ongoing process of knowledge synthesis and customization. At the heart of our proposal is the Abstract Reasoning Induction ARI framework, which divides temporal reasoning into two distinct phases: Knowledge-agnostic and Knowledge-based. This division aims to reduce instances of hallucinations and improve LLMs' capacity for integrating abstract methodologies derived from historical data. Our approach achieves remarkable improvements, with relative gains of 29.7\% and 9.27\% on two temporal QA datasets, underscoring its efficacy in advancing temporal reasoning in LLMs. The code will be released at https://github.com/czy1999/ARI.

* 17 pages, 10 figures

Via

Access Paper or Ask Questions

Towards Reasoning in Large Language Models via Multi-Agent Peer Review Collaboration

Nov 14, 2023

Zhenran Xu, Senbao Shi, Baotian Hu, Jindi Yu, Dongfang Li, Min Zhang, Yuxiang Wu

Figure 1 for Towards Reasoning in Large Language Models via Multi-Agent Peer Review Collaboration

Figure 2 for Towards Reasoning in Large Language Models via Multi-Agent Peer Review Collaboration

Figure 3 for Towards Reasoning in Large Language Models via Multi-Agent Peer Review Collaboration

Figure 4 for Towards Reasoning in Large Language Models via Multi-Agent Peer Review Collaboration

Abstract:Large Language Models (LLMs) have shown remarkable capabilities in general natural language processing tasks but often fall short in complex reasoning tasks. Recent studies have explored human-like problem-solving strategies, such as self-correct, to push further the boundary of single-model reasoning ability. In this work, we let a single model "step outside the box" by engaging multiple models to correct each other. We introduce a multi-agent collaboration strategy that emulates the academic peer review process. Each agent independently constructs its own solution, provides reviews on the solutions of others, and assigns confidence levels to its reviews. Upon receiving peer reviews, agents revise their initial solutions. Extensive experiments on three different types of reasoning tasks show that our collaboration approach delivers superior accuracy across all ten datasets compared to existing methods. Further study demonstrates the effectiveness of integrating confidence in the reviews for math reasoning, and suggests a promising direction for human-mimicking multi-agent collaboration process.

* 9 pages, 3 figures, 8 tables. Work in progress

Via

Access Paper or Ask Questions

A Survey of Large Language Models Attribution

Nov 07, 2023

Dongfang Li, Zetian Sun, Xinshuo Hu, Zhenyu Liu, Ziyang Chen, Baotian Hu, Aiguo Wu, Min Zhang

Abstract:Open-domain generative systems have gained significant attention in the field of conversational AI (e.g., generative search engines). This paper presents a comprehensive review of the attribution mechanisms employed by these systems, particularly large language models. Though attribution or citation improve the factuality and verifiability, issues like ambiguous knowledge reservoirs, inherent biases, and the drawbacks of excessive attribution can hinder the effectiveness of these systems. The aim of this survey is to provide valuable insights for researchers, aiding in the refinement of attribution methodologies to enhance the reliability and veracity of responses generated by open-domain generative systems. We believe that this field is still in its early stages; hence, we maintain a repository to keep track of ongoing studies at https://github.com/HITsz-TMG/awesome-llm-attributions.

Via

Access Paper or Ask Questions