Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yong Jiang

Exploring Knowledge Boundaries in Large Language Models for Retrieval Judgment

Nov 09, 2024

Zhen Zhang, Xinyu Wang, Yong Jiang, Zhuo Chen, Feiteng Mu, Mengting Hu, Pengjun Xie, Fei Huang

Figure 1 for Exploring Knowledge Boundaries in Large Language Models for Retrieval Judgment

Figure 2 for Exploring Knowledge Boundaries in Large Language Models for Retrieval Judgment

Figure 3 for Exploring Knowledge Boundaries in Large Language Models for Retrieval Judgment

Figure 4 for Exploring Knowledge Boundaries in Large Language Models for Retrieval Judgment

Abstract:Large Language Models (LLMs) are increasingly recognized for their practical applications. However, these models often encounter challenges in dynamically changing knowledge, as well as in managing unknown static knowledge. Retrieval-Augmented Generation (RAG) tackles this challenge and has shown a significant impact on LLMs. Actually, we find that the impact of RAG on the question answering capabilities of LLMs can be categorized into three groups: beneficial, neutral, and harmful. By minimizing retrieval requests that yield neutral or harmful results, we can effectively reduce both time and computational costs, while also improving the overall performance of LLMs. This insight motivates us to differentiate between types of questions using certain metrics as indicators, to decrease the retrieval ratio without compromising performance. In our work, we propose a method that is able to identify different types of questions from this view by training a Knowledge Boundary Model (KBM). Experiments conducted on 11 English and Chinese datasets illustrate that the KBM effectively delineates the knowledge boundary, significantly decreasing the proportion of retrievals required for optimal end-to-end performance. Specifically, we evaluate the effectiveness of KBM in three complex scenarios: dynamic knowledge, long-tail static knowledge, and multi-hop problems, as well as its functionality as an external LLM plug-in.

Via

Access Paper or Ask Questions

Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent

Nov 05, 2024

Yangning Li, Yinghui Li, Xingyu Wang, Yong Jiang, Zhen Zhang, Xinran Zheng, Hui Wang, Hai-Tao Zheng, Philip S. Yu, Fei Huang(+1 more)

Figure 1 for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent

Figure 2 for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent

Figure 3 for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent

Figure 4 for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent

Abstract:Multimodal Retrieval Augmented Generation (mRAG) plays an important role in mitigating the "hallucination" issue inherent in multimodal large language models (MLLMs). Although promising, existing heuristic mRAGs typically predefined fixed retrieval processes, which causes two issues: (1) Non-adaptive Retrieval Queries. (2) Overloaded Retrieval Queries. However, these flaws cannot be adequately reflected by current knowledge-seeking visual question answering (VQA) datasets, since the most required knowledge can be readily obtained with a standard two-step retrieval. To bridge the dataset gap, we first construct Dyn-VQA dataset, consisting of three types of "dynamic" questions, which require complex knowledge retrieval strategies variable in query, tool, and time: (1) Questions with rapidly changing answers. (2) Questions requiring multi-modal knowledge. (3) Multi-hop questions. Experiments on Dyn-VQA reveal that existing heuristic mRAGs struggle to provide sufficient and precisely relevant knowledge for dynamic questions due to their rigid retrieval processes. Hence, we further propose the first self-adaptive planning agent for multimodal retrieval, OmniSearch. The underlying idea is to emulate the human behavior in question solution which dynamically decomposes complex multimodal questions into sub-question chains with retrieval action. Extensive experiments prove the effectiveness of our OmniSearch, also provide direction for advancing mRAG. The code and dataset will be open-sourced at https://github.com/Alibaba-NLP/OmniSearch.

Via

Access Paper or Ask Questions

An Adaptive Framework for Generating Systematic Explanatory Answer in Online Q&A Platforms

Oct 23, 2024

Ziyang Chen, Xiaobin Wang, Yong Jiang, Jinzhi Liao, Pengjun Xie, Fei Huang, Xiang Zhao

Figure 1 for An Adaptive Framework for Generating Systematic Explanatory Answer in Online Q&A Platforms

Figure 2 for An Adaptive Framework for Generating Systematic Explanatory Answer in Online Q&A Platforms

Figure 3 for An Adaptive Framework for Generating Systematic Explanatory Answer in Online Q&A Platforms

Figure 4 for An Adaptive Framework for Generating Systematic Explanatory Answer in Online Q&A Platforms

Abstract:Question Answering (QA) systems face challenges in handling complex questions that require multi-domain knowledge synthesis. The naive RAG models, although effective in information retrieval, struggle with complex questions that require comprehensive and in-depth answers. The pioneering task is defined as explanatory answer generation, which entails handling identified challenges such as the requirement for comprehensive information and logical coherence within the generated context. To address these issues, we refer to systematic thinking theory and propose SynthRAG, an innovative framework designed to enhance QA performance. SynthRAG improves on conventional models by employing adaptive outlines for dynamic content structuring, generating systematic information to ensure detailed coverage, and producing customized answers tailored to specific user inquiries. This structured approach guarantees logical coherence and thorough integration of information, yielding responses that are both insightful and methodically organized. Empirical evaluations underscore SynthRAG's effectiveness, demonstrating its superiority in handling complex questions, overcoming the limitations of naive RAG models, and significantly improving answer quality and depth. Furthermore, an online deployment on the Zhihu platform revealed that SynthRAG's answers achieved notable user engagement, with each response averaging 5.73 upvotes and surpassing the performance of 79.8% of human contributors, highlighting the practical relevance and impact of the proposed framework. Our code is available at https://github.com/czy1999/SynthRAG .

* 10 pages, 6 figures

Via

Access Paper or Ask Questions

Benchmarking Agentic Workflow Generation

Oct 10, 2024

Shuofei Qiao, Runnan Fang, Zhisong Qiu, Xiaobin Wang, Ningyu Zhang, Yong Jiang, Pengjun Xie, Fei Huang, Huajun Chen

Figure 1 for Benchmarking Agentic Workflow Generation

Figure 2 for Benchmarking Agentic Workflow Generation

Figure 3 for Benchmarking Agentic Workflow Generation

Figure 4 for Benchmarking Agentic Workflow Generation

Abstract:Large Language Models (LLMs), with their exceptional ability to handle a wide range of tasks, have driven significant advancements in tackling reasoning and planning tasks, wherein decomposing complex problems into executable workflows is a crucial step in this process. Existing workflow evaluation frameworks either focus solely on holistic performance or suffer from limitations such as restricted scenario coverage, simplistic workflow structures, and lax evaluation standards. To this end, we introduce WorFBench, a unified workflow generation benchmark with multi-faceted scenarios and intricate graph workflow structures. Additionally, we present WorFEval, a systemic evaluation protocol utilizing subsequence and subgraph matching algorithms to accurately quantify the LLM agent's workflow generation capabilities. Through comprehensive evaluations across different types of LLMs, we discover distinct gaps between the sequence planning capabilities and graph planning capabilities of LLM agents, with even GPT-4 exhibiting a gap of around 15%. We also train two open-source models and evaluate their generalization abilities on held-out tasks. Furthermore, we observe that the generated workflows can enhance downstream tasks, enabling them to achieve superior performance with less time during inference. Code and dataset will be available at https://github.com/zjunlp/WorFBench.

* Work in progress

Via

Access Paper or Ask Questions

Effective Demonstration Annotation for In-Context Learning via Language Model-Based Determinantal Point Process

Aug 04, 2024

Peng Wang, Xiaobin Wang, Chao Lou, Shengyu Mao, Pengjun Xie, Yong Jiang

Figure 1 for Effective Demonstration Annotation for In-Context Learning via Language Model-Based Determinantal Point Process

Figure 2 for Effective Demonstration Annotation for In-Context Learning via Language Model-Based Determinantal Point Process

Figure 3 for Effective Demonstration Annotation for In-Context Learning via Language Model-Based Determinantal Point Process

Figure 4 for Effective Demonstration Annotation for In-Context Learning via Language Model-Based Determinantal Point Process

Abstract:In-context learning (ICL) is a few-shot learning paradigm that involves learning mappings through input-output pairs and appropriately applying them to new instances. Despite the remarkable ICL capabilities demonstrated by Large Language Models (LLMs), existing works are highly dependent on large-scale labeled support sets, not always feasible in practical scenarios. To refine this approach, we focus primarily on an innovative selective annotation mechanism, which precedes the standard demonstration retrieval. We introduce the Language Model-based Determinant Point Process (LM-DPP) that simultaneously considers the uncertainty and diversity of unlabeled instances for optimal selection. Consequently, this yields a subset for annotation that strikes a trade-off between the two factors. We apply LM-DPP to various language models, including GPT-J, LlaMA, and GPT-3. Experimental results on 9 NLU and 2 Generation datasets demonstrate that LM-DPP can effectively select canonical examples. Further analysis reveals that LLMs benefit most significantly from subsets that are both low uncertainty and high diversity.

Via

Access Paper or Ask Questions

Learning Robust Named Entity Recognizers From Noisy Data With Retrieval Augmentation

Jul 26, 2024

Chaoyi Ai, Yong Jiang, Shen Huang, Pengjun Xie, Kewei Tu

Figure 1 for Learning Robust Named Entity Recognizers From Noisy Data With Retrieval Augmentation

Figure 2 for Learning Robust Named Entity Recognizers From Noisy Data With Retrieval Augmentation

Figure 3 for Learning Robust Named Entity Recognizers From Noisy Data With Retrieval Augmentation

Figure 4 for Learning Robust Named Entity Recognizers From Noisy Data With Retrieval Augmentation

Abstract:Named entity recognition (NER) models often struggle with noisy inputs, such as those with spelling mistakes or errors generated by Optical Character Recognition processes, and learning a robust NER model is challenging. Existing robust NER models utilize both noisy text and its corresponding gold text for training, which is infeasible in many real-world applications in which gold text is not available. In this paper, we consider a more realistic setting in which only noisy text and its NER labels are available. We propose to retrieve relevant text of the noisy text from a knowledge corpus and use it to enhance the representation of the original noisy input. We design three retrieval methods: sparse retrieval based on lexicon similarity, dense retrieval based on semantic similarity, and self-retrieval based on task-specific text. After retrieving relevant text, we concatenate the retrieved text with the original noisy text and encode them with a transformer network, utilizing self-attention to enhance the contextual token representations of the noisy text using the retrieved text. We further employ a multi-view training framework that improves robust NER without retrieving text during inference. Experiments show that our retrieval-augmented model achieves significant improvements in various noisy NER settings.

Via

Access Paper or Ask Questions

Knowledge Mechanisms in Large Language Models: A Survey and Perspective

Jul 22, 2024

Mengru Wang, Yunzhi Yao, Ziwen Xu, Shuofei Qiao, Shumin Deng, Peng Wang, Xiang Chen, Jia-Chen Gu, Yong Jiang, Pengjun Xie(+3 more)

Figure 1 for Knowledge Mechanisms in Large Language Models: A Survey and Perspective

Figure 2 for Knowledge Mechanisms in Large Language Models: A Survey and Perspective

Figure 3 for Knowledge Mechanisms in Large Language Models: A Survey and Perspective

Figure 4 for Knowledge Mechanisms in Large Language Models: A Survey and Perspective

Abstract:Understanding knowledge mechanisms in Large Language Models (LLMs) is crucial for advancing towards trustworthy AGI. This paper reviews knowledge mechanism analysis from a novel taxonomy including knowledge utilization and evolution. Knowledge utilization delves into the mechanism of memorization, comprehension and application, and creation. Knowledge evolution focuses on the dynamic progression of knowledge within individual and group LLMs. Moreover, we discuss what knowledge LLMs have learned, the reasons for the fragility of parametric knowledge, and the potential dark knowledge (hypothesis) that will be challenging to address. We hope this work can help understand knowledge in LLMs and provide insights for future research.

* Ongoing work (v1); 34 pages, 5 figures

Via

Access Paper or Ask Questions

Retrieved In-Context Principles from Previous Mistakes

Jul 08, 2024

Hao Sun, Yong Jiang, Bo Wang, Yingyan Hou, Yan Zhang, Pengjun Xie, Fei Huang

Figure 1 for Retrieved In-Context Principles from Previous Mistakes

Figure 2 for Retrieved In-Context Principles from Previous Mistakes

Figure 3 for Retrieved In-Context Principles from Previous Mistakes

Figure 4 for Retrieved In-Context Principles from Previous Mistakes

Abstract:In-context learning (ICL) has been instrumental in adapting Large Language Models (LLMs) to downstream tasks using correct input-output examples. Recent advances have attempted to improve model performance through principles derived from mistakes, yet these approaches suffer from lack of customization and inadequate error coverage. To address these limitations, we propose Retrieved In-Context Principles (RICP), a novel teacher-student framework. In RICP, the teacher model analyzes mistakes from the student model to generate reasons and insights for preventing similar mistakes. These mistakes are clustered based on their underlying reasons for developing task-level principles, enhancing the error coverage of principles. During inference, the most relevant mistakes for each question are retrieved to create question-level principles, improving the customization of the provided guidance. RICP is orthogonal to existing prompting methods and does not require intervention from the teacher model during inference. Experimental results across seven reasoning benchmarks reveal that RICP effectively enhances performance when applied to various prompting strategies.

Via

Access Paper or Ask Questions

ProductAgent: Benchmarking Conversational Product Search Agent with Asking Clarification Questions

Jul 01, 2024

Jingheng Ye, Yong Jiang, Xiaobin Wang, Yinghui Li, Yangning Li, Hai-Tao Zheng, Pengjun Xie, Fei Huang

Figure 1 for ProductAgent: Benchmarking Conversational Product Search Agent with Asking Clarification Questions

Figure 2 for ProductAgent: Benchmarking Conversational Product Search Agent with Asking Clarification Questions

Figure 3 for ProductAgent: Benchmarking Conversational Product Search Agent with Asking Clarification Questions

Figure 4 for ProductAgent: Benchmarking Conversational Product Search Agent with Asking Clarification Questions

Abstract:This paper introduces the task of product demand clarification within an e-commercial scenario, where the user commences the conversation with ambiguous queries and the task-oriented agent is designed to achieve more accurate and tailored product searching by asking clarification questions. To address this task, we propose ProductAgent, a conversational information seeking agent equipped with abilities of strategic clarification question generation and dynamic product retrieval. Specifically, we develop the agent with strategies for product feature summarization, query generation, and product retrieval. Furthermore, we propose the benchmark called PROCLARE to evaluate the agent's performance both automatically and qualitatively with the aid of a LLM-driven user simulator. Experiments show that ProductAgent interacts positively with the user and enhances retrieval performance with increasing dialogue turns, where user demands become gradually more explicit and detailed. All the source codes will be released after the review anonymity period.

* 17 pages, 13 tables, 6 figures. Under review

Via

Access Paper or Ask Questions

Adaptive Selection for Homogeneous Tools: An Instantiation in the RAG Scenario

Jun 18, 2024

Feiteng Mu, Yong Jiang, Liwen Zhang, Chu Liu, Wenjie Li, Pengjun Xie, Fei Huang

Figure 1 for Adaptive Selection for Homogeneous Tools: An Instantiation in the RAG Scenario

Figure 2 for Adaptive Selection for Homogeneous Tools: An Instantiation in the RAG Scenario

Figure 3 for Adaptive Selection for Homogeneous Tools: An Instantiation in the RAG Scenario

Figure 4 for Adaptive Selection for Homogeneous Tools: An Instantiation in the RAG Scenario

Abstract:Current research on tool learning primarily focuses on selecting the most effective tool from a wide array of options, often overlooking cost-effectiveness, a crucial factor in human problem-solving. In this paper, we address the selection of homogeneous tools by predicting both their performance and the associated cost required to accomplish a given task. We then assign queries to the optimal tools in a cost-effective manner. Our experimental results demonstrate that our method achieves higher performance at a lower cost compared to strong baseline approaches.

Via

Access Paper or Ask Questions