Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jiafeng Guo

On the Robustness of Generative Information Retrieval Models

Dec 25, 2024

Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Changjiang Zhou, Maarten de Rijke, Xueqi Cheng

Figure 1 for On the Robustness of Generative Information Retrieval Models

Figure 2 for On the Robustness of Generative Information Retrieval Models

Figure 3 for On the Robustness of Generative Information Retrieval Models

Figure 4 for On the Robustness of Generative Information Retrieval Models

Abstract:Generative information retrieval methods retrieve documents by directly generating their identifiers. Much effort has been devoted to developing effective generative IR models. Less attention has been paid to the robustness of these models. It is critical to assess the out-of-distribution (OOD) generalization of generative IR models, i.e., how would such models generalize to new distributions? To answer this question, we focus on OOD scenarios from four perspectives in retrieval problems: (i)query variations; (ii)unseen query types; (iii)unseen tasks; and (iv)corpus expansion. Based on this taxonomy, we conduct empirical studies to analyze the OOD robustness of representative generative IR models against dense retrieval models. Our empirical results indicate that the OOD robustness of generative IR models is in need of improvement. By inspecting the OOD robustness of generative IR models we aim to contribute to the development of more reliable IR models. The code is available at \url{https://github.com/Davion-Liu/GR_OOD}.

* Accepted by ECIR 2025. arXiv admin note: substantial text overlap with arXiv:2306.12756

Via

Access Paper or Ask Questions

HEC-GCN: Hypergraph Enhanced Cascading Graph Convolution Network for Multi-Behavior Recommendation

Dec 19, 2024

Yabo Yin, Xiaofei Zhu, Wenshan Wang, Yihao Zhang, Pengfei Wang, Yixing Fan, Jiafeng Guo

Figure 1 for HEC-GCN: Hypergraph Enhanced Cascading Graph Convolution Network for Multi-Behavior Recommendation

Figure 2 for HEC-GCN: Hypergraph Enhanced Cascading Graph Convolution Network for Multi-Behavior Recommendation

Figure 3 for HEC-GCN: Hypergraph Enhanced Cascading Graph Convolution Network for Multi-Behavior Recommendation

Figure 4 for HEC-GCN: Hypergraph Enhanced Cascading Graph Convolution Network for Multi-Behavior Recommendation

Abstract:Multi-behavior recommendation (MBR) has garnered growing attention recently due to its ability to mitigate the sparsity issue by inferring user preferences from various auxiliary behaviors to improve predictions for the target behavior. Although existing research on MBR has yielded impressive results, they still face two major limitations. First, previous methods mainly focus on modeling fine-grained interaction information between users and items under each behavior, which may suffer from sparsity issue. Second, existing models usually concentrate on exploiting dependencies between two consecutive behaviors, leaving intra- and inter-behavior consistency largely unexplored. To the end, we propose a novel approach named Hypergraph Enhanced Cascading Graph Convolution Network for multi-behavior recommendation (HEC-GCN). To be specific, we first explore both fine- and coarse-grained correlations among users or items of each behavior by simultaneously modeling the behavior-specific interaction graph and its corresponding hypergraph in a cascaded manner. Then, we propose a behavior consistency-guided alignment strategy that ensures consistent representations between the interaction graph and its associated hypergraph for each behavior, while also maintaining representation consistency across different behaviors. Extensive experiments and analyses on three public benchmark datasets demonstrate that our proposed approach is consistently superior to previous state-of-the-art methods due to its capability to effectively attenuate the sparsity issue as well as preserve both intra- and inter-behavior consistencies. The code is available at https://github.com/marqu22/HEC-GCN.git.

Via

Access Paper or Ask Questions

AlignXIE: Improving Multilingual Information Extraction by Cross-Lingual Alignment

Nov 07, 2024

Yuxin Zuo, Wenxuan Jiang, Wenxuan Liu, Zixuan Li, Long Bai, Hanbin Wang, Yutao Zeng, Xiaolong Jin, Jiafeng Guo, Xueqi Cheng

Figure 1 for AlignXIE: Improving Multilingual Information Extraction by Cross-Lingual Alignment

Figure 2 for AlignXIE: Improving Multilingual Information Extraction by Cross-Lingual Alignment

Figure 3 for AlignXIE: Improving Multilingual Information Extraction by Cross-Lingual Alignment

Figure 4 for AlignXIE: Improving Multilingual Information Extraction by Cross-Lingual Alignment

Abstract:Empirical evidence suggests that LLMs exhibit spontaneous cross-lingual alignment. Our findings suggest that although LLMs also demonstrate promising cross-lingual alignment in Information Extraction, there remains significant imbalance across languages, revealing an underlying deficiency in the IE alignment. To address this issue, we propose AlignXIE, a powerful code-based LLM that significantly enhances cross-lingual IE alignment through two strategies. Firstly, AlignXIE formulates IE across different languages, especially non-English ones, as code generation tasks, standardizing the representation of various schemas using Python classes to ensure consistency of the same ontology in different languages and align the schema. Secondly, it incorporates an IE cross-lingual alignment phase through a translated instance prediction task proposed in this paper to align the extraction process, utilizing ParallelNER, an IE bilingual parallel dataset with 257,190 samples, generated by our proposed LLM-based automatic pipeline for IE parallel data construction, with manual annotation to ensure quality. Ultimately, we obtain AlignXIE through multilingual IE instruction tuning. Although without training in 9 unseen languages, AlignXIE surpasses ChatGPT by $30.17\%$ and SoTA by $20.03\%$, thereby demonstrating superior cross-lingual IE capabilities. Comprehensive evaluations on 63 IE benchmarks in Chinese and English under various settings, demonstrate that AlignXIE significantly enhances cross-lingual and multilingual IE through boosting the IE alignment.

* Work in progress

Via

Access Paper or Ask Questions

CausalDiff: Causality-Inspired Disentanglement via Diffusion Model for Adversarial Defense

Oct 30, 2024

Mingkun Zhang, Keping Bi, Wei Chen, Quanrun Chen, Jiafeng Guo, Xueqi Cheng

Abstract:Despite ongoing efforts to defend neural classifiers from adversarial attacks, they remain vulnerable, especially to unseen attacks. In contrast, humans are difficult to be cheated by subtle manipulations, since we make judgments only based on essential factors. Inspired by this observation, we attempt to model label generation with essential label-causative factors and incorporate label-non-causative factors to assist data generation. For an adversarial example, we aim to discriminate the perturbations as non-causative factors and make predictions only based on the label-causative factors. Concretely, we propose a casual diffusion model (CausalDiff) that adapts diffusion models for conditional data generation and disentangles the two types of casual factors by learning towards a novel casual information bottleneck objective. Empirically, CausalDiff has significantly outperformed state-of-the-art defense methods on various unseen attacks, achieving an average robustness of 86.39% (+4.01%) on CIFAR-10, 56.25% (+3.13%) on CIFAR-100, and 82.62% (+4.93%) on GTSRB (German Traffic Sign Recognition Benchmark).

* accepted by NeurIPS 2024

Via

Access Paper or Ask Questions

A Claim Decomposition Benchmark for Long-form Answer Verification

Oct 16, 2024

Zhihao Zhang, Yixing Fan, Ruqing Zhang, Jiafeng Guo

Figure 1 for A Claim Decomposition Benchmark for Long-form Answer Verification

Figure 2 for A Claim Decomposition Benchmark for Long-form Answer Verification

Figure 3 for A Claim Decomposition Benchmark for Long-form Answer Verification

Figure 4 for A Claim Decomposition Benchmark for Long-form Answer Verification

Abstract:The advancement of LLMs has significantly boosted the performance of complex long-form question answering tasks. However, one prominent issue of LLMs is the generated "hallucination" responses that are not factual. Consequently, attribution for each claim in responses becomes a common solution to improve the factuality and verifiability. Existing researches mainly focus on how to provide accurate citations for the response, which largely overlook the importance of identifying the claims or statements for each response. To bridge this gap, we introduce a new claim decomposition benchmark, which requires building system that can identify atomic and checkworthy claims for LLM responses. Specifically, we present the Chinese Atomic Claim Decomposition Dataset (CACDD), which builds on the WebCPM dataset with additional expert annotations to ensure high data quality. The CACDD encompasses a collection of 500 human-annotated question-answer pairs, including a total of 4956 atomic claims. We further propose a new pipeline for human annotation and describe the challenges of this task. In addition, we provide experiment results on zero-shot, few-shot and fine-tuned LLMs as baselines. The results show that the claim decomposition is highly challenging and requires further explorations. All code and data are publicly available at \url{https://github.com/FBzzh/CACDD}.

* Accepted by CCIR 2024

Via

Access Paper or Ask Questions

On the Capacity of Citation Generation by Large Language Models

Oct 15, 2024

Haosheng Qian, Yixing Fan, Ruqing Zhang, Jiafeng Guo

Figure 1 for On the Capacity of Citation Generation by Large Language Models

Figure 2 for On the Capacity of Citation Generation by Large Language Models

Figure 3 for On the Capacity of Citation Generation by Large Language Models

Figure 4 for On the Capacity of Citation Generation by Large Language Models

Abstract:Retrieval-augmented generation (RAG) appears as a promising method to alleviate the "hallucination" problem in large language models (LLMs), since it can incorporate external traceable resources for response generation. The essence of RAG in combating the hallucination issue lies in accurately attributing claims in responses to the corresponding retrieved documents. However, most of existing works focus on improving the quality of generated responses from the LLM, while largely overlooked its ability to attribute sources accurately. In this study, we conduct a systematic analysis about the capabilities of LLMs in generating citations within response generation, and further introduce a novel method to enhance their citation generation abilities. Specifically, we evaluate both the correctness and citation quality for seven widely-used LLMs on two benchmark datasets. Meanwhile, we introduce new citation evaluation metrics to eliminate the over-penalization of unnecessary and excessive citations in existing metrics. Furthermore, we propose a Generate-then-Refine method that completes relevant citations and removes irrelevant ones without altering the response text. The results on WebGLM-QA, ASQA and ELI5 datasets show that our method substantially improves the quality of citations in responses generated by LLMs.

* Accepted by CCIR 2024

Via

Access Paper or Ask Questions

Generative Retrieval Meets Multi-Graded Relevance

Sep 27, 2024

Yubao Tang, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Wei Chen, Xueqi Cheng

Figure 1 for Generative Retrieval Meets Multi-Graded Relevance

Figure 2 for Generative Retrieval Meets Multi-Graded Relevance

Figure 3 for Generative Retrieval Meets Multi-Graded Relevance

Figure 4 for Generative Retrieval Meets Multi-Graded Relevance

Abstract:Generative retrieval represents a novel approach to information retrieval. It uses an encoder-decoder architecture to directly produce relevant document identifiers (docids) for queries. While this method offers benefits, current approaches are limited to scenarios with binary relevance data, overlooking the potential for documents to have multi-graded relevance. Extending generative retrieval to accommodate multi-graded relevance poses challenges, including the need to reconcile likelihood probabilities for docid pairs and the possibility of multiple relevant documents sharing the same identifier. To address these challenges, we introduce a framework called GRaded Generative Retrieval (GR$^2$). GR$^2$ focuses on two key components: ensuring relevant and distinct identifiers, and implementing multi-graded constrained contrastive training. First, we create identifiers that are both semantically relevant and sufficiently distinct to represent individual documents effectively. This is achieved by jointly optimizing the relevance and distinctness of docids through a combination of docid generation and autoencoder models. Second, we incorporate information about the relationship between relevance grades to guide the training process. We use a constrained contrastive training strategy to bring the representations of queries and the identifiers of their relevant documents closer together, based on their respective relevance grades. Extensive experiments on datasets with both multi-graded and binary relevance demonstrate the effectiveness of GR$^2$.

* Accepted by the NeurIPS 2024 (Spotlight)

Via

Access Paper or Ask Questions

Controlling Risk of Retrieval-augmented Generation: A Counterfactual Prompting Framework

Sep 24, 2024

Lu Chen, Ruqing Zhang, Jiafeng Guo, Yixing Fan, Xueqi Cheng

Figure 1 for Controlling Risk of Retrieval-augmented Generation: A Counterfactual Prompting Framework

Figure 2 for Controlling Risk of Retrieval-augmented Generation: A Counterfactual Prompting Framework

Figure 3 for Controlling Risk of Retrieval-augmented Generation: A Counterfactual Prompting Framework

Figure 4 for Controlling Risk of Retrieval-augmented Generation: A Counterfactual Prompting Framework

Abstract:Retrieval-augmented generation (RAG) has emerged as a popular solution to mitigate the hallucination issues of large language models. However, existing studies on RAG seldom address the issue of predictive uncertainty, i.e., how likely it is that a RAG model's prediction is incorrect, resulting in uncontrollable risks in real-world applications. In this work, we emphasize the importance of risk control, ensuring that RAG models proactively refuse to answer questions with low confidence. Our research identifies two critical latent factors affecting RAG's confidence in its predictions: the quality of the retrieved results and the manner in which these results are utilized. To guide RAG models in assessing their own confidence based on these two latent factors, we develop a counterfactual prompting framework that induces the models to alter these factors and analyzes the effect on their answers. We also introduce a benchmarking procedure to collect answers with the option to abstain, facilitating a series of experiments. For evaluation, we introduce several risk-related metrics and the experimental results demonstrate the effectiveness of our approach.

Via

Access Paper or Ask Questions

QUITO-X: An Information Bottleneck-based Compression Algorithm with Cross-Attention

Aug 20, 2024

Yihang Wang, Xu Huang, Bowen Tian, Yixing Fan, Jiafeng Guo

Abstract:Generative LLM have achieved significant success in various industrial tasks and can effectively adapt to vertical domains and downstream tasks through ICL. However, with tasks becoming increasingly complex, the context length required by ICL is also getting longer, and two significant issues arise: (i) The excessively long context leads to high costs and inference delays. (ii) A substantial amount of task-irrelevant information introduced by long contexts exacerbates the "lost in the middle" problem. Recently, compressing prompts by removing tokens according to some metric obtained from some causal language models, such as llama-7b, has emerged as an effective approach to mitigate these issues. However, the metric used by prior method such as self-information or PPL do not fully align with the objective of distinuishing the most important tokens when conditioning on query. In this work, we introduce information bottleneck theory to carefully examine the properties required by the metric. Inspired by this, we use cross-attention in encoder-decoder architecture as a new metric. Our simple method leads to significantly better performance in smaller models with lower latency. We evaluate our method on four datasets: DROP, CoQA, SQuAD, and Quoref. The experimental results show that, while maintaining the same performance, our compression rate can improve by nearly 25% over previous SOTA. Remarkably, in experiments where 25% of the tokens are removed, our model's EM score for answers sometimes even exceeds that of the control group using uncompressed text as context.

Via

Access Paper or Ask Questions

Contextual Dual Learning Algorithm with Listwise Distillation for Unbiased Learning to Rank

Aug 19, 2024

Lulu Yu, Keping Bi, Shiyu Ni, Jiafeng Guo

Figure 1 for Contextual Dual Learning Algorithm with Listwise Distillation for Unbiased Learning to Rank

Figure 2 for Contextual Dual Learning Algorithm with Listwise Distillation for Unbiased Learning to Rank

Figure 3 for Contextual Dual Learning Algorithm with Listwise Distillation for Unbiased Learning to Rank

Figure 4 for Contextual Dual Learning Algorithm with Listwise Distillation for Unbiased Learning to Rank

Abstract:Unbiased Learning to Rank (ULTR) aims to leverage biased implicit user feedback (e.g., click) to optimize an unbiased ranking model. The effectiveness of the existing ULTR methods has primarily been validated on synthetic datasets. However, their performance on real-world click data remains unclear. Recently, Baidu released a large publicly available dataset of their web search logs. Subsequently, the NTCIR-17 ULTRE-2 task released a subset dataset extracted from it. We conduct experiments on commonly used or effective ULTR methods on this subset to determine whether they maintain their effectiveness. In this paper, we propose a Contextual Dual Learning Algorithm with Listwise Distillation (CDLA-LD) to simultaneously address both position bias and contextual bias. We utilize a listwise-input ranking model to obtain reconstructed feature vectors incorporating local contextual information and employ the Dual Learning Algorithm (DLA) method to jointly train this ranking model and a propensity model to address position bias. As this ranking model learns the interaction information within the documents list of the training set, to enhance the ranking model's generalization ability, we additionally train a pointwise-input ranking model to learn the listwise-input ranking model's capability for relevance judgment in a listwise manner. Extensive experiments and analysis confirm the effectiveness of our approach.

* 12 pages, 2 figures

Via

Access Paper or Ask Questions