Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kevin Chen-Chuan Chang

Grainger College of Engineering, University of Illinois at Urbana-Champaign

AttentionRetriever: Attention Layers are Secretly Long Document Retrievers

Feb 12, 2026

David Jiahao Fu, Lam Thanh Do, Jiayu Li, Kevin Chen-Chuan Chang

Abstract:Retrieval augmented generation (RAG) has been widely adopted to help Large Language Models (LLMs) to process tasks involving long documents. However, existing retrieval models are not designed for long document retrieval and fail to address several key challenges of long document retrieval, including context-awareness, causal dependence, and scope of retrieval. In this paper, we proposed AttentionRetriever, a novel long document retrieval model that leverages attention mechanism and entity-based retrieval to build context-aware embeddings for long document and determine the scope of retrieval. With extensive experiments, we found AttentionRetriever is able to outperform existing retrieval models on long document retrieval datasets by a large margin while remaining as efficient as dense retrieval models.

Via

Access Paper or Ask Questions

IRB: Automated Generation of Robust Factuality Benchmarks

Feb 08, 2026

Lam Thanh Do, Bhagyashree Taleka, Hozaifa Ammar Bhutta, Vikram Sharma Mailthody, Kevin Chen-Chuan Chang, Wen-mei Hwu

Abstract:Static benchmarks for RAG systems often suffer from rapid saturation and require significant manual effort to maintain robustness. To address this, we present IRB, a framework for automatically generating benchmarks to evaluate the factuality of RAG systems. IRB employs a structured generation pipeline utilizing \textit{factual scaffold} and \textit{algorithmic scaffold}. We utilize IRB to construct a benchmark and evaluate frontier LLMs and retrievers. Our results demonstrate that IRB poses a significant challenge for frontier LLMs in the closed-book setting. Furthermore, our evaluation suggests that reasoning LLMs are more reliable, and that improving the retrieval component may yield more cost-effective gains in RAG system correctness than scaling the generator.

* Code: https://github.com/Hozaifa-Bhutta/IRB

Via

Access Paper or Ask Questions

On Recommending Category: A Cascading Approach

Dec 17, 2025

Qihao Wang, Pritom Saha Akash, Varvara Kollia, Kevin Chen-Chuan Chang, Biwei Jiang, Vadim Von Brzeski

Abstract:Recommendation plays a key role in e-commerce, enhancing user experience and boosting commercial success. Existing works mainly focus on recommending a set of items, but online e-commerce platforms have recently begun to pay attention to exploring users' potential interests at the category level. Category-level recommendation allows e-commerce platforms to promote users' engagements by expanding their interests to different types of items. In addition, it complements item-level recommendations when the latter becomes extremely challenging for users with little-known information and past interactions. Furthermore, it facilitates item-level recommendations in existing works. The predicted category, which is called intention in those works, aids the exploration of item-level preference. However, such category-level preference prediction has mostly been accomplished through applying item-level models. Some key differences between item-level recommendations and category-level recommendations are ignored in such a simplistic adaptation. In this paper, we propose a cascading category recommender (CCRec) model with a variational autoencoder (VAE) to encode item-level information to perform category-level recommendations. Experiments show the advantages of this model over methods designed for item-level recommendations.

Via

Access Paper or Ask Questions

Understanding Cross-Domain Adaptation in Low-Resource Topic Modeling

Jun 09, 2025

Pritom Saha Akash, Kevin Chen-Chuan Chang

Abstract:Topic modeling plays a vital role in uncovering hidden semantic structures within text corpora, but existing models struggle in low-resource settings where limited target-domain data leads to unstable and incoherent topic inference. We address this challenge by formally introducing domain adaptation for low-resource topic modeling, where a high-resource source domain informs a low-resource target domain without overwhelming it with irrelevant content. We establish a finite-sample generalization bound showing that effective knowledge transfer depends on robust performance in both domains, minimizing latent-space discrepancy, and preventing overfitting to the data. Guided by these insights, we propose DALTA (Domain-Aligned Latent Topic Adaptation), a new framework that employs a shared encoder for domain-invariant features, specialized decoders for domain-specific nuances, and adversarial alignment to selectively transfer relevant information. Experiments on diverse low-resource datasets demonstrate that DALTA consistently outperforms state-of-the-art methods in terms of topic coherence, stability, and transferability.

Via

Access Paper or Ask Questions

ERU-KG: Efficient Reference-aligned Unsupervised Keyphrase Generation

May 30, 2025

Lam Thanh Do, Aaditya Bodke, Pritom Saha Akash, Kevin Chen-Chuan Chang

Abstract:Unsupervised keyphrase prediction has gained growing interest in recent years. However, existing methods typically rely on heuristically defined importance scores, which may lead to inaccurate informativeness estimation. In addition, they lack consideration for time efficiency. To solve these problems, we propose ERU-KG, an unsupervised keyphrase generation (UKG) model that consists of an informativeness and a phraseness module. The former estimates the relevance of keyphrase candidates, while the latter generate those candidates. The informativeness module innovates by learning to model informativeness through references (e.g., queries, citation contexts, and titles) and at the term-level, thereby 1) capturing how the key concepts of documents are perceived in different contexts and 2) estimating informativeness of phrases more efficiently by aggregating term informativeness, removing the need for explicit modeling of the candidates. ERU-KG demonstrates its effectiveness on keyphrase generation benchmarks by outperforming unsupervised baselines and achieving on average 89\% of the performance of a supervised model for top 10 predictions. Additionally, to highlight its practical utility, we evaluate the model on text retrieval tasks and show that keyphrases generated by ERU-KG are effective when employed as query and document expansions. Furthermore, inference speed tests reveal that ERU-KG is the fastest among baselines of similar model sizes. Finally, our proposed model can switch between keyphrase generation and extraction by adjusting hyperparameters, catering to diverse application requirements.

* Accepted to ACL 2025

Via

Access Paper or Ask Questions

Writing Like the Best: Exemplar-Based Expository Text Generation

May 24, 2025

Yuxiang Liu, Kevin Chen-Chuan Chang

Figure 1 for Writing Like the Best: Exemplar-Based Expository Text Generation

Figure 2 for Writing Like the Best: Exemplar-Based Expository Text Generation

Figure 3 for Writing Like the Best: Exemplar-Based Expository Text Generation

Figure 4 for Writing Like the Best: Exemplar-Based Expository Text Generation

Abstract:We introduce the Exemplar-Based Expository Text Generation task, aiming to generate an expository text on a new topic using an exemplar on a similar topic. Current methods fall short due to their reliance on extensive exemplar data, difficulty in adapting topic-specific content, and issues with long-text coherence. To address these challenges, we propose the concept of Adaptive Imitation and present a novel Recurrent Plan-then-Adapt (RePA) framework. RePA leverages large language models (LLMs) for effective adaptive imitation through a fine-grained plan-then-adapt process. RePA also enables recurrent segment-by-segment imitation, supported by two memory structures that enhance input clarity and output coherence. We also develop task-specific evaluation metrics--imitativeness, adaptiveness, and adaptive-imitativeness--using LLMs as evaluators. Experimental results across our collected three diverse datasets demonstrate that RePA surpasses existing baselines in producing factual, consistent, and relevant texts for this task.

* Accepted to ACL 2025. Camera-ready version

Via

Access Paper or Ask Questions

RL-based Query Rewriting with Distilled LLM for online E-Commerce Systems

Jan 29, 2025

Duy A. Nguyen, Rishi Kesav Mohan, Van Yang, Pritom Saha Akash, Kevin Chen-Chuan Chang

Figure 1 for RL-based Query Rewriting with Distilled LLM for online E-Commerce Systems

Figure 2 for RL-based Query Rewriting with Distilled LLM for online E-Commerce Systems

Figure 3 for RL-based Query Rewriting with Distilled LLM for online E-Commerce Systems

Figure 4 for RL-based Query Rewriting with Distilled LLM for online E-Commerce Systems

Abstract:Query rewriting (QR) is a critical technique in e-commerce search, addressing the lexical gap between user queries and product descriptions to enhance search performance. Existing QR approaches typically fall into two categories: discriminative models and generative methods leveraging large language models (LLMs). Discriminative models often struggle with natural language understanding and offer limited flexibility in rewriting, while generative LLMs, despite producing high-quality rewrites, face high inference latency and cost in online settings. These limitations force offline deployment, making them vulnerable to issues like information staleness and semantic drift. To overcome these challenges, we propose a novel hybrid pipeline for QR that balances efficiency and effectiveness. Our approach combines offline knowledge distillation to create a lightweight but efficient student model with online reinforcement learning (RL) to refine query rewriting dynamically using real-time feedback. A key innovation is the use of LLMs as simulated human feedback, enabling scalable reward signals and cost-effective evaluation without manual annotations. Experimental results on Amazon ESCI dataset demonstrate significant improvements in query relevance, diversity, and adaptability, as well as positive feedback from the LLM simulation. This work contributes to advancing LLM capabilities for domain-specific applications, offering a robust solution for dynamic and complex e-commerce search environments.

Via

Access Paper or Ask Questions

Query Optimization for Parametric Knowledge Refinement in Retrieval-Augmented Large Language Models

Nov 13, 2024

Youan Cong, Cheng Wang, Pritom Saha Akash, Kevin Chen-Chuan Chang

Figure 1 for Query Optimization for Parametric Knowledge Refinement in Retrieval-Augmented Large Language Models

Figure 2 for Query Optimization for Parametric Knowledge Refinement in Retrieval-Augmented Large Language Models

Figure 3 for Query Optimization for Parametric Knowledge Refinement in Retrieval-Augmented Large Language Models

Figure 4 for Query Optimization for Parametric Knowledge Refinement in Retrieval-Augmented Large Language Models

Abstract:We introduce the Extract-Refine-Retrieve-Read (ERRR) framework, a novel approach designed to bridge the pre-retrieval information gap in Retrieval-Augmented Generation (RAG) systems through query optimization tailored to meet the specific knowledge requirements of Large Language Models (LLMs). Unlike conventional query optimization techniques used in RAG, the ERRR framework begins by extracting parametric knowledge from LLMs, followed by using a specialized query optimizer for refining these queries. This process ensures the retrieval of only the most pertinent information essential for generating accurate responses. Moreover, to enhance flexibility and reduce computational costs, we propose a trainable scheme for our pipeline that utilizes a smaller, tunable model as the query optimizer, which is refined through knowledge distillation from a larger teacher model. Our evaluations on various question-answering (QA) datasets and with different retrieval systems show that ERRR consistently outperforms existing baselines, proving to be a versatile and cost-effective module for improving the utility and accuracy of RAG systems.

Via

Access Paper or Ask Questions

ConTReGen: Context-driven Tree-structured Retrieval for Open-domain Long-form Text Generation

Oct 20, 2024

Kashob Kumar Roy, Pritom Saha Akash, Kevin Chen-Chuan Chang, Lucian Popa

Figure 1 for ConTReGen: Context-driven Tree-structured Retrieval for Open-domain Long-form Text Generation

Figure 2 for ConTReGen: Context-driven Tree-structured Retrieval for Open-domain Long-form Text Generation

Figure 3 for ConTReGen: Context-driven Tree-structured Retrieval for Open-domain Long-form Text Generation

Figure 4 for ConTReGen: Context-driven Tree-structured Retrieval for Open-domain Long-form Text Generation

Abstract:Open-domain long-form text generation requires generating coherent, comprehensive responses that address complex queries with both breadth and depth. This task is challenging due to the need to accurately capture diverse facets of input queries. Existing iterative retrieval-augmented generation (RAG) approaches often struggle to delve deeply into each facet of complex queries and integrate knowledge from various sources effectively. This paper introduces ConTReGen, a novel framework that employs a context-driven, tree-structured retrieval approach to enhance the depth and relevance of retrieved content. ConTReGen integrates a hierarchical, top-down in-depth exploration of query facets with a systematic bottom-up synthesis, ensuring comprehensive coverage and coherent integration of multifaceted information. Extensive experiments on multiple datasets, including LFQA and ODSUM, alongside a newly introduced dataset, ODSUM-WikiHow, demonstrate that ConTReGen outperforms existing state-of-the-art RAG models.

* Accepted at EMNLP'24 Findings

Via

Access Paper or Ask Questions

Enhancing Short-Text Topic Modeling with LLM-Driven Context Expansion and Prefix-Tuned VAEs

Oct 04, 2024

Pritom Saha Akash, Kevin Chen-Chuan Chang

Abstract:Topic modeling is a powerful technique for uncovering hidden themes within a collection of documents. However, the effectiveness of traditional topic models often relies on sufficient word co-occurrence, which is lacking in short texts. Therefore, existing approaches, whether probabilistic or neural, frequently struggle to extract meaningful patterns from such data, resulting in incoherent topics. To address this challenge, we propose a novel approach that leverages large language models (LLMs) to extend short texts into more detailed sequences before applying topic modeling. To further improve the efficiency and solve the problem of semantic inconsistency from LLM-generated texts, we propose to use prefix tuning to train a smaller language model coupled with a variational autoencoder for short-text topic modeling. Our method significantly improves short-text topic modeling performance, as demonstrated by extensive experiments on real-world datasets with extreme data sparsity, outperforming current state-of-the-art topic models.

* EMNLP Findings 2024. arXiv admin note: substantial text overlap with arXiv:2310.15420

Via

Access Paper or Ask Questions