Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Paolo Papotti

EURECOM

Parallel Context-of-Experts Decoding for Retrieval Augmented Generation

Jan 13, 2026

Giulio Corallo, Paolo Papotti

Abstract:Retrieval Augmented Generation faces a trade-off: concatenating documents in a long prompt enables multi-document reasoning but creates prefill bottlenecks, while encoding document KV caches separately offers speed but breaks cross-document interaction. We propose Parallel Context-of-Experts Decoding (Pced), a training-free framework that shifts evidence aggregation from the attention mechanism to the decoding. Pced treats retrieved documents as isolated "experts", synchronizing their predictions via a novel retrieval-aware contrastive decoding rule that weighs expert logits against the model prior. This approach recovers cross-document reasoning capabilities without constructing a shared attention across documents.

Via

Access Paper or Ask Questions

Combating Misinformation in the Arab World: Challenges & Opportunities

Jun 05, 2025

Azza Abouzied, Firoj Alam, Raian Ali, Paolo Papotti

Abstract:Misinformation and disinformation pose significant risks globally, with the Arab region facing unique vulnerabilities due to geopolitical instabilities, linguistic diversity, and cultural nuances. We explore these challenges through the key facets of combating misinformation: detection, tracking, mitigation and community-engagement. We shed light on how connecting with grass-roots fact-checking organizations, understanding cultural norms, promoting social correction, and creating strong collaborative information networks can create opportunities for a more resilient information ecosystem in the Arab world.

* disinformation, misinformation, factuality, harmfulness, fake news

Via

Access Paper or Ask Questions

RelationalFactQA: A Benchmark for Evaluating Tabular Fact Retrieval from Large Language Models

May 27, 2025

Dario Satriani, Enzo Veltri, Donatello Santoro, Paolo Papotti

Abstract:Factuality in Large Language Models (LLMs) is a persistent challenge. Current benchmarks often assess short factual answers, overlooking the critical ability to generate structured, multi-record tabular outputs from parametric knowledge. We demonstrate that this relational fact retrieval is substantially more difficult than isolated point-wise queries, even when individual facts are known to the model, exposing distinct failure modes sensitive to output dimensionality (e.g., number of attributes or records). To systematically evaluate this under-explored capability, we introduce RelationalFactQA, a new benchmark featuring diverse natural language questions (paired with SQL) and gold-standard tabular answers, specifically designed to assess knowledge retrieval in a structured format. RelationalFactQA enables analysis across varying query complexities, output sizes, and data characteristics. Our experiments reveal that even state-of-the-art LLMs struggle significantly, not exceeding 25% factual accuracy in generating relational outputs, with performance notably degrading as output dimensionality increases. These findings underscore critical limitations in current LLMs' ability to synthesize structured factual knowledge and establish RelationalFactQA as a crucial resource for measuring future progress in LLM factuality.

Via

Access Paper or Ask Questions

Community Moderation and the New Epistemology of Fact Checking on Social Media

May 26, 2025

Isabelle Augenstein, Michiel Bakker, Tanmoy Chakraborty, David Corney, Emilio Ferrara, Iryna Gurevych, Scott Hale, Eduard Hovy, Heng Ji, Irene Larraz(+6 more)

Abstract:Social media platforms have traditionally relied on internal moderation teams and partnerships with independent fact-checking organizations to identify and flag misleading content. Recently, however, platforms including X (formerly Twitter) and Meta have shifted towards community-driven content moderation by launching their own versions of crowd-sourced fact-checking -- Community Notes. If effectively scaled and governed, such crowd-checking initiatives have the potential to combat misinformation with increased scale and speed as successfully as community-driven efforts once did with spam. Nevertheless, general content moderation, especially for misinformation, is inherently more complex. Public perceptions of truth are often shaped by personal biases, political leanings, and cultural contexts, complicating consensus on what constitutes misleading content. This suggests that community efforts, while valuable, cannot replace the indispensable role of professional fact-checkers. Here we systemically examine the current approaches to misinformation detection across major platforms, explore the emerging role of community-driven moderation, and critically evaluate both the promises and challenges of crowd-checking at scale.

* 1 Figure, 2 tables

Via

Access Paper or Ask Questions

Think2SQL: Reinforce LLM Reasoning Capabilities for Text2SQL

Apr 21, 2025

Simone Papicchio, Simone Rossi, Luca Cagliero, Paolo Papotti

Figure 1 for Think2SQL: Reinforce LLM Reasoning Capabilities for Text2SQL

Figure 2 for Think2SQL: Reinforce LLM Reasoning Capabilities for Text2SQL

Figure 3 for Think2SQL: Reinforce LLM Reasoning Capabilities for Text2SQL

Figure 4 for Think2SQL: Reinforce LLM Reasoning Capabilities for Text2SQL

Abstract:Large Language Models (LLMs) have shown impressive capabilities in transforming natural language questions about relational databases into SQL queries. Despite recent improvements, small LLMs struggle to handle questions involving multiple tables and complex SQL patterns under a Zero-Shot Learning (ZSL) setting. Supervised Fine-Tuning (SFT) partially compensate the knowledge deficits in pretrained models but falls short while dealing with queries involving multi-hop reasoning. To bridge this gap, different LLM training strategies to reinforce reasoning capabilities have been proposed, ranging from leveraging a thinking process within ZSL, including reasoning traces in SFT, or adopt Reinforcement Learning (RL) strategies. However, the influence of reasoning on Text2SQL performance is still largely unexplored. This paper investigates to what extent LLM reasoning capabilities influence their Text2SQL performance on four benchmark datasets. To this end, it considers the following LLM settings: (1) ZSL, including general-purpose reasoning or not; (2) SFT, with and without task-specific reasoning traces; (3) RL, leveraging execution accuracy as primary reward function; (4) SFT+RL, i.e, a two-stage approach that combines SFT and RL. The results show that general-purpose reasoning under ZSL proves to be ineffective in tackling complex Text2SQL cases. Small LLMs benefit from SFT with reasoning much more than larger ones, bridging the gap of their (weaker) model pretraining. RL is generally beneficial across all tested models and datasets, particularly when SQL queries involve multi-hop reasoning and multiple tables. Small LLMs with SFT+RL excel on most complex datasets thanks to a strategic balance between generality of the reasoning process and optimization of the execution accuracy. Thanks to RL, the7B Qwen-Coder-2.5 model performs on par with 100+ Billion ones on the Bird dataset.

* 15 pages

Via

Access Paper or Ask Questions

Beyond RAG: Task-Aware KV Cache Compression for Comprehensive Knowledge Reasoning

Mar 06, 2025

Giulio Corallo, Orion Weller, Fabio Petroni, Paolo Papotti

Abstract:Incorporating external knowledge in large language models (LLMs) enhances their utility across diverse applications, but existing methods have trade-offs. Retrieval-Augmented Generation (RAG) fetches evidence via similarity search, but key information may fall outside top ranked results. Long-context models can process multiple documents but are computationally expensive and limited by context window size. Inspired by students condensing study material for open-book exams, we propose task-aware key-value (KV) cache compression, which compresses external knowledge in a zero- or few-shot setup. This enables LLMs to reason efficiently over a compacted representation of all relevant information. Experiments show our approach outperforms both RAG and task-agnostic compression methods. On LongBench v2, it improves accuracy by up to 7 absolute points over RAG with a 30x compression rate, while reducing inference latency from 0.43s to 0.16s. A synthetic dataset highlights that RAG performs well when sparse evidence suffices, whereas task-aware compression is superior for broad knowledge tasks.

Via

Access Paper or Ask Questions

Latent Abstractions in Generative Diffusion Models

Oct 04, 2024

Giulio Franzese, Mattia Martini, Giulio Corallo, Paolo Papotti, Pietro Michiardi

Figure 1 for Latent Abstractions in Generative Diffusion Models

Figure 2 for Latent Abstractions in Generative Diffusion Models

Figure 3 for Latent Abstractions in Generative Diffusion Models

Figure 4 for Latent Abstractions in Generative Diffusion Models

Abstract:In this work we study how diffusion-based generative models produce high-dimensional data, such as an image, by implicitly relying on a manifestation of a low-dimensional set of latent abstractions, that guide the generative process. We present a novel theoretical framework that extends NLF, and that offers a unique perspective on SDE-based generative models. The development of our theory relies on a novel formulation of the joint (state and measurement) dynamics, and an information-theoretic measure of the influence of the system state on the measurement process. According to our theory, diffusion models can be cast as a system of SDE, describing a non-linear filter in which the evolution of unobservable latent abstractions steers the dynamics of an observable measurement process (corresponding to the generative pathways). In addition, we present an empirical study to validate our theory and previous empirical results on the emergence of latent abstractions at different stages of the generative process.

Via

Access Paper or Ask Questions

Finch: Prompt-guided Key-Value Cache Compression

Jul 31, 2024

Giulio Corallo, Paolo Papotti

Abstract:Recent large language model applications, such as Retrieval-Augmented Generation and chatbots, have led to an increased need to process longer input contexts. However, this requirement is hampered by inherent limitations. Architecturally, models are constrained by a context window defined during training. Additionally, processing extensive texts requires substantial GPU memory. We propose a novel approach, Finch, to compress the input context by leveraging the pre-trained model weights of the self-attention. Given a prompt and a long text, Finch iteratively identifies the most relevant Key (K) and Value (V) pairs over chunks of the text conditioned on the prompt. Only such pairs are stored in the KV cache, which, within the space constrained by the context window, ultimately contains a compressed version of the long text. Our proposal enables models to consume large inputs even with high compression (up to 93x) while preserving semantic integrity without the need for fine-tuning.

Via

Access Paper or Ask Questions

Retrieve, Merge, Predict: Augmenting Tables with Data Lakes

Feb 13, 2024

Riccardo Cappuzzo, Gael Varoquaux, Aimee Coelho, Paolo Papotti

Figure 1 for Retrieve, Merge, Predict: Augmenting Tables with Data Lakes

Figure 2 for Retrieve, Merge, Predict: Augmenting Tables with Data Lakes

Figure 3 for Retrieve, Merge, Predict: Augmenting Tables with Data Lakes

Figure 4 for Retrieve, Merge, Predict: Augmenting Tables with Data Lakes

Abstract:We present an in-depth analysis of data discovery in data lakes, focusing on table augmentation for given machine learning tasks. We analyze alternative methods used in the three main steps: retrieving joinable tables, merging information, and predicting with the resultant table. As data lakes, the paper uses YADL (Yet Another Data Lake) -- a novel dataset we developed as a tool for benchmarking this data discovery task -- and Open Data US, a well-referenced real data lake. Through systematic exploration on both lakes, our study outlines the importance of accurately retrieving join candidates and the efficiency of simple merging methods. We report new insights on the benefits of existing solutions and on their limitations, aiming at guiding future research in this space.

* 12 pages + references, 11 figures. Under submission at VLDB2024 (EA&B track)

Via

Access Paper or Ask Questions

Variable Selection in Maximum Mean Discrepancy for Interpretable Distribution Comparison

Nov 02, 2023

Kensuke Mitsuzawa, Motonobu Kanagawa, Stefano Bortoli, Margherita Grossi, Paolo Papotti

Figure 1 for Variable Selection in Maximum Mean Discrepancy for Interpretable Distribution Comparison

Figure 2 for Variable Selection in Maximum Mean Discrepancy for Interpretable Distribution Comparison

Figure 3 for Variable Selection in Maximum Mean Discrepancy for Interpretable Distribution Comparison

Figure 4 for Variable Selection in Maximum Mean Discrepancy for Interpretable Distribution Comparison

Abstract:Two-sample testing decides whether two datasets are generated from the same distribution. This paper studies variable selection for two-sample testing, the task being to identify the variables (or dimensions) responsible for the discrepancies between the two distributions. This task is relevant to many problems of pattern analysis and machine learning, such as dataset shift adaptation, causal inference and model validation. Our approach is based on a two-sample test based on the Maximum Mean Discrepancy (MMD). We optimise the Automatic Relevance Detection (ARD) weights defined for individual variables to maximise the power of the MMD-based test. For this optimisation, we introduce sparse regularisation and propose two methods for dealing with the issue of selecting an appropriate regularisation parameter. One method determines the regularisation parameter in a data-driven way, and the other aggregates the results of different regularisation parameters. We confirm the validity of the proposed methods by systematic comparisons with baseline methods, and demonstrate their usefulness in exploratory analysis of high-dimensional traffic simulation data. Preliminary theoretical analyses are also provided, including a rigorous definition of variable selection for two-sample testing.

Via

Access Paper or Ask Questions