Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Miguel Couceiro

INESC-ID

EntmaxKV: Support-Aware Decoding for Entmax Attention

May 20, 2026

Gonçalo Duarte, Miguel Couceiro, Marcos V. Treviso

Abstract:Long-context decoding is increasingly limited by KV-cache memory traffic since each generated token attends over a cache whose size grows linearly with context length. Existing sparse decoding methods reduce this cost by selecting subsets of tokens or pages, but are designed for softmax attention, whose dense tails make any truncation discard nonzero probability mass. In contrast, $α$-entmax produces exact zeros, turning sparse decoding from dense-tail approximation into support recovery: if the selected candidates contain the entmax support, sparse decoding remains exact. While recent entmax kernels enable efficient training, they do not address the autoregressive decoding bottleneck, where dense inference still streams the full KV cache before sparsity is known. In this work, we introduce EntmaxKV, an entmax-native sparse decoding framework that exploits sparsity before KV pages are loaded. EntmaxKV combines query-aware page scoring, support-aware candidate selection, and sparse entmax attention. We analyze truncation error through the dropped probability mass $δ$, showing that output error is controlled by $δ$ and vanishes when the entmax support is recovered. We further introduce a Gaussian-aware entmax selector that estimates the entmax threshold from lightweight page statistics, adapting the selected budget to the score distribution. Empirically, EntmaxKV drops less probability mass, retains more support tokens, and achieves lower output error than softmax-based sparse decoding at matched KV budgets. On long-context and language modeling benchmarks, it closely matches full-cache entmax while using a small fraction of the KV cache, achieving up to $3.36\times$ (softmax) and $5.43\times$ (entmax) speedup over full attention baselines at 1M context length. Code available at: https://github.com/deep-spin/entmaxkv.

Via

Access Paper or Ask Questions

Which Are the Low-Resource Languages of the Semantic Web?

May 07, 2026

Ndeye-Emilie Mbengue, Pierre Monnin, Miguel Couceiro, Fabien Gandon

Abstract:Emerging digital technologies are exacerbating the existing divide in Open Access Data (OAD) between high-and low-resource languages, excluding many communities from the global digital transformation. Multilingual Linked Open Data Knowledge Graphs (LOD KGs) could contribute to mitigating this divide through cross-lingual transfer; however, no clear quantitative definition of low-resource languages has yet been established in the context of LOD KGs. In this poster, we present a methodology to analyze the distribution of languages across LOD KGs and propose a preliminary multi-level categorization based on DBpedia, BabelNet, and Wikidata. This categorization is leveraged to bring a formal definition of low-, high-, and medium-resource languages that could be later leveraged to select cross-lingual transfer candidates.

* ESWC 2026 - 23rd European Semantic Web Conference, May 2026, Dubrovnik, Croatia

Via

Access Paper or Ask Questions

FrameNet Semantic Role Classification by Analogy

Mar 20, 2026

Van-Duy Ngo, Stergos Afantenos, Emiliano Lorini, Miguel Couceiro

Abstract:In this paper, we adopt a relational view of analogies applied to Semantic Role Classification in FrameNet. We define analogies as formal relations over the Cartesian product of frame evoking lexical units (LUs) and frame element (FEs) pairs, which we use to construct a new dataset. Each element of this binary relation is labelled as a valid analogical instance if the frame elements share the same semantic role, or as invalid otherwise. This formulation allows us to transform Semantic Role Classification into binary classification and train a lightweight Artificial Neural Network (ANN) that exhibits rapid convergence with minimal parameters. Unconventionally, no Semantic Role information is introduced to the neural network during training. We recover semantic roles during inference by computing probability distributions over candidates of all semantic roles within a given frame through random sampling and analogical transfer. This approach allows us to surpass previous state-of-the-art results while maintaining computational efficiency and frugality.

* Paper to be presented at LREC 2026

Via

Access Paper or Ask Questions

Generalizing Analogical Inference from Boolean to Continuous Domains

Nov 13, 2025

Francisco Cunha, Yves Lepage, Zied Bouraoui, Miguel Couceiro

Abstract:Analogical reasoning is a powerful inductive mechanism, widely used in human cognition and increasingly applied in artificial intelligence. Formal frameworks for analogical inference have been developed for Boolean domains, where inference is provably sound for affine functions and approximately correct for functions close to affine. These results have informed the design of analogy-based classifiers. However, they do not extend to regression tasks or continuous domains. In this paper, we revisit analogical inference from a foundational perspective. We first present a counterexample showing that existing generalization bounds fail even in the Boolean setting. We then introduce a unified framework for analogical reasoning in real-valued domains based on parameterized analogies defined via generalized means. This model subsumes both Boolean classification and regression, and supports analogical inference over continuous functions. We characterize the class of analogy-preserving functions in this setting and derive both worst-case and average-case error bounds under smoothness assumptions. Our results offer a general theory of analogical inference across discrete and continuous domains.

* 11 pages, to appear in AAAI 2026, extended version

Via

Access Paper or Ask Questions

Comparing representations of long clinical texts for the task of patient note-identification

Mar 31, 2025

Safa Alsaidi, Marc Vincent, Olivia Boyer, Nicolas Garcelon, Miguel Couceiro, Adrien Coulet

Figure 1 for Comparing representations of long clinical texts for the task of patient note-identification

Figure 2 for Comparing representations of long clinical texts for the task of patient note-identification

Figure 3 for Comparing representations of long clinical texts for the task of patient note-identification

Figure 4 for Comparing representations of long clinical texts for the task of patient note-identification

Abstract:In this paper, we address the challenge of patient-note identification, which involves accurately matching an anonymized clinical note to its corresponding patient, represented by a set of related notes. This task has broad applications, including duplicate records detection and patient similarity analysis, which require robust patient-level representations. We explore various embedding methods, including Hierarchical Attention Networks (HAN), three-level Hierarchical Transformer Networks (HTN), LongFormer, and advanced BERT-based models, focusing on their ability to process mediumto-long clinical texts effectively. Additionally, we evaluate different pooling strategies (mean, max, and mean_max) for aggregating wordlevel embeddings into patient-level representations and we examine the impact of sliding windows on model performance. Our results indicate that BERT-based embeddings outperform traditional and hierarchical models, particularly in processing lengthy clinical notes and capturing nuanced patient representations. Among the pooling strategies, mean_max pooling consistently yields the best results, highlighting its ability to capture critical features from clinical notes. Furthermore, the reproduction of our results on both MIMIC dataset and Necker hospital data warehouse illustrates the generalizability of these approaches to real-world applications, emphasizing the importance of both embedding methods and aggregation strategies in optimizing patient-note identification and enhancing patient-level modeling.

Via

Access Paper or Ask Questions

Unveiling Biases while Embracing Sustainability: Assessing the Dual Challenges of Automatic Speech Recognition Systems

Mar 02, 2025

Ajinkya Kulkarni, Atharva Kulkarni, Miguel Couceiro, Isabel Trancoso

Abstract:In this paper, we present a bias and sustainability focused investigation of Automatic Speech Recognition (ASR) systems, namely Whisper and Massively Multilingual Speech (MMS), which have achieved state-of-the-art (SOTA) performances. Despite their improved performance in controlled settings, there remains a critical gap in understanding their efficacy and equity in real-world scenarios. We analyze ASR biases w.r.t. gender, accent, and age group, as well as their effect on downstream tasks. In addition, we examine the environmental impact of ASR systems, scrutinizing the use of large acoustic models on carbon emission and energy consumption. We also provide insights into our empirical analyses, offering a valuable contribution to the claims surrounding bias and sustainability in ASR systems.

* Interspeech 2024

Via

Access Paper or Ask Questions

Do Large Language Models with Reasoning and Acting Meet the Needs of Task-Oriented Dialogue?

Dec 02, 2024

Michelle Elizabeth, Morgan Veyret, Miguel Couceiro, Ondrej Dusek, Lina M. Rojas-Barahona

Figure 1 for Do Large Language Models with Reasoning and Acting Meet the Needs of Task-Oriented Dialogue?

Figure 2 for Do Large Language Models with Reasoning and Acting Meet the Needs of Task-Oriented Dialogue?

Figure 3 for Do Large Language Models with Reasoning and Acting Meet the Needs of Task-Oriented Dialogue?

Figure 4 for Do Large Language Models with Reasoning and Acting Meet the Needs of Task-Oriented Dialogue?

Abstract:Large language models (LLMs) gained immense popularity due to their impressive capabilities in unstructured conversations. However, they underperform compared to previous approaches in task-oriented dialogue (TOD), wherein reasoning and accessing external information are crucial. Empowering LLMs with advanced prompting strategies such as reasoning and acting (ReAct) has shown promise in solving complex tasks traditionally requiring reinforcement learning. In this work, we apply the ReAct strategy to guide LLMs performing TOD. We evaluate ReAct-based LLMs (ReAct-LLMs) both in simulation and with real users. While ReAct-LLMs seem to underperform state-of-the-art approaches in simulation, human evaluation indicates higher user satisfaction rate compared to handcrafted systems despite having a lower success rate.

Via

Access Paper or Ask Questions

KGPrune: a Web Application to Extract Subgraphs of Interest from Wikidata with Analogical Pruning

Aug 26, 2024

Pierre Monnin, Cherif-Hassan Nousradine, Lucas Jarnac, Laurel Zuckerman, Miguel Couceiro

Abstract:Knowledge graphs (KGs) have become ubiquitous publicly available knowledge sources, and are nowadays covering an ever increasing array of domains. However, not all knowledge represented is useful or pertaining when considering a new application or specific task. Also, due to their increasing size, handling large KGs in their entirety entails scalability issues. These two aspects asks for efficient methods to extract subgraphs of interest from existing KGs. To this aim, we introduce KGPrune, a Web Application that, given seed entities of interest and properties to traverse, extracts their neighboring subgraphs from Wikidata. To avoid topical drift, KGPrune relies on a frugal pruning algorithm based on analogical reasoning to only keep relevant neighbors while pruning irrelevant ones. The interest of KGPrune is illustrated by two concrete applications, namely, bootstrapping an enterprise KG and extracting knowledge related to looted artworks.

* Accepted as a demo paper at ECAI 2024

Via

Access Paper or Ask Questions

REFINE-LM: Mitigating Language Model Stereotypes via Reinforcement Learning

Aug 18, 2024

Rameez Qureshi, Naïm Es-Sebbani, Luis Galárraga, Yvette Graham, Miguel Couceiro, Zied Bouraoui

Figure 1 for REFINE-LM: Mitigating Language Model Stereotypes via Reinforcement Learning

Figure 2 for REFINE-LM: Mitigating Language Model Stereotypes via Reinforcement Learning

Figure 3 for REFINE-LM: Mitigating Language Model Stereotypes via Reinforcement Learning

Figure 4 for REFINE-LM: Mitigating Language Model Stereotypes via Reinforcement Learning

Abstract:With the introduction of (large) language models, there has been significant concern about the unintended bias such models may inherit from their training data. A number of studies have shown that such models propagate gender stereotypes, as well as geographical and racial bias, among other biases. While existing works tackle this issue by preprocessing data and debiasing embeddings, the proposed methods require a lot of computational resources and annotation effort while being limited to certain types of biases. To address these issues, we introduce REFINE-LM, a debiasing method that uses reinforcement learning to handle different types of biases without any fine-tuning. By training a simple model on top of the word probability distribution of a LM, our bias agnostic reinforcement learning method enables model debiasing without human annotations or significant computational resources. Experiments conducted on a wide range of models, including several LMs, show that our method (i) significantly reduces stereotypical biases while preserving LMs performance; (ii) is applicable to different types of biases, generalizing across contexts such as gender, ethnicity, religion, and nationality-based biases; and (iii) it is not expensive to train.

Via

Access Paper or Ask Questions

Any four real numbers are on all fours with analogy

Jul 26, 2024

Yves Lepage, Miguel Couceiro

Figure 1 for Any four real numbers are on all fours with analogy

Figure 2 for Any four real numbers are on all fours with analogy

Figure 3 for Any four real numbers are on all fours with analogy

Figure 4 for Any four real numbers are on all fours with analogy

Abstract:This work presents a formalization of analogy on numbers that relies on generalized means. It is motivated by recent advances in artificial intelligence and applications of machine learning, where the notion of analogy is used to infer results, create data and even as an assessment tool of object representations, or embeddings, that are basically collections of numbers (vectors, matrices, tensors). This extended analogy use asks for mathematical foundations and clear understanding of the notion of analogy between numbers. We propose a unifying view of analogies that relies on generalized means defined in terms of a power parameter. In particular, we show that any four increasing positive real numbers is an analogy in a unique suitable power. In addition, we show that any such analogy can be reduced to an equivalent arithmetic analogy and that any analogical equation has a solution for increasing numbers, which generalizes without restriction to complex numbers. These foundational results provide a better understanding of analogies in areas where representations are numerical.

Via

Access Paper or Ask Questions