Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Alexandre Misrahi

DiffLoRA: Differential Low-Rank Adapters for Large Language Models

Jul 31, 2025

Alexandre Misrahi, Nadezhda Chirkova, Maxime Louis, Vassilina Nikoulina

Abstract:Differential Transformer has recently been proposed to improve performance in Transformer models by canceling out noise through a denoiser attention mechanism. In this work, we introduce DiffLoRA, a parameter-efficient adaptation of the differential attention mechanism, with low-rank adapters on both positive and negative attention terms. This approach retains the efficiency of LoRA while aiming to benefit from the performance gains of differential attention. We evaluate DiffLoRA across a broad range of NLP tasks, including general benchmarks, many-shot in-context learning, RAG, and long-context tests. We observe that, although DiffLoRA falls short of other parameter-efficient fine-tuning methods in most evaluation tasks, it shows interesting results in certain domains (+11 pts on LoRA for HumanEval). We analyze the attention patterns post-finetuning to identify the reasons for this behavior.

Via

Access Paper or Ask Questions

Adapting Large Language Models for Multi-Domain Retrieval-Augmented-Generation

Apr 03, 2025

Alexandre Misrahi, Nadezhda Chirkova, Maxime Louis, Vassilina Nikoulina

Figure 1 for Adapting Large Language Models for Multi-Domain Retrieval-Augmented-Generation

Figure 2 for Adapting Large Language Models for Multi-Domain Retrieval-Augmented-Generation

Figure 3 for Adapting Large Language Models for Multi-Domain Retrieval-Augmented-Generation

Figure 4 for Adapting Large Language Models for Multi-Domain Retrieval-Augmented-Generation

Abstract:Retrieval-Augmented Generation (RAG) enhances LLM factuality, but multi-domain applications face challenges like lack of diverse benchmarks and poor out-of-domain generalization. The first contribution of this work is to introduce a diverse benchmark comprising a variety of question-answering tasks from 8 sources and covering 13 domains. Our second contribution consists in systematically testing out-of-domain generalization for typical RAG tuning strategies. While our findings reveal that standard fine-tuning fails to generalize effectively, we show that sequence-level distillation with teacher-generated labels improves out-of-domain performance by providing more coherent supervision. Our findings highlight key strategies for improving multi-domain RAG robustness.

* 25 pages, 8 figures, 21 tables

Via

Access Paper or Ask Questions