Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yannick Teglia

Backdoor Unlearning Generalization: A Path Toward the Removal of Unknown Triggers in LLMs

Jun 04, 2026

Lisa Bouger, Théo Lasnier, Philippe Loubet Moundi, Yannick Teglia, Djamé Seddah

Abstract:Backdoor attacks in Large Language Models (LLMs) are a growing security concern, where models can generate adversary-chosen content. Existing defenses target backdoors one at a time and typically require knowledge of the trigger, leaving the defender at a structural disadvantage when unknown backdoors may exist in a model. We show that backdoor neutralization through unlearning generalizes across backdoors: training a model to ignore a single trigger can also suppress other backdoors that were never explicitly targeted. We study this phenomenon across three model families, whose backdoors were injected via pretraining or continual pretraining, by analyzing the models obtained after removing one backdoor at a time. To understand why unlearning certain backdoors induces the suppression of others, we introduce the Cross Activation Shift Distance, to quantify the distance between model changes induced by different trainings. Our results open a new direction for LLM safety as defenders could deliberately inject controlled backdoors and then remove them, leveraging cross-backdoor transfer to also suppress unknown backdoors that an attacker may have previously introduced in the model.

* 22 pages, 28 figures

Via

Access Paper or Ask Questions

Backdoor Attacks on Deep Learning Face Detection

Aug 01, 2025

Quentin Le Roux, Yannick Teglia, Teddy Furon, Philippe Loubet-Moundi

Figure 1 for Backdoor Attacks on Deep Learning Face Detection

Figure 2 for Backdoor Attacks on Deep Learning Face Detection

Figure 3 for Backdoor Attacks on Deep Learning Face Detection

Figure 4 for Backdoor Attacks on Deep Learning Face Detection

Abstract:Face Recognition Systems that operate in unconstrained environments capture images under varying conditions,such as inconsistent lighting, or diverse face poses. These challenges require including a Face Detection module that regresses bounding boxes and landmark coordinates for proper Face Alignment. This paper shows the effectiveness of Object Generation Attacks on Face Detection, dubbed Face Generation Attacks, and demonstrates for the first time a Landmark Shift Attack that backdoors the coordinate regression task performed by face detectors. We then offer mitigations against these vulnerabilities.

Via

Access Paper or Ask Questions

Survivability of Backdoor Attacks on Unconstrained Face Recognition Systems

Jul 02, 2025

Quentin Le Roux, Yannick Teglia, Teddy Furon, Philippe Loubet-Moundi, Eric Bourbao

Abstract:The widespread use of deep learning face recognition raises several security concerns. Although prior works point at existing vulnerabilities, DNN backdoor attacks against real-life, unconstrained systems dealing with images captured in the wild remain a blind spot of the literature. This paper conducts the first system-level study of backdoors in deep learning-based face recognition systems. This paper yields four contributions by exploring the feasibility of DNN backdoors on these pipelines in a holistic fashion. We demonstrate for the first time two backdoor attacks on the face detection task: face generation and face landmark shift attacks. We then show that face feature extractors trained with large margin losses also fall victim to backdoor attacks. Combining our models, we then show using 20 possible pipeline configurations and 15 attack cases that a single backdoor enables an attacker to bypass the entire function of a system. Finally, we provide stakeholders with several best practices and countermeasures.

Via

Access Paper or Ask Questions

Backdoored Retrievers for Prompt Injection Attacks on Retrieval Augmented Generation of Large Language Models

Oct 18, 2024

Cody Clop, Yannick Teglia

Figure 1 for Backdoored Retrievers for Prompt Injection Attacks on Retrieval Augmented Generation of Large Language Models

Figure 2 for Backdoored Retrievers for Prompt Injection Attacks on Retrieval Augmented Generation of Large Language Models

Figure 3 for Backdoored Retrievers for Prompt Injection Attacks on Retrieval Augmented Generation of Large Language Models

Figure 4 for Backdoored Retrievers for Prompt Injection Attacks on Retrieval Augmented Generation of Large Language Models

Abstract:Large Language Models (LLMs) have demonstrated remarkable capabilities in generating coherent text but remain limited by the static nature of their training data. Retrieval Augmented Generation (RAG) addresses this issue by combining LLMs with up-to-date information retrieval, but also expand the attack surface of the system. This paper investigates prompt injection attacks on RAG, focusing on malicious objectives beyond misinformation, such as inserting harmful links, promoting unauthorized services, and initiating denial-of-service behaviors. We build upon existing corpus poisoning techniques and propose a novel backdoor attack aimed at the fine-tuning process of the dense retriever component. Our experiments reveal that corpus poisoning can achieve significant attack success rates through the injection of a small number of compromised documents into the retriever corpus. In contrast, backdoor attacks demonstrate even higher success rates but necessitate a more complex setup, as the victim must fine-tune the retriever using the attacker poisoned dataset.

* 12 pages, 5 figures

Via

Access Paper or Ask Questions