Picture for Nicolas Flammarion

Nicolas Flammarion

LIENS, SIERRA

Does Refusal Training in LLMs Generalize to the Past Tense?

Add code
Jul 16, 2024
Figure 1 for Does Refusal Training in LLMs Generalize to the Past Tense?
Figure 2 for Does Refusal Training in LLMs Generalize to the Past Tense?
Figure 3 for Does Refusal Training in LLMs Generalize to the Past Tense?
Figure 4 for Does Refusal Training in LLMs Generalize to the Past Tense?
Viaarxiv icon

Implicit Bias of Mirror Flow on Separable Data

Add code
Jun 18, 2024
Figure 1 for Implicit Bias of Mirror Flow on Separable Data
Figure 2 for Implicit Bias of Mirror Flow on Separable Data
Figure 3 for Implicit Bias of Mirror Flow on Separable Data
Viaarxiv icon

Is In-Context Learning Sufficient for Instruction Following in LLMs?

Add code
May 30, 2024
Figure 1 for Is In-Context Learning Sufficient for Instruction Following in LLMs?
Figure 2 for Is In-Context Learning Sufficient for Instruction Following in LLMs?
Figure 3 for Is In-Context Learning Sufficient for Instruction Following in LLMs?
Figure 4 for Is In-Context Learning Sufficient for Instruction Following in LLMs?
Viaarxiv icon

Competition Report: Finding Universal Jailbreak Backdoors in Aligned LLMs

Add code
Apr 22, 2024
Figure 1 for Competition Report: Finding Universal Jailbreak Backdoors in Aligned LLMs
Figure 2 for Competition Report: Finding Universal Jailbreak Backdoors in Aligned LLMs
Figure 3 for Competition Report: Finding Universal Jailbreak Backdoors in Aligned LLMs
Figure 4 for Competition Report: Finding Universal Jailbreak Backdoors in Aligned LLMs
Viaarxiv icon

Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks

Add code
Apr 02, 2024
Viaarxiv icon

JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models

Add code
Mar 28, 2024
Figure 1 for JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
Figure 2 for JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
Figure 3 for JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
Figure 4 for JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
Viaarxiv icon

Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks

Add code
Mar 08, 2024
Viaarxiv icon

Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning

Add code
Feb 07, 2024
Figure 1 for Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning
Figure 2 for Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning
Figure 3 for Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning
Figure 4 for Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning
Viaarxiv icon

Early alignment in two-layer networks training is a two-edged sword

Add code
Jan 19, 2024
Viaarxiv icon

Why Do We Need Weight Decay in Modern Deep Learning?

Add code
Oct 06, 2023
Viaarxiv icon