Picture for Maksym Andriushchenko

Maksym Andriushchenko

Saarland University

Improving Alignment and Robustness with Short Circuiting

Add code
Jun 06, 2024
Figure 1 for Improving Alignment and Robustness with Short Circuiting
Figure 2 for Improving Alignment and Robustness with Short Circuiting
Figure 3 for Improving Alignment and Robustness with Short Circuiting
Figure 4 for Improving Alignment and Robustness with Short Circuiting
Viaarxiv icon

Is In-Context Learning Sufficient for Instruction Following in LLMs?

Add code
May 30, 2024
Figure 1 for Is In-Context Learning Sufficient for Instruction Following in LLMs?
Figure 2 for Is In-Context Learning Sufficient for Instruction Following in LLMs?
Figure 3 for Is In-Context Learning Sufficient for Instruction Following in LLMs?
Figure 4 for Is In-Context Learning Sufficient for Instruction Following in LLMs?
Viaarxiv icon

Competition Report: Finding Universal Jailbreak Backdoors in Aligned LLMs

Add code
Apr 22, 2024
Figure 1 for Competition Report: Finding Universal Jailbreak Backdoors in Aligned LLMs
Figure 2 for Competition Report: Finding Universal Jailbreak Backdoors in Aligned LLMs
Figure 3 for Competition Report: Finding Universal Jailbreak Backdoors in Aligned LLMs
Figure 4 for Competition Report: Finding Universal Jailbreak Backdoors in Aligned LLMs
Viaarxiv icon

Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks

Add code
Apr 02, 2024
Viaarxiv icon

JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models

Add code
Mar 28, 2024
Figure 1 for JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
Figure 2 for JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
Figure 3 for JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
Figure 4 for JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
Viaarxiv icon

Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning

Add code
Feb 07, 2024
Figure 1 for Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning
Figure 2 for Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning
Figure 3 for Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning
Figure 4 for Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning
Viaarxiv icon

Scaling Compute Is Not All You Need for Adversarial Robustness

Add code
Dec 20, 2023
Viaarxiv icon

The Effects of Overparameterization on Sharpness-aware Minimization: An Empirical and Theoretical Analysis

Add code
Nov 29, 2023
Figure 1 for The Effects of Overparameterization on Sharpness-aware Minimization: An Empirical and Theoretical Analysis
Figure 2 for The Effects of Overparameterization on Sharpness-aware Minimization: An Empirical and Theoretical Analysis
Figure 3 for The Effects of Overparameterization on Sharpness-aware Minimization: An Empirical and Theoretical Analysis
Figure 4 for The Effects of Overparameterization on Sharpness-aware Minimization: An Empirical and Theoretical Analysis
Viaarxiv icon

Why Do We Need Weight Decay in Modern Deep Learning?

Add code
Oct 06, 2023
Viaarxiv icon

Transferable Adversarial Robustness for Categorical Data via Universal Robust Embeddings

Add code
Jun 06, 2023
Figure 1 for Transferable Adversarial Robustness for Categorical Data via Universal Robust Embeddings
Figure 2 for Transferable Adversarial Robustness for Categorical Data via Universal Robust Embeddings
Figure 3 for Transferable Adversarial Robustness for Categorical Data via Universal Robust Embeddings
Figure 4 for Transferable Adversarial Robustness for Categorical Data via Universal Robust Embeddings
Viaarxiv icon