Picture for Aengus Lynch

Aengus Lynch

How Do Large Language Monkeys Get Their Power (Laws)?

Add code
Feb 24, 2025
Figure 1 for How Do Large Language Monkeys Get Their Power (Laws)?
Figure 2 for How Do Large Language Monkeys Get Their Power (Laws)?
Figure 3 for How Do Large Language Monkeys Get Their Power (Laws)?
Figure 4 for How Do Large Language Monkeys Get Their Power (Laws)?
Viaarxiv icon

Best-of-N Jailbreaking

Add code
Dec 04, 2024
Figure 1 for Best-of-N Jailbreaking
Figure 2 for Best-of-N Jailbreaking
Figure 3 for Best-of-N Jailbreaking
Figure 4 for Best-of-N Jailbreaking
Viaarxiv icon

Targeted Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs

Add code
Jul 22, 2024
Viaarxiv icon

Analyzing the Generalization and Reliability of Steering Vectors -- ICML 2024

Add code
Jul 17, 2024
Viaarxiv icon

Eight Methods to Evaluate Robust Unlearning in LLMs

Add code
Feb 26, 2024
Figure 1 for Eight Methods to Evaluate Robust Unlearning in LLMs
Figure 2 for Eight Methods to Evaluate Robust Unlearning in LLMs
Figure 3 for Eight Methods to Evaluate Robust Unlearning in LLMs
Figure 4 for Eight Methods to Evaluate Robust Unlearning in LLMs
Viaarxiv icon

Towards Automated Circuit Discovery for Mechanistic Interpretability

Add code
Apr 28, 2023
Figure 1 for Towards Automated Circuit Discovery for Mechanistic Interpretability
Figure 2 for Towards Automated Circuit Discovery for Mechanistic Interpretability
Figure 3 for Towards Automated Circuit Discovery for Mechanistic Interpretability
Figure 4 for Towards Automated Circuit Discovery for Mechanistic Interpretability
Viaarxiv icon

Spawrious: A Benchmark for Fine Control of Spurious Correlation Biases

Add code
Mar 09, 2023
Figure 1 for Spawrious: A Benchmark for Fine Control of Spurious Correlation Biases
Figure 2 for Spawrious: A Benchmark for Fine Control of Spurious Correlation Biases
Figure 3 for Spawrious: A Benchmark for Fine Control of Spurious Correlation Biases
Figure 4 for Spawrious: A Benchmark for Fine Control of Spurious Correlation Biases
Viaarxiv icon

Causal Machine Learning: A Survey and Open Problems

Add code
Jun 30, 2022
Figure 1 for Causal Machine Learning: A Survey and Open Problems
Figure 2 for Causal Machine Learning: A Survey and Open Problems
Figure 3 for Causal Machine Learning: A Survey and Open Problems
Figure 4 for Causal Machine Learning: A Survey and Open Problems
Viaarxiv icon