Picture for Jannik Brinkmann

Jannik Brinkmann

Mitigating Adaptive Attacks against Reasoning Models with Activation Consistency Training

Add code
May 27, 2026
Viaarxiv icon

Agents of Chaos

Add code
Feb 23, 2026
Viaarxiv icon

Mechanisms of AI Protein Folding in ESMFold

Add code
Feb 05, 2026
Viaarxiv icon

In-Context Algebra

Add code
Dec 18, 2025
Viaarxiv icon

In-Context Learning Without Copying

Add code
Nov 07, 2025
Viaarxiv icon

Jailbreak Strength and Model Similarity Predict Transferability

Add code
Jun 15, 2025
Figure 1 for Jailbreak Strength and Model Similarity Predict Transferability
Figure 2 for Jailbreak Strength and Model Similarity Predict Transferability
Viaarxiv icon

Large Language Models Share Representations of Latent Grammatical Concepts Across Typologically Diverse Languages

Add code
Jan 10, 2025
Figure 1 for Large Language Models Share Representations of Latent Grammatical Concepts Across Typologically Diverse Languages
Figure 2 for Large Language Models Share Representations of Latent Grammatical Concepts Across Typologically Diverse Languages
Figure 3 for Large Language Models Share Representations of Latent Grammatical Concepts Across Typologically Diverse Languages
Figure 4 for Large Language Models Share Representations of Latent Grammatical Concepts Across Typologically Diverse Languages
Viaarxiv icon

NSA: Neuro-symbolic ARC Challenge

Add code
Jan 08, 2025
Figure 1 for NSA: Neuro-symbolic ARC Challenge
Figure 2 for NSA: Neuro-symbolic ARC Challenge
Figure 3 for NSA: Neuro-symbolic ARC Challenge
Figure 4 for NSA: Neuro-symbolic ARC Challenge
Viaarxiv icon

The Quest for the Right Mediator: A History, Survey, and Theoretical Grounding of Causal Interpretability

Add code
Aug 02, 2024
Figure 1 for The Quest for the Right Mediator: A History, Survey, and Theoretical Grounding of Causal Interpretability
Figure 2 for The Quest for the Right Mediator: A History, Survey, and Theoretical Grounding of Causal Interpretability
Figure 3 for The Quest for the Right Mediator: A History, Survey, and Theoretical Grounding of Causal Interpretability
Viaarxiv icon

Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models

Add code
Jul 31, 2024
Figure 1 for Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models
Figure 2 for Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models
Figure 3 for Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models
Figure 4 for Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models
Viaarxiv icon