Picture for Filip Sondej

Filip Sondej

Robust LLM Unlearning with MUDMAN: Meta-Unlearning with Disruption Masking And Normalization

Add code
Jun 14, 2025
Viaarxiv icon

Multi-Agent Security Tax: Trading Off Security and Collaboration Capabilities in Multi-Agent Systems

Add code
Feb 26, 2025
Viaarxiv icon

Ablation is Not Enough to Emulate DPO: How Neuron Dynamics Drive Toxicity Reduction

Add code
Nov 10, 2024
Viaarxiv icon