Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Katarzyna Zaleska

TADA! Tuning Audio Diffusion Models through Activation Steering

Feb 12, 2026

Łukasz Staniszewski, Katarzyna Zaleska, Mateusz Modrzejewski, Kamil Deja

Abstract:Audio diffusion models can synthesize high-fidelity music from text, yet their internal mechanisms for representing high-level concepts remain poorly understood. In this work, we use activation patching to demonstrate that distinct semantic musical concepts, such as the presence of specific instruments, vocals, or genre characteristics, are controlled by a small, shared subset of attention layers in state-of-the-art audio diffusion architectures. Next, we demonstrate that applying Contrastive Activation Addition and Sparse Autoencoders in these layers enables more precise control over the generated audio, indicating a direct benefit of the specialization phenomenon. By steering activations of the identified layers, we can alter specific musical elements with high precision, such as modulating tempo or changing a track's mood.

* Preprint. Preliminary work

Via

Access Paper or Ask Questions

Low-Rank Continual Personalization of Diffusion Models

Oct 07, 2024

Łukasz Staniszewski, Katarzyna Zaleska, Kamil Deja

Figure 1 for Low-Rank Continual Personalization of Diffusion Models

Figure 2 for Low-Rank Continual Personalization of Diffusion Models

Figure 3 for Low-Rank Continual Personalization of Diffusion Models

Figure 4 for Low-Rank Continual Personalization of Diffusion Models

Abstract:Recent personalization methods for diffusion models, such as Dreambooth, allow fine-tuning pre-trained models to generate new concepts. However, applying these techniques across multiple tasks in order to include, e.g., several new objects or styles, leads to mutual interference between their adapters. While recent studies attempt to mitigate this issue by combining trained adapters across tasks after fine-tuning, we adopt a more rigorous regime and investigate the personalization of large diffusion models under a continual learning scenario, where such interference leads to catastrophic forgetting of previous knowledge. To that end, we evaluate the na\"ive continual fine-tuning of customized models and compare this approach with three methods for consecutive adapters' training: sequentially merging new adapters, merging orthogonally initialized adapters, and updating only relevant parameters according to the task. In our experiments, we show that the proposed approaches mitigate forgetting when compared to the na\"ive approach.

Via

Access Paper or Ask Questions