Picture for Daniil Gavrilov

Daniil Gavrilov

Train One Sparse Autoencoder Across Multiple Sparsity Budgets to Preserve Interpretability and Accuracy

Add code
May 30, 2025
Viaarxiv icon

Train Sparse Autoencoders Efficiently by Utilizing Features Correlation

Add code
May 28, 2025
Viaarxiv icon

Steering LLM Reasoning Through Bias-Only Adaptation

Add code
May 24, 2025
Viaarxiv icon

You Do Not Fully Utilize Transformer's Representation Capacity

Add code
Feb 13, 2025
Viaarxiv icon

Analyze Feature Flow to Enhance Interpretation and Steering in Language Models

Add code
Feb 06, 2025
Viaarxiv icon

The Differences Between Direct Alignment Algorithms are a Blur

Add code
Feb 03, 2025
Viaarxiv icon

Mechanistic Permutability: Match Features Across Layers

Add code
Oct 10, 2024
Viaarxiv icon

Learn Your Reference Model for Real Good Alignment

Add code
Apr 15, 2024
Viaarxiv icon

Linear Transformers with Learnable Kernel Functions are Better In-Context Models

Add code
Feb 16, 2024
Figure 1 for Linear Transformers with Learnable Kernel Functions are Better In-Context Models
Figure 2 for Linear Transformers with Learnable Kernel Functions are Better In-Context Models
Figure 3 for Linear Transformers with Learnable Kernel Functions are Better In-Context Models
Figure 4 for Linear Transformers with Learnable Kernel Functions are Better In-Context Models
Viaarxiv icon

Ahead-of-Time P-Tuning

Add code
May 18, 2023
Figure 1 for Ahead-of-Time P-Tuning
Figure 2 for Ahead-of-Time P-Tuning
Figure 3 for Ahead-of-Time P-Tuning
Figure 4 for Ahead-of-Time P-Tuning
Viaarxiv icon