Picture for Felix Dangel

Felix Dangel

Dataless Weight Disentanglement in Task Arithmetic via Kronecker-Factored Approximate Curvature

Add code
Feb 19, 2026
Viaarxiv icon

Understanding and Improving the Shampoo Optimizer via Kullback-Leibler Minimization

Add code
Sep 03, 2025
Figure 1 for Understanding and Improving the Shampoo Optimizer via Kullback-Leibler Minimization
Figure 2 for Understanding and Improving the Shampoo Optimizer via Kullback-Leibler Minimization
Figure 3 for Understanding and Improving the Shampoo Optimizer via Kullback-Leibler Minimization
Figure 4 for Understanding and Improving the Shampoo Optimizer via Kullback-Leibler Minimization
Viaarxiv icon

Collapsing Taylor Mode Automatic Differentiation

Add code
May 19, 2025
Viaarxiv icon

Improving Energy Natural Gradient Descent through Woodbury, Momentum, and Randomization

Add code
May 17, 2025
Viaarxiv icon

Hide & Seek: Transformer Symmetries Obscure Sharpness & Riemannian Geometry Finds It

Add code
May 08, 2025
Viaarxiv icon

Spectral-factorized Positive-definite Curvature Learning for NN Training

Add code
Feb 10, 2025
Viaarxiv icon

Position: Curvature Matrices Should Be Democratized via Linear Operators

Add code
Jan 31, 2025
Figure 1 for Position: Curvature Matrices Should Be Democratized via Linear Operators
Figure 2 for Position: Curvature Matrices Should Be Democratized via Linear Operators
Figure 3 for Position: Curvature Matrices Should Be Democratized via Linear Operators
Figure 4 for Position: Curvature Matrices Should Be Democratized via Linear Operators
Viaarxiv icon

What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis

Add code
Oct 14, 2024
Figure 1 for What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
Figure 2 for What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
Figure 3 for What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
Figure 4 for What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
Viaarxiv icon

Revisiting Scalable Hessian Diagonal Approximations for Applications in Reinforcement Learning

Add code
Jun 05, 2024
Viaarxiv icon

Lowering PyTorch's Memory Consumption for Selective Differentiation

Add code
Apr 15, 2024
Figure 1 for Lowering PyTorch's Memory Consumption for Selective Differentiation
Figure 2 for Lowering PyTorch's Memory Consumption for Selective Differentiation
Figure 3 for Lowering PyTorch's Memory Consumption for Selective Differentiation
Figure 4 for Lowering PyTorch's Memory Consumption for Selective Differentiation
Viaarxiv icon