Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ekaterina Grishina

DyKAF: Dynamical Kronecker Approximation of the Fisher Information Matrix for Gradient Preconditioning

Nov 09, 2025

Nikolay Yudin, Ekaterina Grishina, Andrey Veprikov, Alexandr Beznosikov, Maxim Rakhuba

Figure 1 for DyKAF: Dynamical Kronecker Approximation of the Fisher Information Matrix for Gradient Preconditioning

Figure 2 for DyKAF: Dynamical Kronecker Approximation of the Fisher Information Matrix for Gradient Preconditioning

Figure 3 for DyKAF: Dynamical Kronecker Approximation of the Fisher Information Matrix for Gradient Preconditioning

Figure 4 for DyKAF: Dynamical Kronecker Approximation of the Fisher Information Matrix for Gradient Preconditioning

Abstract:Recently, optimizers that explicitly treat weights as matrices, rather than flattened vectors, have demonstrated their effectiveness. This perspective naturally leads to structured approximations of the Fisher matrix as preconditioners, where the matrix view induces a Kronecker-factorized form that enables memory-efficient representation. However, constructing such approximations both efficiently and accurately remains an open challenge, since obtaining the optimal factorization is resource-intensive and practical methods therefore rely on heuristic design choices. In this work, we introduce a novel approach that leverages projector-splitting integrators to construct effective preconditioners. Our optimizer, DyKAF (Dynamical Kronecker Approximation of the Fisher Matrix), consistently improves the Fisher matrix approximation quality. Experiments on large language model pre-training and fine-tuning demonstrate that DyKAF outperforms existing optimizers across a range of evaluation metrics.

Via

Access Paper or Ask Questions

Tight and Efficient Upper Bound on Spectral Norm of Convolutional Layers

Sep 18, 2024

Ekaterina Grishina, Mikhail Gorbunov, Maxim Rakhuba

Abstract:Controlling the spectral norm of the Jacobian matrix, which is related to the convolution operation, has been shown to improve generalization, training stability and robustness in CNNs. Existing methods for computing the norm either tend to overestimate it or their performance may deteriorate quickly with increasing the input and kernel sizes. In this paper, we demonstrate that the tensor version of the spectral norm of a four-dimensional convolution kernel, up to a constant factor, serves as an upper bound for the spectral norm of the Jacobian matrix associated with the convolution operation. This new upper bound is independent of the input image resolution, differentiable and can be efficiently calculated during training. Through experiments, we demonstrate how this new bound can be used to improve the performance of convolutional architectures.

* ECCV 2024

Via

Access Paper or Ask Questions