Picture for Wu Lin

Wu Lin

Reparametrizing Shampoo and SOAP for Subspace Basis Updates and BFloat16 Storage

Add code
May 25, 2026
Viaarxiv icon

Understanding and Improving the Shampoo Optimizer via Kullback-Leibler Minimization

Add code
Sep 03, 2025
Figure 1 for Understanding and Improving the Shampoo Optimizer via Kullback-Leibler Minimization
Figure 2 for Understanding and Improving the Shampoo Optimizer via Kullback-Leibler Minimization
Figure 3 for Understanding and Improving the Shampoo Optimizer via Kullback-Leibler Minimization
Figure 4 for Understanding and Improving the Shampoo Optimizer via Kullback-Leibler Minimization
Viaarxiv icon

Spectral-factorized Positive-definite Curvature Learning for NN Training

Add code
Feb 10, 2025
Viaarxiv icon

Training Data Attribution via Approximate Unrolled Differentiation

Add code
May 21, 2024
Figure 1 for Training Data Attribution via Approximate Unrolled Differentiation
Figure 2 for Training Data Attribution via Approximate Unrolled Differentiation
Figure 3 for Training Data Attribution via Approximate Unrolled Differentiation
Figure 4 for Training Data Attribution via Approximate Unrolled Differentiation
Viaarxiv icon

Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective

Add code
Feb 13, 2024
Figure 1 for Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective
Figure 2 for Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective
Figure 3 for Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective
Figure 4 for Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective
Viaarxiv icon

Structured Inverse-Free Natural Gradient: Memory-Efficient & Numerically-Stable KFAC for Large Neural Nets

Add code
Dec 16, 2023
Figure 1 for Structured Inverse-Free Natural Gradient: Memory-Efficient & Numerically-Stable KFAC for Large Neural Nets
Figure 2 for Structured Inverse-Free Natural Gradient: Memory-Efficient & Numerically-Stable KFAC for Large Neural Nets
Figure 3 for Structured Inverse-Free Natural Gradient: Memory-Efficient & Numerically-Stable KFAC for Large Neural Nets
Figure 4 for Structured Inverse-Free Natural Gradient: Memory-Efficient & Numerically-Stable KFAC for Large Neural Nets
Viaarxiv icon

Simplifying Momentum-based Riemannian Submanifold Optimization

Add code
Feb 20, 2023
Figure 1 for Simplifying Momentum-based Riemannian Submanifold Optimization
Figure 2 for Simplifying Momentum-based Riemannian Submanifold Optimization
Figure 3 for Simplifying Momentum-based Riemannian Submanifold Optimization
Figure 4 for Simplifying Momentum-based Riemannian Submanifold Optimization
Viaarxiv icon

Structured second-order methods via natural gradient descent

Add code
Jul 22, 2021
Figure 1 for Structured second-order methods via natural gradient descent
Figure 2 for Structured second-order methods via natural gradient descent
Viaarxiv icon

Tractable structured natural gradient descent using local parameterizations

Add code
Mar 04, 2021
Figure 1 for Tractable structured natural gradient descent using local parameterizations
Figure 2 for Tractable structured natural gradient descent using local parameterizations
Figure 3 for Tractable structured natural gradient descent using local parameterizations
Figure 4 for Tractable structured natural gradient descent using local parameterizations
Viaarxiv icon

Handling the Positive-Definite Constraint in the Bayesian Learning Rule

Add code
Mar 08, 2020
Figure 1 for Handling the Positive-Definite Constraint in the Bayesian Learning Rule
Figure 2 for Handling the Positive-Definite Constraint in the Bayesian Learning Rule
Figure 3 for Handling the Positive-Definite Constraint in the Bayesian Learning Rule
Figure 4 for Handling the Positive-Definite Constraint in the Bayesian Learning Rule
Viaarxiv icon