Alert button
Picture for Kazuki Osawa

Kazuki Osawa

Alert button

ASDL: A Unified Interface for Gradient Preconditioning in PyTorch

Add code
Bookmark button
Alert button
May 08, 2023
Kazuki Osawa, Satoki Ishikawa, Rio Yokota, Shigang Li, Torsten Hoefler

Figure 1 for ASDL: A Unified Interface for Gradient Preconditioning in PyTorch
Figure 2 for ASDL: A Unified Interface for Gradient Preconditioning in PyTorch
Figure 3 for ASDL: A Unified Interface for Gradient Preconditioning in PyTorch
Figure 4 for ASDL: A Unified Interface for Gradient Preconditioning in PyTorch
Viaarxiv icon

PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices

Add code
Bookmark button
Alert button
Nov 25, 2022
Kazuki Osawa, Shigang Li, Torsten Hoefler

Figure 1 for PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices
Figure 2 for PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices
Figure 3 for PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices
Figure 4 for PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices
Viaarxiv icon

Understanding Gradient Regularization in Deep Learning: Efficient Finite-Difference Computation and Implicit Bias

Add code
Bookmark button
Alert button
Oct 06, 2022
Ryo Karakida, Tomoumi Takase, Tomohiro Hayase, Kazuki Osawa

Figure 1 for Understanding Gradient Regularization in Deep Learning: Efficient Finite-Difference Computation and Implicit Bias
Figure 2 for Understanding Gradient Regularization in Deep Learning: Efficient Finite-Difference Computation and Implicit Bias
Figure 3 for Understanding Gradient Regularization in Deep Learning: Efficient Finite-Difference Computation and Implicit Bias
Figure 4 for Understanding Gradient Regularization in Deep Learning: Efficient Finite-Difference Computation and Implicit Bias
Viaarxiv icon

Neural Graph Databases

Add code
Bookmark button
Alert button
Sep 20, 2022
Maciej Besta, Patrick Iff, Florian Scheidl, Kazuki Osawa, Nikoli Dryden, Michal Podstawski, Tiancheng Chen, Torsten Hoefler

Figure 1 for Neural Graph Databases
Figure 2 for Neural Graph Databases
Figure 3 for Neural Graph Databases
Figure 4 for Neural Graph Databases
Viaarxiv icon

Efficient Quantized Sparse Matrix Operations on Tensor Cores

Add code
Bookmark button
Alert button
Sep 14, 2022
Shigang Li, Kazuki Osawa, Torsten Hoefler

Figure 1 for Efficient Quantized Sparse Matrix Operations on Tensor Cores
Figure 2 for Efficient Quantized Sparse Matrix Operations on Tensor Cores
Figure 3 for Efficient Quantized Sparse Matrix Operations on Tensor Cores
Figure 4 for Efficient Quantized Sparse Matrix Operations on Tensor Cores
Viaarxiv icon

Understanding Approximate Fisher Information for Fast Convergence of Natural Gradient Descent in Wide Neural Networks

Add code
Bookmark button
Alert button
Oct 23, 2020
Ryo Karakida, Kazuki Osawa

Figure 1 for Understanding Approximate Fisher Information for Fast Convergence of Natural Gradient Descent in Wide Neural Networks
Figure 2 for Understanding Approximate Fisher Information for Fast Convergence of Natural Gradient Descent in Wide Neural Networks
Figure 3 for Understanding Approximate Fisher Information for Fast Convergence of Natural Gradient Descent in Wide Neural Networks
Viaarxiv icon

Scalable and Practical Natural Gradient for Large-Scale Deep Learning

Add code
Bookmark button
Alert button
Feb 13, 2020
Kazuki Osawa, Yohei Tsuji, Yuichiro Ueno, Akira Naruse, Chuan-Sheng Foo, Rio Yokota

Figure 1 for Scalable and Practical Natural Gradient for Large-Scale Deep Learning
Figure 2 for Scalable and Practical Natural Gradient for Large-Scale Deep Learning
Figure 3 for Scalable and Practical Natural Gradient for Large-Scale Deep Learning
Figure 4 for Scalable and Practical Natural Gradient for Large-Scale Deep Learning
Viaarxiv icon

Practical Deep Learning with Bayesian Principles

Add code
Bookmark button
Alert button
Jun 06, 2019
Kazuki Osawa, Siddharth Swaroop, Anirudh Jain, Runa Eschenhagen, Richard E. Turner, Rio Yokota, Mohammad Emtiyaz Khan

Figure 1 for Practical Deep Learning with Bayesian Principles
Figure 2 for Practical Deep Learning with Bayesian Principles
Figure 3 for Practical Deep Learning with Bayesian Principles
Figure 4 for Practical Deep Learning with Bayesian Principles
Viaarxiv icon

Second-order Optimization Method for Large Mini-batch: Training ResNet-50 on ImageNet in 35 Epochs

Add code
Bookmark button
Alert button
Dec 05, 2018
Kazuki Osawa, Yohei Tsuji, Yuichiro Ueno, Akira Naruse, Rio Yokota, Satoshi Matsuoka

Figure 1 for Second-order Optimization Method for Large Mini-batch: Training ResNet-50 on ImageNet in 35 Epochs
Figure 2 for Second-order Optimization Method for Large Mini-batch: Training ResNet-50 on ImageNet in 35 Epochs
Figure 3 for Second-order Optimization Method for Large Mini-batch: Training ResNet-50 on ImageNet in 35 Epochs
Figure 4 for Second-order Optimization Method for Large Mini-batch: Training ResNet-50 on ImageNet in 35 Epochs
Viaarxiv icon