Picture for Ryo Karakida

Ryo Karakida

Hierarchical Associative Memory, Parallelized MLP-Mixer, and Symmetry Breaking

Add code
Jun 18, 2024
Viaarxiv icon

Self-attention Networks Localize When QK-eigenspectrum Concentrates

Add code
Feb 03, 2024
Viaarxiv icon

On the Parameterization of Second-Order Optimization Effective Towards the Infinite Width

Add code
Dec 19, 2023
Viaarxiv icon

MLP-Mixer as a Wide and Sparse MLP

Add code
Jun 02, 2023
Figure 1 for MLP-Mixer as a Wide and Sparse MLP
Figure 2 for MLP-Mixer as a Wide and Sparse MLP
Figure 3 for MLP-Mixer as a Wide and Sparse MLP
Figure 4 for MLP-Mixer as a Wide and Sparse MLP
Viaarxiv icon

Attention in a family of Boltzmann machines emerging from modern Hopfield networks

Add code
Dec 09, 2022
Figure 1 for Attention in a family of Boltzmann machines emerging from modern Hopfield networks
Viaarxiv icon

Understanding Gradient Regularization in Deep Learning: Efficient Finite-Difference Computation and Implicit Bias

Oct 06, 2022
Figure 1 for Understanding Gradient Regularization in Deep Learning: Efficient Finite-Difference Computation and Implicit Bias
Figure 2 for Understanding Gradient Regularization in Deep Learning: Efficient Finite-Difference Computation and Implicit Bias
Figure 3 for Understanding Gradient Regularization in Deep Learning: Efficient Finite-Difference Computation and Implicit Bias
Figure 4 for Understanding Gradient Regularization in Deep Learning: Efficient Finite-Difference Computation and Implicit Bias
Viaarxiv icon

Deep Learning in Random Neural Fields: Numerical Experiments via Neural Tangent Kernel

Add code
Feb 10, 2022
Figure 1 for Deep Learning in Random Neural Fields: Numerical Experiments via Neural Tangent Kernel
Figure 2 for Deep Learning in Random Neural Fields: Numerical Experiments via Neural Tangent Kernel
Figure 3 for Deep Learning in Random Neural Fields: Numerical Experiments via Neural Tangent Kernel
Figure 4 for Deep Learning in Random Neural Fields: Numerical Experiments via Neural Tangent Kernel
Viaarxiv icon

Learning Curves for Sequential Training of Neural Networks: Self-Knowledge Transfer and Forgetting

Dec 03, 2021
Figure 1 for Learning Curves for Sequential Training of Neural Networks: Self-Knowledge Transfer and Forgetting
Figure 2 for Learning Curves for Sequential Training of Neural Networks: Self-Knowledge Transfer and Forgetting
Figure 3 for Learning Curves for Sequential Training of Neural Networks: Self-Knowledge Transfer and Forgetting
Figure 4 for Learning Curves for Sequential Training of Neural Networks: Self-Knowledge Transfer and Forgetting
Viaarxiv icon

Self-paced Data Augmentation for Training Neural Networks

Oct 29, 2020
Figure 1 for Self-paced Data Augmentation for Training Neural Networks
Figure 2 for Self-paced Data Augmentation for Training Neural Networks
Figure 3 for Self-paced Data Augmentation for Training Neural Networks
Figure 4 for Self-paced Data Augmentation for Training Neural Networks
Viaarxiv icon

Understanding Approximate Fisher Information for Fast Convergence of Natural Gradient Descent in Wide Neural Networks

Add code
Oct 23, 2020
Figure 1 for Understanding Approximate Fisher Information for Fast Convergence of Natural Gradient Descent in Wide Neural Networks
Figure 2 for Understanding Approximate Fisher Information for Fast Convergence of Natural Gradient Descent in Wide Neural Networks
Figure 3 for Understanding Approximate Fisher Information for Fast Convergence of Natural Gradient Descent in Wide Neural Networks
Viaarxiv icon