Picture for Taiji Suzuki

Taiji Suzuki

Transformers Provably Solve Parity Efficiently with Chain of Thought

Add code
Oct 11, 2024
Viaarxiv icon

On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent

Add code
Oct 07, 2024
Viaarxiv icon

Unveil Benign Overfitting for Transformer in Vision: Training Dynamics, Convergence, and Generalization

Add code
Sep 28, 2024
Viaarxiv icon

Transformers are Minimax Optimal Nonparametric In-Context Learners

Add code
Aug 22, 2024
Viaarxiv icon

Learning sum of diverse features: computational hardness and efficient gradient-based training for ridge combinations

Add code
Jun 17, 2024
Viaarxiv icon

Provably Neural Active Learning Succeeds via Prioritizing Perplexing Samples

Add code
Jun 06, 2024
Viaarxiv icon

High-Dimensional Kernel Methods under Covariate Shift: Data-Dependent Implicit Regularization

Add code
Jun 05, 2024
Viaarxiv icon

Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit

Add code
Jun 03, 2024
Viaarxiv icon

Flow matching achieves minimax optimal convergence

Add code
May 31, 2024
Viaarxiv icon

State Space Models are Comparable to Transformers in Estimating Functions with Dynamic Smoothness

Add code
May 29, 2024
Viaarxiv icon