Picture for Taiji Suzuki

Taiji Suzuki

On the Learning Dynamics of Two-layer Linear Networks with Label Noise SGD

Add code
Mar 11, 2026
Viaarxiv icon

Takeuchi's Information Criteria as Generalization Measures for DNNs Close to NTK Regime

Add code
Feb 26, 2026
Viaarxiv icon

Transformers as Measure-Theoretic Associative Memory: A Statistical Perspective and Minimax Optimality

Add code
Feb 02, 2026
Viaarxiv icon

Inference-Aware Meta-Alignment of LLMs via Non-Linear GRPO

Add code
Feb 02, 2026
Viaarxiv icon

A Relative-Budget Theory for Reinforcement Learning with Verifiable Rewards in Large Language Model Reasoning

Add code
Feb 02, 2026
Viaarxiv icon

Zero-Flow Encoders

Add code
Jan 31, 2026
Viaarxiv icon

From Shortcut to Induction Head: How Data Diversity Shapes Algorithm Selection in Transformers

Add code
Dec 21, 2025
Figure 1 for From Shortcut to Induction Head: How Data Diversity Shapes Algorithm Selection in Transformers
Figure 2 for From Shortcut to Induction Head: How Data Diversity Shapes Algorithm Selection in Transformers
Figure 3 for From Shortcut to Induction Head: How Data Diversity Shapes Algorithm Selection in Transformers
Figure 4 for From Shortcut to Induction Head: How Data Diversity Shapes Algorithm Selection in Transformers
Viaarxiv icon

Sliding Window Recurrences for Sequence Models

Add code
Dec 15, 2025
Figure 1 for Sliding Window Recurrences for Sequence Models
Figure 2 for Sliding Window Recurrences for Sequence Models
Figure 3 for Sliding Window Recurrences for Sequence Models
Figure 4 for Sliding Window Recurrences for Sequence Models
Viaarxiv icon

Towards a Unified Analysis of Neural Networks in Nonparametric Instrumental Variable Regression: Optimization and Generalization

Add code
Nov 18, 2025
Viaarxiv icon

Provable Benefit of Curriculum in Transformer Tree-Reasoning Post-Training

Add code
Nov 10, 2025
Viaarxiv icon