Picture for Antonio Orvieto

Antonio Orvieto

ETH Zurich

Deriving Hyperparameter Scaling Laws via Modern Optimization Theory

Add code
Mar 16, 2026
Viaarxiv icon

GASP: Guided Asymmetric Self-Play For Coding LLMs

Add code
Mar 16, 2026
Viaarxiv icon

Improved state mixing in higher-order and block diagonal linear recurrent networks

Add code
Feb 12, 2026
Viaarxiv icon

Explaining Grokking in Transformers through the Lens of Inductive Bias

Add code
Feb 06, 2026
Viaarxiv icon

Universal Dynamics of Warmup Stable Decay: understanding WSD beyond Transformers

Add code
Jan 13, 2026
Viaarxiv icon

Scaling Behavior of Discrete Diffusion Language Models

Add code
Dec 11, 2025
Figure 1 for Scaling Behavior of Discrete Diffusion Language Models
Figure 2 for Scaling Behavior of Discrete Diffusion Language Models
Figure 3 for Scaling Behavior of Discrete Diffusion Language Models
Figure 4 for Scaling Behavior of Discrete Diffusion Language Models
Viaarxiv icon

Design Principles for Sequence Models via Coefficient Dynamics

Add code
Oct 10, 2025
Figure 1 for Design Principles for Sequence Models via Coefficient Dynamics
Figure 2 for Design Principles for Sequence Models via Coefficient Dynamics
Figure 3 for Design Principles for Sequence Models via Coefficient Dynamics
Figure 4 for Design Principles for Sequence Models via Coefficient Dynamics
Viaarxiv icon

How does the optimizer implicitly bias the model merging loss landscape?

Add code
Oct 06, 2025
Viaarxiv icon

When recalling in-context, Transformers are not SSMs

Add code
Aug 26, 2025
Viaarxiv icon

Enhancing Optimizer Stability: Momentum Adaptation of The NGN Step-size

Add code
Aug 20, 2025
Viaarxiv icon