Picture for Antonio Orvieto

Antonio Orvieto

ETH Zurich

Towards Understanding Self-Pretraining for Sequence Classification

Add code
May 20, 2026
Viaarxiv icon

GRASP: Deterministic argument ranking in interaction graphs

Add code
May 18, 2026
Viaarxiv icon

Muown: Row-Norm Control for Muon Optimization

Add code
May 11, 2026
Viaarxiv icon

GASP: Guided Asymmetric Self-Play For Coding LLMs

Add code
Mar 16, 2026
Viaarxiv icon

Deriving Hyperparameter Scaling Laws via Modern Optimization Theory

Add code
Mar 16, 2026
Viaarxiv icon

Improved state mixing in higher-order and block diagonal linear recurrent networks

Add code
Feb 12, 2026
Viaarxiv icon

Explaining Grokking in Transformers through the Lens of Inductive Bias

Add code
Feb 06, 2026
Viaarxiv icon

Universal Dynamics of Warmup Stable Decay: understanding WSD beyond Transformers

Add code
Jan 13, 2026
Viaarxiv icon

Scaling Behavior of Discrete Diffusion Language Models

Add code
Dec 11, 2025
Figure 1 for Scaling Behavior of Discrete Diffusion Language Models
Figure 2 for Scaling Behavior of Discrete Diffusion Language Models
Figure 3 for Scaling Behavior of Discrete Diffusion Language Models
Figure 4 for Scaling Behavior of Discrete Diffusion Language Models
Viaarxiv icon

Design Principles for Sequence Models via Coefficient Dynamics

Add code
Oct 10, 2025
Figure 1 for Design Principles for Sequence Models via Coefficient Dynamics
Figure 2 for Design Principles for Sequence Models via Coefficient Dynamics
Figure 3 for Design Principles for Sequence Models via Coefficient Dynamics
Figure 4 for Design Principles for Sequence Models via Coefficient Dynamics
Viaarxiv icon