Picture for Antonio Orvieto

Antonio Orvieto

ETH Zurich

When recalling in-context, Transformers are not SSMs

Add code
Aug 26, 2025
Viaarxiv icon

Enhancing Optimizer Stability: Momentum Adaptation of The NGN Step-size

Add code
Aug 20, 2025
Viaarxiv icon

GitChameleon: Evaluating AI Code Generation Against Python Library Version Incompatibilities

Add code
Jul 16, 2025
Viaarxiv icon

Is your batch size the problem? Revisiting the Adam-SGD gap in language modeling

Add code
Jun 14, 2025
Viaarxiv icon

In Search of Adam's Secret Sauce

Add code
May 27, 2025
Viaarxiv icon

Can you Finetune your Binoculars? Embedding Text Watermarks into the Weights of Large Language Models

Add code
Apr 08, 2025
Viaarxiv icon

Fixed-Point RNNs: From Diagonal to Dense in a Few Iterations

Add code
Mar 13, 2025
Figure 1 for Fixed-Point RNNs: From Diagonal to Dense in a Few Iterations
Figure 2 for Fixed-Point RNNs: From Diagonal to Dense in a Few Iterations
Figure 3 for Fixed-Point RNNs: From Diagonal to Dense in a Few Iterations
Figure 4 for Fixed-Point RNNs: From Diagonal to Dense in a Few Iterations
Viaarxiv icon

Generalized Interpolating Discrete Diffusion

Add code
Mar 06, 2025
Viaarxiv icon

An Uncertainty Principle for Linear Recurrent Neural Networks

Add code
Feb 13, 2025
Viaarxiv icon

When, Where and Why to Average Weights?

Add code
Feb 10, 2025
Viaarxiv icon