Picture for Marco Bondaschi

Marco Bondaschi

The Conditional Regret-Capacity Theorem for Batch Universal Prediction

Add code
Aug 14, 2025
Viaarxiv icon

What One Cannot, Two Can: Two-Layer Transformers Provably Represent Induction Heads on Any-Order Markov Chains

Add code
Aug 10, 2025
Viaarxiv icon

Batch Normalization Decomposed

Add code
Dec 03, 2024
Viaarxiv icon

Transformers on Markov Data: Constant Depth Suffices

Add code
Jul 25, 2024
Figure 1 for Transformers on Markov Data: Constant Depth Suffices
Figure 2 for Transformers on Markov Data: Constant Depth Suffices
Figure 3 for Transformers on Markov Data: Constant Depth Suffices
Figure 4 for Transformers on Markov Data: Constant Depth Suffices
Viaarxiv icon

Fundamental Limits of Prompt Compression: A Rate-Distortion Framework for Black-Box Language Models

Add code
Jul 22, 2024
Figure 1 for Fundamental Limits of Prompt Compression: A Rate-Distortion Framework for Black-Box Language Models
Figure 2 for Fundamental Limits of Prompt Compression: A Rate-Distortion Framework for Black-Box Language Models
Figure 3 for Fundamental Limits of Prompt Compression: A Rate-Distortion Framework for Black-Box Language Models
Figure 4 for Fundamental Limits of Prompt Compression: A Rate-Distortion Framework for Black-Box Language Models
Viaarxiv icon

Local to Global: Learning Dynamics and Effect of Initialization for Transformers

Add code
Jun 05, 2024
Figure 1 for Local to Global: Learning Dynamics and Effect of Initialization for Transformers
Figure 2 for Local to Global: Learning Dynamics and Effect of Initialization for Transformers
Figure 3 for Local to Global: Learning Dynamics and Effect of Initialization for Transformers
Figure 4 for Local to Global: Learning Dynamics and Effect of Initialization for Transformers
Viaarxiv icon

Attention with Markov: A Framework for Principled Analysis of Transformers via Markov Chains

Add code
Feb 06, 2024
Figure 1 for Attention with Markov: A Framework for Principled Analysis of Transformers via Markov Chains
Figure 2 for Attention with Markov: A Framework for Principled Analysis of Transformers via Markov Chains
Figure 3 for Attention with Markov: A Framework for Principled Analysis of Transformers via Markov Chains
Figure 4 for Attention with Markov: A Framework for Principled Analysis of Transformers via Markov Chains
Viaarxiv icon

Batch Universal Prediction

Add code
Feb 06, 2024
Viaarxiv icon

LASER: Linear Compression in Wireless Distributed Optimization

Add code
Oct 19, 2023
Figure 1 for LASER: Linear Compression in Wireless Distributed Optimization
Figure 2 for LASER: Linear Compression in Wireless Distributed Optimization
Figure 3 for LASER: Linear Compression in Wireless Distributed Optimization
Figure 4 for LASER: Linear Compression in Wireless Distributed Optimization
Viaarxiv icon