Picture for Quanquan Gu

Quanquan Gu

Transformers Trained via Gradient Descent Can Provably Learn a Class of Teacher Models

Add code
Mar 24, 2026
Viaarxiv icon

Dimension-Independent Convergence of Underdamped Langevin Monte Carlo in KL Divergence

Add code
Mar 02, 2026
Viaarxiv icon

Near-Optimal Regret for KL-Regularized Multi-Armed Bandits

Add code
Mar 02, 2026
Viaarxiv icon

Protein Autoregressive Modeling via Multiscale Structure Generation

Add code
Feb 04, 2026
Viaarxiv icon

Scalable Spatio-Temporal SE(3) Diffusion for Long-Horizon Protein Dynamics

Add code
Feb 02, 2026
Viaarxiv icon

Deep Delta Learning

Add code
Jan 01, 2026
Viaarxiv icon

Group Representational Position Encoding

Add code
Dec 08, 2025
Figure 1 for Group Representational Position Encoding
Figure 2 for Group Representational Position Encoding
Figure 3 for Group Representational Position Encoding
Figure 4 for Group Representational Position Encoding
Viaarxiv icon

Higher-order Linear Attention

Add code
Oct 31, 2025
Viaarxiv icon

Causal Attention with Lookahead Keys

Add code
Sep 09, 2025
Viaarxiv icon

SwingArena: Competitive Programming Arena for Long-context GitHub Issue Solving

Add code
May 29, 2025
Figure 1 for SwingArena: Competitive Programming Arena for Long-context GitHub Issue Solving
Figure 2 for SwingArena: Competitive Programming Arena for Long-context GitHub Issue Solving
Figure 3 for SwingArena: Competitive Programming Arena for Long-context GitHub Issue Solving
Figure 4 for SwingArena: Competitive Programming Arena for Long-context GitHub Issue Solving
Viaarxiv icon