Attention Is All You Need


Attention Sinks Are Provably Necessary in Softmax Transformers: Evidence from Trigger-Conditional Tasks

Add code
Mar 12, 2026
Viaarxiv icon

STAIRS-Former: Spatio-Temporal Attention with Interleaved Recursive Structure Transformer for Offline Multi-task Multi-agent Reinforcement Learning

Add code
Mar 12, 2026
Viaarxiv icon

Towards Intelligent Spectrum Management: Spectrum Demand Estimation Using Graph Neural Networks

Add code
Mar 11, 2026
Viaarxiv icon

Telogenesis: Goal Is All U Need

Add code
Mar 10, 2026
Viaarxiv icon

Attention Gathers, MLPs Compose: A Causal Analysis of an Action-Outcome Circuit in VideoViT

Add code
Mar 11, 2026
Viaarxiv icon

Alfa: Attentive Low-Rank Filter Adaptation for Structure-Aware Cross-Domain Personalized Gaze Estimation

Add code
Mar 09, 2026
Viaarxiv icon

SCDP: Learning Humanoid Locomotion from Partial Observations via Mixed-Observation Distillation

Add code
Mar 10, 2026
Viaarxiv icon

Beyond Hungarian: Match-Free Supervision for End-to-End Object Detection

Add code
Mar 09, 2026
Viaarxiv icon

BuildMamba: A Visual State-Space Based Model for Multi-Task Building Segmentation and Height Estimation from Satellite Images

Add code
Mar 09, 2026
Viaarxiv icon

Benchmark of Benchmarks: Unpacking Influence and Code Repository Quality in LLM Safety Benchmarks

Add code
Mar 03, 2026
Viaarxiv icon