Sliding Window Attention


Personality Prediction from Life Stories using Language Models

Add code
Jun 24, 2025
Viaarxiv icon

RATTENTION: Towards the Minimal Sliding Window Size in Local-Global Attention Models

Add code
Jun 18, 2025
Viaarxiv icon

Two heads are better than one: simulating large transformers with small ones

Add code
Jun 13, 2025
Viaarxiv icon

CodeBrain: Bridging Decoupled Tokenizer and Multi-Scale Architecture for EEG Foundation Model

Add code
Jun 10, 2025
Viaarxiv icon

LoLA: Low-Rank Linear Attention With Sparse Caching

Add code
May 29, 2025
Viaarxiv icon

Delta Attention: Fast and Accurate Sparse Attention Inference by Delta Correction

Add code
May 16, 2025
Viaarxiv icon

HydraNet: Momentum-Driven State Space Duality for Multi-Granularity Tennis Tournaments Analysis

Add code
May 29, 2025
Viaarxiv icon

A Network Science Approach to Granular Time Series Segmentation

Add code
May 23, 2025
Viaarxiv icon

DeepConvContext: A Multi-Scale Approach to Timeseries Classification in Human Activity Recognition

Add code
May 27, 2025
Viaarxiv icon

Efficient Pretraining Length Scaling

Add code
Apr 21, 2025
Viaarxiv icon