Picture for Dalton Jones

Dalton Jones

KeyDiff: Key Similarity-Based KV Cache Eviction for Long-Context LLM Inference in Resource-Constrained Environments

Add code
Apr 23, 2025
Viaarxiv icon

KeDiff: Key Similarity-Based KV Cache Eviction for Long-Context LLM Inference in Resource-Constrained Environments

Add code
Apr 21, 2025
Viaarxiv icon

CAOTE: KV Caching through Attention Output Error based Token Eviction

Add code
Apr 18, 2025
Viaarxiv icon

PADRe: A Unifying Polynomial Attention Drop-in Replacement for Efficient Vision Transformer

Add code
Jul 16, 2024
Figure 1 for PADRe: A Unifying Polynomial Attention Drop-in Replacement for Efficient Vision Transformer
Figure 2 for PADRe: A Unifying Polynomial Attention Drop-in Replacement for Efficient Vision Transformer
Figure 3 for PADRe: A Unifying Polynomial Attention Drop-in Replacement for Efficient Vision Transformer
Figure 4 for PADRe: A Unifying Polynomial Attention Drop-in Replacement for Efficient Vision Transformer
Viaarxiv icon