Video Understanding


SlowFocus: Enhancing Fine-grained Temporal Understanding in Video LLM

Add code
Feb 03, 2026
Viaarxiv icon

KTV: Keyframes and Key Tokens Selection for Efficient Training-Free Video LLMs

Add code
Feb 03, 2026
Viaarxiv icon

Morphe: High-Fidelity Generative Video Streaming with Vision Foundation Model

Add code
Feb 03, 2026
Viaarxiv icon

Contribution-aware Token Compression for Efficient Video Understanding via Reinforcement Learning

Add code
Feb 02, 2026
Viaarxiv icon

FreshMem: Brain-Inspired Frequency-Space Hybrid Memory for Streaming Video Understanding

Add code
Feb 02, 2026
Viaarxiv icon

LongVPO: From Anchored Cues to Self-Reasoning for Long-Form Video Preference Optimization

Add code
Feb 02, 2026
Viaarxiv icon

Thinking with Comics: Enhancing Multimodal Reasoning through Structured Visual Storytelling

Add code
Feb 03, 2026
Viaarxiv icon

DuoGen: Towards General Purpose Interleaved Multimodal Generation

Add code
Feb 03, 2026
Viaarxiv icon

Hand3R: Online 4D Hand-Scene Reconstruction in the Wild

Add code
Feb 03, 2026
Viaarxiv icon

Self-Supervised Learning from Structural Invariance

Add code
Feb 02, 2026
Viaarxiv icon