Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:HieraMamba: Video Temporal Grounding via Hierarchical Anchor-Mamba Pooling

Oct 27, 2025

Joungbin An, Kristen Grauman

Figure 1 for HieraMamba: Video Temporal Grounding via Hierarchical Anchor-Mamba Pooling

Figure 2 for HieraMamba: Video Temporal Grounding via Hierarchical Anchor-Mamba Pooling

Figure 3 for HieraMamba: Video Temporal Grounding via Hierarchical Anchor-Mamba Pooling

Figure 4 for HieraMamba: Video Temporal Grounding via Hierarchical Anchor-Mamba Pooling

Share this with someone who'll enjoy it:

Abstract:Video temporal grounding, the task of localizing the start and end times of a natural language query in untrimmed video, requires capturing both global context and fine-grained temporal detail. This challenge is particularly pronounced in long videos, where existing methods often compromise temporal fidelity by over-downsampling or relying on fixed windows. We present HieraMamba, a hierarchical architecture that preserves temporal structure and semantic richness across scales. At its core are Anchor-MambaPooling (AMP) blocks, which utilize Mamba's selective scanning to produce compact anchor tokens that summarize video content at multiple granularities. Two complementary objectives, anchor-conditioned and segment-pooled contrastive losses, encourage anchors to retain local detail while remaining globally discriminative. HieraMamba sets a new state-of-the-art on Ego4D-NLQ, MAD, and TACoS, demonstrating precise, temporally faithful localization in long, untrimmed videos.

* Project Page: https://vision.cs.utexas.edu/projects/hieramamba/

View paper on

Share this with someone who'll enjoy it:

Title:HieraMamba: Video Temporal Grounding via Hierarchical Anchor-Mamba Pooling

Paper and Code