Fine Grained Action Detection


DiGIT: Multi-Dilated Gated Encoder and Central-Adjacent Region Integrated Decoder for Temporal Action Detection Transformer

Add code
May 09, 2025
Viaarxiv icon

LogisticsVLN: Vision-Language Navigation For Low-Altitude Terminal Delivery Based on Agentic UAVs

Add code
May 06, 2025
Viaarxiv icon

TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action

Add code
May 02, 2025
Viaarxiv icon

Learning Streaming Video Representation via Multitask Training

Add code
Apr 28, 2025
Viaarxiv icon

Detecting Actionable Requests and Offers on Social Media During Crises Using LLMs

Add code
Apr 22, 2025
Viaarxiv icon

LazyReview A Dataset for Uncovering Lazy Thinking in NLP Peer Reviews

Add code
Apr 15, 2025
Viaarxiv icon

F$^3$Set: Towards Analyzing Fast, Frequent, and Fine-grained Events from Videos

Add code
Apr 15, 2025
Viaarxiv icon

Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting

Add code
Apr 09, 2025
Viaarxiv icon

MultiTSF: Transformer-based Sensor Fusion for Human-Centric Multi-view and Multi-modal Action Recognition

Add code
Apr 03, 2025
Viaarxiv icon

FakeScope: Large Multimodal Expert Model for Transparent AI-Generated Image Forensics

Add code
Mar 31, 2025
Viaarxiv icon