Picture for Limin Wang

Limin Wang

UniDDT: Unifying Multimodal Understanding and Generation with Decoupled Diffusion Transformer

Add code
Jun 15, 2026
Viaarxiv icon

HYDRA-X: Native Unified Multimodal Models with Holistic Visual Tokenizers

Add code
Jun 11, 2026
Viaarxiv icon

SpikeTAD: Spiking Neural Networks for End-to-End Temporal Action Detection

Add code
Jun 10, 2026
Viaarxiv icon

InternVideo3: Agentify Foundation Models with Multimodal Contextual Reasoning

Add code
Jun 10, 2026
Viaarxiv icon

Explainable Forensics of Manipulated Segments in Untrimmed Long Videos

Add code
Jun 01, 2026
Viaarxiv icon

StreamOV: Streaming Omni-Video Understanding via Evidence-Guided Memory and Response Triggering

Add code
May 25, 2026
Viaarxiv icon

USV: Towards Understanding the User-generated Short-form Videos

Add code
May 20, 2026
Viaarxiv icon

Towards Multimodal Lifelong Understanding: A Dataset and Agentic Baseline

Add code
Mar 05, 2026
Viaarxiv icon

RIVER: A Real-Time Interaction Benchmark for Video LLMs

Add code
Mar 04, 2026
Viaarxiv icon

LongVPO: From Anchored Cues to Self-Reasoning for Long-Form Video Preference Optimization

Add code
Feb 02, 2026
Viaarxiv icon