Picture for Yan Huang

Yan Huang

PanopticQuery: Unified Query-Time Reasoning for 4D Scenes

Add code
Apr 07, 2026
Viaarxiv icon

Multi-View Video Diffusion Policy: A 3D Spatio-Temporal-Aware Video Action Model

Add code
Apr 03, 2026
Viaarxiv icon

FloorPlan-VLN: A New Paradigm for Floor Plan Guided Vision-Language Navigation

Add code
Mar 18, 2026
Viaarxiv icon

Towards Visual Query Segmentation in the Wild

Add code
Mar 09, 2026
Viaarxiv icon

Towards Long-Form Spatio-Temporal Video Grounding

Add code
Feb 26, 2026
Viaarxiv icon

Beyond Closed-Pool Video Retrieval: A Benchmark and Agent Framework for Real-World Video Search and Moment Localization

Add code
Feb 10, 2026
Viaarxiv icon

PaperX: A Unified Framework for Multimodal Academic Presentation Generation with Scholar DAG

Add code
Feb 05, 2026
Viaarxiv icon

BridgeV2W: Bridging Video Generation Models to Embodied World Models via Embodiment Masks

Add code
Feb 03, 2026
Viaarxiv icon

ShotFinder: Imagination-Driven Open-Domain Video Shot Retrieval via Web Search

Add code
Jan 30, 2026
Viaarxiv icon

VERM: Leveraging Foundation Models to Create a Virtual Eye for Efficient 3D Robotic Manipulation

Add code
Dec 18, 2025
Viaarxiv icon