Picture for Yan Huang

Yan Huang

Towards Long-Form Spatio-Temporal Video Grounding

Add code
Feb 26, 2026
Viaarxiv icon

Beyond Closed-Pool Video Retrieval: A Benchmark and Agent Framework for Real-World Video Search and Moment Localization

Add code
Feb 10, 2026
Viaarxiv icon

PaperX: A Unified Framework for Multimodal Academic Presentation Generation with Scholar DAG

Add code
Feb 05, 2026
Viaarxiv icon

BridgeV2W: Bridging Video Generation Models to Embodied World Models via Embodiment Masks

Add code
Feb 03, 2026
Viaarxiv icon

ShotFinder: Imagination-Driven Open-Domain Video Shot Retrieval via Web Search

Add code
Jan 30, 2026
Viaarxiv icon

VERM: Leveraging Foundation Models to Create a Virtual Eye for Efficient 3D Robotic Manipulation

Add code
Dec 18, 2025
Viaarxiv icon

DP-CSGP: Differentially Private Stochastic Gradient Push with Compressed Communication

Add code
Dec 15, 2025
Viaarxiv icon

UniBYD: A Unified Framework for Learning Robotic Manipulation Across Embodiments Beyond Imitation of Human Demonstrations

Add code
Dec 12, 2025
Figure 1 for UniBYD: A Unified Framework for Learning Robotic Manipulation Across Embodiments Beyond Imitation of Human Demonstrations
Figure 2 for UniBYD: A Unified Framework for Learning Robotic Manipulation Across Embodiments Beyond Imitation of Human Demonstrations
Figure 3 for UniBYD: A Unified Framework for Learning Robotic Manipulation Across Embodiments Beyond Imitation of Human Demonstrations
Figure 4 for UniBYD: A Unified Framework for Learning Robotic Manipulation Across Embodiments Beyond Imitation of Human Demonstrations
Viaarxiv icon

VP-AutoTest: A Virtual-Physical Fusion Autonomous Driving Testing Platform

Add code
Dec 08, 2025
Viaarxiv icon

Unified Video Editing with Temporal Reasoner

Add code
Dec 08, 2025
Viaarxiv icon