Picture for Hao Yang

Hao Yang

ViSRA: A Video-based Spatial Reasoning Agent for Multi-modal Large Language Models

Add code
May 11, 2026
Viaarxiv icon

Open-H-Embodiment: A Large-Scale Dataset for Enabling Foundation Models in Medical Robotics

Add code
Apr 22, 2026
Viaarxiv icon

Visual Reasoning through Tool-supervised Reinforcement Learning

Add code
Apr 21, 2026
Viaarxiv icon

Low Light Image Enhancement Challenge at NTIRE 2026

Add code
Apr 19, 2026
Viaarxiv icon

OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation

Add code
Apr 13, 2026
Viaarxiv icon

NTIRE 2026 The Second Challenge on Day and Night Raindrop Removal for Dual-Focused Images: Methods and Results

Add code
Apr 12, 2026
Viaarxiv icon

ReAlign: Optimizing the Visual Document Retriever with Reasoning-Guided Fine-Grained Alignment

Add code
Apr 08, 2026
Viaarxiv icon

LongCat-Next: Lexicalizing Modalities as Discrete Tokens

Add code
Mar 29, 2026
Viaarxiv icon

Towards Dynamic Model Identification and Gravity Compensation for the dVRK-Si Patient Side Manipulator

Add code
Mar 12, 2026
Viaarxiv icon

Resurfacing Paralinguistic Awareness in Large Audio Language Models

Add code
Mar 12, 2026
Viaarxiv icon