Picture for Yi Yang

Yi Yang

The Hong Kong University of Science and Technology, Hong Kong SAR, China

RSATalker: Realistic Socially-Aware Talking Head Generation for Multi-Turn Conversation

Add code
Jan 15, 2026
Viaarxiv icon

V-Zero: Self-Improving Multimodal Reasoning with Zero Annotation

Add code
Jan 15, 2026
Viaarxiv icon

EvasionBench: Detecting Evasive Answers in Financial Q&A via Multi-Model Consensus and LLM-as-Judge

Add code
Jan 14, 2026
Viaarxiv icon

CogFlow: Bridging Perception and Reasoning through Knowledge Internalization for Visual Mathematical Problem Solving

Add code
Jan 05, 2026
Viaarxiv icon

Unified Generation and Self-Verification for Vision-Language Models via Advantage Decoupled Preference Optimization

Add code
Jan 04, 2026
Viaarxiv icon

KV-Embedding: Training-free Text Embedding via Internal KV Re-routing in Decoder-only LLMs

Add code
Jan 03, 2026
Viaarxiv icon

DeMoGen: Towards Decompositional Human Motion Generation with Energy-Based Diffusion Models

Add code
Dec 26, 2025
Viaarxiv icon

Recurrent Video Masked Autoencoders

Add code
Dec 15, 2025
Viaarxiv icon

UniBYD: A Unified Framework for Learning Robotic Manipulation Across Embodiments Beyond Imitation of Human Demonstrations

Add code
Dec 12, 2025
Figure 1 for UniBYD: A Unified Framework for Learning Robotic Manipulation Across Embodiments Beyond Imitation of Human Demonstrations
Figure 2 for UniBYD: A Unified Framework for Learning Robotic Manipulation Across Embodiments Beyond Imitation of Human Demonstrations
Figure 3 for UniBYD: A Unified Framework for Learning Robotic Manipulation Across Embodiments Beyond Imitation of Human Demonstrations
Figure 4 for UniBYD: A Unified Framework for Learning Robotic Manipulation Across Embodiments Beyond Imitation of Human Demonstrations
Viaarxiv icon

See Once, Then Act: Vision-Language-Action Model with Task Learning from One-Shot Video Demonstrations

Add code
Dec 08, 2025
Viaarxiv icon