Picture for Yu Qiao

Yu Qiao

ShenZhen Key Lab of Computer Vision and Pattern Recognition, SIAT-SenseTime Joint Lab, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, SIAT Branch, Shenzhen Institute of Artificial Intelligence and Robotics for Society

SIN-Bench: Tracing Native Evidence Chains in Long-Context Multimodal Scientific Interleaved Literature

Add code
Jan 15, 2026
Viaarxiv icon

OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agent

Add code
Jan 12, 2026
Viaarxiv icon

InternVLA-A1: Unifying Understanding, Generation and Action for Robotic Manipulation

Add code
Jan 05, 2026
Viaarxiv icon

Yume-1.5: A Text-Controlled Interactive World Generation Model

Add code
Dec 26, 2025
Viaarxiv icon

UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture

Add code
Dec 25, 2025
Figure 1 for UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture
Figure 2 for UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture
Figure 3 for UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture
Figure 4 for UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture
Viaarxiv icon

OS-Oracle: A Comprehensive Framework for Cross-Platform GUI Critic Models

Add code
Dec 18, 2025
Figure 1 for OS-Oracle: A Comprehensive Framework for Cross-Platform GUI Critic Models
Figure 2 for OS-Oracle: A Comprehensive Framework for Cross-Platform GUI Critic Models
Figure 3 for OS-Oracle: A Comprehensive Framework for Cross-Platform GUI Critic Models
Figure 4 for OS-Oracle: A Comprehensive Framework for Cross-Platform GUI Critic Models
Viaarxiv icon

LongVie 2: Multimodal Controllable Ultra-Long Video World Model

Add code
Dec 15, 2025
Viaarxiv icon

ShotDirector: Directorially Controllable Multi-Shot Video Generation with Cinematographic Transitions

Add code
Dec 11, 2025
Figure 1 for ShotDirector: Directorially Controllable Multi-Shot Video Generation with Cinematographic Transitions
Figure 2 for ShotDirector: Directorially Controllable Multi-Shot Video Generation with Cinematographic Transitions
Figure 3 for ShotDirector: Directorially Controllable Multi-Shot Video Generation with Cinematographic Transitions
Figure 4 for ShotDirector: Directorially Controllable Multi-Shot Video Generation with Cinematographic Transitions
Viaarxiv icon

P1: Mastering Physics Olympiads with Reinforcement Learning

Add code
Nov 17, 2025
Viaarxiv icon

ViCO: A Training Strategy towards Semantic Aware Dynamic High-Resolution

Add code
Oct 14, 2025
Figure 1 for ViCO: A Training Strategy towards Semantic Aware Dynamic High-Resolution
Figure 2 for ViCO: A Training Strategy towards Semantic Aware Dynamic High-Resolution
Figure 3 for ViCO: A Training Strategy towards Semantic Aware Dynamic High-Resolution
Figure 4 for ViCO: A Training Strategy towards Semantic Aware Dynamic High-Resolution
Viaarxiv icon