Picture for Yu Qiao

Yu Qiao

ShenZhen Key Lab of Computer Vision and Pattern Recognition, SIAT-SenseTime Joint Lab, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, SIAT Branch, Shenzhen Institute of Artificial Intelligence and Robotics for Society

ShotDirector: Directorially Controllable Multi-Shot Video Generation with Cinematographic Transitions

Add code
Dec 11, 2025
Figure 1 for ShotDirector: Directorially Controllable Multi-Shot Video Generation with Cinematographic Transitions
Figure 2 for ShotDirector: Directorially Controllable Multi-Shot Video Generation with Cinematographic Transitions
Figure 3 for ShotDirector: Directorially Controllable Multi-Shot Video Generation with Cinematographic Transitions
Figure 4 for ShotDirector: Directorially Controllable Multi-Shot Video Generation with Cinematographic Transitions
Viaarxiv icon

P1: Mastering Physics Olympiads with Reinforcement Learning

Add code
Nov 17, 2025
Viaarxiv icon

ViCO: A Training Strategy towards Semantic Aware Dynamic High-Resolution

Add code
Oct 14, 2025
Figure 1 for ViCO: A Training Strategy towards Semantic Aware Dynamic High-Resolution
Figure 2 for ViCO: A Training Strategy towards Semantic Aware Dynamic High-Resolution
Figure 3 for ViCO: A Training Strategy towards Semantic Aware Dynamic High-Resolution
Figure 4 for ViCO: A Training Strategy towards Semantic Aware Dynamic High-Resolution
Viaarxiv icon

Learning on the Job: An Experience-Driven Self-Evolving Agent for Long-Horizon Tasks

Add code
Oct 09, 2025
Viaarxiv icon

MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

Add code
Sep 26, 2025
Viaarxiv icon

ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data

Add code
Sep 18, 2025
Figure 1 for ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data
Figure 2 for ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data
Figure 3 for ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data
Figure 4 for ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data
Viaarxiv icon

GenExam: A Multidisciplinary Text-to-Image Exam

Add code
Sep 17, 2025
Figure 1 for GenExam: A Multidisciplinary Text-to-Image Exam
Figure 2 for GenExam: A Multidisciplinary Text-to-Image Exam
Figure 3 for GenExam: A Multidisciplinary Text-to-Image Exam
Figure 4 for GenExam: A Multidisciplinary Text-to-Image Exam
Viaarxiv icon

A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers

Add code
Aug 28, 2025
Figure 1 for A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers
Figure 2 for A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers
Figure 3 for A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers
Figure 4 for A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers
Viaarxiv icon

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Add code
Aug 25, 2025
Figure 1 for InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Figure 2 for InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Figure 3 for InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Figure 4 for InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Viaarxiv icon

LIA-X: Interpretable Latent Portrait Animator

Add code
Aug 13, 2025
Viaarxiv icon