Picture for Yan Zhou

Yan Zhou

Department of Radiology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China

Kling-Omni Technical Report

Add code
Dec 18, 2025
Figure 1 for Kling-Omni Technical Report
Figure 2 for Kling-Omni Technical Report
Figure 3 for Kling-Omni Technical Report
Figure 4 for Kling-Omni Technical Report
Viaarxiv icon

KlingAvatar 2.0 Technical Report

Add code
Dec 15, 2025
Viaarxiv icon

UnityVideo: Unified Multi-Modal Multi-Task Learning for Enhancing World-Aware Video Generation

Add code
Dec 08, 2025
Viaarxiv icon

Few to Big: Prototype Expansion Network via Diffusion Learner for Point Cloud Few-shot Semantic Segmentation

Add code
Sep 16, 2025
Viaarxiv icon

MIDAS: Multimodal Interactive Digital-humAn Synthesis via Real-time Autoregressive Video Generation

Add code
Aug 28, 2025
Viaarxiv icon

A Physics-Driven Neural Network with Parameter Embedding for Generating Quantitative MR Maps from Weighted Images

Add code
Aug 11, 2025
Viaarxiv icon

DiffCap: Diffusion-based Real-time Human Motion Capture using Sparse IMUs and a Monocular Camera

Add code
Aug 08, 2025
Figure 1 for DiffCap: Diffusion-based Real-time Human Motion Capture using Sparse IMUs and a Monocular Camera
Figure 2 for DiffCap: Diffusion-based Real-time Human Motion Capture using Sparse IMUs and a Monocular Camera
Figure 3 for DiffCap: Diffusion-based Real-time Human Motion Capture using Sparse IMUs and a Monocular Camera
Figure 4 for DiffCap: Diffusion-based Real-time Human Motion Capture using Sparse IMUs and a Monocular Camera
Viaarxiv icon

Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model

Add code
Jun 16, 2025
Viaarxiv icon

AgentAlign: Navigating Safety Alignment in the Shift from Informative to Agentic Large Language Models

Add code
May 29, 2025
Viaarxiv icon

Can Multimodal Large Language Models Understand Spatial Relations?

Add code
May 25, 2025
Viaarxiv icon