Picture for Zhou Zhao

Zhou Zhao

OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use

Add code
Aug 06, 2025
Viaarxiv icon

EC-Diff: Fast and High-Quality Edge-Cloud Collaborative Inference for Diffusion Models

Add code
Jul 16, 2025
Viaarxiv icon

STARS: A Unified Framework for Singing Transcription, Alignment, and Refined Style Annotation

Add code
Jul 09, 2025
Viaarxiv icon

ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing

Add code
Jun 26, 2025
Viaarxiv icon

GenSpace: Benchmarking Spatially-Aware Image Generation

Add code
May 30, 2025
Viaarxiv icon

IRBridge: Solving Image Restoration Bridge with Pre-trained Generative Diffusion Models

Add code
May 30, 2025
Viaarxiv icon

TCSinger 2: Customizable Multilingual Zero-shot Singing Voice Synthesis

Add code
May 20, 2025
Viaarxiv icon

T2A-Feedback: Improving Basic Capabilities of Text-to-Audio Generation via Fine-grained AI Feedback

Add code
May 15, 2025
Viaarxiv icon

Depth Anything with Any Prior

Add code
May 15, 2025
Viaarxiv icon

WavReward: Spoken Dialogue Models With Generalist Reward Evaluators

Add code
May 14, 2025
Viaarxiv icon