Picture for Manyuan Zhang

Manyuan Zhang

Unify-Agent: A Unified Multimodal Agent for World-Grounded Image Synthesis

Add code
Apr 01, 2026
Viaarxiv icon

Gen-Searcher: Reinforcing Agentic Search for Image Generation

Add code
Mar 30, 2026
Viaarxiv icon

LongCat-Next: Lexicalizing Modalities as Discrete Tokens

Add code
Mar 29, 2026
Viaarxiv icon

AutoWeather4D: Autonomous Driving Video Weather Conversion via G-Buffer Dual-Pass Editing

Add code
Mar 27, 2026
Viaarxiv icon

MACRO: Advancing Multi-Reference Image Generation with Structured Long-Context Data

Add code
Mar 26, 2026
Viaarxiv icon

RPiAE: A Representation-Pivoted Autoencoder Enhancing Both Image Generation and Editing

Add code
Mar 19, 2026
Viaarxiv icon

Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligence

Add code
Mar 08, 2026
Viaarxiv icon

OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence

Add code
Feb 09, 2026
Viaarxiv icon

OmniVideo-R1: Reinforcing Audio-visual Reasoning with Query Intention and Modality Attention

Add code
Feb 05, 2026
Viaarxiv icon

Exploring Reasoning Reward Model for Agents

Add code
Jan 29, 2026
Viaarxiv icon