Picture for Zheng Ge

Zheng Ge

DGAE: Diffusion-Guided Autoencoder for Efficient Latent Representation Learning

Add code
Jun 11, 2025
Viaarxiv icon

Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model

Add code
Jun 10, 2025
Viaarxiv icon

Step1X-Edit: A Practical Framework for General Image Editing

Add code
Apr 24, 2025
Viaarxiv icon

Perception-R1: Pioneering Perception Policy with Reinforcement Learning

Add code
Apr 10, 2025
Viaarxiv icon

Perception in Reflection

Add code
Apr 09, 2025
Viaarxiv icon

M-DocSum: Do LVLMs Genuinely Comprehend Interleaved Image-Text in Document Summarization?

Add code
Mar 27, 2025
Viaarxiv icon

Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation Model

Add code
Mar 14, 2025
Viaarxiv icon

Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction

Add code
Feb 18, 2025
Viaarxiv icon

Unhackable Temporal Rewarding for Scalable Video MLLMs

Add code
Feb 17, 2025
Viaarxiv icon

PerPO: Perceptual Preference Optimization via Discriminative Rewarding

Add code
Feb 05, 2025
Viaarxiv icon