Picture for Ziyu Guo

Ziyu Guo

Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos

Add code
Jun 05, 2025
Viaarxiv icon

Adaptive Classifier-Free Guidance via Dynamic Low-Confidence Masking

Add code
May 26, 2025
Viaarxiv icon

Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO

Add code
May 22, 2025
Viaarxiv icon

CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms

Add code
May 22, 2025
Viaarxiv icon

UniCTokens: Boosting Personalized Understanding and Generation via Unified Concept Tokens

Add code
May 20, 2025
Viaarxiv icon

T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT

Add code
May 01, 2025
Viaarxiv icon

StyleMotif: Multi-Modal Motion Stylization using Style-Content Cross Fusion

Add code
Mar 27, 2025
Viaarxiv icon

Less is More: Improving Motion Diffusion Models with Sparse Keyframes

Add code
Mar 18, 2025
Viaarxiv icon

HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model

Add code
Mar 13, 2025
Viaarxiv icon

SciVerse: Unveiling the Knowledge Comprehension and Visual Reasoning of LMMs on Multi-modal Scientific Problems

Add code
Mar 13, 2025
Viaarxiv icon