Picture for Zhengrong Yue

Zhengrong Yue

VideoChat-A1: Thinking with Long Videos by Chain-of-Shot Reasoning

Add code
Jun 06, 2025
Viaarxiv icon

V-Stylist: Video Stylization via Collaboration and Reflection of MLLM Agents

Add code
Mar 15, 2025
Figure 1 for V-Stylist: Video Stylization via Collaboration and Reflection of MLLM Agents
Figure 2 for V-Stylist: Video Stylization via Collaboration and Reflection of MLLM Agents
Figure 3 for V-Stylist: Video Stylization via Collaboration and Reflection of MLLM Agents
Figure 4 for V-Stylist: Video Stylization via Collaboration and Reflection of MLLM Agents
Viaarxiv icon

LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents

Add code
Mar 13, 2025
Viaarxiv icon

TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning

Add code
Oct 25, 2024
Viaarxiv icon

MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration

Add code
Aug 21, 2024
Figure 1 for MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration
Figure 2 for MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration
Figure 3 for MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration
Figure 4 for MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration
Viaarxiv icon