Picture for Rongyao Fang

Rongyao Fang

CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms

Add code
May 22, 2025
Viaarxiv icon

GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning

Add code
May 22, 2025
Viaarxiv icon

SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving

Add code
May 22, 2025
Viaarxiv icon

GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing

Add code
Mar 13, 2025
Viaarxiv icon

StreamChat: Chatting with Streaming Video

Add code
Dec 11, 2024
Viaarxiv icon

PUMA: Empowering Unified MLLM with Multi-granular Visual Generation

Add code
Oct 17, 2024
Figure 1 for PUMA: Empowering Unified MLLM with Multi-granular Visual Generation
Figure 2 for PUMA: Empowering Unified MLLM with Multi-granular Visual Generation
Figure 3 for PUMA: Empowering Unified MLLM with Multi-granular Visual Generation
Figure 4 for PUMA: Empowering Unified MLLM with Multi-granular Visual Generation
Viaarxiv icon

FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis

Add code
Mar 19, 2024
Viaarxiv icon

InstructSeq: Unifying Vision Tasks with Instruction-conditioned Multi-modal Sequence Generation

Add code
Nov 30, 2023
Viaarxiv icon

Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking

Add code
Mar 09, 2023
Viaarxiv icon

FeatAug-DETR: Enriching One-to-Many Matching for DETRs with Feature Augmentation

Add code
Mar 02, 2023
Viaarxiv icon