Picture for Rongyao Fang

Rongyao Fang

Unified Multimodal Autoregressive Modeling with Shared Context-Visual Tokenizer is Key to Unification

Add code
Jun 17, 2026
Viaarxiv icon

HomeWorld: A Unified Floorplan-to-Furnished Framework for Generating Controllable, Densely Interactive Whole-Home Scenes

Add code
Jun 04, 2026
Viaarxiv icon

MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning

Add code
Oct 16, 2025
Viaarxiv icon

FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark

Add code
Sep 11, 2025
Viaarxiv icon

T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image Generation

Add code
Aug 24, 2025
Figure 1 for T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image Generation
Figure 2 for T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image Generation
Figure 3 for T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image Generation
Figure 4 for T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image Generation
Viaarxiv icon

GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning

Add code
May 22, 2025
Viaarxiv icon

CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms

Add code
May 22, 2025
Viaarxiv icon

SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving

Add code
May 22, 2025
Viaarxiv icon

GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing

Add code
Mar 13, 2025
Viaarxiv icon

StreamChat: Chatting with Streaming Video

Add code
Dec 11, 2024
Figure 1 for StreamChat: Chatting with Streaming Video
Figure 2 for StreamChat: Chatting with Streaming Video
Figure 3 for StreamChat: Chatting with Streaming Video
Figure 4 for StreamChat: Chatting with Streaming Video
Viaarxiv icon