Picture for Jiaming Han

Jiaming Han

GIDE: Unlocking Diffusion LLMs for Precise Training-Free Image Editing

Add code
Mar 22, 2026
Viaarxiv icon

Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens

Add code
Mar 19, 2026
Viaarxiv icon

UniWeTok: An Unified Binary Tokenizer with Codebook Size $\mathit{2^{128}}$ for Unified Multimodal Large Language Model

Add code
Feb 15, 2026
Viaarxiv icon

BitDance: Scaling Autoregressive Generative Models with Binary Tokens

Add code
Feb 15, 2026
Viaarxiv icon

Growing Visual Generative Capacity for Pre-Trained MLLMs

Add code
Oct 02, 2025
Viaarxiv icon

ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents

Add code
Jul 30, 2025
Viaarxiv icon

Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations

Add code
Jun 23, 2025
Figure 1 for Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations
Figure 2 for Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations
Figure 3 for Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations
Figure 4 for Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations
Viaarxiv icon

CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms

Add code
May 22, 2025
Viaarxiv icon

Multimodal Long Video Modeling Based on Temporal Dynamic Context

Add code
Apr 14, 2025
Viaarxiv icon

Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation

Add code
Feb 23, 2025
Figure 1 for Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation
Figure 2 for Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation
Figure 3 for Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation
Figure 4 for Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation
Viaarxiv icon