Picture for Jiuxiang Gu

Jiuxiang Gu

MiLDEdit: Reasoning-Based Multi-Layer Design Document Editing

Add code
Jan 08, 2026
Viaarxiv icon

MMGR: Multi-Modal Generative Reasoning

Add code
Dec 17, 2025
Figure 1 for MMGR: Multi-Modal Generative Reasoning
Figure 2 for MMGR: Multi-Modal Generative Reasoning
Figure 3 for MMGR: Multi-Modal Generative Reasoning
Figure 4 for MMGR: Multi-Modal Generative Reasoning
Viaarxiv icon

Sparse-LaViDa: Sparse Multimodal Discrete Diffusion Language Models

Add code
Dec 16, 2025
Viaarxiv icon

More Than the Final Answer: Improving Visual Extraction and Logical Consistency in Vision-Language Models

Add code
Dec 13, 2025
Viaarxiv icon

OIDA-QA: A Multimodal Benchmark for Analyzing the Opioid Industry Documents Archive

Add code
Nov 14, 2025
Viaarxiv icon

Image Tokenizer Needs Post-Training

Add code
Sep 15, 2025
Viaarxiv icon

MS4UI: A Dataset for Multi-modal Summarization of User Interface Instructional Videos

Add code
Jun 14, 2025
Viaarxiv icon

Refer to Anything with Vision-Language Prompts

Add code
Jun 05, 2025
Viaarxiv icon

R-KV: Redundancy-aware KV Cache Compression for Training-Free Reasoning Models Acceleration

Add code
May 30, 2025
Viaarxiv icon

Towards Visual Text Grounding of Multimodal Large Language Model

Add code
Apr 07, 2025
Figure 1 for Towards Visual Text Grounding of Multimodal Large Language Model
Figure 2 for Towards Visual Text Grounding of Multimodal Large Language Model
Figure 3 for Towards Visual Text Grounding of Multimodal Large Language Model
Figure 4 for Towards Visual Text Grounding of Multimodal Large Language Model
Viaarxiv icon