Picture for Jiuxiang Gu

Jiuxiang Gu

MMGR: Multi-Modal Generative Reasoning

Add code
Dec 17, 2025
Viaarxiv icon

Sparse-LaViDa: Sparse Multimodal Discrete Diffusion Language Models

Add code
Dec 16, 2025
Viaarxiv icon

More Than the Final Answer: Improving Visual Extraction and Logical Consistency in Vision-Language Models

Add code
Dec 13, 2025
Viaarxiv icon

OIDA-QA: A Multimodal Benchmark for Analyzing the Opioid Industry Documents Archive

Add code
Nov 14, 2025
Viaarxiv icon

Image Tokenizer Needs Post-Training

Add code
Sep 15, 2025
Viaarxiv icon

MS4UI: A Dataset for Multi-modal Summarization of User Interface Instructional Videos

Add code
Jun 14, 2025
Viaarxiv icon

Refer to Anything with Vision-Language Prompts

Add code
Jun 05, 2025
Viaarxiv icon

R-KV: Redundancy-aware KV Cache Compression for Training-Free Reasoning Models Acceleration

Add code
May 30, 2025
Viaarxiv icon

Towards Visual Text Grounding of Multimodal Large Language Model

Add code
Apr 07, 2025
Figure 1 for Towards Visual Text Grounding of Multimodal Large Language Model
Figure 2 for Towards Visual Text Grounding of Multimodal Large Language Model
Figure 3 for Towards Visual Text Grounding of Multimodal Large Language Model
Figure 4 for Towards Visual Text Grounding of Multimodal Large Language Model
Viaarxiv icon

QuartDepth: Post-Training Quantization for Real-Time Depth Estimation on the Edge

Add code
Mar 20, 2025
Figure 1 for QuartDepth: Post-Training Quantization for Real-Time Depth Estimation on the Edge
Figure 2 for QuartDepth: Post-Training Quantization for Real-Time Depth Estimation on the Edge
Figure 3 for QuartDepth: Post-Training Quantization for Real-Time Depth Estimation on the Edge
Figure 4 for QuartDepth: Post-Training Quantization for Real-Time Depth Estimation on the Edge
Viaarxiv icon