Picture for Mingrui Wu

Mingrui Wu

Alibaba Group

Test-Time Computing for Referring Multimodal Large Language Models

Add code
Feb 23, 2026
Viaarxiv icon

MICON-Bench: Benchmarking and Enhancing Multi-Image Context Image Generation in Unified Multimodal Models

Add code
Feb 23, 2026
Viaarxiv icon

From Atoms to Trees: Building a Structured Feature Forest with Hierarchical Sparse Autoencoders

Add code
Feb 12, 2026
Viaarxiv icon

Do Images Speak Louder than Words? Investigating the Effect of Textual Misinformation in VLMs

Add code
Jan 27, 2026
Viaarxiv icon

From Indoor to Open World: Revealing the Spatial Reasoning Gap in MLLMs

Add code
Dec 22, 2025
Figure 1 for From Indoor to Open World: Revealing the Spatial Reasoning Gap in MLLMs
Figure 2 for From Indoor to Open World: Revealing the Spatial Reasoning Gap in MLLMs
Figure 3 for From Indoor to Open World: Revealing the Spatial Reasoning Gap in MLLMs
Figure 4 for From Indoor to Open World: Revealing the Spatial Reasoning Gap in MLLMs
Viaarxiv icon

PromptEnhancer: A Simple Approach to Enhance Text-to-Image Models via Chain-of-Thought Prompt Rewriting

Add code
Sep 04, 2025
Viaarxiv icon

MIHBench: Benchmarking and Mitigating Multi-Image Hallucinations in Multimodal Large Language Models

Add code
Aug 01, 2025
Viaarxiv icon

RePrompt: Reasoning-Augmented Reprompting for Text-to-Image Generation via Reinforcement Learning

Add code
May 23, 2025
Viaarxiv icon

Param$Δ$ for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost

Add code
Apr 23, 2025
Viaarxiv icon

Vision Calorimeter for Anti-neutron Reconstruction: A Baseline

Add code
Aug 20, 2024
Figure 1 for Vision Calorimeter for Anti-neutron Reconstruction: A Baseline
Figure 2 for Vision Calorimeter for Anti-neutron Reconstruction: A Baseline
Figure 3 for Vision Calorimeter for Anti-neutron Reconstruction: A Baseline
Figure 4 for Vision Calorimeter for Anti-neutron Reconstruction: A Baseline
Viaarxiv icon