Picture for Yuwei Wu

Yuwei Wu

Large Language Models are Demonstration Pre-Selectors for Themselves

Add code
Jun 06, 2025
Viaarxiv icon

Multi-Sourced Compositional Generalization in Visual Question Answering

Add code
May 29, 2025
Viaarxiv icon

Chain-of-Focus: Adaptive Visual Search and Zooming for Multimodal Reasoning via RL

Add code
May 21, 2025
Viaarxiv icon

Memory-Centric Embodied Question Answer

Add code
May 20, 2025
Viaarxiv icon

Diving into the Fusion of Monocular Priors for Generalized Stereo Matching

Add code
May 20, 2025
Viaarxiv icon

Multi-Label Stereo Matching for Transparent Scene Depth Estimation

Add code
May 20, 2025
Viaarxiv icon

3D Visual Illusion Depth Estimation

Add code
May 19, 2025
Viaarxiv icon

LLM-Land: Large Language Models for Context-Aware Drone Landing

Add code
May 09, 2025
Viaarxiv icon

Iterative Tool Usage Exploration for Multimodal Agents via Step-wise Preference Tuning

Add code
May 06, 2025
Viaarxiv icon

Iterative Trajectory Exploration for Multimodal Agents

Add code
Apr 30, 2025
Viaarxiv icon