Picture for Yu-Xiong Wang

Yu-Xiong Wang

Refer to Anything with Vision-Language Prompts

Add code
Jun 05, 2025
Viaarxiv icon

Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought

Add code
May 29, 2025
Viaarxiv icon

MR. Video: "MapReduce" is the Principle for Long Video Understanding

Add code
Apr 22, 2025
Viaarxiv icon

Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception

Add code
Apr 15, 2025
Viaarxiv icon

AgMMU: A Comprehensive Agricultural Multimodal Understanding and Reasoning Benchmark

Add code
Apr 14, 2025
Viaarxiv icon

GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation

Add code
Apr 10, 2025
Viaarxiv icon

V2Edit: Versatile Video Diffusion Editor for Videos and 3D Scenes

Add code
Mar 13, 2025
Viaarxiv icon

InterMimic: Towards Universal Whole-Body Control for Physics-Based Human-Object Interactions

Add code
Feb 27, 2025
Viaarxiv icon

Proposer-Agent-Evaluator(PAE): Autonomous Skill Discovery For Foundation Model Internet Agents

Add code
Dec 17, 2024
Viaarxiv icon

Can We Generate Visual Programs Without Prompting LLMs?

Add code
Dec 11, 2024
Viaarxiv icon