Picture for Zeyuan Chen

Zeyuan Chen

MTA-Agent: An Open Recipe for Multimodal Deep Search Agents

Add code
Apr 07, 2026
Viaarxiv icon

How Far Are Vision-Language Models from Constructing the Real World? A Benchmark for Physical Generative Reasoning

Add code
Mar 25, 2026
Viaarxiv icon

Real Money, Fake Models: Deceptive Model Claims in Shadow APIs

Add code
Mar 05, 2026
Viaarxiv icon

CADGrasp: Learning Contact and Collision Aware General Dexterous Grasping in Cluttered Scenes

Add code
Jan 21, 2026
Viaarxiv icon

Soft Tail-dropping for Adaptive Visual Tokenization

Add code
Jan 20, 2026
Viaarxiv icon

APEX: Academic Poster Editing Agentic Expert

Add code
Jan 08, 2026
Viaarxiv icon

CVP: Central-Peripheral Vision-Inspired Multimodal Model for Spatial Reasoning

Add code
Dec 09, 2025
Viaarxiv icon

C3Editor: Achieving Controllable Consistency in 2D Model for 3D Editing

Add code
Oct 06, 2025
Viaarxiv icon

WALT: Web Agents that Learn Tools

Add code
Oct 01, 2025
Viaarxiv icon

SCUBA: Salesforce Computer Use Benchmark

Add code
Sep 30, 2025
Figure 1 for SCUBA: Salesforce Computer Use Benchmark
Figure 2 for SCUBA: Salesforce Computer Use Benchmark
Figure 3 for SCUBA: Salesforce Computer Use Benchmark
Figure 4 for SCUBA: Salesforce Computer Use Benchmark
Viaarxiv icon