Picture for Wentao Zhang

Wentao Zhang

BrowseComp-$V^3$: A Visual, Vertical, and Verifiable Benchmark for Multimodal Browsing Agents

Add code
Feb 13, 2026
Viaarxiv icon

GENIUS: Generative Fluid Intelligence Evaluation Suite

Add code
Feb 11, 2026
Viaarxiv icon

Canvas-of-Thought: Grounding Reasoning via Mutable Structured States

Add code
Feb 11, 2026
Viaarxiv icon

SciFlow-Bench: Evaluating Structure-Aware Scientific Diagram Generation via Inverse Parsing

Add code
Feb 10, 2026
Viaarxiv icon

EvoCodeBench: A Human-Performance Benchmark for Self-Evolving LLM-Driven Coding Systems

Add code
Feb 10, 2026
Viaarxiv icon

M2A: Multimodal Memory Agent with Dual-Layer Hybrid Memory for Long-Term Personalized Interactions

Add code
Feb 07, 2026
Viaarxiv icon

AD-MIR: Bridging the Gap from Perception to Persuasion in Advertising Video Understanding via Structured Reasoning

Add code
Feb 07, 2026
Viaarxiv icon

Guided Verifier: Collaborative Multimodal Reasoning via Dynamic Process Supervision

Add code
Feb 04, 2026
Viaarxiv icon

Semantic Routing: Exploring Multi-Layer LLM Feature Weighting for Diffusion Transformers

Add code
Feb 03, 2026
Viaarxiv icon

From Knowing to Doing Precisely: A General Self-Correction and Termination Framework for VLA models

Add code
Feb 02, 2026
Viaarxiv icon