Picture for Wentao Zhang

Wentao Zhang

M2A: Multimodal Memory Agent with Dual-Layer Hybrid Memory for Long-Term Personalized Interactions

Add code
Feb 07, 2026
Viaarxiv icon

AD-MIR: Bridging the Gap from Perception to Persuasion in Advertising Video Understanding via Structured Reasoning

Add code
Feb 07, 2026
Viaarxiv icon

Guided Verifier: Collaborative Multimodal Reasoning via Dynamic Process Supervision

Add code
Feb 04, 2026
Viaarxiv icon

Semantic Routing: Exploring Multi-Layer LLM Feature Weighting for Diffusion Transformers

Add code
Feb 03, 2026
Viaarxiv icon

From Knowing to Doing Precisely: A General Self-Correction and Termination Framework for VLA models

Add code
Feb 02, 2026
Viaarxiv icon

Research on World Models Is Not Merely Injecting World Knowledge into Specific Tasks

Add code
Feb 02, 2026
Viaarxiv icon

Exploring Information Seeking Agent Consolidation

Add code
Jan 31, 2026
Viaarxiv icon

ProphetKV: User-Query-Driven Selective Recomputation for Efficient KV Cache Reuse in Retrieval-Augmented Generation

Add code
Jan 31, 2026
Viaarxiv icon

DataCross: A Unified Benchmark and Agent Framework for Cross-Modal Heterogeneous Data Analysis

Add code
Jan 29, 2026
Viaarxiv icon

MMFineReason: Closing the Multimodal Reasoning Gap via Open Data-Centric Methods

Add code
Jan 29, 2026
Viaarxiv icon