Picture for Tong Xu

Tong Xu

VehicleMemBench: An Executable Benchmark for Multi-User Long-Term Memory in In-Vehicle Agents

Add code
Mar 25, 2026
Viaarxiv icon

PERMA: Benchmarking Personalized Memory Agents via Event-Driven Preference and Realistic Task Environments

Add code
Mar 24, 2026
Viaarxiv icon

How Foundational Skills Influence VLM-based Embodied Agents:A Native Perspective

Add code
Feb 24, 2026
Viaarxiv icon

PRIME: A Process-Outcome Alignment Benchmark for Verifiable Reasoning in Mathematics and Engineering

Add code
Feb 12, 2026
Viaarxiv icon

Mock Worlds, Real Skills: Building Small Agentic Language Models with Synthetic Tasks, Simulated Environments, and Rubric-Based Rewards

Add code
Jan 30, 2026
Viaarxiv icon

Token-level Collaborative Alignment for LLM-based Generative Recommendation

Add code
Jan 26, 2026
Viaarxiv icon

From Tags to Trees: Structuring Fine-Grained Knowledge for Controllable Data Selection in LLM Instruction Tuning

Add code
Jan 20, 2026
Viaarxiv icon

VIGIL: Defending LLM Agents Against Tool Stream Injection via Verify-Before-Commit

Add code
Jan 09, 2026
Viaarxiv icon

DynaDebate: Breaking Homogeneity in Multi-Agent Debate with Dynamic Path Generation

Add code
Jan 09, 2026
Viaarxiv icon

Look As You Think: Unifying Reasoning and Visual Evidence Attribution for Verifiable Document RAG via Reinforcement Learning

Add code
Nov 15, 2025
Viaarxiv icon