Picture for Ze Xu

Ze Xu

Mobile-Agent-v3.5: Multi-platform Fundamental GUI Agents

Add code
Feb 15, 2026
Viaarxiv icon

P-GenRM: Personalized Generative Reward Model with Test-time User-based Scaling

Add code
Feb 12, 2026
Viaarxiv icon

Unmasking Reasoning Processes: A Process-aware Benchmark for Evaluating Structural Mathematical Reasoning in LLMs

Add code
Jan 31, 2026
Viaarxiv icon

EcomBench: Towards Holistic Evaluation of Foundation Agents in E-commerce

Add code
Dec 11, 2025
Figure 1 for EcomBench: Towards Holistic Evaluation of Foundation Agents in E-commerce
Figure 2 for EcomBench: Towards Holistic Evaluation of Foundation Agents in E-commerce
Figure 3 for EcomBench: Towards Holistic Evaluation of Foundation Agents in E-commerce
Figure 4 for EcomBench: Towards Holistic Evaluation of Foundation Agents in E-commerce
Viaarxiv icon