Picture for Ze Xu

Ze Xu

Unmasking Reasoning Processes: A Process-aware Benchmark for Evaluating Structural Mathematical Reasoning in LLMs

Add code
Jan 31, 2026
Viaarxiv icon

EcomBench: Towards Holistic Evaluation of Foundation Agents in E-commerce

Add code
Dec 11, 2025
Figure 1 for EcomBench: Towards Holistic Evaluation of Foundation Agents in E-commerce
Figure 2 for EcomBench: Towards Holistic Evaluation of Foundation Agents in E-commerce
Figure 3 for EcomBench: Towards Holistic Evaluation of Foundation Agents in E-commerce
Figure 4 for EcomBench: Towards Holistic Evaluation of Foundation Agents in E-commerce
Viaarxiv icon