Picture for Xiangru Tang

Xiangru Tang

SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks

Add code
Jul 01, 2025
Viaarxiv icon

Scaling Test-time Compute for LLM Agents

Add code
Jun 15, 2025
Viaarxiv icon

Med-PRM: Medical Reasoning Models with Stepwise, Guideline-verified Process Rewards

Add code
Jun 13, 2025
Viaarxiv icon

MedAgentGym: Training LLM Agents for Code-Based Medical Reasoning at Scale

Add code
Jun 04, 2025
Viaarxiv icon

Beyond Chemical QA: Evaluating LLM's Chemical Reasoning with Modular Chemical Operations

Add code
May 27, 2025
Viaarxiv icon

ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows

Add code
May 26, 2025
Viaarxiv icon

KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation

Add code
May 21, 2025
Viaarxiv icon

Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

Add code
Mar 31, 2025
Viaarxiv icon

LocAgent: Graph-Guided LLM Agents for Code Localization

Add code
Mar 12, 2025
Viaarxiv icon

MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning

Add code
Mar 10, 2025
Viaarxiv icon