Picture for Xiangru Tang

Xiangru Tang

Agentic Reasoning for Large Language Models

Add code
Jan 18, 2026
Viaarxiv icon

Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows

Add code
Dec 18, 2025
Viaarxiv icon

Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving

Add code
Jul 08, 2025
Viaarxiv icon

SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks

Add code
Jul 01, 2025
Figure 1 for SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks
Figure 2 for SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks
Figure 3 for SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks
Figure 4 for SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks
Viaarxiv icon

Scaling Test-time Compute for LLM Agents

Add code
Jun 15, 2025
Viaarxiv icon

Med-PRM: Medical Reasoning Models with Stepwise, Guideline-verified Process Rewards

Add code
Jun 13, 2025
Viaarxiv icon

MedAgentGym: Training LLM Agents for Code-Based Medical Reasoning at Scale

Add code
Jun 04, 2025
Figure 1 for MedAgentGym: Training LLM Agents for Code-Based Medical Reasoning at Scale
Figure 2 for MedAgentGym: Training LLM Agents for Code-Based Medical Reasoning at Scale
Figure 3 for MedAgentGym: Training LLM Agents for Code-Based Medical Reasoning at Scale
Figure 4 for MedAgentGym: Training LLM Agents for Code-Based Medical Reasoning at Scale
Viaarxiv icon

Beyond Chemical QA: Evaluating LLM's Chemical Reasoning with Modular Chemical Operations

Add code
May 27, 2025
Viaarxiv icon

ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows

Add code
May 26, 2025
Figure 1 for ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows
Figure 2 for ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows
Figure 3 for ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows
Figure 4 for ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows
Viaarxiv icon

KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation

Add code
May 21, 2025
Figure 1 for KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation
Figure 2 for KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation
Figure 3 for KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation
Figure 4 for KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation
Viaarxiv icon