Picture for Xuanjing Huang

Xuanjing Huang

Can RL Improve Generalization of LLM Agents? An Empirical Study

Add code
Mar 12, 2026
Viaarxiv icon

LifeSim: Long-Horizon User Life Simulator for Personalized Assistant Evaluation

Add code
Mar 12, 2026
Viaarxiv icon

MagicAgent: Towards Generalized Agent Planning

Add code
Feb 22, 2026
Viaarxiv icon

SciAgentGym: Benchmarking Multi-Step Scientific Tool-use in LLM Agents

Add code
Feb 13, 2026
Viaarxiv icon

Advancing Block Diffusion Language Models for Test-Time Scaling

Add code
Feb 11, 2026
Viaarxiv icon

Mirror: A Multi-Agent System for AI-Assisted Ethics Review

Add code
Feb 09, 2026
Viaarxiv icon

Emergent Structured Representations Support Flexible In-Context Inference in Large Language Models

Add code
Feb 08, 2026
Viaarxiv icon

DFPO: Scaling Value Modeling via Distributional Flow towards Robust and Generalizable LLM Post-Training

Add code
Feb 05, 2026
Viaarxiv icon

OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions

Add code
Feb 05, 2026
Viaarxiv icon

Outcome Accuracy is Not Enough: Aligning the Reasoning Process of Reward Models

Add code
Feb 04, 2026
Viaarxiv icon