Picture for Xuanjing Huang

Xuanjing Huang

Agentic Harness Engineering: Observability-Driven Automatic Evolution of Coding-Agent Harnesses

Add code
Apr 28, 2026
Viaarxiv icon

Beyond Rating: A Comprehensive Evaluation and Benchmark for AI Reviews

Add code
Apr 22, 2026
Viaarxiv icon

EVPO: Explained Variance Policy Optimization for Adaptive Critic Utilization in LLM Post-Training

Add code
Apr 21, 2026
Viaarxiv icon

Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges

Add code
Apr 15, 2026
Viaarxiv icon

MM-Doc-R1: Training Agents for Long Document Visual Question Answering through Multi-turn Reinforcement Learning

Add code
Apr 15, 2026
Viaarxiv icon

Steering the Verifiability of Multimodal AI Hallucinations

Add code
Apr 08, 2026
Viaarxiv icon

Beyond Attention Magnitude: Leveraging Inter-layer Rank Consistency for Efficient Vision-Language-Action Models

Add code
Mar 26, 2026
Viaarxiv icon

FinToolSyn: A forward synthesis Framework for Financial Tool-Use Dialogue Data with Dynamic Tool Retrieval

Add code
Mar 25, 2026
Viaarxiv icon

JFTA-Bench: Evaluate LLM's Ability of Tracking and Analyzing Malfunctions Using Fault Trees

Add code
Mar 24, 2026
Viaarxiv icon

CCTU: A Benchmark for Tool Use under Complex Constraints

Add code
Mar 16, 2026
Viaarxiv icon