Picture for Jiajun Chai

Jiajun Chai

CDRRM: Contrast-Driven Rubric Generation for Reliable and Interpretable Reward Modeling

Add code
Mar 09, 2026
Viaarxiv icon

SAE as a Crystal Ball: Interpretable Features Predict Cross-domain Transferability of LLMs without Training

Add code
Mar 03, 2026
Viaarxiv icon

Contextual Rollout Bandits for Reinforcement Learning with Verifiable Rewards

Add code
Feb 09, 2026
Viaarxiv icon

Your Group-Relative Advantage Is Biased

Add code
Jan 13, 2026
Viaarxiv icon

AWPO: Enhancing Tool-Use of Large Language Models through Explicit Integration of Reasoning Rewards

Add code
Dec 23, 2025
Viaarxiv icon

ToolForge: A Data Synthesis Pipeline for Multi-Hop Search without Real-World APIs

Add code
Dec 18, 2025
Viaarxiv icon

LocalSearchBench: Benchmarking Agentic Search in Real-World Local Life Services

Add code
Dec 08, 2025
Viaarxiv icon

From Experience to Strategy: Empowering LLM Agents with Trainable Graph Memory

Add code
Nov 11, 2025
Viaarxiv icon

Promoting Efficient Reasoning with Verifiable Stepwise Reward

Add code
Aug 14, 2025
Viaarxiv icon

SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning

Add code
Jun 24, 2025
Viaarxiv icon