Picture for Jiajun Chai

Jiajun Chai

AWPO: Enhancing Tool-Use of Large Language Models through Explicit Integration of Reasoning Rewards

Add code
Dec 23, 2025
Viaarxiv icon

ToolForge: A Data Synthesis Pipeline for Multi-Hop Search without Real-World APIs

Add code
Dec 18, 2025
Viaarxiv icon

LocalSearchBench: Benchmarking Agentic Search in Real-World Local Life Services

Add code
Dec 08, 2025
Viaarxiv icon

From Experience to Strategy: Empowering LLM Agents with Trainable Graph Memory

Add code
Nov 11, 2025
Viaarxiv icon

Promoting Efficient Reasoning with Verifiable Stepwise Reward

Add code
Aug 14, 2025
Viaarxiv icon

SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning

Add code
Jun 24, 2025
Viaarxiv icon

DipLLM: Fine-Tuning LLM for Strategic Decision-making in Diplomacy

Add code
Jun 11, 2025
Viaarxiv icon

A Hierarchical Deep Reinforcement Learning Framework for 6-DOF UCAV Air-to-Air Combat

Add code
Dec 05, 2022
Figure 1 for A Hierarchical Deep Reinforcement Learning Framework for 6-DOF UCAV Air-to-Air Combat
Figure 2 for A Hierarchical Deep Reinforcement Learning Framework for 6-DOF UCAV Air-to-Air Combat
Figure 3 for A Hierarchical Deep Reinforcement Learning Framework for 6-DOF UCAV Air-to-Air Combat
Figure 4 for A Hierarchical Deep Reinforcement Learning Framework for 6-DOF UCAV Air-to-Air Combat
Viaarxiv icon