Picture for Yali Du

Yali Du

NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning

Add code
May 21, 2025
Viaarxiv icon

Automatic Dataset Generation for Knowledge Intensive Question Answering Tasks

Add code
May 20, 2025
Viaarxiv icon

Post-Incorporating Code Structural Knowledge into LLMs via In-Context Learning for Code Translation

Add code
Mar 28, 2025
Viaarxiv icon

SocialJax: An Evaluation Suite for Multi-agent Reinforcement Learning in Sequential Social Dilemmas

Add code
Mar 18, 2025
Viaarxiv icon

GRU: Mitigating the Trade-off between Unlearning and Retention for Large Language Models

Add code
Mar 12, 2025
Viaarxiv icon

M3HF: Multi-agent Reinforcement Learning from Multi-phase Human Feedback of Mixed Quality

Add code
Mar 06, 2025
Viaarxiv icon

ATLaS: Agent Tuning via Learning Critical Steps

Add code
Mar 04, 2025
Viaarxiv icon

$\text{M}^3\text{HF}$: Multi-agent Reinforcement Learning from Multi-phase Human Feedback of Mixed Quality

Add code
Mar 03, 2025
Viaarxiv icon

CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation

Add code
Feb 28, 2025
Viaarxiv icon

Distill Not Only Data but Also Rewards: Can Smaller Language Models Surpass Larger Ones?

Add code
Feb 26, 2025
Viaarxiv icon