Picture for Rujun Han

Rujun Han

RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards

Add code
May 11, 2026
Viaarxiv icon

SkillOS: Learning Skill Curation for Self-Evolving Agents

Add code
May 07, 2026
Viaarxiv icon

SAGE: Steerable Agentic Data Generation for Deep Search with Execution Feedback

Add code
Jan 26, 2026
Viaarxiv icon

Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning

Add code
Oct 29, 2025
Viaarxiv icon

In Prospect and Retrospect: Reflective Memory Management for Long-term Personalized Dialogue Agents

Add code
Mar 11, 2025
Viaarxiv icon

Reverse Thinking Makes LLMs Stronger Reasoners

Add code
Nov 29, 2024
Figure 1 for Reverse Thinking Makes LLMs Stronger Reasoners
Figure 2 for Reverse Thinking Makes LLMs Stronger Reasoners
Figure 3 for Reverse Thinking Makes LLMs Stronger Reasoners
Figure 4 for Reverse Thinking Makes LLMs Stronger Reasoners
Viaarxiv icon

Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling

Add code
Oct 15, 2024
Figure 1 for Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling
Figure 2 for Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling
Figure 3 for Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling
Figure 4 for Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling
Viaarxiv icon

Dancing in Chains: Reconciling Instruction Following and Faithfulness in Language Models

Add code
Jul 31, 2024
Viaarxiv icon

RAG-QA Arena: Evaluating Domain Robustness for Long-form Retrieval Augmented Question Answering

Add code
Jul 19, 2024
Viaarxiv icon

ACCENT: An Automatic Event Commonsense Evaluation Metric for Open-Domain Dialogue Systems

Add code
May 12, 2023
Figure 1 for ACCENT: An Automatic Event Commonsense Evaluation Metric for Open-Domain Dialogue Systems
Figure 2 for ACCENT: An Automatic Event Commonsense Evaluation Metric for Open-Domain Dialogue Systems
Figure 3 for ACCENT: An Automatic Event Commonsense Evaluation Metric for Open-Domain Dialogue Systems
Figure 4 for ACCENT: An Automatic Event Commonsense Evaluation Metric for Open-Domain Dialogue Systems
Viaarxiv icon