Picture for Yali Du

Yali Du

M3HF: Multi-agent Reinforcement Learning from Multi-phase Human Feedback of Mixed Quality

Add code
Mar 06, 2025
Figure 1 for M3HF: Multi-agent Reinforcement Learning from Multi-phase Human Feedback of Mixed Quality
Figure 2 for M3HF: Multi-agent Reinforcement Learning from Multi-phase Human Feedback of Mixed Quality
Figure 3 for M3HF: Multi-agent Reinforcement Learning from Multi-phase Human Feedback of Mixed Quality
Figure 4 for M3HF: Multi-agent Reinforcement Learning from Multi-phase Human Feedback of Mixed Quality
Viaarxiv icon

ATLaS: Agent Tuning via Learning Critical Steps

Add code
Mar 04, 2025
Figure 1 for ATLaS: Agent Tuning via Learning Critical Steps
Figure 2 for ATLaS: Agent Tuning via Learning Critical Steps
Figure 3 for ATLaS: Agent Tuning via Learning Critical Steps
Figure 4 for ATLaS: Agent Tuning via Learning Critical Steps
Viaarxiv icon

$\text{M}^3\text{HF}$: Multi-agent Reinforcement Learning from Multi-phase Human Feedback of Mixed Quality

Add code
Mar 03, 2025
Figure 1 for $\text{M}^3\text{HF}$: Multi-agent Reinforcement Learning from Multi-phase Human Feedback of Mixed Quality
Figure 2 for $\text{M}^3\text{HF}$: Multi-agent Reinforcement Learning from Multi-phase Human Feedback of Mixed Quality
Figure 3 for $\text{M}^3\text{HF}$: Multi-agent Reinforcement Learning from Multi-phase Human Feedback of Mixed Quality
Figure 4 for $\text{M}^3\text{HF}$: Multi-agent Reinforcement Learning from Multi-phase Human Feedback of Mixed Quality
Viaarxiv icon

CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation

Add code
Feb 28, 2025
Figure 1 for CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation
Figure 2 for CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation
Figure 3 for CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation
Figure 4 for CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation
Viaarxiv icon

Distill Not Only Data but Also Rewards: Can Smaller Language Models Surpass Larger Ones?

Add code
Feb 26, 2025
Figure 1 for Distill Not Only Data but Also Rewards: Can Smaller Language Models Surpass Larger Ones?
Figure 2 for Distill Not Only Data but Also Rewards: Can Smaller Language Models Surpass Larger Ones?
Figure 3 for Distill Not Only Data but Also Rewards: Can Smaller Language Models Surpass Larger Ones?
Figure 4 for Distill Not Only Data but Also Rewards: Can Smaller Language Models Surpass Larger Ones?
Viaarxiv icon

VLP: Vision-Language Preference Learning for Embodied Manipulation

Add code
Feb 17, 2025
Figure 1 for VLP: Vision-Language Preference Learning for Embodied Manipulation
Figure 2 for VLP: Vision-Language Preference Learning for Embodied Manipulation
Figure 3 for VLP: Vision-Language Preference Learning for Embodied Manipulation
Figure 4 for VLP: Vision-Language Preference Learning for Embodied Manipulation
Viaarxiv icon

RAT: Adversarial Attacks on Deep Reinforcement Agents for Targeted Behaviors

Add code
Dec 14, 2024
Viaarxiv icon

RuAG: Learned-rule-augmented Generation for Large Language Models

Add code
Nov 04, 2024
Figure 1 for RuAG: Learned-rule-augmented Generation for Large Language Models
Figure 2 for RuAG: Learned-rule-augmented Generation for Large Language Models
Figure 3 for RuAG: Learned-rule-augmented Generation for Large Language Models
Figure 4 for RuAG: Learned-rule-augmented Generation for Large Language Models
Viaarxiv icon

A Joint Learning Model with Variational Interaction for Multilingual Program Translation

Add code
Aug 25, 2024
Figure 1 for A Joint Learning Model with Variational Interaction for Multilingual Program Translation
Figure 2 for A Joint Learning Model with Variational Interaction for Multilingual Program Translation
Figure 3 for A Joint Learning Model with Variational Interaction for Multilingual Program Translation
Figure 4 for A Joint Learning Model with Variational Interaction for Multilingual Program Translation
Viaarxiv icon

Explaining an Agent's Future Beliefs through Temporally Decomposing Future Reward Estimators

Add code
Aug 15, 2024
Figure 1 for Explaining an Agent's Future Beliefs through Temporally Decomposing Future Reward Estimators
Figure 2 for Explaining an Agent's Future Beliefs through Temporally Decomposing Future Reward Estimators
Figure 3 for Explaining an Agent's Future Beliefs through Temporally Decomposing Future Reward Estimators
Figure 4 for Explaining an Agent's Future Beliefs through Temporally Decomposing Future Reward Estimators
Viaarxiv icon