Picture for Simon S. Du

Simon S. Du

Frank

Anytime Acceleration of Gradient Descent

Add code
Nov 26, 2024
Viaarxiv icon

Learning to Cooperate with Humans using Generative Agents

Add code
Nov 21, 2024
Viaarxiv icon

The Crucial Role of Samplers in Online Direct Preference Optimization

Add code
Sep 29, 2024
Figure 1 for The Crucial Role of Samplers in Online Direct Preference Optimization
Figure 2 for The Crucial Role of Samplers in Online Direct Preference Optimization
Figure 3 for The Crucial Role of Samplers in Online Direct Preference Optimization
Figure 4 for The Crucial Role of Samplers in Online Direct Preference Optimization
Viaarxiv icon

Multi-Agent Reinforcement Learning from Human Feedback: Data Coverage and Algorithmic Techniques

Add code
Sep 04, 2024
Figure 1 for Multi-Agent Reinforcement Learning from Human Feedback: Data Coverage and Algorithmic Techniques
Figure 2 for Multi-Agent Reinforcement Learning from Human Feedback: Data Coverage and Algorithmic Techniques
Figure 3 for Multi-Agent Reinforcement Learning from Human Feedback: Data Coverage and Algorithmic Techniques
Figure 4 for Multi-Agent Reinforcement Learning from Human Feedback: Data Coverage and Algorithmic Techniques
Viaarxiv icon

Understanding the Gains from Repeated Self-Distillation

Add code
Jul 05, 2024
Figure 1 for Understanding the Gains from Repeated Self-Distillation
Figure 2 for Understanding the Gains from Repeated Self-Distillation
Figure 3 for Understanding the Gains from Repeated Self-Distillation
Figure 4 for Understanding the Gains from Repeated Self-Distillation
Viaarxiv icon

Toward Global Convergence of Gradient EM for Over-Parameterized Gaussian Mixture Models

Add code
Jun 29, 2024
Viaarxiv icon

Rethinking Transformers in Solving POMDPs

Add code
May 30, 2024
Figure 1 for Rethinking Transformers in Solving POMDPs
Figure 2 for Rethinking Transformers in Solving POMDPs
Figure 3 for Rethinking Transformers in Solving POMDPs
Figure 4 for Rethinking Transformers in Solving POMDPs
Viaarxiv icon

Horizon-Free Regret for Linear Markov Decision Processes

Add code
Mar 15, 2024
Viaarxiv icon

Transferable Reinforcement Learning via Generalized Occupancy Models

Add code
Mar 10, 2024
Viaarxiv icon

Reflect-RL: Two-Player Online RL Fine-Tuning for LMs

Add code
Feb 20, 2024
Figure 1 for Reflect-RL: Two-Player Online RL Fine-Tuning for LMs
Figure 2 for Reflect-RL: Two-Player Online RL Fine-Tuning for LMs
Figure 3 for Reflect-RL: Two-Player Online RL Fine-Tuning for LMs
Figure 4 for Reflect-RL: Two-Player Online RL Fine-Tuning for LMs
Viaarxiv icon