Picture for Alec Koppel

Alec Koppel

SAIL: Self-Improving Efficient Online Alignment of Large Language Models

Add code
Jun 21, 2024
Figure 1 for SAIL: Self-Improving Efficient Online Alignment of Large Language Models
Figure 2 for SAIL: Self-Improving Efficient Online Alignment of Large Language Models
Figure 3 for SAIL: Self-Improving Efficient Online Alignment of Large Language Models
Figure 4 for SAIL: Self-Improving Efficient Online Alignment of Large Language Models
Viaarxiv icon

Compressed Online Learning of Conditional Mean Embedding

Add code
May 13, 2024
Viaarxiv icon

Global Optimality without Mixing Time Oracles in Average-reward RL via Multi-level Actor-Critic

Add code
Mar 18, 2024
Figure 1 for Global Optimality without Mixing Time Oracles in Average-reward RL via Multi-level Actor-Critic
Figure 2 for Global Optimality without Mixing Time Oracles in Average-reward RL via Multi-level Actor-Critic
Figure 3 for Global Optimality without Mixing Time Oracles in Average-reward RL via Multi-level Actor-Critic
Viaarxiv icon

Independent RL for Cooperative-Competitive Agents: A Mean-Field Perspective

Add code
Mar 17, 2024
Figure 1 for Independent RL for Cooperative-Competitive Agents: A Mean-Field Perspective
Figure 2 for Independent RL for Cooperative-Competitive Agents: A Mean-Field Perspective
Figure 3 for Independent RL for Cooperative-Competitive Agents: A Mean-Field Perspective
Figure 4 for Independent RL for Cooperative-Competitive Agents: A Mean-Field Perspective
Viaarxiv icon

Beyond Joint Demonstrations: Personalized Expert Guidance for Efficient Multi-Agent Reinforcement Learning

Add code
Mar 13, 2024
Figure 1 for Beyond Joint Demonstrations: Personalized Expert Guidance for Efficient Multi-Agent Reinforcement Learning
Figure 2 for Beyond Joint Demonstrations: Personalized Expert Guidance for Efficient Multi-Agent Reinforcement Learning
Figure 3 for Beyond Joint Demonstrations: Personalized Expert Guidance for Efficient Multi-Agent Reinforcement Learning
Figure 4 for Beyond Joint Demonstrations: Personalized Expert Guidance for Efficient Multi-Agent Reinforcement Learning
Viaarxiv icon

MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with Diverse Human Preferences

Add code
Feb 14, 2024
Figure 1 for MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with Diverse Human Preferences
Figure 2 for MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with Diverse Human Preferences
Figure 3 for MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with Diverse Human Preferences
Figure 4 for MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with Diverse Human Preferences
Viaarxiv icon

Near-Optimal Fair Resource Allocation for Strategic Agents without Money: A Data-Driven Approach

Add code
Nov 18, 2023
Viaarxiv icon

Byzantine-Resilient Decentralized Multi-Armed Bandits

Add code
Oct 11, 2023
Figure 1 for Byzantine-Resilient Decentralized Multi-Armed Bandits
Figure 2 for Byzantine-Resilient Decentralized Multi-Armed Bandits
Figure 3 for Byzantine-Resilient Decentralized Multi-Armed Bandits
Figure 4 for Byzantine-Resilient Decentralized Multi-Armed Bandits
Viaarxiv icon

Aligning Agent Policy with Externalities: Reward Design via Bilevel RL

Add code
Aug 03, 2023
Figure 1 for Aligning Agent Policy with Externalities: Reward Design via Bilevel RL
Figure 2 for Aligning Agent Policy with Externalities: Reward Design via Bilevel RL
Figure 3 for Aligning Agent Policy with Externalities: Reward Design via Bilevel RL
Figure 4 for Aligning Agent Policy with Externalities: Reward Design via Bilevel RL
Viaarxiv icon

Limited-Memory Greedy Quasi-Newton Method with Non-asymptotic Superlinear Convergence Rate

Add code
Jun 27, 2023
Figure 1 for Limited-Memory Greedy Quasi-Newton Method with Non-asymptotic Superlinear Convergence Rate
Figure 2 for Limited-Memory Greedy Quasi-Newton Method with Non-asymptotic Superlinear Convergence Rate
Figure 3 for Limited-Memory Greedy Quasi-Newton Method with Non-asymptotic Superlinear Convergence Rate
Figure 4 for Limited-Memory Greedy Quasi-Newton Method with Non-asymptotic Superlinear Convergence Rate
Viaarxiv icon