Picture for Andrea Zanette

Andrea Zanette

ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL

Add code
Feb 29, 2024
Figure 1 for ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
Figure 2 for ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
Figure 3 for ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
Figure 4 for ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
Viaarxiv icon

Is Offline Decision Making Possible with Only Few Samples? Reliable Decisions in Data-Starved Bandits via Trust Region Enhancement

Add code
Feb 24, 2024
Viaarxiv icon

Policy Finetuning in Reinforcement Learning via Design of Experiments using Offline Data

Add code
Jul 10, 2023
Viaarxiv icon

When is Realizability Sufficient for Off-Policy Reinforcement Learning?

Add code
Nov 10, 2022
Figure 1 for When is Realizability Sufficient for Off-Policy Reinforcement Learning?
Figure 2 for When is Realizability Sufficient for Off-Policy Reinforcement Learning?
Figure 3 for When is Realizability Sufficient for Off-Policy Reinforcement Learning?
Figure 4 for When is Realizability Sufficient for Off-Policy Reinforcement Learning?
Viaarxiv icon

Stabilizing Q-learning with Linear Architectures for Provably Efficient Learning

Add code
Jun 01, 2022
Viaarxiv icon

Bellman Residual Orthogonalization for Offline Reinforcement Learning

Add code
Mar 24, 2022
Viaarxiv icon

Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning

Add code
Aug 19, 2021
Viaarxiv icon

Design of Experiments for Stochastic Contextual Linear Bandits

Add code
Jul 22, 2021
Figure 1 for Design of Experiments for Stochastic Contextual Linear Bandits
Figure 2 for Design of Experiments for Stochastic Contextual Linear Bandits
Figure 3 for Design of Experiments for Stochastic Contextual Linear Bandits
Figure 4 for Design of Experiments for Stochastic Contextual Linear Bandits
Viaarxiv icon

Cautiously Optimistic Policy Optimization and Exploration with Linear Function Approximation

Add code
Mar 24, 2021
Figure 1 for Cautiously Optimistic Policy Optimization and Exploration with Linear Function Approximation
Viaarxiv icon

Exponential Lower Bounds for Batch Reinforcement Learning: Batch RL can be Exponentially Harder than Online RL

Add code
Dec 14, 2020
Figure 1 for Exponential Lower Bounds for Batch Reinforcement Learning: Batch RL can be Exponentially Harder than Online RL
Viaarxiv icon