Picture for Aldo Pacchiano

Aldo Pacchiano

Second Order Bounds for Contextual Bandits with Function Approximation

Add code
Sep 24, 2024
Viaarxiv icon

Learning Rate-Free Reinforcement Learning: A Case for Model Selection with Non-Stationary Objectives

Add code
Aug 07, 2024
Figure 1 for Learning Rate-Free Reinforcement Learning: A Case for Model Selection with Non-Stationary Objectives
Figure 2 for Learning Rate-Free Reinforcement Learning: A Case for Model Selection with Non-Stationary Objectives
Figure 3 for Learning Rate-Free Reinforcement Learning: A Case for Model Selection with Non-Stationary Objectives
Viaarxiv icon

Provable Interactive Learning with Hindsight Instruction Feedback

Add code
Apr 14, 2024
Figure 1 for Provable Interactive Learning with Hindsight Instruction Feedback
Figure 2 for Provable Interactive Learning with Hindsight Instruction Feedback
Figure 3 for Provable Interactive Learning with Hindsight Instruction Feedback
Figure 4 for Provable Interactive Learning with Hindsight Instruction Feedback
Viaarxiv icon

Multiple-policy Evaluation via Density Estimation

Add code
Mar 29, 2024
Viaarxiv icon

Provably Sample Efficient RLHF via Active Preference Optimization

Add code
Feb 16, 2024
Figure 1 for Provably Sample Efficient RLHF via Active Preference Optimization
Figure 2 for Provably Sample Efficient RLHF via Active Preference Optimization
Figure 3 for Provably Sample Efficient RLHF via Active Preference Optimization
Figure 4 for Provably Sample Efficient RLHF via Active Preference Optimization
Viaarxiv icon

A Framework for Partially Observed Reward-States in RLHF

Add code
Feb 05, 2024
Viaarxiv icon

Contextual Bandits with Stage-wise Constraints

Add code
Jan 15, 2024
Viaarxiv icon

Experiment Planning with Function Approximation

Add code
Jan 10, 2024
Viaarxiv icon

Unbiased Decisions Reduce Regret: Adversarial Domain Adaptation for the Bank Loan Problem

Add code
Aug 15, 2023
Figure 1 for Unbiased Decisions Reduce Regret: Adversarial Domain Adaptation for the Bank Loan Problem
Figure 2 for Unbiased Decisions Reduce Regret: Adversarial Domain Adaptation for the Bank Loan Problem
Figure 3 for Unbiased Decisions Reduce Regret: Adversarial Domain Adaptation for the Bank Loan Problem
Figure 4 for Unbiased Decisions Reduce Regret: Adversarial Domain Adaptation for the Bank Loan Problem
Viaarxiv icon

Anytime Model Selection in Linear Bandits

Add code
Jul 24, 2023
Viaarxiv icon