Picture for Sayak Ray Chowdhury

Sayak Ray Chowdhury

Provably Robust DPO: Aligning Language Models with Noisy Feedback

Add code
Mar 01, 2024
Figure 1 for Provably Robust DPO: Aligning Language Models with Noisy Feedback
Figure 2 for Provably Robust DPO: Aligning Language Models with Noisy Feedback
Figure 3 for Provably Robust DPO: Aligning Language Models with Noisy Feedback
Viaarxiv icon

Provably Sample Efficient RLHF via Active Preference Optimization

Feb 16, 2024
Viaarxiv icon

GAR-meets-RAG Paradigm for Zero-Shot Information Retrieval

Oct 31, 2023
Viaarxiv icon

Differentially Private Reward Estimation with Preference Feedback

Oct 30, 2023
Viaarxiv icon

Differentially Private Episodic Reinforcement Learning with Heavy-tailed Rewards

Jun 05, 2023
Figure 1 for Differentially Private Episodic Reinforcement Learning with Heavy-tailed Rewards
Figure 2 for Differentially Private Episodic Reinforcement Learning with Heavy-tailed Rewards
Figure 3 for Differentially Private Episodic Reinforcement Learning with Heavy-tailed Rewards
Figure 4 for Differentially Private Episodic Reinforcement Learning with Heavy-tailed Rewards
Viaarxiv icon

On Differentially Private Federated Linear Contextual Bandits

Feb 27, 2023
Figure 1 for On Differentially Private Federated Linear Contextual Bandits
Figure 2 for On Differentially Private Federated Linear Contextual Bandits
Figure 3 for On Differentially Private Federated Linear Contextual Bandits
Viaarxiv icon

Exploration in Linear Bandits with Rich Action Sets and its Implications for Inference

Jul 23, 2022
Figure 1 for Exploration in Linear Bandits with Rich Action Sets and its Implications for Inference
Figure 2 for Exploration in Linear Bandits with Rich Action Sets and its Implications for Inference
Figure 3 for Exploration in Linear Bandits with Rich Action Sets and its Implications for Inference
Viaarxiv icon

Model Selection in Reinforcement Learning with General Function Approximations

Jul 06, 2022
Viaarxiv icon

Distributed Differential Privacy in Multi-Armed Bandits

Jun 12, 2022
Figure 1 for Distributed Differential Privacy in Multi-Armed Bandits
Figure 2 for Distributed Differential Privacy in Multi-Armed Bandits
Figure 3 for Distributed Differential Privacy in Multi-Armed Bandits
Figure 4 for Distributed Differential Privacy in Multi-Armed Bandits
Viaarxiv icon

Shuffle Private Linear Contextual Bandits

Feb 11, 2022
Figure 1 for Shuffle Private Linear Contextual Bandits
Figure 2 for Shuffle Private Linear Contextual Bandits
Viaarxiv icon