Picture for Sayak Ray Chowdhury

Sayak Ray Chowdhury

Why DPO is a Misspecified Estimator and How to Fix It

Add code
Oct 23, 2025
Viaarxiv icon

KITE: Kernelized and Information Theoretic Exemplars for In-Context Learning

Add code
Sep 19, 2025
Viaarxiv icon

DP-NCB: Privacy Preserving Fair Bandits

Add code
Aug 05, 2025
Viaarxiv icon

Right Now, Wrong Then: Non-Stationary Direct Preference Optimization under Preference Drift

Add code
Jul 26, 2024
Figure 1 for Right Now, Wrong Then: Non-Stationary Direct Preference Optimization under Preference Drift
Figure 2 for Right Now, Wrong Then: Non-Stationary Direct Preference Optimization under Preference Drift
Figure 3 for Right Now, Wrong Then: Non-Stationary Direct Preference Optimization under Preference Drift
Figure 4 for Right Now, Wrong Then: Non-Stationary Direct Preference Optimization under Preference Drift
Viaarxiv icon

Provably Robust DPO: Aligning Language Models with Noisy Feedback

Add code
Mar 01, 2024
Figure 1 for Provably Robust DPO: Aligning Language Models with Noisy Feedback
Figure 2 for Provably Robust DPO: Aligning Language Models with Noisy Feedback
Figure 3 for Provably Robust DPO: Aligning Language Models with Noisy Feedback
Viaarxiv icon

Provably Sample Efficient RLHF via Active Preference Optimization

Add code
Feb 16, 2024
Figure 1 for Provably Sample Efficient RLHF via Active Preference Optimization
Figure 2 for Provably Sample Efficient RLHF via Active Preference Optimization
Figure 3 for Provably Sample Efficient RLHF via Active Preference Optimization
Figure 4 for Provably Sample Efficient RLHF via Active Preference Optimization
Viaarxiv icon

GAR-meets-RAG Paradigm for Zero-Shot Information Retrieval

Add code
Oct 31, 2023
Viaarxiv icon

Differentially Private Reward Estimation with Preference Feedback

Add code
Oct 30, 2023
Figure 1 for Differentially Private Reward Estimation with Preference Feedback
Viaarxiv icon

Differentially Private Episodic Reinforcement Learning with Heavy-tailed Rewards

Add code
Jun 05, 2023
Figure 1 for Differentially Private Episodic Reinforcement Learning with Heavy-tailed Rewards
Figure 2 for Differentially Private Episodic Reinforcement Learning with Heavy-tailed Rewards
Figure 3 for Differentially Private Episodic Reinforcement Learning with Heavy-tailed Rewards
Figure 4 for Differentially Private Episodic Reinforcement Learning with Heavy-tailed Rewards
Viaarxiv icon

On Differentially Private Federated Linear Contextual Bandits

Add code
Feb 27, 2023
Viaarxiv icon