Picture for Subhojyoti Mukherjee

Subhojyoti Mukherjee

Learning to Clarify by Reinforcement Learning Through Reward-Weighted Fine-Tuning

Add code
Jun 08, 2025
Viaarxiv icon

A Personalized Conversational Benchmark: Towards Simulating Personalized Conversations

Add code
May 20, 2025
Viaarxiv icon

From Selection to Generation: A Survey of LLM-based Active Learning

Add code
Feb 17, 2025
Viaarxiv icon

Multi-Objective Alignment of Large Language Models Through Hypervolume Maximization

Add code
Dec 06, 2024
Figure 1 for Multi-Objective Alignment of Large Language Models Through Hypervolume Maximization
Figure 2 for Multi-Objective Alignment of Large Language Models Through Hypervolume Maximization
Figure 3 for Multi-Objective Alignment of Large Language Models Through Hypervolume Maximization
Figure 4 for Multi-Objective Alignment of Large Language Models Through Hypervolume Maximization
Viaarxiv icon

Off-Policy Evaluation from Logged Human Feedback

Add code
Jun 14, 2024
Figure 1 for Off-Policy Evaluation from Logged Human Feedback
Figure 2 for Off-Policy Evaluation from Logged Human Feedback
Figure 3 for Off-Policy Evaluation from Logged Human Feedback
Figure 4 for Off-Policy Evaluation from Logged Human Feedback
Viaarxiv icon

Pretraining Decision Transformers with Reward Prediction for In-Context Multi-task Structured Bandit Learning

Add code
Jun 07, 2024
Figure 1 for Pretraining Decision Transformers with Reward Prediction for In-Context Multi-task Structured Bandit Learning
Figure 2 for Pretraining Decision Transformers with Reward Prediction for In-Context Multi-task Structured Bandit Learning
Figure 3 for Pretraining Decision Transformers with Reward Prediction for In-Context Multi-task Structured Bandit Learning
Figure 4 for Pretraining Decision Transformers with Reward Prediction for In-Context Multi-task Structured Bandit Learning
Viaarxiv icon

SaVeR: Optimal Data Collection Strategy for Safe Policy Evaluation in Tabular MDP

Add code
Jun 04, 2024
Figure 1 for SaVeR: Optimal Data Collection Strategy for Safe Policy Evaluation in Tabular MDP
Figure 2 for SaVeR: Optimal Data Collection Strategy for Safe Policy Evaluation in Tabular MDP
Figure 3 for SaVeR: Optimal Data Collection Strategy for Safe Policy Evaluation in Tabular MDP
Figure 4 for SaVeR: Optimal Data Collection Strategy for Safe Policy Evaluation in Tabular MDP
Viaarxiv icon

Optimal Design for Human Feedback

Add code
Apr 22, 2024
Figure 1 for Optimal Design for Human Feedback
Figure 2 for Optimal Design for Human Feedback
Figure 3 for Optimal Design for Human Feedback
Viaarxiv icon

Experimental Design for Active Transductive Inference in Large Language Models

Add code
Apr 12, 2024
Figure 1 for Experimental Design for Active Transductive Inference in Large Language Models
Figure 2 for Experimental Design for Active Transductive Inference in Large Language Models
Figure 3 for Experimental Design for Active Transductive Inference in Large Language Models
Figure 4 for Experimental Design for Active Transductive Inference in Large Language Models
Viaarxiv icon

Multi-task Representation Learning for Pure Exploration in Bilinear Bandits

Add code
Nov 01, 2023
Viaarxiv icon