Picture for Dipendra Misra

Dipendra Misra

Principled Fine-tuning of LLMs from User-Edits: A Medley of Preference, Supervision, and Reward

Add code
Jan 27, 2026
Viaarxiv icon

Is Behavior Cloning All You Need? Understanding Horizon in Imitation Learning

Add code
Jul 20, 2024
Figure 1 for Is Behavior Cloning All You Need? Understanding Horizon in Imitation Learning
Figure 2 for Is Behavior Cloning All You Need? Understanding Horizon in Imitation Learning
Figure 3 for Is Behavior Cloning All You Need? Understanding Horizon in Imitation Learning
Figure 4 for Is Behavior Cloning All You Need? Understanding Horizon in Imitation Learning
Viaarxiv icon

Aligning LLM Agents by Learning Latent Preference from User Edits

Add code
Apr 23, 2024
Figure 1 for Aligning LLM Agents by Learning Latent Preference from User Edits
Figure 2 for Aligning LLM Agents by Learning Latent Preference from User Edits
Figure 3 for Aligning LLM Agents by Learning Latent Preference from User Edits
Figure 4 for Aligning LLM Agents by Learning Latent Preference from User Edits
Viaarxiv icon

Dataset Reset Policy Optimization for RLHF

Add code
Apr 15, 2024
Figure 1 for Dataset Reset Policy Optimization for RLHF
Figure 2 for Dataset Reset Policy Optimization for RLHF
Figure 3 for Dataset Reset Policy Optimization for RLHF
Figure 4 for Dataset Reset Policy Optimization for RLHF
Viaarxiv icon

Provable Interactive Learning with Hindsight Instruction Feedback

Add code
Apr 14, 2024
Figure 1 for Provable Interactive Learning with Hindsight Instruction Feedback
Figure 2 for Provable Interactive Learning with Hindsight Instruction Feedback
Figure 3 for Provable Interactive Learning with Hindsight Instruction Feedback
Figure 4 for Provable Interactive Learning with Hindsight Instruction Feedback
Viaarxiv icon

Towards Principled Representation Learning from Videos for Reinforcement Learning

Add code
Mar 20, 2024
Figure 1 for Towards Principled Representation Learning from Videos for Reinforcement Learning
Figure 2 for Towards Principled Representation Learning from Videos for Reinforcement Learning
Figure 3 for Towards Principled Representation Learning from Videos for Reinforcement Learning
Figure 4 for Towards Principled Representation Learning from Videos for Reinforcement Learning
Viaarxiv icon

Policy Improvement using Language Feedback Models

Add code
Feb 25, 2024
Figure 1 for Policy Improvement using Language Feedback Models
Figure 2 for Policy Improvement using Language Feedback Models
Figure 3 for Policy Improvement using Language Feedback Models
Figure 4 for Policy Improvement using Language Feedback Models
Viaarxiv icon

The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction

Add code
Dec 21, 2023
Figure 1 for The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction
Figure 2 for The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction
Figure 3 for The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction
Figure 4 for The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction
Viaarxiv icon

LLF-Bench: Benchmark for Interactive Learning from Language Feedback

Add code
Dec 13, 2023
Viaarxiv icon

Learning to Generate Better Than Your LLM

Add code
Jun 20, 2023
Viaarxiv icon