Alert button
Picture for Archit Sharma

Archit Sharma

Alert button

A Critical Evaluation of AI Feedback for Aligning Large Language Models

Feb 19, 2024
Archit Sharma, Sedrick Keh, Eric Mitchell, Chelsea Finn, Kushal Arora, Thomas Kollar

Viaarxiv icon

RLVF: Learning from Verbal Feedback without Overgeneralization

Feb 16, 2024
Moritz Stephan, Alexander Khazatsky, Eric Mitchell, Annie S Chen, Sheryl Hsu, Archit Sharma, Chelsea Finn

Viaarxiv icon

SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning

Feb 01, 2024
Jianlan Luo, Zheyuan Hu, Charles Xu, You Liang Tan, Jacob Berg, Archit Sharma, Stefan Schaal, Chelsea Finn, Abhishek Gupta, Sergey Levine

Viaarxiv icon

Adapt On-the-Go: Behavior Modulation for Single-Life Robot Deployment

Nov 02, 2023
Annie S. Chen, Govind Chada, Laura Smith, Archit Sharma, Zipeng Fu, Sergey Levine, Chelsea Finn

Figure 1 for Adapt On-the-Go: Behavior Modulation for Single-Life Robot Deployment
Figure 2 for Adapt On-the-Go: Behavior Modulation for Single-Life Robot Deployment
Figure 3 for Adapt On-the-Go: Behavior Modulation for Single-Life Robot Deployment
Figure 4 for Adapt On-the-Go: Behavior Modulation for Single-Life Robot Deployment
Viaarxiv icon

Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for Autonomous Real-World Reinforcement Learning

Oct 23, 2023
Jingyun Yang, Max Sobol Mark, Brandon Vu, Archit Sharma, Jeannette Bohg, Chelsea Finn

Viaarxiv icon

An Emulator for Fine-Tuning Large Language Models using Small Language Models

Oct 19, 2023
Eric Mitchell, Rafael Rafailov, Archit Sharma, Chelsea Finn, Christopher D. Manning

Figure 1 for An Emulator for Fine-Tuning Large Language Models using Small Language Models
Figure 2 for An Emulator for Fine-Tuning Large Language Models using Small Language Models
Figure 3 for An Emulator for Fine-Tuning Large Language Models using Small Language Models
Figure 4 for An Emulator for Fine-Tuning Large Language Models using Small Language Models
Viaarxiv icon

Offline Retraining for Online RL: Decoupled Policy Learning to Mitigate Exploration Bias

Oct 12, 2023
Max Sobol Mark, Archit Sharma, Fahim Tajwar, Rafael Rafailov, Sergey Levine, Chelsea Finn

Figure 1 for Offline Retraining for Online RL: Decoupled Policy Learning to Mitigate Exploration Bias
Figure 2 for Offline Retraining for Online RL: Decoupled Policy Learning to Mitigate Exploration Bias
Figure 3 for Offline Retraining for Online RL: Decoupled Policy Learning to Mitigate Exploration Bias
Figure 4 for Offline Retraining for Online RL: Decoupled Policy Learning to Mitigate Exploration Bias
Viaarxiv icon

Waypoint-Based Imitation Learning for Robotic Manipulation

Jul 26, 2023
Lucy Xiaoyang Shi, Archit Sharma, Tony Z. Zhao, Chelsea Finn

Viaarxiv icon

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

May 29, 2023
Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, Chelsea Finn

Figure 1 for Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Figure 2 for Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Figure 3 for Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Figure 4 for Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Viaarxiv icon

Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback

May 24, 2023
Katherine Tian, Eric Mitchell, Allan Zhou, Archit Sharma, Rafael Rafailov, Huaxiu Yao, Chelsea Finn, Christopher D. Manning

Figure 1 for Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback
Figure 2 for Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback
Figure 3 for Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback
Figure 4 for Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback
Viaarxiv icon