Picture for Sambit Sahu

Sambit Sahu

T1: A Tool-Oriented Conversational Dataset for Multi-Turn Agentic Planning

Add code
May 22, 2025
Viaarxiv icon

Critique-Guided Distillation: Improving Supervised Fine-tuning via Better Distillation

Add code
May 16, 2025
Viaarxiv icon

Continual Pre-training of MoEs: How robust is your router?

Add code
Mar 06, 2025
Viaarxiv icon

RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization

Add code
Oct 05, 2024
Figure 1 for RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization
Figure 2 for RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization
Figure 3 for RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization
Figure 4 for RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization
Viaarxiv icon

Preference Tuning with Human Feedback on Language, Speech, and Vision Tasks: A Survey

Add code
Sep 17, 2024
Viaarxiv icon