Picture for Arun Verma

Arun Verma

Stochastic Multi-Armed Bandits with Limited Control Variates

Add code
Mar 02, 2026
Viaarxiv icon

BarrierSteer: LLM Safety via Learning Barrier Steering

Add code
Feb 23, 2026
Viaarxiv icon

Uncovering Scaling Laws for Large Language Models via Inverse Problems

Add code
Sep 09, 2025
Viaarxiv icon

COBRA: Contextual Bandit Algorithm for Ensuring Truthful Strategic Agents

Add code
May 29, 2025
Viaarxiv icon

ActiveDPO: Active Direct Preference Optimization for Sample-Efficient Alignment

Add code
May 25, 2025
Figure 1 for ActiveDPO: Active Direct Preference Optimization for Sample-Efficient Alignment
Figure 2 for ActiveDPO: Active Direct Preference Optimization for Sample-Efficient Alignment
Figure 3 for ActiveDPO: Active Direct Preference Optimization for Sample-Efficient Alignment
Figure 4 for ActiveDPO: Active Direct Preference Optimization for Sample-Efficient Alignment
Viaarxiv icon

Active Human Feedback Collection via Neural Contextual Dueling Bandits

Add code
Apr 16, 2025
Viaarxiv icon

TETRIS: Optimal Draft Token Selection for Batch Speculative Decoding

Add code
Feb 21, 2025
Viaarxiv icon

Online Fair Division with Contextual Bandits

Add code
Aug 23, 2024
Viaarxiv icon

Neural Dueling Bandits

Add code
Jul 24, 2024
Viaarxiv icon

Understanding the Relationship between Prompts and Response Uncertainty in Large Language Models

Add code
Jul 20, 2024
Viaarxiv icon