Picture for Pranav Mahajan

Pranav Mahajan

Soft $Q(λ)$: A multi-step off-policy method for entropy regularised reinforcement learning using eligibility traces

Add code
Apr 15, 2026
Viaarxiv icon

Mind the Gap: How Elicitation Protocols Shape the Stated-Revealed Preference Gap in Language Models

Add code
Jan 29, 2026
Viaarxiv icon

Neural Associative Skill Memories for safer robotics and modelling human sensorimotor repertoires

Add code
May 14, 2025
Viaarxiv icon