Picture for Sambit Sahu

Sambit Sahu

Ask, Don't Judge: Binary Questions for Interpretable LLM Evaluation and Self-Improvement

Add code
Jun 25, 2026
Viaarxiv icon

SAFARI: Scaling Long Horizon Agentic Fault Attribution via Active Investigation

Add code
Jun 23, 2026
Viaarxiv icon

Towards Scalable Customization and Deployment of Multi-Agent Systems for Enterprise Applications

Add code
Jun 16, 2026
Viaarxiv icon

A History-Aware Visually Grounded Critic for Computer Use Agents

Add code
Jun 09, 2026
Viaarxiv icon

T1-Bench: Benchmarking Multi-Scenario Agents in Real-World Domains

Add code
Jun 09, 2026
Viaarxiv icon

MemGym: a Long-Horizon Memory Environment for LLM Agents

Add code
May 20, 2026
Viaarxiv icon

AVSD: Adaptive-View Self-Distillation by Balancing Consensus and Teacher-Specific Privileged Signals

Add code
May 20, 2026
Viaarxiv icon

CoT-Guard: Small Models for Strong Monitoring

Add code
May 12, 2026
Viaarxiv icon

Your Model Diversity, Not Method, Determines Reasoning Strategy

Add code
Apr 12, 2026
Viaarxiv icon

Decomposing the Delta: What Do Models Actually Learn from Preference Pairs?

Add code
Apr 09, 2026
Viaarxiv icon