Picture for Radha Poovendran

Radha Poovendran

CyberChainBench: Can AI Agents Secure Smart Contracts Against Real-World On-Chain Vulnerabilities?

Add code
Jun 24, 2026
Viaarxiv icon

AutoLab: Can Frontier Models Solve Long-Horizon Auto Research and Engineering Tasks?

Add code
Jun 03, 2026
Viaarxiv icon

Agents' Last Exam

Add code
Jun 03, 2026
Viaarxiv icon

The Strongest Teacher Is Not Always the Best Teacher: Student-Centric Answer Selection

Add code
May 26, 2026
Viaarxiv icon

JobBench: Aligning Agent Work With Human Will

Add code
May 25, 2026
Viaarxiv icon

The WidthWall: A Strict Expressivity Hierarchy for Hypergraph Neural Networks

Add code
May 13, 2026
Viaarxiv icon

Polyhedral Instability Governs Regret in Online Learning

Add code
May 13, 2026
Viaarxiv icon

Visual Aesthetic Benchmark: Can Frontier Models Judge Beauty?

Add code
May 12, 2026
Viaarxiv icon

VisualSphinx: Large-Scale Synthetic Vision Logic Puzzles for RL

Add code
May 29, 2025
Viaarxiv icon

SOSBENCH: Benchmarking Safety Alignment on Scientific Knowledge

Add code
May 27, 2025
Viaarxiv icon