Picture for Ameya Prabhu

Ameya Prabhu

Michael Pokorny

FutureSim: Replaying World Events to Evaluate Adaptive Agents

Add code
May 14, 2026
Viaarxiv icon

LongCoT: Benchmarking Long-Horizon Chain-of-Thought Reasoning

Add code
Apr 15, 2026
Viaarxiv icon

Round-Trip Translation Reveals What Frontier Multilingual Benchmarks Miss

Add code
Apr 14, 2026
Viaarxiv icon

Personalizing Text-to-Image Generation to Individual Taste

Add code
Apr 08, 2026
Viaarxiv icon

PostTrainBench: Can LLM Agents Automate LLM Post-Training?

Add code
Mar 10, 2026
Viaarxiv icon

Modular Memory is the Key to Continual Learning Agents

Add code
Mar 02, 2026
Viaarxiv icon

Intrinsic Credit Assignment for Long Horizon Interaction

Add code
Feb 12, 2026
Viaarxiv icon

Scaling Open-Ended Reasoning to Predict the Future

Add code
Dec 31, 2025
Viaarxiv icon

Un-Attributability: Computing Novelty From Retrieval & Semantic Similarity

Add code
Oct 31, 2025
Viaarxiv icon

Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols

Add code
Oct 10, 2025
Figure 1 for Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols
Figure 2 for Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols
Figure 3 for Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols
Figure 4 for Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols
Viaarxiv icon