Picture for Paula Rodriguez

Paula Rodriguez

PRBench: Large-Scale Expert Rubrics for Evaluating High-Stakes Professional Reasoning

Add code
Nov 14, 2025
Viaarxiv icon

Reliable Weak-to-Strong Monitoring of LLM Agents

Add code
Aug 26, 2025
Figure 1 for Reliable Weak-to-Strong Monitoring of LLM Agents
Figure 2 for Reliable Weak-to-Strong Monitoring of LLM Agents
Figure 3 for Reliable Weak-to-Strong Monitoring of LLM Agents
Figure 4 for Reliable Weak-to-Strong Monitoring of LLM Agents
Viaarxiv icon