Picture for Fazl Barez

Fazl Barez

The Capability Frontier: Benchmarks Miss 82% of Model Performance

Add code
Jun 25, 2026
Viaarxiv icon

From Democracies to Autocracies: How AI Systems Enable Authoritarianism by Design

Add code
Jun 15, 2026
Viaarxiv icon

Running the Gauntlet: Re-evaluating the Capabilities of Agents Beyond Familiar Environments

Add code
Jun 12, 2026
Viaarxiv icon

Position: Don't Just "Fix it in Post": A Science of AI Must Study Training Dynamics

Add code
Jun 03, 2026
Viaarxiv icon

Same Answer, Different Representations: Hidden instability in VLMs

Add code
Feb 06, 2026
Viaarxiv icon

Chain-of-Thought Hijacking

Add code
Oct 30, 2025
Viaarxiv icon

Rethinking Safety in LLM Fine-tuning: An Optimization Perspective

Add code
Aug 17, 2025
Viaarxiv icon

Establishing Best Practices for Building Rigorous Agentic Benchmarks

Add code
Jul 03, 2025
Figure 1 for Establishing Best Practices for Building Rigorous Agentic Benchmarks
Figure 2 for Establishing Best Practices for Building Rigorous Agentic Benchmarks
Figure 3 for Establishing Best Practices for Building Rigorous Agentic Benchmarks
Figure 4 for Establishing Best Practices for Building Rigorous Agentic Benchmarks
Viaarxiv icon

The Singapore Consensus on Global AI Safety Research Priorities

Add code
Jun 25, 2025
Figure 1 for The Singapore Consensus on Global AI Safety Research Priorities
Figure 2 for The Singapore Consensus on Global AI Safety Research Priorities
Figure 3 for The Singapore Consensus on Global AI Safety Research Priorities
Viaarxiv icon

Beyond Linear Steering: Unified Multi-Attribute Control for Language Models

Add code
May 30, 2025
Viaarxiv icon