Picture for Chris Russell

Chris Russell

LLMs Encode Their Failures: Predicting Success from Pre-Generation Activations

Add code
Feb 10, 2026
Viaarxiv icon

Agent Benchmarks Fail Public Sector Requirements

Add code
Jan 28, 2026
Viaarxiv icon

Evaluating the Ability of Explanations to Disambiguate Models in a Rashomon Set

Add code
Jan 13, 2026
Viaarxiv icon

It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web Agents

Add code
Dec 29, 2025
Viaarxiv icon

OxEnsemble: Fair Ensembles for Low-Data Classification

Add code
Dec 10, 2025
Viaarxiv icon

CAST: Compositional Analysis via Spectral Tracking for Understanding Transformer Layer Functions

Add code
Oct 16, 2025
Viaarxiv icon

LLMs Don't Know Their Own Decision Boundaries: The Unreliability of Self-Generated Counterfactual Explanations

Add code
Sep 11, 2025
Viaarxiv icon

Evaluating Model Explanations without Ground Truth

Add code
May 15, 2025
Viaarxiv icon

Deepfakes on Demand: the rise of accessible non-consensual deepfake image generators

Add code
May 06, 2025
Viaarxiv icon

The Fourth Monocular Depth Estimation Challenge

Add code
Apr 24, 2025
Figure 1 for The Fourth Monocular Depth Estimation Challenge
Figure 2 for The Fourth Monocular Depth Estimation Challenge
Figure 3 for The Fourth Monocular Depth Estimation Challenge
Viaarxiv icon