Picture for Amanda Dsouza

Amanda Dsouza

Shammie

RIFT: A RubrIc Failure Mode Taxonomy and Automated Diagnostics

Add code
Apr 01, 2026
Viaarxiv icon

Benchmarking Agents in Insurance Underwriting Environments

Add code
Jan 31, 2026
Viaarxiv icon

Evaluating Language Model Context Windows: A "Working Memory" Test and Inference-time Correction

Add code
Jul 04, 2024
Figure 1 for Evaluating Language Model Context Windows: A "Working Memory" Test and Inference-time Correction
Figure 2 for Evaluating Language Model Context Windows: A "Working Memory" Test and Inference-time Correction
Figure 3 for Evaluating Language Model Context Windows: A "Working Memory" Test and Inference-time Correction
Figure 4 for Evaluating Language Model Context Windows: A "Working Memory" Test and Inference-time Correction
Viaarxiv icon

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Add code
Jun 10, 2022
Viaarxiv icon