Picture for Charese H. Smiley

Charese H. Smiley

How Reliable are Confidence Estimators for Large Reasoning Models? A Systematic Benchmark on High-Stakes Domains

Add code
Jan 13, 2026
Viaarxiv icon

Calibrating LLM Confidence by Probing Perturbed Representation Stability

Add code
May 27, 2025
Viaarxiv icon

FinNLI: Novel Dataset for Multi-Genre Financial Natural Language Inference Benchmarking

Add code
Apr 22, 2025
Viaarxiv icon