chatbots


BankMathBench: A Benchmark for Numerical Reasoning in Banking Scenarios

Add code
Feb 19, 2026
Viaarxiv icon

Helpful to a Fault: Measuring Illicit Assistance in Multi-Turn, Multilingual LLM Agents

Add code
Feb 19, 2026
Viaarxiv icon

Key Considerations for Domain Expert Involvement in LLM Design and Evaluation: An Ethnographic Study

Add code
Feb 16, 2026
Viaarxiv icon

"Not Human, Funnier": How Machine Identity Shapes Humor Perception in Online AI Stand-up Comedy

Add code
Feb 13, 2026
Viaarxiv icon

RGAlign-Rec: Ranking-Guided Alignment for Latent Query Reasoning in Recommendation Systems

Add code
Feb 13, 2026
Viaarxiv icon

SPILLage: Agentic Oversharing on the Web

Add code
Feb 13, 2026
Viaarxiv icon

SCOPE: Selective Conformal Optimized Pairwise LLM Judging

Add code
Feb 13, 2026
Viaarxiv icon

Differentiable Modal Logic for Multi-Agent Diagnosis, Orchestration and Communication

Add code
Feb 12, 2026
Viaarxiv icon

Self-Regulated Reading with AI Support: An Eight-Week Study with Students

Add code
Feb 10, 2026
Viaarxiv icon

Don't Shoot The Breeze: Topic Continuity Model Using Nonlinear Naive Bayes With Attention

Add code
Feb 10, 2026
Viaarxiv icon