chatbots


Evaluation format, not model capability, drives triage failure in the assessment of consumer health AI

Add code
Mar 12, 2026
Viaarxiv icon

Stop Listening to Me! How Multi-turn Conversations Can Degrade Diagnostic Reasoning

Add code
Mar 12, 2026
Viaarxiv icon

Designing Service Systems from Textual Evidence

Add code
Mar 11, 2026
Viaarxiv icon

End-to-End Chatbot Evaluation with Adaptive Reasoning and Uncertainty Filtering

Add code
Mar 11, 2026
Viaarxiv icon

Naïve Exposure of Generative AI Capabilities Undermines Deepfake Detection

Add code
Mar 11, 2026
Viaarxiv icon

Context Engineering: From Prompts to Corporate Multi-Agent Architecture

Add code
Mar 10, 2026
Viaarxiv icon

Towards Provably Unbiased LLM Judges via Bias-Bounded Evaluation

Add code
Mar 05, 2026
Viaarxiv icon

Survive at All Costs: Exploring LLM's Risky Behaviors under Survival Pressure

Add code
Mar 05, 2026
Viaarxiv icon

Understanding Parents' Desires in Moderating Children's Interactions with GenAI Chatbots through LLM-Generated Probes

Add code
Mar 04, 2026
Viaarxiv icon

Stan: An LLM-based thermodynamics course assistant

Add code
Mar 04, 2026
Viaarxiv icon